Martin Rubli

Building a Webcam Infrastructure for GNU/Linux
Master Thesis EPFL, Switzerland 2006

Prof. Matthias Grossglauser, Laboratory for Computer Communications and Applications, EPFL

Richard Nicolet, Logitech Remy Zimmermann, Logitech

© 2006,

Martin Rubli
School of Computer and Communication Sciences, Swiss Federal Institute of Technology, Lausanne, Switzerland Logitech, Fremont, California

Revision a. All trademarks used are properties of their respective owners.
A This document was set in Meridien LT and Frutiger using the L TEX typesetting system on Debian GNU/Linux.

Abstract In this thesis we analyze the current state of webcam support on the GNU/Linux platform. Based on the results gained from that analysis we develop a framework of new software components and improve the current platform with the goal of enhancing the user experience of webcam owners. Along the way we get a close insight of the components involved in streaming video from a webcam and of what today’s hardware is capable of doing.

. .4. . . . . . . .3. . . . . . . . . . .1 Overview . . . . . . . . . . . . . . .1 Introduction . . . . . . . .4. . . . . . . .2 The API . .1 Webcams and audio .2. . . . . 10 . . . . . . . .1 History . . . . . .4 Controls . . . . . 9 . .3 Logitech webcams . . . . . . . . . . . . . . . . . . . 3 An introduction to Linux multimedia 3. . . . . . . . . . 9 . . . . . . 2. . . . . . . . . . . . . . . . . 9 . . . . . . .1 A brief history of Video4Linux 3. . . .4. .2 Linux audio support . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . .3 Device topology . .5 Non-Logitech cameras . . 3. . . . . . .Contents 1 Introduction 2 Current state of webcam hardware 2. 8 . .5 Payload formats . . . . . 4 . . . . . . . . . 5 . .3. . . iv . . . . . . 4. . . . . . . .3 USB Video Class cameras . . . . 10 12 12 12 12 13 13 14 15 16 18 18 18 19 19 20 . . . . . . 4. . . . . . . . . . . . . . . . . 2. . 2. . .1 Introduction . . . . .3 Linux user mode multimedia support 3. . . . . . . . . . . 3 .1 Introduction . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . 3.3. . . . . . . . . . 2. . . . . . . . . . . . . 2. .2. . . . . . . . . . . . . . . . 3. . . . . . . . .2 Cameras using proprietary protocols 2. . . .6 Transfer modes . . . . . . . . . 2. . . . . . . . . . . . . 2. . . . . . . 4 Current state of Linux webcam support 4. . . 4. . . . . . . . . . . . . . . . 2. . . . . 4 . . .4 Current discussion . . . . . .2 Linux kernel multimedia support .1 GStreamer . . 3. . . . . . . . . . . . . . . . .4. . 2. . . . .2 Device descriptor . . . . . . . . . . . . . . . . . . . . . . .2 NMM . . . . . . . . . . . . . . . . . . .1. . . . . . 6 . 2. .4 USB Video Class . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . 3 .2 Terminology . . .2 V4L2: Video for Linux Two . . . . . . . . 1 3 . . . . . .1 Introduction . .3. . . . . . 10 . . . 2. . . . . .4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. 3. 2. . . . . . . . . . . 8 . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . 5. . . . . . 4. . . . . . . . . . . . . . . . . . 5. . . . . . . . . 5 Designing the webcam infrastructure 5. . . . .4.3. .1 Introduction . 5. . . . . . . . . . . . . . . . . . . . . . . . . .3 V4L2 controls in sysfs .3. . . . .1 V4L2 applications .4. . . . . . . . 4. . . . . . . . . 4. . . . . . . .2 The Spca5xx Webcam driver . . . . . . . . . . . . . . 4. . . . . . . 5.4. . . . . . . . .4. . . . . . . . . 4. .3 GStreamer applications .1 Multiple open . . . . . . . .4 Bits and pieces .5 Flashback: current problems . . . . . . . . .1. . . . . . . . . 5. . . . . .3. . . . . . . . . .3 The QuickCam Messenger & Communicate driver 4. . . . . .2 V4L applications . . . . . . . . .2 Goals . . . . . . . . . . .4. . . . . . .4. . . . . 5. . 5. . . .5 The Linux USB Video Class driver . . . . . . . . . .8 libwebcam . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . .3 Drivers . . . . . . . . . . . . . .4. . 6 Enhancing existing components 6. . .3. . . . . . . .12 LVGstCap (part 3 of 3: feature controls) 5. . . . 5. . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . 6. . v . . . . . . . . . .5. . . . . . .4 GStreamer . . . . . . . . . . . . .5. . . . . . . . . . . . . . 4. . . . 4. . . . . . . . . .11 liblumvp . . . . . . . . . .9 libwebcampanel . . .1. . . .1 The Philips USB Webcam driver . . 6. . . . . . . . . 4. . . .4. .2 The Video4Linux user mode library . 5. . . . . . . . . . .3 V4L2 . . . . . . . . . . . . . . . . . . . . . . . . . .5. . 5. . . . . . .2. . . .3 Summary . . . . . . . . . . . . . . . . . . .1 Kernel mode vs. . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . .2 UVC extension support 6. . . . . . . . .2 Video4Linux . . . . . . . . . . . . 6. . . 6. . .2 UVC driver . . . . . . . . . . . . . . . 6. .3 V4L2 related problems . . 5. . . . . . . . . . .4. . . . . . . . . . . .4. . . . . . . . .5 Problems and design issues . . . . . . . . . .5 v4l2src .1 Linux UVC driver . . . . . . . . . . . . . . . . . . . . . .13 lvcmdpanel . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . . .4. . . . .4. . . . . . . . . . . . . . . . . . . . . 4. . . . . 5. . . . . . . . . . . . . . . .1. . . . . . . . .4. . . . . . . . . . . . .4 Components . . . . . . . . . . . . . . . . .3 GStreamer . . . . . . . . . . . . . .7 LVGstCap (part 1 of 3: video streaming) 5. . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . . .4. .3. . . . 5. . . . .6 lvfilter . . . . . . . .4. . . . . . . . . . 4.1 Overview . . . . . . .4 The QuickCam Express driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . user mode . . . . . .3 Architecture overview . .10 LVGstCap (part 2 of 3: camera controls) 5. . . . . . . . . . . . . 21 21 21 22 22 23 23 24 24 26 26 27 27 33 34 39 39 39 43 45 45 46 46 47 47 49 50 50 51 51 52 52 53 53 56 56 56 59 62 62 65 65 . . .4 Applications . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . .1 Meta information . . . . . . .1 Enumeration functions . . . . . . . . . . . . . . . . . . . . . . . . . 83 8. . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . . 7. . . . . . . . . . . . 7. . . . . . . . . . . 7. . . . . . . . .7. . . . . . . . .7. . . . . . . . . . . . . .1 Libraries . .1 LVGstCap . . . 7. . . . . . . . . . . . . . . . . . . . .5. . . . . . .6 Outlook . 7. . . . . . . . . . . . . . . . .7 Licensing .1 libwebcam .2 Applications . . . . . . . . . . .2 Linux webcam framework 7. . . . . . . . . . . . . . . . 7. . . . . . . . . . . .1. . . . . . . . . . . . 67 67 68 68 70 71 71 73 74 75 75 78 79 80 80 81 81 8 The new webcam infrastructure at work 83 8. . . .8 Distribution . . . . . . . . . . . . . . . . . .2 lvcmdpanel . . . . .1 UVC driver . . . . . 7. . . . . . . .7 New components 7. . . . . . . . . . . . . . . . . . . . . . . . . . .3 libwebcampanel . . . .5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . .2 Feature controls . . . . . 83 9 Conclusion A List of Logitech webcam USB PIDs 86 88 vi . . . . .2 liblumvp and lvfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5. . . . . . 7. . . . . . . . . . . . . . . . . . .4 Build system . . . . .1.3. . . . . . .2 Thread-safety . 7. . . . . . . . . . 7. . . . . . .

oftentimes for "live" cameras on the Internet that update a static picture every few seconds. one that does not require a graphical user interface and a physical screen.e. For another. often for mostly unsubstantiated fear of compiling kernel drivers. Traditionally. webcam manufacturers have provided little support for the Linux platform. the scope of a Master thesis is enough to lay the foundations that are required. apart from presenting the newly developed 1 . Linux has been used for server installations and only in the recent years has it started to conquer the desktop. This project is an attempt of Logitech to change this in order to provide Linux users with an improved webcam experience that eventually converges towards the one that Windows users enjoy today. not only of a technical nature but also in terms of establishing discussions between the parties involved. the adventure has only just started. Obviously. A webcam is perfectly useless without good software that takes advantage of its features. so where do users go from here? Since the first webcams appeared on the market. i.Chapter 1 Introduction Getting a webcam to work on Linux is a challenge on different levels. For one. In the course of this project. most of which was in the form of giving technical information to the open source community without taking the opportunity to actively participate and influence the direction that webcam software takes. they have evolved from simple devices that captured relatively poor quality videos the size of a postage stamp to hightech devices that allow screen-filling videos to be recorded all while applying complex real-time video processing in hardware and software. Even once that first hurdle is cleared. Linux applications have primarily focused on retrieving still images from the cameras. Luckily. the timeline of such an undertaking is in the order of years due to the sheer amount of components and people involved. This fact still shows in the form of two important differences when one compares webcam support on Linux and Windows. Making the system recognize the device properly sets the bar to a level that many users feel unable to cross. These programs often work in a headless environment.

thereby letting me concentrate on the higher-level components. the author of the Linux UVC driver. Richard Nicolet and Remy Zimmermann.framework. It was this extensive analysis that eventually led to the design of the proposed framework in an attempt to learn from previous mistakes and raise awareness of current limitations. In particular this goes to Laurent Pinchart. The latter is especially important for a platform that has to keep up with powerful and agile competitors. All of these are key to establishing a successful multimedia platform and delivering users the experience they expect from an operating system that has officially set out to conquer the desktop. for his guidance. I would like to thank first of all my supervisors at Logitech. A big thank you to the people in the open source community I got to work with or ask questions to. USA. and second of all for the constructive collaboration in extending it. and makes applications easier to maintain. my supervisor at EPFL. The foundations we laid with the Linux webcam framework make it easier for developers to base their products on a common core which reduces development time. but also the rest of the video driver and firmware team for their big help with various questions that kept coming up. thanks to everybody who helped make this project happen in one way or another but whose name did not make it into this section. Fremont. September 2006 2 . we will look at many of the components that already exist today. highlighting their strengths but also their weaknesses. for their advice and the expertise they shared with me. first of all for having written the driver. Thanks also to Matthias Grossglauser. Last but not least. increases stability.

Therefore.x case.1 Introduction The goal of this chapter is to give an overview of the webcams that are currently on the market. 2. or simply UVC. low-speed is irrelevant. the majority of the chapter is dedicated to UVC cameras as devices using proprietary protocols are slowly phased out by the manufacturers. There also exists a mode called low-speed that was designed for very low bandwidth devices like keyboards or mice. We will nevertheless mention the most important past generations of webcams because some of them remain in broad use and it will be interesting to see how they differ in functionality. We will sometimes use these acronyms for readability’s 3 . We will also give an overview of the USB Video Class.Chapter 2 Current state of webcam hardware 2.2 Terminology There are a few terms that will keep coming up in the rest of the report.0 operation and full-speed for the USB 1. specification. For webcams. The Linux webcam framework was designed primarily with UVC devices in mind and the main goal of this chapter is to present the hardware requirements of the framework. We will first focus on Logitech devices and devote a small section to cameras of other vendors later on. Image resolutions There is a number of standard resolutions that have corresponding acronyms. which is the designated standard for all future USB camera devices. USB modes In the context of USB we will often use the terms high-speed to denote USB 2. Let us quickly go over some of them to avoid any terminology related confusion.

Higher resolution video at 25 or more frames per second only became possible when USB 2. A maximum theoretical transfer rate of 480 Mb/s provides enough reserves for the next generations of webcams with multi-megapixel sensors. was a bandwidth that was still relatively low and image resolutions above 320x240 pixels required compression algorithms that could send VGA images over the bus at tolerable frame rates.1 has a list of the most common ones. With the advent of the Universal Serial Bus. Using a printer and a webcam at the same time was no longer a problem. For example.1 Width [px] 160 176 320 352 640 1024 1280 1280 Height [px] 120 144 240 288 480 768 960 1024 Acronym QSIF QCIF QVGA (also SIF) CIF VGA XGA SXGA (4:3) SXGA (5:4) Table 2.1 Logitech webcams History In the last years the market has seen a myriad of different webcam models and technologies. webcams finally became comfortable and simple enough to use for the average PC user. One of the limitations of USB. 352x288 is the PAL version of CIF whereas NTSC CIF is 352x240.sake. Table 2. 1 For some of the acronyms there exist different resolutions depending on the analog video standard they were derived from.3.1: List of standard resolutions and commonly used acronyms. 4 . Driver installation became simple and multiple devices could share the bus.x controllers. The first webcams were devices for the parallel port allowing very limited bandwidth and a user experience that was far from the plug-andplay that users take for granted nowadays.0 was introduced. 2. All recent Logitech cameras take advantage of USB 2. however.0.3 2. albeit with a limited resolution set. although they still work on USB 1.

1 devices. or 10-bit RGB Bayer format. in the case of the 302 with built-in audio support. with the help of an integrated encoder chip. JPEG frames. 9. even the ASIC is completely abstracted by the protocol and–in the optimal case–every UVC camera works with any UVC driver. at least as far as the functionality covered by the standard is concerned. This model has a different USB identifier as can be seen in the table in appendix A. In the case of UVC cameras.1 where we talk about the Linux driver for cameras based on this chip. such knowledge becomes less important because the firmware hides sensor specific commands from the USB interface. It has built-in microphone support and delivers image data in 8. We will see in chapter 4 that this categorization is useful when it comes to selecting a driver. they can also deliver uncompressed 8 or 9-bit RGB Bayer data or.3.2.1 chipset that supports VGA at a maximum of 15 fps. This following list shows a number of Logitech’s non-UVC cameras and is therefore grouped by the ASIC family they use. • Logitech QuickCam IM • Logitech QuickCam Connect • Logitech QuickCam Chat • Logitech QuickCam Messenger • Logitech QuickCam for Notebooks • Logitech QuickCam for Notebooks Deluxe • Logitech QuickCam Communicate STX • Labtec Webcam Plus • Labtec Notebook Pro Philips SAA8116 based The Philips SAA8116 is also a USB 1. While the sensors are also an important component that the driver has to know about. It can also use a proprietary YUV compression format that we will encounter again in section 4. • Logitech QuickCam Zoom • Logitech QuickCam Pro 3000 • Logitech QuickCam Pro 4000 • Logitech QuickCam Orbit/Sphere3 2 The application-specific integrated circuit in a webcam is the processor designed to process the image data and communicate them to the host. They support a maximum resolution of VGA at 15 frames per second.2 Cameras using proprietary protocols From a driver point of view Logitech cameras are best distinguished by the ASIC2 they are based on. Vimicro 30x based Cameras with the Vimicro 301 or 302 chips are USB 1.3. Apart from uncompressed YUV data. 3 There also exists a model of this camera that does not use Philips ASICs but the SPCA525 described below. 5 .

They support VGA at 30 fps and. the UVC support of these cameras is still fairly complete. 6 .0 compliant and include an audio chip.3 megapixels at lower frame rates. As we will see later on when we talk about the Linux UVC driver in more detail. To reduce the traffic on the bus they feature a built-in JPEG encoder to support streaming of MJPEG data in addition to uncompressed YUV.1 shows product photos of some of these cameras.1 chipset that only supports the CIF format at up to 15 fps.3 USB Video Class cameras Logitech was the first webcam manufacturer to offer products that use the USB Video Class protocol. although this transition was done in two steps. The following is a complete list of these devices: • Logitech QuickCam Fusion • Logitech QuickCam Orbit MP/Sphere MP • Logitech QuickCam Pro 5000 • Logitech QuickCam for Notebooks Pro • Logitech QuickCam for Dell Notebooks (built-in camera for notebooks) • Acer OrbiCam (built-in camera for notebooks) • Cisco VT Camera II Figure 2. higher resolutions up to 1. The following is a list of cameras that are based on this chip: • Logitech QuickCam Chat • Logitech QuickCam Express • Logitech QuickCam for Notebooks • Labtec Webcam • Labtec Webcam Plus 2. This conservative approach was due to the fact that the first models did not pass all the tests required to qualify as UVC devices. The USB descriptors of these cameras still announce the camera as a so-called vendor class device.3. which is why it simply overrides the device class and treats them as ordinary UVC devices. It started with a first set of cameras containing the Sunplus SPCA525 chip which supports both a proprietary protocol as well as the UVC standard. depending on the sensor used. All SPCA525 based cameras are USB 2.• Logitech QuickCam Pro for Notebooks • Logitech ViewPort AV100 • Cisco VT Camera Sunplus SPCA561 based The Sunplus SPCA561 is a low-end USB 1.

1: The first Logitech webcams with UVC support. 7 .(a) QuickCam Fusion (b) QuickCam Orbit MP (c) QuickCam Pro 5000 (d) QuickCam for Notebooks Pro Figure 2.

It supports a variety of features that cover the most frequently used cases while allowing device manufacturers to add their own extensions.The next generation of Logitech webcams scheduled for the second half of 2006 are pure UVC-compliant cameras. Among those are the QuickCam Ultra Vision and the 2006 model of the QuickCam Fusion. device or as belonging to one of the different device classes that the USB forum has defined.4 2. 2. it can take advantage of most or all of the device’s features without requiring the installation of a specific driver. If an operating system comes with a USB class driver for a given device class. vendor-specific. printers.2: The first pure Logitech UVC webcam: QuickCam UltraVision All of these new cameras are supported by the Linux UVC driver and are automatically recognized because their USB descriptors mark them as USB Video Class devices. HID (Human Interface Devices). Each device can either classify itself as a custom. The remainder of this section gives the reader a short introduction to some of the key concepts of UVC. and webcams. We will only cover what is important to understand the scope of this report and refer the interested reader to [6] for the technical details.1 USB Video Class Introduction We have already quickly mentioned the concept of USB device classes. There exist many device classes with some of the best-known being mass storage. Figure 2. and audio devices.4. hence greatly adding to the user’s plug-and-play experience. 8 . The USB Video Class standard follows the same strategy supporting video devices such as digital camcorders. therefore eliminating the need to hardcode their product identifiers in the software. television tuners.

2. Units. To do anything useful with the functionality that extension units provide. the extension unit. Terminals only have a single pin through which they can be connected to other entities. it enumerates its entities and builds a graph with two terminal nodes. Note that the controls in the third column are not specified by the standard but are instead taken from the list of extension controls that the current Logitech UVC webcams provide. and frame rates supported by the device as well as a description of all the entities that the device defines. units and terminals contain sets of so-called controls through which a wide range of camera settings can be changed or retrieved.2 lists a few typical examples of such controls grouped by the entities they belong to. While the USB standard imposes a few ground rules on what the descriptor must contain and on the format of that data. different device classes build their own class-specific descriptors on top of these. They can be used to select one of many inputs (selector unit) or to control image attributes (processing unit).3 Device topology The functionality of UVC devices is divided up into two different entities: units and terminals.4. Table 2. the controls they contain are not. Terminals are data sources or data sinks with typical examples being a CCD sensor or a USB endpoint. and one or multiple units in between. 9 .4 Controls Both. The host can retrieve all information it needs from these descriptors and make the device’s features available to applications. We shall see the implications of this fact later on when we discuss the Linux UVC driver.2 Device descriptor USB devices are self-descriptive to a large degree. There is a special type of unit that we will talk most about in this report. on the other hand. The UVC descriptor contains such information as the list of video standards. resolutions. When the driver initializes the device. an input and an output terminal. exporting all information necessary for a driver to make the device work in a so-called descriptor. Extension units are the means through which vendors can add features to their devices that the UVC standard does not specify.4. 2. are intermediate entities that have at least one input and one output pin.4.2. the host driver or application must have additional knowledge about the device because while the extension units themselves are self-descriptive.

Camera terminal • Exposure time • Lens focus • Zoom • Motor control (pan/tilt/roll) Processing unit • Backlight compensation • Brightness • Contrast • Hue • Saturation • White balance Extension units • Pan/tilt reset • Firmware version • LED state • Pixel defect correction Table 2. The controls in the first two columns are defined in the standard. 2.2: A selection of UVC terminal and unit controls. Isochronous transfers are used when a minimum speed is required but the loss of certain packets is tolerable. They are commonly used in file transfers where reliability is more important than speed. 2. In the case of a lost frame. Most webcams use isochronous transfers because it is more acceptable to drop a frame than to transmit and display the frames delayed. 2. Bulk transfers guarantee that all data arrives without loss but do not make any similar guarantees as to bandwidth or latency. MPEG-2. A list of supported devices can be found on the developer’s website[23]. MJPEG. something that is barely noticeable by the user.4.5 Payload formats The UVC standard defines a number of different formats for the streaming data that is to be transferred from the device to the host. 10 .4.5 Non-Logitech cameras Creative WebCam Creative has a number of webcams that work on Linux. whereas delayed frames are usually considered more disruptive of a video conversation. Creative also has a collection of links to drivers that work with some of their older camera models[3].6 Transfer modes UVC devices have the choice between using bulk and isochronous data transfer. or uncompressed. the availability and definition of the controls in the last column depends on the camera model. Each of these formats has its own adapted header format that the driver needs to be able to parse and process correctly. such as DV. the driver can simply repeat the previous frame. most of them with the SPCA5xx driver. MJPEG and uncompressed are the only formats used by today’s Logitech webcams and they are also currently the only ones understood by the Linux UVC driver.

Further models are scheduled but none of them are reported to be UVC compliant at this time.Microsoft LifeCam In summer 2006 Microsoft entered the webcam market with two new products. 11 . the LifeCam VX-3000 and VX-6000 models. Both of them are not currently supported by Linux due to the fact that they use a proprietary protocol.

0 kernel under the name of bttv. among others the first webcam driver for the Connectix QuickCam.Chapter 3 An introduction to Linux multimedia 3. At the end of this chapter the reader should have an overview of the different multimedia components available on Linux and how they work together.2. work on a successor had started as early as 1998 and. after four years. In 1996 a series of drivers targeted at the popular BrookTree Bt848 chipset that was used in many TV cards made it into the 2. The driver evolved quickly to include support for radio tuners and other chipsets.2 3. Eventually.1 Introduction This chapter gives an overview of what the current state of multimedia support looks like on GNU/Linux.2. TV tuner cards formed the first category of devices to spur the development of a multimedia framework for Linux. 3. or short V4L. We shall first look at the history of the involved components and then proceed to the more technical details.1 Linux kernel multimedia support A brief history of Video4Linux Video devices were available long before webcams became popular.5 of the of- 12 . more drivers started to show up. was merged into version 2. With V4L being criticized as too inflexible. was released in 1999 and included a multimedia framework called Video4Linux. It must be said that the name is somewhat misleading in the sense that Video4Linux not only supports video devices but a whole range of related functions like radio tuners or teletext decoders. The next stable kernel version. Linux 2. that provided a common API for the available video drivers.

in particular 2. This leaves Video4Linux 2 as the sole kernel subsystem for video processing on current Linux versions. the Advanced Linux Sound Architecture. When version 2. although OSS is still available as a deprecated option.6. and for another both subsystems have followed a rather strict separation of concerns. 3. While successful for a long time its rather simple architecture suffers from a number of problems. V4L and V4L2 coexisted for a long time in the Linux 2. or simply OSS. 3. audio has been around much longer than video has. Depending on the author and the context Video for Linux Two is also referred to as Video4Linux 2 or just Video4Linux.6 series but as of July 2006 the old V4L1 API was officially deprecated and removed from the kernel. Advanced Linux Sound Architecture Starting with Linux 2. it was the first version of Linux to officially include Video for Linux Two. Thanks to features like allowing devices to be shared among applications most new applications come with ALSA support built in and many existing applications make the conversion from older audio frameworks. As an example it is not possible to hear system notification sounds while an audio application is playing music in the background. Backports of V4L2 to earlier kernel versions. were developed and are still being used today. the most serious of which to the average user being the inability to share a sound device between different applications.ficial Linux kernel development tree. 13 . The first application to claim the device blocks the device for all other applications.3 Linux user mode multimedia support The Linux kernel community tries to move as many components as possible into user space. their history is marked by somewhat similar events. On the one hand this approach brings a number of advan1 Note the variety in spelling. Open Sound System The Open Sound System. ALSA became the standard Linux sound subsystem.6 of the kernel was released. Even though they were developed by different teams at different times. The reason for this is the lack of ALSA audio drivers for some older sound devices.4.2. or simply V4L21 . Together with a number of non-technical reasons this eventually led to the development of ALSA. was originally developed not only for the Linux operating system but for a number of different Unix derivatives.2 Linux audio support Linux has traditionally separated audio and video support. For one thing.

1 GStreamer GStreamer can be thought of as a rather generic multimedia layer that provides solid support for pipeline centric primitives such as elements. The available choices range from simple media decoding libraries to fully grown network-oriented and pipeline-based frameworks. Nevertheless the gains seem to outweigh the drawbacks. Figure 3. respectively). and buffers. The latter one is still relatively young and therefore not as wide-spread as GStreamer which has found its way into all current Linux distributions. On the other hand.1: A simple GStreamer pipeline that plays an MP3 audio file on the default ALSA source. A typical pipeline consists of one or more sources that are connected via multiple processing elements to one or more sinks. the fact that there is a variety of such frameworks available can be seen as a positive or negative outcome of this trend.3. It bears some resemblance to Microsoft DirectShow. and increased stability. and output is handled by plugins that are loaded on the fly. which is why a lot of effort has gone into the development of user space multimedia frameworks. or synchronization. the core library provides basic functions like capability negotiation. Two elements can be linked by their pads with the data flowing from the source pad to the sink pad. Figure 3. or lower performance due to increased overhead. The lack of a single common multimedia framework undoubtedly makes it more difficult for application developers to pick a basis for their software. For the rest of this section we will present two of what we consider the most promising frameworks available today. processing. user space solutions can suffer from problems such as reduced flexibility. 3. GStreamer and NMM.1 lists a few plugins for each category. while all input. sink elements only have 14 . which has been the center of Windows multimedia technology for many years now. faster development.1 shows a very simple example. Table 3.e. i. Each plugin has an arbitrary number of so-called pads. Source elements are characterized by the fact that they only have source pads. The mad plugin decodes the MP3 data that it receives from the file source and sends the raw audio data to the ALSA sink. Depending on the point of view. The GStreamer architecture is strongly plugin-based. albeit not always in its latest and most complete version. Both projects are available under open source licenses (LGPL and LGPL/GPL combined. pads.tages like easier debugging. routing facilities. the lack of transparency.

and sink plugins.sink pads.10\media\clip. processing. but all of these are classified under the name of a virtual file system. Let us look at two common examples of how today’s multimedia software interacts with the network: 1.2/home/mrubli/music/clip. this is the easiest case because it is almost entirely transparent to the applications.ogg (generic URL for a secure FTP resource as used by many Linux environments). an application can simply open a file path such as \\192. in kernel mode or user mode. The underlying layers make sure that all the usual input/output functions work the same on these files as on local files. processing.3. 3. Playback of a file residing on a file server in the network 2. Sources • filesrc • alsasrc • v4l2src Processing • audioresample • identity • videoflip Sinks • udpsink • alsasink • xvimagesink Table 3. it tightly integrates network resources into the process. The main requirement is that the underlying layers (operating system or desktop environment) know how to make network resources available to their applications in a manner that resembles access to local resources as closely as possible. So apart from supporting the syntax of such network paths the burden is not on the application writer. as the name already suggests. There are different ways how this can be realized. e. Playback of an on-demand audio or video stream coming from the network 1. and processing elements have at least one of each.2 NMM NMM stands for Network-Integrated Multimedia Middleware and. As an example. and output usually all happen on the same machine. By doing so NMM sets a counterpoint to most other multimedia frameworks that take a machine centric approach where input.1.1: An arbitrary selection of GStreamer source.168. 15 .0.g.avi (UNC path for a Windows file server resource) or sftp://192. Playback of a network file From the point of view of a player application. 168.

It requires deep knowledge of different network layers and protocols. and loss recovery lies entirely on the application’s shoulders. The applications communicate with a streaming server via partially proprietary protocols based on UDP or TCP. Some devices provide uncompressed data streams whereas others offer compressed video data in addition to uncompressed formats. In particular. which strongly reduces platform independence and interoperability. 3. e. This allows applications to access remote hardware as if it were plugged into the local computer. NMM tries to escape this machine centric view by providing an infrastructure that makes the entire network topology transparent to applications using the framework. Each one of these developers had their own vision of what a driver should or should not do.2. which is why certain driver writers have included decompressor modules in their drivers. While the V4L2 API specifies the syntax and semantics of the function calls that drivers have to implement. Apart from that. the application has no way of actively controlling remote devices. the zoom factor of the camera from which the video stream originates. This guarantees maximum compatibility and 16 . The website of the NMM project[16] lists a number of impressive examples of the software’s capabilities. In the case of a decompressor-enabled driver format conversion can occur transparently if an application asks for uncompressed data but the device provides only compressed data. Not every application. the client plays a rather passive role by just processing the received data locally and exercising relatively little control over the provided data flow. It can change channels on a remote TV tuner card or control the zoom level of a digital camera connected to a remote machine. therefore leaving room for interpretation. Note how there is no transparency from the point of view of the streaming client. It is usually limited to starting or stopping the stream and jumping to a particular location within the stream. The photo is from an article that describes the setup of a video wall in detail[13]. The NMM framework abstracts all these controls and builds communication channels that reliably transmit data between the involved machines.4 Current discussion Over the years many video device drivers have been developed by many different people. The elements of the flow graph can be distributed within a network without requiring the application to be aware of this fact. may be able to process compressed data. loss detection. The classic example where different people have different opinions is the case of video formats and whether V4L2 drivers should include support for format conversion. Playback of an on-demand stream Playing back on-demand multimedia streams has been made popular by applications such as RealPlayer or Windows Media Player.2. The burden of flow control. it does not provide much help in terms of higher-level guidance. however.g. One of them can be seen in figure 3.

allows applications to focus on their core business: processing or displaying video data. brand. What both sides have in common is that the main task of a multimedia framework is to abstract the device in a high-level manner so that applications need as little as possible a priori knowledge of the nature. for an application to work with devices that provide compressed data. it has to supply its own decompressor module possibly leading to code– and bug–duplication unless a common library is used to carry out such tasks. We will see the advantages and disadvantages of both approaches together with possible solutions–existing and non-existing–in more detail in the next chapter.2: Video wall based on NMM.Figure 3. 17 . It uses two laptop computers to display one half of a video each and a third system that renders the entire video. and model of the device they are talking to. Other authors take the view that decompressor modules have no place in the kernel and base their opinion partly on ideological and partly on technical reasons like the inability of using floating point mathematics in kernel-space. Therefore.

1 Introduction In the previous chapter we saw a number of components involved in getting multimedia data from the device to the user’s eyes and ears. This chapter will show how these components are linked together in order to support webcams. We will find out what exactly they do and don’t do and what the interfaces between them look like. audio will only come up when it requires particular attention in the remainder of this report. After this chapter readers should understand what is going on behind the scenes when a user opens his favorite webcam application and they should have enough background to understand the necessity of the enhancements and additions that were part of this project. To the host system these webcams appear as two separate devices. 4. The microphone adheres to the USB Audio Class standard and is available to every host that supplies a USB audio class driver.1. one of them being the video part. For this reason. On Linux. the other being the microphone.1 Webcams and audio With the advent of USB webcams vendors started including microphones in the devices. Due to the availability of the Linux USB audio class driver there was no particular need for us to concentrate on the audio part of current webcams as they work out of the box. 18 . and the fact that Video4Linux does not (need to) know about the audio part of webcams. this driver is called snd-usb-audio and exposes recognized device functions as ALSA devices.Chapter 4 Current state of Linux webcam support 4.

webcams) • Video overlay devices (TV tuners) • Raw and sliced VBI input devices (Teletext. The fact that these subdevices have relatively little in common makes the job of specifying a common API difficult. or remote control interfaces. The full story is a little more complicated than that. where available.2. a few examples: • Video capture devices (TV tuners. The following is a list of device types that are supported by V4L2 and. teletext decoders. For one thing.2. This section focuses on the technical aspects of this subsystem. Figure 4.1 where we saw the evolution from the first video device drivers into what is today known as Video for Linux Two.2 V4L2: Video for Linux Two Video for Linux was already quickly introduced in section 3. Figure 4.1 Overview In a nutshell. and closed captioning decoders) • Radio receivers (Radio tuners integrated on some TV tuner cards) • Video output devices 19 . V4L2 not only supports video devices but related subdevices like audio chips integrated on multimedia boards. The dashed arrows indicate that there are further operating system layers involved between the driver and the hardware. 4. The gray box shows which components run in kernel space. DVB decoders.1: Simplified view of the components involved when a V4L2 application displays video. or just V4L2. EPG. V4L2 abstracts different video devices behind a common API that applications can use to retrieve video data without being aware of the particularities of the involved hardware.1 shows a schematic of the architecture.4.

20 .2 The API Due to its nature as a subsystem that communicates both with kernel space components and user space processes. or mmap. however. The V4L2 API[5] defines more than 50 such ioctls. come with a few caveats as we will see later on when we discuss the shortcomings of the current Linux video architecture. the V4L2 specification talks about codecs and effects.e. ranging from video format enumeration to stream control. That simplicity does. It is also the category that has by far the greatest number of devices. The V4L2 kernel interface The user space API is only one half of the V4L2 subsystem. in user space or kernel space. mostly due to disagreement how they should be implemented. which are not real devices but virtual ones that can modify video data.In addition. Ioctls are a way for an application and a kernel space component to communicate data without the usual read and write system calls. video capture devices. even the readiness of buffers is communicated via ioctls. support for these was never implemented. The other one is the use of mapped memory where kernel space buffers are mapped into an application’s address space to eliminate the need to copy memory around. thereby increasing performance. one for user space and one for kernel space. Using the read and write system calls is one of two ways to exchange data between video devices and applications. 1 In the case of memory mapped communication. they are an ideal means to exchange control commands. The other half consists of the driver interface that every driver that abstracts a device for V4L2 must implement. However. Even though the API was originally designed with analog devices in mind.2. i. The V4L2 user space API Every application that wishes to use the services that V4L2 provides needs a way to communicate with the V4L2 subsystem. These device nodes can be read from and written to in a similar manner as ordinary files. Like most devices on Unix-like systems V4L2 devices appear as so-called device nodes in a special tree within the file system. In V4L2 everything that is not reading or writing of video data is accomplished through ioctls1 . The scope of this project merely encloses the first category of the above list. While ioctls are not used to exchange large amounts of data. V4L2 has two different interfaces. and practical applications. 4. This communication is based on two basic mechanisms: file I/O and ioctls. drivers. webcam drivers also fall into this category. The fact that the entire V4L2 API is based on these two relatively basic elements makes it quite simple.

4. other drivers find themselves accessing the PCI bus to which TV tuner cards are connected. the decompressor module remained available in binary form.3 Drivers This section presents four drivers that are in one way or another relevant to the Logitech QuickCam series of webcams. and pipelines the V4L2 subsystem is rather low level and focused on its core task: exchange of video and control. Therefore. so video compression had to be applied for video streams above a certain data rate. All of them are either V4L1 or V4L2 drivers and available as open source. Only the former one was released in source code. When a V4L2 driver loads. The V4L2 kernel interface does not specify how drivers have to work internally because the devices that these drivers talk to are fundamentally different. managing data flow. At the time there was no USB 2. 4.Obviously kernel space does not know the same abstractions as user space.3. has a troubled history and has caused a lot of discussion and controversy in the Linux community. it registers itself with the V4L2 subsystem and gives it a number of function addresses that are called whenever V4L2 needs something from the driver–usually in response to a user space ioctl or read/write system call. or simply PWC. The original version of the driver was written by a developer known under the pseudonym Nemosoft as a project he did with the support of Philips. While webcam drivers usually communicate with their webcams through the USB subsystem. clocks.1 The Philips USB Webcam driver The Philips USB Webcam Driver. At each callback the driver carries out the requested action and returns a value indicating success or failure. Compared to other platforms where the multimedia subsystems have many additional tasks like converting between formats.2. What makes them V4L2 drivers is the fact that they all implement a small number of V4L2 functions. The pwc driver eventually made it into 21 . the driver was split into two parts: the actual device driver (pwc) that supported the basic video modes that could be used without compression and a decompressor module (pwcx) that attached to the driver and enabled the higher resolutions. 4.3 Summary We have seen that the V4L2 subsystem itself is a rather thin layer that provides a standardized way through which video applications and video device drivers can communicate. so in the case of the V4L2 kernel interface all exchange is done through standard function calls. Therefore. each driver depends on its own set of kernel subsystems. These compression algorithms were proprietary and Philips did not want to release them into open source.

this time based on V4L2 and under the name of gspca.2 The Spca5xx Webcam driver The name of the Spca5xx Webcam driver is a little misleading because it suggests that it only works with the Sunplus SPCA5xx series of chipsets. While that was true at one time.3. Michel Xhaard has developed the Spca5xx driver into one of the most versatile Linux webcam drivers that exist today. 4. Again. among others a number of Logitech cameras. notably a few models of the QuickCam Messenger. appendix A has a list of these devices. The reason he gave was the fact that the kernel is licensed under the GPL and such functionality is considered in violation of it. 4. Next to the mentioned Sunplus chipsets it supports a number of others from manufacturers such as Pixart. Vimicro. Greg Kroah-Hartman. Ever since.3. and QuickCam Express series. Luc Saillard published a pure open source version of the driver after having reverse-engineered large parts of the original pwcx module. The driver works with many Philips-based webcams from different vendors. 22 . Among the many supported cameras on the list. Only a few weeks later. decided to remove the hook that allowed the pwcx module to hook into the video stream. and the way the driver has quickly grown over time. They are all based on the STMicroelectronics 6422 chip. The complete list of Logitech USB PIDs compatible with the PWC driver can be found in appendix A. Nemosoft demanded that the pwc driver be removed entirely from the kernel because he felt that his work had been crippled and did not agree with the way the situation was handled by the kernel maintainers. The (incomplete) list of supported cameras at [23] contains more than 200 cameras and the author is working on additional chipsets. The main drawback of the Spca5xx driver is the fact that it does not support the V4L2 API yet. QuickCam Communicate. The driver supports only V4L1 at the time of this writing and can be found at [14]. the maintainer of the Linux kernel USB subsystem.3 The QuickCam Messenger & Communicate driver This driver supports a relatively small number of cameras. the driver has been under continuous development and was even ported to V4L2. In August 2004. As a reaction.the official kernel but the pwcx module had to be downloaded and installed separately. there is a fair number of Logitech’s older camera models as well as some newer ones. Much of the history can be found in [1] and the links in the article. This limitation. are the main reasons why the author has recently started rewriting the driver from scratch. or Zoran. Sonix.

It was developed in 2005 by Laurent Pinchart because he needed support for the Logitech QuickCam for Notebooks Pro camera that he was planning to use for his robot. At this point.1 we shall discuss extensions and changes that were done to support the Linux webcam infrastructure. We saw in section 2. Technical overview The Linux UVC driver.3. the device becomes visible to user space.3.5 The Linux USB Video Class driver Robot contests have been the starting point for many an open source software project.4. It is still actively maintained. This includes setting attributes like video data format. although there are no signs yet of a V4L2 version. 4. The project quickly earned a lot of interest with Linux users who tried to get their cameras to work. the driver initializes the device and registers it as a V4L2 device. the driver enters the so-called probe/commit phase to negotiate the parameters of the video stream. Device initialization As soon as a supported device is discovered. [19] focuses on the Logitech QuickCam Express and QuickCam Web models that contain chipsets from STMicroelectronics’ 6xx series. personal and community interest. It registers with the USB stack as a handler for devices of the UVC device class and. The Linux UVC driver is one of the more prominent examples. usually in the form of a device node. For this reason. we will give here a basic overview of the driver. e. /dev/video0.3.g. Later in section 6. whenever a matching device is connected. The official project website can be found at [17]. is a Video4Linux 2 and a USB driver at the same time.3 that some Logitech cameras do not announce themselves as UVC devices even though they are capable of the protocol. Driven by both. Device enumeration The first task of any USB driver is to define a criteria list for the operating system so that the latter one knows which devices the driver is willing and able to handle. Stream setup and streaming If a V4L2 application requests a video stream. the driver reads and parses the device’s control descriptor and. 23 . uvcvideo includes a hard-coded list of product IDs of such devices in addition to the generic class specifier. Let us now look at a few tasks and aspects of the UVC driver in the order they typically occur. if successful.4 The QuickCam Express driver Another relatively limited V4L1 driver. or short uvcvideo. Since this driver is one of the corner stones of this project. the driver has left the status of a hobby project behind and is designated to become the official UVC driver of the Linux kernel. sets up the internal data structures for units and terminals before it finally registers the camera with the V4L2 subsystem.

This process requires some mapping information because the translation is all but obvious. hence its name. sources and binary packages can be downloaded from [18].4. In section 6. luvcview This tool was developed by the author of the Spca5xx driver with the intention to support some features unique to the Linux UVC driver. Outlook For obvious reasons V4L2 cannot support all possible features that the UVC specification defines. and frame rate.frame size.1 we shall see one such example that was realized with the help of the sysfs virtual file system and is about to be included in the project. It is licensed under the GPL. Given the resemblance to other popular conferencing software. These control requests must be translated from the V4L2 requests that the driver receives to UVC requests understood by the device. and is therefore able to support a large number of different webcams. it must parse the packets. Ekiga is one of the main applications for webcams on Linux. but applications can use different controls to change the settings of the camera or the properties of the video stream. 24 . Logitech is already moving all cameras onto the UVC track and other vendors are expected to follow given that UVC is a Windows Vista logo requirement. which makes it compatible not only to applications such as NetMeeting but also to conferencing hardware that supports the same standards. documentation. The driver thus needs to take measures that allow user space applications to access such features nonetheless. It comes with plugins for both. Controls Video streaming does not only consist of receiving video data from the device.323. V4L1 and V4L2. When the driver finally receives video data from the device.1 Ekiga Applications V4L2 applications Ekiga is a VoIP and video conferencing that supports SIP and H.4 4. We will have a closer look at this problem and how it can be solved later on. check them for errors and reassemble the raw frame data before it can send a frame to the application. It is safe to say that the Linux USB Video Class driver is going to be the most important Linux webcam driver in the foreseeable future. 4. For Linux users this means that all these cameras will be natively supported by the Linux UVC driver.

Figure 4.3 shows a screenshot of the luvcview user interface and the command line used to start it in the background. including some of the custom controls that the UVC driver provides to enable mechanical pan/tilt for the Logitech QuickCam Orbit camera series. The latest version includes a patch that was written during this project to help with debugging of camera and driver issues. for uploading them to a web server in regular intervals. It allows to easily save the raw data received from the device into files with the help of command line options. Purely command-line based it can be used to retrieve pictures from a webcam and store them in files.g. Thanks to its simplicity it has become one of the favorite programs for testing whether the newly installed camera works. e.Figure 4. It is based on V4L2 for video input and the SDL library for video output. fswebcam This nifty application is the proof that not all webcam software needs a GUI to be useful. luvcview can be downloaded from [22]. 25 . The simple user interface allows basic camera controls to be manipulated. The fswebcam website can be found at [9].2: The main window of Ekiga during a call.

The most used ones are probably Amarok. 26 . At the moment Amarok is limited to audio.4 shows Camorama in action.3: The window of luvcview and the console used to start it in the background.4. It can even upload the pictures to a remote web server.4. It can be downloaded from [11] and is part of many Linux distributions. although video support is being discussed. 4. the GNOME’s main media player. 4. the default KDE music player and Totem.2 V4L applications Camorama Camorama is a V4L1 only application made for taking pictures either manually or in specified intervals.Figure 4.3 GStreamer applications There are many small multimedia applications that use the GStreamer engine as a back-end but a relatively small number of prominent ones. Camorama allows adjusting the most common camera controls and includes a number of video filters. though. What makes Totem interesting from the point of view of webcam users is a little webcam utility called Vanity. Unfortunately it has received very little attention from both developers and users and it remains to be seen whether the project is revived or even integrated into Totem. some of which don’t seem very stable. Unfortunately development seems to stand still at the moment. Figure 4.

1 Kernel mode vs. some of which were briefly hinted at in the previous sections.5 Problems and design issues As with every architecture.5. At that time we shall also see how GStreamer and V4L2 work together. particularly in the Linux kernel. user mode The discussion whether functionality X should be implemented in user mode or in kernel mode is an all-time classic in the open source community.4: Camorama streaming at QVGA resolution from a Logitech QuickCam Messenger camera using the Spca5xx driver. 4. We will see another webcam application based on GStreamer in the next chapter when we look at the software that was developed for this project.Figure 4. Unfortunately these discussions are oftentimes far from conclusive leading to slower progress in the implementation of certain 27 . 4. We will now look at these issues in more detail and see what their implications on webcam support on the Linux platform are. there are a number of drawbacks. At the same time we will look at possible solutions to these problems and how other platforms handle them.

memory and processing power constraints.features or. A typical example is the way the current Logitech webcam drivers for Windows are implemented. While the devices usually provide two formats. user space software development Format transparency One of the main problems in multimedia application is the myriad of formats that are in use. in the worst case. While the points are focused on webcam applications. applications get to see neither of these formats. many of them can also be applied to other domains like audio processing or even devices completely unrelated to multimedia. For application developers it becomes increasingly difficult to stay current on which devices use which formats and to support them all. to factually discontinued projects due to lack of consent and acceptance. they are offered the choice between I420 and 24-bit RGB with the latter one being especially easy to process because each pixel 28 . Instead.1 shows the most notable differences between kernel mode and user mode implementations of multimedia functionality. as in the case of the cameras using the PWC driver it may even be impossible for someone to integrate certain algorithms for legal reasons. Different vendors use different compression schemes for a number of reasons: licensing and implementation costs. Kernel space + • Transparency for user space • Direct device access • Device works "out of the box" User space • Simple upgrading • Simple debugging • Safer (bugs only affect one process) • More flexible licensing – • No floating point math • Complicated debugging • Open source only • No callback functions • Difficult to establish standard • Requires flexible kernel back-end Table 4. Table 4. This is a strong argument for hiding the entire format conversion layer from the application. and personal or corporate preference.1: Kernel space vs. In the following we will analyze these different points and present possible solutions and workarounds. In some cases. compressed MJPEG and uncompressed YUY2. backward compatibility. so that every application only needs to support a very small number of standard formats to remain compatible with all hardware and drivers.

they will create as many different pipelines as there are applications. The capturing software is not aware of this process and does not need to have its own JPEG decoder. if the camera is streaming in MJPEG mode and the capturing software requests RGB data. A few classic examples are face tracking. and blue 8-bit color value. where the conversion process is carried out before the stream is multiplexed. the work is done just once in the driver and all the frameworks receive the processed data as an input. There exists another category of video processing that is different in a very important way: computer vision. they may need to provide format filters for each one of these frameworks in the case of a proprietary streaming format. Feature transparency Up until now our discussion has focused primarily on format conversion. the driver uses its internal decompressor module to convert the JPEG data coming from the camera into uncompressed RGB. At which layer this format conversion should happen depends on a number of factors of both technical and historical nature.is represented by a red. green. where software can recognize faces it has previously memorized. the conversion takes place in the driver itself. the same process has to be carried out in the pipeline of each application because there is no way through which the applications could share the result. To see the fundamental difference between computer vision and format conversion modules we have to look first at a basic mechanism of multimedia frameworks: pipeline graph 29 . where the algorithm tries to keep track of the position of one or multiple faces. nose. however. This has the effect of multiplying the required work. where the computer locates not only the face but features like eyes. Computer vision is a form of image or video processing with the goal of extracting meta data that enables computers to "see" or at least recognize certain features and patterns. or mouth. feature tracking. In the opposite case. This can greatly simplify development by concentrating the effort on a single driver instead of different framework components. If vendors and driver developers are interested in the support of these outdated frameworks. If the format conversion–or any other computationally intensive process–is done in the user space framework. For example. There are also performance considerations when deciding on which level a conversion should take place. Windows and Linux have seen different attempts at multimedia frameworks and many of them have only survived because their removal would break compatibility with older applications still relying on these APIs. and face recognition. If two or more applications want to access the video stream of a camera at the same time. all frameworks can be presented with some standard format that they are guaranteed to understand. something that leads to poor scalability of the solution. one nontrivial module less to implement. therefore importantly reducing the overhead associated with multiple streams in parallel. If. Traditionally. These formats are provided independent of the mode in which the camera is being used.

The case of computer vision is very similar with respect to the pipeline graph creation process. Naturally. For example. one that uses the SIMD instruction set of the CPU if available and one that uses only simple arithmetic. Imagine now an audio quality improvement filter called audio_qual that accepts uncompressed audio/x-wav data as input and outputs the same type of data. so the input and output formats are the same and the framework does not see the need to include the element into the graph. 30 . so the graph builder algorithm has to take decisions. a decoder filter could have a capability descriptor that says "Able to parse and decode . When an application wants to play an . This problem is not easy to solve because making every audio application aware of the plugin’s existence is not always practical. In many cases there exist multiple graphs that are able to fulfill the given task. mp3_simd can refuse to be part of the graph but the framework will still be able to build a working graph because it can fall back to our standard decoder.mp3 file as input and delivers audio/x-wav data as output. The default MP3 decoder is called mp3_dec and has a lower priority of 50. One elegant solution to this problem is to do the processing in kernel mode in the webcam driver before the data actually reaches the pipeline source. this approach can require a format conversion in the driver if the computer vision algorithms cannot work directly on the video format delivered by the camera. Let us call the first module mp3_simd and assume it has a priority of 100. The framework should automatically build a flow graph that puts the right decoders and converters in the right order. When an application wants to play a certain media source it should not have to know the individual filters that become part of the pipeline in order to do so. How can the application benefit from audio_qual without having to know about it? The graph builder algorithm will always take the simplest graph possible. mp3_dec. the graph construction will succeed. the graph builder algorithm will first try to build the graph using mp3_simd.construction.mp3 file. In the opposite case where the current machine lacks SIMD. Back in our example there could be two MP3 decoders on the system. Obviously. so it does not see an advantage in introducing an additional filter element that–from the algorithm’s capability oriented perspective–is nothing but a null operation.mp3 files" and "Able to output uncompressed audio/x-wav data". If the current CPU supports the required SIMD instructions. The computer vision module does not modify the data. The algorithms that do this are usually based on capability descriptors that belong to each element combined with priorities to resolve ambiguities. it can simply request a pipeline that has the given . So the solution presented in the previous section becomes not only a performance advantage but a necessity to support certain features transparently for all applications.

for V4L2 applications to be able to use the comfort of callback notification. We will see a concrete example of this issue–and a possible solution–later on when we look at how the webcam framework developed as part of this project communicates with the device driver. There are many cases where such notification schemes are useful: • Notification about newly available or unplugged devices • Notification about controls whose value has changed. a new camera model can introduce motion control for pan and tilt. Therefore. but a special side channel has to be established. it is not enough to use the framework API. possibly as a result of some device built-in automatism • Notification about device buttons that have been pressed • Notification about the success or failure of an action asynchronously triggered by the application (e. applications running on top of it cannot use these features. current operating systems provide no way to do direct callback from kernel mode to user mode. a user space component would have to be introduced that wraps polling or waiting and calls the application whenever an event occurs. The design of such a side channel can turn out to be rather complicated if future reusability is a requirement because of the difficulty to predict the features of upcoming devices. For an application to be able to communicate with the driver. If the user mode multimedia framework is not aware of this or incapable to map these controls onto its primitives. In chapter 7 we propose a design that does just that. The advantage of this approach is that it has no impact on performance (especially compared to polling) and is much simpler because it does not require the application to use multiple threads to poll or wait. Ease of use The Linux kernel comes with a variety of features built in including many drivers that users of other operating systems have to download and install 31 .g. Obviously this point is also valid for kernel mode frameworks but it is generally easier to communicate between kernel components than across the barrier between user mode and kernel mode. For example. a pan or tilt request that can take some time to finish) • Notification about non-fatal errors on the bus or in the driver Unfortunately.Direct device access Another main advantage of a kernel mode multimedia framework is that the framework has easy access to special features that the device provides. Callback Many APIs rely on callback to implement certain features as opposed to polling or waiting on handles.

In user mode an application bug is almost always limited to a single process and trying out a new version is as easy as recompiling and relaunching the program.separately. While these are very useful. Traditionally there are many more tools available for developing applications than kernel components. they cannot compare with the comfort of the kernel debugger tools available on the Windows platform. In high-availability environments. something that not every Linux user may be comfortable doing. and basic mathematics can suddenly become much more complicated. Seemingly simple tasks like memory allocation. If a certain device works "out of the box" it provides for good user experience because people can immediately start using the device and launch up their favorite applications. Development aspects For a number of reasons programming in user mode tends to be easier than programming in kernel mode. the downtime incurred by a reboot can be unacceptable. In comparison. one for development and one for testing. The simple reason is for one that the development of user space tools itself is easier and for another that the number of application developers is just much higher than the one of system developers. e. and possibly fixed. On the other hand. the implications of a software bug. Three of these reasons are the variety of development tools. In some cases the entire machine can freeze without so much as a single line of output that would help locate the problem. not all the comfort of the API that application programmers are used to is available in kernel space. the implications can be manifold. There is a large variety of debug tools and helper libraries out there but almost none of them are applicable to kernel mode software. string handling. and the comfort of the API. upgrading a user mode application is as easy as restarting the application once the application package has been upgraded. These circumstances inevitably call for two machines. such an isolated crash often requires a reboot of the test machine because the crashed component cannot be replaced by a new. such an upgrade usually requires rebooting the machine. In less severe cases the kernel manages to write enough useful debug information to the system log and may even continue to run without the component in question. Finally.g. Such behavior is obviously desirable because it frees users from having to compile and install the driver themselves. One important difference is that floating point operations 32 . If a problem in the kernel component occurs. in the case of a popular webcam streaming server. Therefore the Linux kernel mode developer has to rely mostly on kernel built-in tools. Even though current distributions provide comfortable packaging of precompiled kernels. version anymore. the disadvantage of such an approach is the limited upgradeability of kernel components. Nevertheless.

user mode library that complements the kernel part of V4L2. There seems. 33 . 4. however. Such a library could take over tasks like format conversion. most kernel modules are assumed derived works. One has to resort to algorithms that avoid floating point computations or apply tricks that are unlikely to receive a positive echo in the Linux kernel community. This second module can be distributed in a binary only form and does not have to adopt the kernel’s license because it cannot be considered a derived work anymore.2 The Video4Linux user mode library One solution to most of the problems just described keeps coming up when new and missing features and design issues are discussed on the V4L mailing list: a widely available. As a matter of fact. 2 Banning floating point from kernel mode allows the kernel to omit the otherwise expensive saving and restoring of floating point registers when the currently executing code is preempted. an important point given the complexity that the involved algorithms and subsystems often have. In many cases. notably the lack of acceptance in the community and the difficult maintenance given the large number of different kernel packages that exist.are oftentimes not available in kernel mode for performance reasons2 . The situation for kernel modules. It basically consists of having a wrapper module. the kernel part could entirely concentrate on providing the drivers that abstract device capabilities and making sure that they implement the interfaces required by the V4L library. itself under the GPL. providing a flexible interface for more direct hardware access. however. therefore ruling out the development of closed source kernel modules[20]. All of these points make the development of multimedia software in user mode much easier. the software would have to be recompiled for every minor upgrade and for every flavor and architecture of the supported Linux distributions. to be an acceptable way of including a binary module into the Linux kernel. there is a large number of commercial Linux applications out there that were ported from other operating systems or written from scratch without releasing their source code. that serves as a proxy for the kernel functions required by the second module. Even after sidestepping the legal issues of a binary only kernel module there remain a few arguments against realizing a project in such a way. is more complicated than that. Since the GPL requires derived works of a GPL-licensed product to be published under the same terms. Licensing Nothing speaks against writing closed source software for Linux. under which the Linux kernel and most of the system software is released. At the same time. does not forbid closed source applications. This can drastically limit the scope of supported platforms. The GNU General Public License (GPL).5. and taking complexity away from today’s applications. open source.

again something that not all application authors choose to do. nobody has found themselves willing or able to start such a project. Usually their decision depends on the purpose of their tool and the hardware they have access to during development. While it would be possible for drivers to implement both techniques. The uvcvideo driver for example does not support the read/write protocol in favor of the more flexible mmap. in turn. Unfortunately. Input and output We saw in section 4. In the meantime. It seems fair to say that the project of the V4L user mode library has died long before it even got to the stage of a draft and it would require a fair amount of initiative to be revived. This classic I/O-based approach. Let us look at the prototype 34 . The use of the standard read and write system calls and memory mapped buffers (mmap). The operating system forwards ioctl requests directly to the driver responsible for the device. some of them choose not to support read/write and mmap at the same time.2. has the advantage of enabling every application that supports file I/O to work with V4L2 devices. For some of them easy fixes are possible. since these frameworks do not primarily target V4L. makes it increasingly harder for a V4L library to become widespread and eventually the tool of choice for V4L2 front-ends. The growing popularity of these multimedia architectures. they are rarely able to abstract all desirable features.3 V4L2 related problems Video4Linux has a number of problems that have their roots partially in the legacy of V4L1 and Unix systems in general as well as in design decisions that were made with strictly analog devices in mind. The legacy of ioctl The ioctl system call was first introduced with AT&T Unix version 7 in the late seventies. It was used to exchange control data that did not fit into the stream-oriented I/O model.2 that V4L2 provides two different ways for applications to read and write video data. The fact that for the application the availability of either protocol depends on the driver in use erodes the usefulness of the abstraction layer that V4L is supposed to provide. an application would have to implement both protocols at the same time. for others solutions are more difficult.While the approach sounds very promising and would bring the Linux multimedia platform a large step forward. in turn. 4. other user mode frameworks like GStreamer or NMM have partly stepped into the breach. Device input and output using the read/write interface used to be–and still is in some cases–very popular but is not the technique of choice due to the fact that it does not allow meta information such as frame timestamps to be communicated alongside the data. To be on the safe side.5.

etc. V4L2 capabilities whose existence is independent of a device’s presence in the system could be queried without the need for the application to know which capability was introduced with which kernel version and hardcoding corresponding conditionals. 35 . Since the operating system requires a device handle to be passed to the ioctl request. Currently this is restricted to the device type (Video devices are called videoN . • Device information querying. however.) • Module enumeration. deprives the compiler of doing any sort of compile-time type checking leading to possibly hard to find bugs if a wrong data type is passed. int request. Unless the driver supports multiple opening of the same device. It is easy to come up with a few occasions where such stateless functions would be desirable: • Device enumeration. As a consequence this eliminates the possibility of device independent V4L2 functions. radio devices radioN . Similarly.of the ioctl function to understand where some of the design limitations in V4L2 come from: int ioctl(int device. applications would want to retrieve a list of the currently available modules without requiring opening a device first. Every call needs a device handle. the application has no choice but to open the device prior to doing the ioctl call. the second one opposes a more important limitation on applications: there are no "stateless" calls to the V4L2 subsystem possible. something that is not trivial to implement because the associated policies have to be carefully thought through. If the V4L2 system were to provide format conversion and other processing filters. • System capability querying. where N is a number. The fact that ioctl provides only one argument for passing data between caller and callee is not a serious technical limitation in practice and neither is its untypedness. For developers this also makes for a little intuitive interface since even relatively simple requests require data structures to be used where a few individual arguments of basic types would be simpler. It is currently left to the application to enumerate the device nodes in the /dev directory and filter those that belong to V4L2 devices. There are two properties that stick out for an interface based on this function: 1. applications have no more information than what the name of the device node itself provides. void *argp). The ways that this interface is used. 2. While the first point is mostly a cosmetic one. There is only one untyped argument for passing data.

To make this process less frustrating than what it seems V4L2 drivers return the nearest valid resolution if a resolution switch fails. Instead. As an example.2 how this severe limitation was removed by enhancing the V4L2 API. • The only way for V4L2 applications to enumerate frame rates is to test them one by one and check if the driver accepts them. Once a selection is made. For digital devices. Similar restrictions apply for frame rates where multiples of 5 or 2. • Since a one-by-one enumeration of resolutions is impossible due to the sheer number of possible value combinations. there was no need to provide applications with a way to retrieve these finite sets. they limit the supported resolutions to a finite set.5 dominate. Analog video devices have a certain advantage over digital ones in that they oftentimes have no constraints as to the video size and frame rate they can deliver.It is clear that the current API was designed to blend in nicely with the Unix way of communicating between applications and system components. applications simply have to live with this limitation and either provide a hardcoded list of resolutions likely to be supported or have the user enter them by hand. Missing frame format enumeration We have mentioned that the current Video4Linux API was designed mostly with analog devices in mind. if an application requests 660x430. the driver would be likely to set the resolution to 640x480. While this keeps the API rather simple from a technical point of view. this is different. most webcams are not. most of them with a particular aspect ratio such as 4:3. these are hidden by the firmware to adapt to the way that digital video data is transmitted and used. One implication of this is that at the time V4L2 was designed. Especially for fast advancing areas like multimedia a less generic but more flexible approach is often desirable. So while an analog TV card may very well be capable of delivering an image 673 pixels wide and 187 pixels high. the application can test the given resolution. While the sensors used by digital webcams theoretically provide similar capabilities. This has peculiar effects at times: • Many applications are completely unaware of the frame rate and rely on the driver to apply a default value. We shall see in 6. Control value size Another limitation that is likely to become a severe problem in the future is the structure that V4L2 uses to get and set the values of device controls: 36 . it has to be asked whether it is worth sticking to these legacy interfaces that clearly were not–and could not at the time–designed to handle all the cases that come up nowadays.

*/ }. which is satisfactory for most simple controls but not for more complex controls. especially in the non-commercial and open source sector: poor documentation. This lack of starting points for developers is likely one of the biggest problems of V4L2 at the moment.struct v4l2_control { __u32 id /* Identifies the control. the latter one is completely outdated. */ __s32 value /* New value or current value. Stream synchronization There is one important aspect normally present in multimedia frameworks that all applications known to the author have blissfully ignored without any obviously bad consequences: synchronization of multimedia streams. and there exist some poorly written drivers out there. though. and it gives no guidelines on how to implement a driver and what to watch out for. and VIDIOC_TRY_EXT_CTRLS ioctls in [5]). The V4L2 documentation is split into two parts. One can only hope that the current developers eventually take a little time out of their schedules to document the existing code as long as the knowledge and recollection is still there. The value field is limited to 32 bits. Again. something that in turn prevents code sharing and modularization of common features. VIDIOC_S_EXT_CTRLS. While the first one is mostly complete and up-to-date. Lack of current documentation The last problem we want to look at is unfortunately not limited to V4L2 but affects a wide range of software products. which allow applications to group several control requests and provide some room for extension. Moreover. little helpful except for getting a first overview. delving into the source code is the best and only way to get answers. there is little documentation available on what the V4L2 subsystem actually does and doesn’t do. The main source of information on how to write a V4L2 driver is therefore the source code of existing drivers. The lack of a reference driver doesn’t make the choice easy. 37 . We will come back to this issue at the beginning of chapter 5 when we discuss the goals of our webcam framework. This has already given rise to the recent introduction of extended controls (see the VIDIOC_G_EXT_CTRLS. an API specification for application programmers[5] and a driver writer’s guide[4]. As part of this project the author has tried to set a good example by properly documenting the newly added frame format enumeration features and providing a reference implementation that demonstrates their usage. It sets the threshold for newcomers quite high and makes it hard for established developers to find common guidelines to adhere to.

The fact that no bad consequences can be observed with current Linux webcam software does not mean. 38 . Once this has changed. The problem only becomes apparent when videos are recorded that include an audio stream and none of the common applications seem to do that yet.Whenever a computer processes audio and video inputs simultaneously there is an inevitable tendency for the two streams to slowly drift apart when they are recorded. Linux has a functioning platform for webcams today. Despite all these problems. many of which are explained in [12]. an excellent article by the author of VirtualDub. V4L2 on its own cannot prevent this because it has no access to the audio data. that the problem does not exist on the Linux platform. The next chapter is a first step in that direction. as it provides some ideas and many real solutions. an extremely popular video processing utility for Windows. It is only a matter of time and effort to resolve them one buy one. however. This has numerous reasons and there are different strategies to reduce the problem. applications will need to figure out a way to avoid the problem of having the video and audio streams drift apart.

that fit together to achieve the primary goal. conflicts inevitably arise. enhancing the webcam experience of Linux users.1 Introduction After having seen all the relevant requirements for operating a webcam on Linux. there is another group of problems that are less immediate but must nevertheless be carefully considered: business and legal decisions. This chapter treats the ideas and goals behind the project. usually between protection 39 . It does. however. These goals are of a more concrete nature and can be broken down into technical or environmental requirements. To conclude we shall revisit the problems discussed in the previous chapters and summarize how our solution solves them and strives to avoid similar problems in the future.Chapter 5 Designing the webcam infrastructure 5. Apart from the obvious technical challenges that need to be solved.2 Goals The main goal of the project. When a company takes a go at open source software. is a rather vague one and does not primarily lend itself as a template for a technical specification. entail a number of secondary goals. Software engineering without having clear goals in mind is almost guaranteed to lose focus of the main tasks over the little things and features. however. we can finally discuss what our webcam framework looks like. Before doing so. we need to be clear about the goals we want to achieve and set priorities. We will present all the components involved in a high-level manner and save the technical details for the two following chapters. 5. or means. how we have tackled the difficulties and why the solution looks as it looks today.

of intellectual property and publishing source code. Their consideration has played an important role in defining the infrastructure of the webcam framework and we will return to the topic when discussing the components affected by it. Let us now look at the different goals one by one and how they were achieved. A solution that works As trivial as it may sound, the solution should work. Not only on a small selection of systems that happens to be supported by the developer but on as broad a system base as possible and for as many users as possible. Nothing is more frustrating for a user than downloading a program just to find out that it does not work on his system. Unfortunately it cannot always be avoided to limit the system base to a certain degree for practical and technical reasons. Practical reasons are mostly due to the fact that it is impossible to test the software on every system combination out there. Many different versions of the kernel can be combined with just as many different versions of the C runtime library. On the technical side there is an entire list of features that a given solution is based on and without which it cannot properly work. The size of the supported system base is therefore a tradeoff between development and testing effort on one side and satisfying as many users as possible on the other. Making this tradeoff was not particularly difficult for this project as one of the pillars of the webcam framework already sets a quite strict technical limit. For USB 2.0 isochronous mode to work properly a Linux kernel with version 2.6.15 or higher is strongly recommended because the USB stack of earlier versions is known to have issues that can cause errors in the communication between drivers and devices. In a similar way, certain features of Video4Linux 2 only became available in recent versions of the kernel, notably the frame format enumeration that we will see in 6.2. This does not mean, however, that the solution does not work at all on systems that do not meeting these requirements. The feature set of the webcam framework on older platforms is just smaller. Everything that does not depend on features of the UVC driver works on kernels older than 2.6.15 and the lack of a V4L2 implementation that does not provide frame format enumeration prevents only this particular feature from working. A solution that works best–but not exclusively–with Logitech cameras Parts of the solution we have developed are clearly optimized for the latest Logitech cameras, no need to hide this fact. Logitech has invested large amounts of money and time into developing the QuickCam hardware and software. There is a lot of intellectual property contained in the software as well as some components licensed from third party companies. Even if Logitech wanted to distribute these features in source code form, it would not be legally possible. As a result, these components must be distributed in binary format and they are designed to work only if a Logitech camera is present in

40

the system because other cameras don’t implement the necessary features. These binary components are limited to a single dynamic library that is not required for the webcam infrastructure to work. For users this means that there is some extra functionality available if they are using a Logitech camera but nothing stops them from using the same software with any other UVC compliant camera. Planning ahead In the fast moving world of consumer electronics it is sometimes hard to predict where technology will lead us in a few years from now. Future webcams will have many features that today’s software does not know about. It is therefore important to be prepared for such features by designing interfaces in a way that makes them easily extensible to accommodate new challenges. A typical example of this necessity is the set of values of certain camera controls. Most controls are limited to 32-bit integer values, which is enough for simple control such as image brightness or camera tilt. One can imagine, however, that certain software supported features could need to transmit chunks of data to the camera that do not fit in 32 bits. Image processing on the host could compute a list of defect pixels that the camera should interpolate in the firmware or it could transmit region information to help the camera use different exposure settings for foreground and background. In the provided solution we have avoided fixed-length value limitations wherever possible. Each control can have arbitrary long values and all fixedlength strings, often used in APIs for simplicity reasons, have been replaced by variable-length, null-terminated strings. While it is true that this approach is slightly more complicated for all involved parties, it assures that future problems do not encounter data width bottlenecks. We have carefully planned the API in a way that puts the burden on the libraries and not the applications and their developers. For applications, buffer management is mostly transparent and the enumeration API functions are no different than if fixed-width data had been used. Another example that guarantees future extensibility is the generic access to UVC extension units that we added to the UVC driver. Without such a feature, the driver would need to be updated for every new camera model, the very process that generalizing standards like UVC strive to avoid. The new sysfs interface of the UVC driver allows user mode applications generic raw access to controls provided by a camera’s UVC extension units. Since these extension units are self-descriptive, the driver can retrieve all required information at runtime and need not be recompiled. There are a few other places where we have planned ahead for future extensions, such as the abstraction layers we are taking advantage of and the modularity of some of the involved modules. These examples will be explained in more detail in the rest of this chapter.

41

Dealing with current problems A prerequisite for and a goal of this project at the same time was solving the problems we saw in chapter 4 in the best manner for everybody. This means that we did not want to further complicate the current situation by introducing parallel systems but instead help solve these problems so that currently existing applications can also leverage off the improvements we required for our framework. Admittedly, it may sometimes seem easier to reinvent the wheel than improve the wheels already in place, but in the end having a single solution that suits multiple problems is preferable because a combined effort often achieves a higher quality than two half-baked solutions do. The effects of a developer branching the software out of frustration with the line a project is following can be seen quite often in the open source community. The recent Mambo/Joomla dispute1 is a typical example where it is doubtful that the split has resulted in an advantage of any of the involved parties. Let us use the UVC driver as an example to illustrate the situation in the webcam context. Creating our own driver or forking the current one would have made it easier to introduce features that are interesting for Logitech because we could have changed the interface without discussing the implications with anyone. By doing so, both drivers would have received less testing and it would have been harder to synchronize changes applicable to both branches. Keeping a single driver is a big win for the Linux webcam user and avoids the frustrating situation where two similar devices require two slightly different drivers. Community acceptance Many Linux projects with a commercial background have received a lukewarm reception from the open source community in the past, sometimes for valid reasons, sometimes out of fear and skepticism. There is no recipe for guaranteed acceptance by the Linux community but there are a few traps one can try to avoid. One of the traps that many companies fall into is that they strictly limit use of their software to their own products. Obviously, for certain device classes they may not have any choice, take the example of a graphics board. Fortunately, for the scope of this project, this was relatively easy given that the webcams for which it was primarily designed adhere to the USB Video Class standard. Linux users have every interest in good UVC support, so there were very few negative reactions to Logitech’s involvement. The fact that somebody was already developing a UVC driver when we started the project may also have helped convince some of the more suspicious characters out there that it was not our intent to create a software solution that was merely for Logitech’s benefit. Throughout the project we have strived to add features to the UVC driver that we depend on for the best support of our cameras in the most generic
1 The open source content management system Mambo was forked in August 2005 after the company that owned the the trademark founded a non-profit organization with whose organization many of the developers did not agree with. The fork was named Joomla.

42

A typical example for this is the support for UVC extensions. so we had to avoid being stalled by long discussions that dissolve without yielding an actual solution.2 we already know how V4L2 interfaces with the UVC driver on one side and the webcam application on the other (figure 5. the limitations become better visible. it renders the process similarly slow as in politics. all additional camera features are built on top of UVC extension units. flows through V4L2 without carrying out any processing itself.5. this is less than optimal. The open source community is a democracy where everyone can contribute their opinions. so that by the time that more UVC devices appear on the market.3 Architecture overview With a number of high-level goals in mind. From section 4. Feedback becomes more concrete. like so often it can turn out to be more fruitful to confront people with an actual piece of software that they can touch and test. It can therefore be expected that other vendors will use the same mechanisms as Logitech. For projects with time constraints and full-time jobs behind it.1b illustrates the different subsystems involved and where the core of the webcam framework is located. While this often helps make sure that bad solutions never even end up being realized. To start off. We are confident that the webcam framework will show some of the pros as well as cons that a user mode library brings. Figure 5. If a project finds rapid acceptance with users. i. The webcam framework positions itself between the operating system and the application that receives live video from a camera.1a). concerns. they will already be natively supported by Linux. we can start to translate these goals into an architecture of components and specify each component’s tasks and interfaces. 5. control and video data. However.way so that devices of other vendors can take advantage of them. and suggestions. This approach is used by all current webcam applications and suffers from a few issues identified in section 4. and so do the good points.e. developers are likely to become inspired and contribute or eventually use some of the ideas for other projects. let us compare what the component stack looks like with the conventional approach on one side and with the webcam framework on the other. We see that the webcam framework fills a relatively small spot in the entire system but it is one of two interfaces that a webcam application interfaces 43 . While not strictly necessary for streaming video. The stack is relatively simple as all data. Avoid the slowness of democracy This goal may at first seem diametrical to the previous point. Maybe one day somebody revives the project of a V4L2 user mode library and integrates parts of the webcam framework as a subset of its functionality because that is where it would ideally lie.

(a) (b) Figure 5. Note the border between user space and kernel space and how both V4L2 and sysfs have interfaces to either side. 44 .1: Layer schema of the components involved in a video stream with (a) the conventional approach and (b) the webcam framework in action.

This modularity ensures that no single component grows too complicated and that the package remains easy to maintain and use.2 gives an overview of the entire framework in the context of the GStreamer and Qt based webcam application. Both these applications are provided as part of the package and can be seen in action in chapter 8. the webcam framework for accessing camera controls and advanced features that require more detailed information than what V4L2 provides. This leaves the application the flexibility to choose for every task the component that performs it best: V4L2 for video streaming and related tasks such as frame format enumeration or stream setup. Figure 5. 45 . as well as a panel application. The dashed box shows the three components that use the GStreamer multimedia framework.4.2: Overview of the webcam framework kernel space and user space components. Figure 5.1 Components Overview Despite of what the previous schema suggests. In the remainder of this section we will look at all of these components. the Linux webcam framework is not a single monolithic component but a collection of different libraries with strictly separated tasks. 5.with to communicate with the camera.4 5.

• Verify and interpret the received data. The challenge stems from the fact that the functions described in the UVC standard are not a subset of those supported by V4L2.3. • Communicate with the camera using the UVC protocol over USB. V4L2 is good at the first point but it has some deficiencies when it comes to the second one due to its limitation of control values to 32 bits (see 4. device manufacturers and driver engineers stick to it.3 V4L2 We have seen previously that Video4Linux has two key tasks relevant to webcams: • Make the video stream captured by the device driver available to applications. all of this in parallel to the V4L2 API. compatibility comes naturally. This is why our scenario does not rely solely on V4L2 for webcam controls but uses the UVC driver’s sysfs interface where necessary.3). 5.what their tasks are.4. • Respond to V4L2 requests originating from applications. which is still used for the entire video streaming part and provides support for a fairly general subset of the camera controls.4. It is therefore impossible for a Video4Linux application to make use of the entire UVC feature spectrum without resorting to interfaces that work in parallel to the V4L2 API. 46 . • Provide additional interfaces for features not supported by V4L2.5. While doing so we shall see how they accomplish the goals discussed above. 5. The standard is publicly available and if both. Its key tasks are: • Supervise device enumeration and register the camera with the system. and what the interfaces between them look like. It provides raw access to user mode software in a generic manner. Therefore the necessity to support features unknown to V4L2 rarely arose.5. It is the last of these points that makes it a key component in the webcam framework. For the UVC driver the sysfs virtual file system takes over this role. This was not an easy task since the specifications available to the developers were often incomplete or even had to be reverse engineered from scratch. With the USB Video Class standard this is completely different. Conventional webcam drivers oriented themselves at the features supported by V4L2 and tried to implement these as far as possible.2 UVC driver The UVC driver was already introduced in chapter 4. therefore we will only give a short recapitulation at this point. • Provide image and camera related controls to applications.

only the GStreamer v4l2src plugin uses the V4L2 API. Figure 5. 5. Technically speaking. Figure 5.We can see from the figures that V4L2 serves as interface between user mode and kernel mode. Note that even though. This will become clear as we talk more about the components involved. Simply speaking. We will see the different access scenarios as we go. will have a back-end for GStreamer. In user mode it takes requests from the application. vice-versa for the replies that originate from the driver and end up in the application. GStreamer is the only framework supported by the Linux webcam framework. As long as the driver supports it–there is no multiplexing done on Video4Linux’ part–. the interface between GStreamer and liblumvp. Together with the ongoing intensive development that takes place. the same device can be opened multiple times by one or more processes. this plugin is the source of all V4L2 data that flows through the GStreamer pipeline. this makes it a safe choice for multimedia applications and is likely to guarantee a smooth integration into future software.5 v4l2src As the name already suggests.3 visualizes this by comparing the component overview of a V4L2 application to a GStreamer application that uses a V4L2 video source. plugins for different libraries like NMM can be written very easily. This is required by the current webcam framework because the video application is not the only component to access the V4L2 device handle. The capability negotiation that is carried out during stream initialization uses the information retrieved from V4L2 function calls like ENUMFMT or 47 . Note that v4l2src does not directly process the GStreamer state transitions but is based on the GstPushSrc plugin that wraps those and uses a callback mechanism. which it then redirects towards the UVC driver that runs in kernel mode.4 GStreamer Parts of our webcam framework are built on top of GStreamer because. all other components use techniques provided by the GStreamer library to exchange data. the multimedia framework of KDE 4.4. All that needs to be ported in such a case is the lvfilter plugin. currently. Another important point is that V4L2 is not limited to talking to one application at a time. 5.4. Its integration with the GNOME desktop environment proves that it has reached a respectable grade of stability and flexibility and Phonon. There are three elements in the figure that take advantage of the GStreamer multimedia framework. in our opinion. the box labeled GStreamer is the "application" as far as V4L2 is concerned.1 shows the functions that v4l2src uses and the V4L2 counterparts that they call. it is currently the most advanced multimedia framework on the Linux platform. This is best illustrated by an example. It translates V4L2 device properties into pad capabilities and pipeline state changes into V4L2 commands.

.(a) Components involved when a V4L2 application displays video.1: Translation between GStreamer and V4L2 elements and functions..3: Component overview with and without the use of the GStreamer multimedia framework.. Figure 5. STREAMOFFclose Description Initialization Format enumeration Stream setup Streaming Cleanup Table 5. 48 . GStreamer start get_caps set_caps create . (b) Components involved when a GStreamer based application displays V4L2 video.. stop V4L2 open ENUMFMTTRY_FMT S_FMTSTREAMON DQBUFQBUF .

the pad capabilities and the fixed capabilities. short. The camera supports two pixel formats. This is necessary for the pipeline to identify the exact YUV format used as there are many different ones with YUY2 being only one of them. height=[ 120. framerate=25/1 We can clearly see that the format chosen for the pipeline is a subset of the pad capabilities seen above. The intervals have disappeared and all attributes have fixed values now. Note that the section for the uncompressed format has an additional format attribute that specifies the FourCC code. There are two so-called caps descriptors involved in our example. resolutions. YUV (uncompressed) and MJPEG (compressed) and the intervals give the upper and lower limits on frame size and frame rate. 30/1 ]. height=[ 120.6 lvfilter The Logitech video filter or. and frame rates and looks something like this: video/x-raw-yuv. 5. Its task is relatively simple: intercept the video stream when enabled (filter mode) and act as a no-op when disabled (pass-through mode). The descriptor for the fixed capabilities is set only after the set_caps phase when the stream format has been negotiated with V4L2. After requesting an uncompressed VGA stream at 25 fps from the camera. 1280 ]. width=[ 160. 960 ]. width=640. framerate=[ 5/1. format=YUY2. height=480. This capability contains no ranges or lists but is a simple subset of the pad capabilities. 720 ]. format=YUY2.G_FMT to create a special data description format that GStreamer uses internally to check pads for compatibility. image/jpeg. framerate=[ 5/1. width=[ 160. The former is created by enumerating the device features during the get_caps phase.4. 49 . it would look as follows: video/x-raw-yuv. All data that flows through the pipeline after the caps are fixed are of this format. lvfilter component is also realized as a GStreamer plugin. 960 ]. 25/1 ] The format is mostly self-explanatory. for example. It is a set that contains the supported range of formats.

• Start. These features work with all webcams as long as the camera is supported by Linux and its driver works with the GStreamer v4l2src plugin. • Provide detailed information about the detected devices. and frame rate) and select one. use both V4L2 and the webcam framework simultaneously to access the device.e. • Provide unified access to V4L2 and sysfs camera controls. Today libwebcam provides the following core features: • Enumeration of all cameras available in the system. It is the third component in our schema that uses GStreamer and the only one with a user interface.8 libwebcam The Webcam library is a cornerstone of the webcam framework in that all other new components rely on it in one way or another.4. i. 5.g. On top of this basic functionality LVGstCap supports some additional features. We will talk about them in parts 2 and 3. • Wrapper for the V4L2 frame format enumeration.1b. image resolution. the interface is prepared to handle device events ranging from newly detected cameras over control value changes to device button events. let lvfilter be a no-op. in particular liblumvp. 50 . a combination of pixel format.e. Being more than only an important technical element. and freeze the video stream.7 LVGstCap (part 1 of 3: video streaming) The sample webcam software provided as part of the framework is LVGstCap. contrast. stop. For the moment. the Logitech Video GStreamer Capture application. LVGstCap is also the first webcam capture program to use the approach depicted in 5.We will come back to the functionality of lvfilter when we look at some of the other components. libwebcam realizes part of what the Video4Linux user space library was always supposed to be: an easy to use library that shields its users from many of the difficulties and problems of using the V4L2 API directly. • List the available frame formats (i. This fact remains completely transparent to the user as everything is nicely integrated into a single interface.4. Among others. LVGstCap provides the basic features expected from a webcam capture application: • List the available cameras and select one. In addition. • Modify image controls (e. sharpness). 5. brightness. It is easy to add new features without breaking application compatibility and the addition of new controls or events is straightforward.

We can see that the main goal of libwebcampanel is making the development of generic webcam applications easier.9 libwebcampanel The Webcam panel library takes libwebcam one step further. • Implement a superset of libwebcam’s functionality. a Boolean.4. libwebcampanel might transform such a control either into a list of eight choices if the bits are mutually exclusive or split it up into eight different Boolean controls if arbitrary bit combinations are allowed. LVGstCap will disable user interaction with it and gray it out to make this fact visually clear to the user. This makes it a common repository for device-specific information that would otherwise be distributed and duplicated within various applications. 5. We will see a few concrete examples of such cases later in chapter 7. let us look at LVGstCap one more time to see how it uses the control meta information. 51 . if libwebcampanel sets the read-only flag on a certain control. there is no additional information on the control apart from the value range and whether the control is a number. libwebcampanel does just that. 0 to 255 when each bit has a distinct meaning. It combines internal information about specific devices with the controls provided by libwebcam to provide applications with meta information and other added value. The last point of the above list will become clear when we discuss liblumvp. While most controls can be made to fit in one of these categories. Therefore. In the former case it seems inappropriate to present the user with an integer control that accepts values from. This allows LVGstCap to display the controls in a generic manner. or a list of choices. Before doing so. say.4. Ordinarily.10 LVGstCap (part 2 of 3: camera controls) When the user selects a device in LVGstCap. however. i. It is for this reason that most applications will want to use libwebcampanel instead of the lower-level libwebcam. The core features of libwebcampanel are: • Provide meta data that applications need to display camera information and user-friendly control elements. it immediately enumerates the controls that the chosen device provides and displays them in a side panel. While libwebcam is still relatively low-level and does not interpret any of the controls or events directly. • Give access to the feature controls that liblumvp provides. in the case of V4L2 controls. Two examples are controls whose value is a bitmask and read-only controls. In the case of read-only controls the user should not be allowed to change the GUI element but still be able to read its current value.e.5. in practice there are a number of controls for which this representation is not quite right.

This is where the second reason for the additional layer introduced by libwebcampanel lies: it can provide applications with a list of hardware camera controls on the one hand and a list of liblumvp software controls on the other hand. There is. its presence is transparent and needs no further attention. however. Whenever lvfilter is in filter mode. LVGstCap communicates this to libwebcampanel. For this reason. The application also has access to the names of the different features that liblumvp has compiled in. the library exposes a number of controls. We will later see that this communication is not as trivial in all cases as it may look at first.4. One can think of a multitude of plugins that liblumvp could include. We will see the advantages and disadvantages of this in chapter 7 3 The reasons why the two are not treated exactly the same are explained in chapter 7.11 liblumvp The name liblumvp stands for Logitech user mode video processing library. This information can be used to group the controls into categories when required. When the user changes a feature control. which is just what LVGstCap does. 5. the application must explicitly include lvfilter in its GStreamer pipeline. Applications can handle both categories in an almost symmetric manner3 . When a video stream is started. The library receives all its input from lvfilter. a case where this does not hold true: panel applications. All of this remains transparent to the application2 . 52 . It is the only component of the webcam framework that is not open source because it contains Logitech intellectual property. liblumvp consists of a fairly simple video pipeline that passes the video data it receives through a list of plugins that can process and modify the images before they are output again. basically it could implement all the features that Logitech QuickCam provides on Windows. In the example of a video application that incorporates both video output and control panel in a single process. it sends the video data it intercepts to liblumvp and uses the–possibly modified–video buffer it receives back as its output. but once the pipeline stands. the feature control list is retrieved and its control items are displayed to the user in a special tab next to the ordinary controls. so-called feature controls in a manner almost identical to how libwebcam does it.12 LVGstCap (part 3 of 3: feature controls) LVGstCap uses libwebcampanel not only for presenting camera controls to the user but also for feature controls if liblumvp is currently enabled. 2 As a matter of fact. which takes care of the communication with liblumvp.5. This requires applications to be able to communicate with these plugins. there is no need for special measures. for example to enable or disable them or change certain parameters.4.

5.4: Command line options supported by lvcmdpanel. -v.4. a command line tool called lvcmdpanel.. There are a few situations where panel applications are useful: • Allow command line tools or scripts to modify video stream parameters. This demonstrates that there are good ways to realize video processing and 53 . 5. -l. Chapter 8 has a sample session to illustrate some of the commands. Let us now revisit them one by one and show how our webcam framework avoids or solves them.13 lvcmdpanel A panel application is a–usually simple–program that does not do any video handling itself but allows the user to control a video stream that is currently active in another application.. e.. • Provide an additional way of changing controls. [VALUES].g.. -s. Avoid kernel mode components Apart from some work on the UVC driver and V4L2 that are necessary to exploit the full feature set provided by current webcams the entire framework consists of user mode components. from a tray application. -V. --help --version --verbose --device=devicename --list --clist --get=control --set=control Print help and exit Print version and exit Enable verbose output Specify the device to use List available cameras List available controls Retrieve the current control value Set a new control value Figure 5. Note that we don’t go to great technical details here but save those for chapter 7.5 Flashback: current problems In chapter 4 we discovered a number of issues that current V4L2 applications have to deal with.1 Control webcam video using the command line Usage: lvcmdpanel [OPTIONS]. -h. Figure 5. -c. lvcmdpanel 0. -g.4 shows the output of the help command. Our webcam framework includes an example application of the first kind. • Permit control over the video stream of an application that does not have its own control panel. -d.

the enumeration APIs that our libraries provide are superior in terms of usability to those that V4L2 offers. Direct device access While direct device access can never be achieved without the support of select kernel mode components. libwebcam does this by maintaining an internal list of camera devices that contains such data. The first one returns the required buffer size and the second one returns the data in one self-contained block of memory. libwebcam has a wrapper for frame 54 . While some V4L2 functions like frame format enumeration can require dozens of ioctl calls and the management of dynamic data structures in the client. With the help of sysfs. Missing frame format enumeration As we will see later on. The webcam framework provides the corresponding interfaces that can be used as soon as the kernel space components implement the necessary underlying mechanisms. To keep the API as uniform and simple as possible for application developers. In the same way that listing the contents of a directory with ls does not open each single file. It can be retrieved at any time by any application without opening a V4L2 device. we have developed an interface that is superior to any standard C interface in that it allows shell scripts and system commands to access the hardware in an intuitive way. which is a big problem if certain programs are no longer maintained. This approach requires the applications to know criteria they should not have to know like the decision whether a given device node is a video device or not. Simple API We have seen that mechanisms such as function callback are valuable if not indispensable for certain features like event notification. all applications have to be updated. Complicated device enumeration Applications should not have to loop through the huge number of device nodes in the system and filter out the devices they can handle. No stateless device information querying It seems unnecessary to open a device just to retrieve its name and other information an application may want to present to its user. our framework allows all enumeration data to be retrieved in two function calls.related tasks in user mode today and that for most of the associated drawbacks good solutions can be found. The complexity on the application’s side is minimal and so is the overhead. This problem is solved by the device enumeration function of libwebcam. this problem was solved by adding the missing functionality directly to V4L2 with the UVC driver being the first one to support it. If these criteria change. we tackled this problem by extending the UVC driver so that it allows user mode applications to access the full spectrum of UVC extensions. In addition. it would be desirable to query the device information at enumeration time.

This is a big advantage for developers who want to use parts of the webcam framework for their own applications.format enumeration that severely reduces the complexity associated with retrieving the supported frame formats. we did make sure that all libraries that application developers can interact with are thoroughly documented. an extensive API specification is available in HTML format. The next two chapters are devoted to the more technical details of what was presented in this chapter. 55 . We will first look at the extensions and changes that were applied to currently existing components before we focus on the newly developed aspects of the webcam framework. this report gives a vast amount of design and implementation background. In addition. Lack of current documentation While we have not solved the problem of parts of the V4L2 documentation being outdated or incomplete.

Chapter 6 Enhancing existing components In order to realize the webcam framework like it was described in the previous chapter a few extensions and changes to existing components were necessary. This chapter sums up the most important of these and lists them in the order of their importance. When does streaming start? When does 56 . 6. the video subsystem must make sure that two video device handles cannot influence each other in unwanted ways.1 Linux UVC driver With UVC devices being at the center of the Linux webcam framework the UVC driver was the main focus of attention as far as preexisting components are concerned. 6. These range from small patches that correct wrong or inflexible behavior to rewrites of bigger software parts.1. While this seems easy enough to do the problem arises because the concept of "streaming" is not clearly definable.1 Multiple open From chapter 5 we know that multiple open is a useful feature to work around some of V4L2’s limitations. Since the webcam framework relies on the camera driver being able to manage multiple simultaneously opened file handles to a given device. As with ordinary file handles where the operating system must make sure that readers and writers do not disrupt each other. The main challenge when developing a concept for multiple device opening are permissions and priorities. Webcam drivers that are unable to multiplex the video stream must make sure that only a single device handle is streaming at a time. The following sections describe some important changes and give an outlook of what is about to change in the near future. this was one of the most important extensions to the UVC driver.

5. Start the stream. 2. Stop the stream. Drawing the line at the right place is a trade-off between preventing ill interactions on the one hand and allowing a maximum of parallel access on the other.it stop? There are several steps involved between when an application decides to start the video stream and when it frees the device again: 1.1). There are four different states: 57 . 3. 4. Set up the stream format. Close the device. Open the device. To this end we divided the Video4Linux functions into privileged (or streaming) ioctls and unprivileged ioctls and introduced a state machine for the device handles (figure 6. The rounded rectangles show which ioctls can be carried out in the corresponding state.1: The state machine for the device handles of the Linux UVC driver used to guarantee device consistency for concurring applications. Figure 6. We decided to make the boundary right before the stream setup.

Obviously only one handle can be in this state at a time for any given device because the driver made sure that no two handles could get into the active state in the first place. Every handle is created in this state. Using the STREAMON ioctl lets a handle move from active to streaming. Stop streaming. • Active The first privileged state. S_FMT. Four ioctls can be identified in the UVC driver that applications use before they start streaming: TRY_FMT. ioctl S_INPUT QUERYBUF QBUF DQBUF STREAMON STREAMOFF Description Select the current video input (no-op in uvcvideo). 58 . in which case an error is returned. Retrieve information about a buffer. Dequeue a video buffer. Queue a video buffer.1 contains a list of privileged ioctls. It is also the state that all handles end up in when the application closes them. While not technically a state in the software. Table 6. this state serves as a visualization for all inexistent handles that are about to spring into existence when they are opened by an application. As soon as an application calls one of these functions. Also note that the only way for an application with a handle in a privileged state to give up its privileges is to close the handle. The categorization of all ioctls into privileged and unprivileged ones not only yields the state transition events but also decides which ioctls can be used in which states. A handle moves from passive to active when it starts setting up the video stream. Table 6. It stands for the fact that the application has opened the device but has not yet made any steps towards starting the stream. and S_PARM for stream format setup and REQBUFS for buffer allocation. • Streaming The second privileged state. its handle moves into the active state–unless there already is another handle for the same device in a privileged state.• Closed The first unprivileged state. This schema guarantees that different device handles for the same device can perform the tasks required for panel applications and the Linux webcam framework while ensuring that the panel application cannot stop the stream or change its attributes in a way that could endanger the video application. Start streaming.1: Privileged controls in the uvcvideo state machine used for multiple open. Querying device information or enumerating controls can already happen in this state. • Passive The second unprivileged state.

min |-. The contents of the last one show two controls and one of the controls has its virtual files visible. In the case of the only writable file cur there are two corresponding UVC commands: GET_CUR and SET_CUR. Under Linux sysfs is an ideal way to realize such an interface. For this reason.cur |-. Extensions and their controls are mapped to a hierarchical structure of virtual directories and files that applications can read from and write to. Otherwise.name +-. UVC drivers should have an interface that allows applications to access these extension units.6.63610682-5070-49AB-B8CC-B3855E8D221E |-.63610682-5070-49AB-B8CC-B3855E8D221D |-.def |-. followed by an 59 .2 UVC extension support We saw that in section 2. For example.1.e.len |-. Raw extension control support through sysfs The first and obvious way to expose UVC extension controls in a generic way is to give applications raw access.ctrl_1 +-. sends it to the device and turns the device response into the file contents. the driver creates a GET_CUR request. i. the read-only files def and len map to GET_DEF and GET_LEN. each of which identified by a unique ID. what the application writes to the file is sent as is to the device and what the application reads from the file is the same buffer that the driver has received from the device. Whatever is written to the cur file is wrapped within a SET_CUR command and sent to the device.63610682-5070-49AB-B8CC-B3855E8D2256 |-. During this whole process no interpretation of the relayed data is being done on the driver’s side.4 when we discussed the USB Video Class specification that extension units are important for device manufacturers to add additional features. All these files correspond directly to the UVC commands of the same name. The files are treated like binary files. Let us look at a simplified example of such a sysfs directory structure: extensions/ |-. they may not be able to exploit the full range of device capabilities.ctrl_2 |-.info |-.max |-.res We can see that the camera supports four different extension units. In the opposite case where an application opens cur and reads from it.63610682-5070-49AB-B8CC-B3855E8D221F +-.

Through sysfs. If an error occurs during the process. The rationale behind this is that V4L2 controls must already be as simple as possible and sensible since the application is in contact with them. for example in the case of applications that are no longer maintained. Two solutions seem reasonable: 1. Figure 6. This can lead to security issues on multi-user machines like section 7. Obviously. it is conceivable that a device would pack multiple related settings into a single control1 . The decision on how such mappings are going to be fed to the driver has not yet been made. We therefore came to the conclusion that the driver should hardcode as few control mappings as possible with the majority coming from user space.3. then all it would take to allow the program to use UVC extension controls is a mapping between the two. If such an application has functions to enumerate V4L2 controls and present them in a generic manner. The obvious answer is from the driver itself but with the perspective of an increasing release frequency of new UVC devices in mind this cannot be the final answer. applications should see multiple V4L2 controls without knowing that the driver maps them to one and the same UVC control in the background. but there are situations where this may not be an option. The 1 As a matter of fact we shall see such an example in section 7. For UVC controls however.end-of-file marker. It would mean that new driver versions would have to be released on a very frequent basis only to update the mappings. the corresponding read or write call returns an error message. One of the assumptions we made was that there could be a 1:n mapping between UVC and V4L2 controls but not in the opposite direction. While this approach works well and is supported by our extended UVC driver. Mapping UVC to V4L2 controls V4L2 applications cannot use the raw sysfs controls unless they include the necessary tools and knowledge. If that is the case. it would be easier to just use a library like libwebcam or libwebcampanel that can wrap any sort of controls behind a simple and consistent interface. Another problem with this approach of using raw data is that applications must know exactly what they are doing.1 will show. The following section describes a possible way to resolve this issue. there is a limitation associated with it that has to do with the way that ownership and permissions are set on these virtual files. Designing and implementing a flexible mechanism that can cover most of the cases to be expected in the foreseeable future is an ongoing process for which the ground stones were laid as part of this project.2 gives a schema of such a mapping.5. User space applications could write mapping data to a sysfs file and the driver would generate a mapping from the data. 60 . The next fundamental point was the question where the mapping definitions should come from. This is undesirable in the case of generic applications because the knowledge has to be duplicated in every single one of them.

For the moment. we restrict ourselves to hardcoded mappings. Binary data would be easier for the driver to parse but contradict the philosophy of sysfs after which exchanged data should be human-readable.Figure 6. such as a control daemon. For the driver side the same argument as for a binary sysfs file applies here with the difference that ioctls were designed for binary data. In addition. 2. XML would be ideal for the first one but a driver cannot be expected to parse XML. The UVC control descriptor contains information about how to locate and access the UVC control. Whatever the format looks like. the so-called control instances. The V4L2 control part has attributes that determine offset and length inside the UVC control as well as the properties of the V4L2 control. the mapping setup would be as easy as redirecting a configuration file to a sysfs file. The drawback is that a specialized user space application would be necessary to install the mapping data. Internally the driver manages a global list of control descriptors with their V4L2 mappings. Through custom ioctls. a device-dependent list of controls. main challenge here would be to find a reasonable format that is both human-readable and easily parseable by the driver. The future will show which way turns out to be the best to manage the mapping configuration from user space. is used to store information about each device’s controls like 61 .2: Schema of a UVC control to V4L2 control mapping.

the lack of frame format enumeration as described in 4. Such a pointer allows for better scaling and less overhead because the driver does not have to walk any data structures to retrieve its internal state.1.2 Video4Linux In section 4. This switch would be completely transparent to users of libwebcam. Laurent Pinchart is currently working on this as part of his rewrite of the control code that fixes a number of other small problems. What originally started out as a tool for debugging turned out to be a useful option for scripting the supported devices.3 V4L2 controls in sysfs In connection with the topics mentioned above there is an interesting discussion going on whether all V4L2 controls could be exposed to sysfs by default and in a generic manner.3.the range of valid values. While most of them could not be fixed without breaking backwards compatibility. It would complete the sysfs interface of uvcvideo in a very nice manner and open the doors for entirely new tools. this does not cause any performance problems because these are exceptional events that do not occur during streaming. Once all the data structures are in place. This process required another change to the driver’s architecture: the addition of a global device list.5. was relatively easy to overcome. it is likely that Video4Linux will eventually receive such a control mapping layer. This is indispensable for performance critical applications and helps simplify the code in any case. Given the broad application scenarios and generic nature of the feature it would be optimal if the V4L2 core took care of automatically exposing all device controls to sysfs in addition to the V4L2 controls that are available today. 6.3 we saw a number of issues that developers of software using the current version of V4L2 have to deal with. If such an interface became reality. 62 . 6. Luckily. the driver loops through all devices and adds a control instance only if the device in question supports the new control. While currently not more than a point of discussion and an entry on the wish list. the most severe one. The idea comes from the pvrusb2 driver[10] which does just that. The Linux UVC driver also uses this technique whenever possible but for adding and removing control mappings it must fall back to using the device list. Many drivers do not need to maintain an internal list of devices because almost all APIs provide space for a custom pointer in the structures that they make available when they call an application.5. the V4L2 control access functions must be rewritten to use the mappings. When a control descriptor is added. libwebcam could automatically use it if the current driver does not support multiple opening of the same device because this would prevent it from using the V4L2 controls that it uses now.

Using a single dynamically allocated memory buffer is therefore out of question unless the buffer size is chosen much bigger than the average expected size. It does this by using the standard way for list enumeration in V4L2: The application repeatedly calls a given ioctl with an increasing index. starting at zero. the received data is inevitably inconsistent leading to unexpected behavior in the best case or crashes in the worst case. When we decided to add frame size and frame rate enumeration. however. If this happens. The draft received little positive feedback. an important point must be kept in mind: the attributes pixel format. and frame rate are not independent of each other. frame size. something that should be avoided by an API in order to encourage developers to use it in the first place and to discourage possibly unreliable hacks. • Non-atomicity. and receives all the corresponding list entries in return. our first draft would have solved both of these problems at once. It is left up to the reader to decide whether driver simplicity justifies the above problems.e. This shifts the complexity towards the application.V4L2 currently provides a way to enumerate a device’s supported pixel formats using the VIDIOC_ENUM_FMT ioctl. For any given pixel format. the index is out of bounds. Nothing forbids the application to start with a different index than zero or quit the enumeration process before the driver has had a chance to return the end of list marker. The entire list would have been returned in a single buffer making it easy for the application to parse on the one hand and rendering the mechanism insusceptible to consistency issues on the other. and we had to settle for a less elegant version that we present in the remainder of this section. The advantages of the second approach are its obvious simplicity for the driver side. There are two fundamental problems with this approach: • Application complexity. but it is not necessarily clear what this hierarchy 63 . If the list that the application wants to enumerate does not remain static over time. This seems to imply a certain hierarchy of these three attributes. there is a list of supported frame sizes and any given combination of pixel format and frame size determines the supported frame rates. Unfortunately this does not work because the driver has no way of knowing if an application is currently enumerating at all. there is always a chance that the list changes while an application is enumerating the contents of the list. the driver returns the EINVAL error value. The only reliable and scalable way is to build up a linked list within the application and add an entry for each ioctl call. The application cannot know how many entries there are in the list. i. The first idea for a workaround that comes to mind is that the driver could return a special error value indicating that the data has changed and that the application should restart the enumeration. No matter what enumeration approach is chosen. If there are no more entries left.

Frame rate However. The highest level has no dependency on lower levels. Technical details. suggest the following: 1. This mechanism can theoretically be extended to an arbitrary number of attributes although in practice there are limits to what can be considered a reasonable number of input values. Pixel format 2. Enumeration attribute Pixel format Frame size Frame rate Input parameters none Pixel format f Pixel format f. A video stream should mainly have a large enough image and a high enough frame rate. Once such a hierarchy has been established. we decided to leave the hierarchy in the order mentioned above. An application can still opt to collect the entire attribute hierarchy and present the user with a more suitable order. the lower levels have dependencies on only the higher levels. which means that it could be seamlessly integrated with our design for frame size and frame rate enumeration. a list of frame rates as a function of the selected resolution.should look like. a user might prefer a list of frame sizes to choose from first and. Frame size 3. Frame size s Output values Pixel formats Frame sizes supported for frame format f Frame rates supported for frame format f and frame size s Table 6. In order to keep the V4L2 frame format enumeration API consistent with the other layers. As a result. These functions are now part of the official V4L2 API. the documentation for which can be found at [5]. 64 . for users it may not be obvious why they should even care about the pixel format. Table 6. is just a technicality that the application should deal with in an intelligent and transparent manner. As it happens the V4L2 API already provided a function for pixel format enumeration. like the UVC descriptor format. the input and output values of each of the enumeration functions becomes obvious. possibly.2 gives the situation for the three attributes used by webcams. The pixel format and whether compression is used.2: Input and output values of the frame format enumeration functions.

Edgard Lima. The decoder did not process such images properly and failed to display the image as a result. one of the plugin’s authors. however. There is a particularity about the UVC driver that causes it not to work with a few applications.3. Spca5xx The Spca5xx driver already supports a large number of webcams as we saw in section 4. This was fixed as part of a patch that added two different modes for capturing raw frames. Also there were two issues with not supported ioctls and the frame rate computation. notably the absence of the VIDIOC_G_PARM and VIDIOC_S_PARM ioctls that do not apply to digital devices. One mode writes each received frame into a separate file (raw frame capturing). In September 2006. the v4l2src plugin worked great with UVC webcams and proved to be a good choice as a basis for our project. the other one creates one single file where it stores the complete video stream (raw frame stream capturing).4 Bits and pieces Especially during the first few weeks the project involved a lot of testing and bug fixing in various applications. The JPEG standard allows an encoder to add a customized Huffman table if it does not want to use the one defined in the standard. very similar to those in GStreamer’s v4l2src. The first mode can be used to easily 65 . added proper support for frame rate negotiation using GStreamer capabilities which allows GStreamer applications to take full advantage of the spectrum of streaming parameters.2 and the author relies to a large part on user feedback to maintain his compatibility list. 6. We also did some tests at Logitech with a number of our older cameras and found a few that were not recognized by the driver but would still work with the driver after patching its USB PID list. the first to remove the above dependency and the second to fix a small bug in the frame rate detection code. After two small patches. The GStreamer V4L2 source was one of these applications that would rely on these functions to be present and fail in the adverse case. luvcview The luvcview tool had a problem with empty frames that could occur with certain cameras and which would make the application crash.6.3 GStreamer GStreamer has had V4L2 support in the form of the v4l2src plugin for a while. Some of these changes are listed below. Ekiga During some tests with a prototype camera a bug in the JPEG decoder of Ekiga became apparent. but had not received any testing with webcams using the UVC driver.

66 .g. depending on the pixel format. the data may require some post processing.capture frames from the camera. e. although. adding of an image header.

7.8. were briefly covered in section 5. If a device was successfully opened. Another topic of this chapter is the licensing model of the framework. directly go ahead and open a device.4. or simply libwebcam. This chapter is dedicated to elaborate how some of these goals were achieved and implemented.Chapter 7 New components Chapter 5 gave an overview of our webcam framework and described its goals without going into much technical detail. The API is described in great detail in the documentation that comes with the sources. the library returns a handle 67 . The client can now continue by either enumerating devices or. a crucial topic of any open source project. We will also give an outlook on future work and opportunities. Each application must initialize the library before it is first used. if it already knows which device it wants to open. At the same time we will show the limitations of the current solution and their implications towards future extensibility. It will also explain the design decisions and explain why we have chosen certain solutions before others.1 libwebcam The goals of the Webcam library. The functions can be grouped into the following categories: • Initialization and cleanup • Opening and closing devices • Device enumeration and information retrieval • Frame format enumeration • Control enumeration and usage • Event enumeration and registration The general usage is rather simple. This allows the library to properly set up its internal data structures.

it should close the device handles and uninitialize the library to properly free any resources that the library may have allocated.1 Enumeration functions All enumeration functions use an approach that makes it very easy for applications to retrieve the contents of the list in question. In the rare occasion where the list changes between the two calls. the first one to determine the required buffer size. This handle is then used for tasks such as enumerating frame formats or controls and reading or writing control values. a third call can be necessary. The following pseudo code illustrates the usage of this enumeration schema from the point of view of the application. 68 . but with the current implementation this situation can only arise for devices. size : buffer_size) Obviously. Even though the buffer can contain variable-sized data. it can be treated as an array through which the application can loop. This means that enumeration usually takes exactly two calls. the syntax of the actual API looks slightly different and applications must do proper memory management and error handling.that the application has to use for all subsequent requests. Since most GUI applications are multi-threaded. Let us now look at a few implementation details that are important for application developers using libwebcam to know. All internal data structures are protected against simultaneous changes from different threads that could otherwise lead to inconsistent data or program errors. Once the application is done. this spares the application developer from taking additional steps to prevent multiple simultaneous calls to libwebcam functions. 7. 7. buffer := NULL buffer_size := 0 required_size := c_enum(buffer : NULL. Figure 7. Another aspect that makes this type of enumeration very easy for its users is that the buffer is completely self-contained.1.1 illustrates the memory layout of such a buffer.1.2 Thread-safety The entire library is programmed in a way that makes it safe to use from multithreaded applications. the second one to fill the buffer. size : buffer_size) while(required_size > buffer_size) buffer_size := required_size buffer := allocate_memory(size : buffer_size) required_size := c_enum(buffer : buffer.

Pointers can also be NULL. 69 .Figure 7.1: Illustration of the memory block returned by a libwebcam enumeration function. Each list item has four words of fixed-sized data and two char pointers. The buffer contains three list items and a number of variable-sized items (strings in the example). Note that only the pointers belonging to the second item are illustrated. in which case there is no space reserved for them to point to. The second item shows pointers to two strings in the variable-sized data area at the end of the buffer.

lvfilter hooks into the GStreamer video pipeline where it influences the stream capability negotiation in a way that makes sure that the format is understood by liblumvp. where they can be processed and. Instead. it redirects all video frames through liblumvp. While lvfilter takes care of the proper initialization of liblumvp.2 liblumvp and lvfilter The Logitech user mode video processing library is in some ways similar to libwebcam.11 and its interface is very similar when it comes to library initialization/cleanup or control enumeration. Before we move on to the next topic a few words about the two plugins that are currently available: 70 . that can also be seen as an opportunity. the video interception filter that delivers video data. and libwebcampanel. Some users do not like components to work transparently. Nothing. We have mentioned that applications must explicitly make use of liblumvp by including lvfilter in their GStreamer pipeline. however. The list of drawbacks is led by the fact that it does not smoothly integrate into existing applications and that each application must test for the existence of lvfilter if it wants to use the extra features.7. it does not use the feature controls that liblumvp provides. It is this very fact. prevents an application from directly using liblumvp apart from the fact that this would make the application directly dependent of a library that was designed to act transparently in the background. however. either because they could potentially have negative interactions that would make problems hard to debug or because they do not trust closed source libraries. When the stream starts. It also provides controls as we have seen in section 5. It then initializes the latter with the negotiated stream parameters and waits for the pipeline state to change. from which it receives commands directed at the features that it provides. Interaction with these happens through libwebcampanel as we will see shortly. liblumvp is not directly used by the application. This has positive and negative sides. modified before it outputs them to the remaining elements in the pipeline. possibly. its two clients are lvfilter.4. The function categories are: • Initialization and cleanup • Opening and closing devices • Video stream initialization • Feature enumeration and management • Feature control enumeration and usage • Video processing In our webcam framework.

The algorithm detects people’s faces and zooms in on them.9 gave a high-level overview of what sort of information filtering libwebcampanel adds on top of what libwebcam provides. so that they are better visible when the user moves away from the camera. In the future. If the camera supports mechanical pan and tilt. This feature is only available for Logitech cameras that are UVC compatible. 7. it is still able to provide a somewhat useful string like "Unknown Logitech camera (0x1234)". like the Logitech QuickCam Orbit.3. For other cameras the same is done digitally.4. the second one can be useful for laptops with cameras built into the top of the screen because these are usually rotatable by 180 degrees along the upper edge and allow switching between targeting the user and what is in front of the user. • Controls 71 . it does so by moving the lens head in the right direction.1 Meta information Section 5. In the case of the UVC driver it is always "USB Video Class device". not very helpful for the user who has three different UVC cameras connected.Mirror The first.3 libwebcampanel The interface of the Webcam panel library is very similar to the one provided by libwebcam. This gives the application more descriptive names like "Logitech QuickCam Fusion". This was a design decision that should make it easy for applications that started out using libwebcam to switch to libwebcampanel when they want more functionality. very simple plugin is available on any camera and lets the user mirror the image vertically and horizontally. • Devices – Camera name change: The camera name string in libwebcam comes from the V4L2 driver and is usually generic. more features from the Logitech QuickCam software will be made available for Linux users through similar plugins. Face tracking This module corresponds closely to what users of the QuickCam software on Windows know as the "Track two or more of us" mode of the face tracking feature. While the first one can be used to turn a webcam into an actual mirror. Let us look at these in more detail. If the library recognizes only the vendor but not the exemplary device ID 0x1234. 7. For this reason libwebcampanel has a built-in database of device names that it associates with the help of their USB vendor and product IDs.

relative pan angle and relative tilt angle. Both motors can be moved separately by a given angle. Controls can also be made read-only or write-only. combines both values. the user will see a list box. As a fictitious example. Unfortunately. it might present the user with a slider or range control that can take all possible values between 1 and 255. The control. with each entry having a distinct and clear meaning and no chance for the user to accidentally select invalid values.e. libwebcampanel comes with enough information to avoid this situation by turning the auto-exposure mode control into a selection control instead that allows only four different settings–the ones defined in the UVC standard. the control is usually exported as an integer control with valid values ranging from 1 to 255. a major gain in usability. 4. due to the limited control description capabilities of UVC. While the first point is pretty self-explanatory. However. the second one deserves a few real life examples. The library can filter those out stopping them from appearing in the application and confusing users. Example 1: Control attribute modification The UVC standard defines a control called Auto-exposure mode. This control is a 8-bit wide bitmask with only four of the eight bits actually being used. This can be useful in cases where a driver wrongly reports a generic control that is not supported by the hardware. however. one for each axis. a 3D motion control could be split up into three different motion controls. If an application uses a generic algorithm to display such a control. 8 as the set of legal values. in a single 4-byte control containing a signed 2-byte integer for each. 2. or whatever the application developer decided to use to represent a selection control.– Control attribute modification: These modification range from simple name changes to more complex ones like modification of the value ranges or completely changing the type of a control. Example 2: Control splitting The Logitech QuickCam Orbit series has mechanical pan and tilt capabilities with the help of two little motors. It also 72 . i. For an application such a control is virtually unusable without the knowledge how the control values have to be interpreted. Now. – Control deletion: A control can be hidden from the application. The bits are mutually exclusive leaving 1. libwebcampanel solves this problem very elegantly by splitting up the control into two separate controls: relative pan angle and relative tilt angle. through which these capabilities are exposed. – Control splitting: A single control can be split into multiple controls. It determines what parameters the camera changes to adapt to different lighting conditions. most values will have no effect because they do not represent a valid bitmask.

it becomes easy for the application to selectively support only one set of controls or to clearly separate the two sets. and the panel application that uses libwebcampanel run in 73 . in XML configuration files. which one of these approaches is more suitable. Communication between client and library There is another very important point that was left unmentioned until now and that only occurs in the case of a panel application. 7. The driver and therefore V4L2 know about them from the very start.marks both controls as write-only. They are different to ordinary controls in a few ways and they require a few special provisions as we shall see now.3. as in the example of LVGstCap. As a nice side-effect. An alternative approach would be to move all device-specific information outside the library. At this time. The future will show. While this would make it easier to keep the information current. The video stream. Obviously. it would also make it harder to describe device-specific behavior. feature controls We have previously mentioned that controls and feature controls are handled in an almost symmetrical manner. and therefore liblumvp. The small but important difference between the two is that ordinary controls are device-related but feature controls are stream-related. to present the user with a slider control that can be dragged to either side and jumps back to the neutral position when let go of. and as action controls. Controls vs.g. What this means is that the list of device controls can be queried before the application takes any steps to start the video stream. This timing difference would make it considerably more complicated for applications to manage a combined control list in a user-friendly manner. a video application will probably query the camera controls right after device connection but feature controls only when the video is about to be displayed. So in practice. e. meaning that changing the controls causes a one-time action to be performed. because it makes no sense to read a relative angle. most of this information is device-specific and needs to be kept up-to-date whenever new devices become available. It can therefore be expected that new minor versions of the library appear rather frequently including only minor changes. They can enable or disable certain features or change the way video effects operate.2 Feature controls Feature controls influence directly what goes on inside liblumvp. The application can use this information. the GStreamer pipeline may not even be built and lvfilter and liblumvp not loaded.

However. on the other side. These convenience libraries contain some of the functionality that liblumvp plugins rely on and were ported from the corresponding Windows libraries. There is a possible optimization here. The directory structure of the open source part looks as follows: / +--lib | +--libwebcam | +--libwebcampanel | +--gstlvfilter | +--src +--lvgstcap The top level Makefile generated by Autotools compiles all the components. 7. Their semantics are completely identical. but it would be a second and completely independent instance. the IPC implementation does not cause any noticeable delays since the amount of transmitted data remains in the order of kilobytes. has a socket client that it uses to send requests to liblumvp whenever one of the feature control functions is used. 74 . To avoid this problem. liblumvp could well be loaded into the application’s address space. Both. libwebcampanel and liblumvp have a socket implementation over which they can transfer all requests related to feature controls. although the C version is still available and ready to use if circumstances make it seem preferable. 1 Convenience libraries group a number of partially linked object files together. although each component can also be built and installed on its own. We opted for the simpler solution of using the same interface in both cases. only the medium differs. it creates a socket server thread that waits for such requests. the traditional choice for most Linux software. While they are not suitable for use as-is. they can be compiled into other projects in a similar way to ordinary object files. This means that the application would in vain try to change feature controls. libwebcampanel. the two libraries must be able to communicate across process borders–a clear case for inter-process communication.two different processes. The project is mostly selfcontained with the exception of liblumvp which has some dependencies on convenience libraries1 outside the build tree. Whenever a client opens a device using liblumvp (in our case this is done by lvfilter). namely the use of the C interface instead of the IPC interface whenever both libraries run in the same process. Generic build instructions are included in the source archive.4 Build system The build system of the webcam framework is based on the Autotools suite.

having a complete UVC driver cannot be more than a long-term goal.7. This section is dedicated to make developers and users of the Linux webcam framework aware of these limitations. it is still work in progress and there remains a lot of work to be done for it to implement the entire UVC standard. e. 75 .5. A few of these are discussed below. While this has no influence on the video stream itself. is that the driver supports the features that today’s devices use. Support for status interrupts The UVC standard defines a status interrupt endpoint that devices must implement if they want to take advantage of certain special features. These are: • Hardware triggers (e. which are the topic of the next section. 7. for example update the internal state or pass the notification on to user space applications. the UVC standard describes many features and technologies for which there exist no devices today and for another. sensor-based controls) When such an event occurs. The latter one can be quite useful for applications because they may want to prevent the user from sending further motion commands while the device is still moving. Currently. motor controls whose execution can take a considerable amount of time and after completion of which the driver should be notified) • AutoUpdate controls (controls whose values can change without an external set request.g. For one thing. buttons on the camera device for functions such as still image capturing) • Asynchronous controls (e.g. others are the result of time constraints or are beyond the project’s scope. At the same time it gives pointers for future work. Some of them have technical reasons.g.5 Limitations Each solution has its trade-offs and limitations and it is important to be aware of them. however. What is important.1 UVC driver Even though the Linux UVC driver is stable and provides support for all basic UVC features needed to do video streaming and manage video controls. At the moment. Luckily. not even Windows ships with such a driver. and this is a short-term goal that will be achieved soon. the device sends a corresponding interrupt packet to the host and the UVC driver can take the necessary action. the Linux UVC driver has no support for status interrupts and consequently ignores the packets. it prevents applications from receiving device button events or be notified when a motor control command has finished. the list of tasks to get there is now down to a relative small number of items.

6. The polling process sleeps and wakes up as soon as one of the monitored attributes changes. If the algorithm has no way of knowing the exact completion time it must resort to approximations and guess work. at what point in time in order to avoid overlapping requests. this does not leave it much choice when it comes to defining 76 . For the application this incurs some extra complexity. Sysfs permissions Another problem that still awaits resolution is to find a method to avoid giving all users arbitrary access to the controls exported to the sysfs virtual file system.5. like the one used for multiple face tracking in liblumvp. 2.In the context of mechanical pan/tilt there are two other issues that the lack of such a notification brings with it: 1. Motion tracking.17 it is possible to make sysfs attributes pollable (see [2] for an overview of the interface). When a motion tracking algorithm. One of the reasons why the UVC driver does not currently process status interrupts is that the V4L2 API does not itself have any event notification support. Otherwise. the algorithm must resynchronize. the entire scene would be interpreted as being in motion due to the viewport translation that happens. notably the necessity of multi-threading. As soon as the driver is up to the task. Keeping track of the current angle. If the hardware itself does not provide the driver with information as to the current pan and tilt angles. issues a pan or tilt command to the camera. others being synthesized by the library itself. This polling process does not impose any CPU load on the system because it is implemented with the help of the poll system call. the driver or user space library can approximate this by keeping track of the relative motion commands it sends to the device. it must temporarily stop processing the video frames for the duration of the movement. This is clearly a task for a library like libwebcam. The polling functionality only needs to be written once and at the same time the notifications can be sent using a more application friendly mechanism like callback. some of them coming from the hardware. For this purpose. After the motion has completed. libwebcam already has an interface designed for this exact purpose. applications will be able to register callback functions for individual events. Since kernel 2. therefore decreasing its performance. This is what liblumvp does at the moment. Since sysfs attributes have fixed root:root ownership when the UVC driver creates them.1 such a scheme is not easy to implement due to the lack of callback techniques that kernel space components have at their disposition. As we saw in section 4. it needs to know whether a given command has succeeded and if so. The sysfs interface that is about to be included in the UVC driver is a first step into the direction of adding a notification scheme.

he is integrating the extensions presented in 6. would only give the superuser write access to the sysfs attributes. 77 . Ongoing development The ongoing development of the UVC driver is of course not a limitation in itself. The version distributed with the framework does not properly support controls on the camera terminal. in particular functions that require raw access to the extension units need to use the version distributed as part of the framework. Another approach to the problem would be to let user space handle the permissions. A good solution would be to duplicate the ownership and permission from the device node and apply them to the sysfs nodes. they do preserve new values when set from user space. This will simplify large parts of the control code because both entity types can contain controls. users who want to try out the webcam framework in its entirety. and therefore the UVC extension controls. A user space application running with elevated privileges could therefore take care of this task. This only affects the controls related to exposure parameters and will automatically be fixed during the merge back. This still does not give fine grained permissions to individual users but at least a user has to be a member of the video group to be able to access the camera. the author is rewriting parts of the driver to be more modular and to better adapt them to future needs. however. Currently. Therefore. on the other hand. Another aspect of the current rewrite is the consolidation of some internal structures. Modes 0660 and 0664. but it will be added as soon as the completely rewritten control management is finished. such a solution does not seem feasible due to the hard-coded attribute ownership. notably the combination of the uvc_terminal and uvc_unit structs. would permit every user to change the behavior of the attached video devices leading to a rather undesirable situation: a user guest that happens to be logged in via SSH on a machine on which a video conference is in progress could change settings such as brightness or even cause the camera to tilt despite not having access to the video stream or the V4L2 interface itself. The latest SVN version of the UVC driver does not yet contain the sysfs interface. for the time being. Even though sysfs attributes have their UID and GID set to 0 on creation. For device nodes this problem is usually resolved by changing the group ownership to something like root:video and giving it 0660 permissions. As of the time of this writing. on the one hand.g. Mode 0666. The limitation merely stems from the fact that not all of the proposed changes have made their way into the main driver branch yet. This would make sure that whoever has access to the V4L2 video device also has access to the device’s UVC extensions and controls.1 piece by piece. using chmod. At the same time.permissions. e.

the events supported by libwebcampanel will be a superset of those known to libwebcam in a manner analog to controls. This is hardly a limitation at the moment because current applications are simply not prepared for such a special mode.Still image support The UVC specification includes features to retrieve still images from the camera. the Linux UVC driver does not support this method at all. however.2 Linux webcam framework Missing event support The fact that libwebcam currently lacks support for events. despite the fact that the interface is there. In the future one could. Still images are treated differently from streaming video in that they do not have to be real-time. 78 . let us look at the list of events that libwebcam and libwebcampanel could support: • Device discovered/unplugged • Control value changed automatically (e.g. Imagine the following. for UVC AutoUpdate controls) • Control value changed by client (to synchronize multiple clients of libwebcam) • Control value change completed (for asynchronous controls) • Other. was already mentioned above. To give the reader an idea of what the future holds. This would allow frame capturing with simple command line tools or amazingly simple scripts. driver-specific events • Feature control value changed (libwebcampanel only) • Events specific to liblumvp feature plugins (libwebcampanel only) Again.jpg It would be fairly simple to extend the driver to support such a feature. 7. but the priorities are clearly elsewhere at the moment. At the moment. which gives the camera time to apply image quality enhancing algorithms and techniques. think of some interesting features like the ability to read still images directly from /dev/videoX after setting a few parameters in sysfs. All single frame capture applications that currently exist open a video stream and then process single frames only. something that obviously works perfectly fine with the UVC driver. for example: dd if=/dev/video0 of=capture.5.

There are no applications today that provide multiple video windows per camera at the same time and the possible use cases seem restricted to development and debug purposes. Does it work on Linux? • Which driver do I need? Where do I download it? • How do I compile and install the driver? How can I verify its proper functioning? • What applications are there? What can they do? • What camera features are supported? What would it take to fix this? All these questions are not easy to answer. This design decision was made for simple practicality. Even though the information is present somewhere on the web. though. Typical questions that arise are: • What camera should I buy so that it works on Linux? • I have camera X. the demand for detailed and reliable information out there is quite large. which is the default in LVGstCap. In GStreamer terms this means that the slower ximagesink would have to be used instead of xvimagesink. the additional work required would hardly justify the benefits.Single stream per device The entire framework is laid out to only work with a single video stream at a time. This means that it is impossible to multiplex the stream. For another thing. and control the feature plugins separately for both substreams.6 Outlook Providing an outlook of the further development of the Linux webcam framework at this moment is not easy given that it has not been published yet and therefore received very little feedback. Even V4L2 applications still use API calls that are not suitable for digital devices. There are. a few signs that there is quite some demand out there for Linux webcam software as well as related information. however. There is another reason why it is unlikely that such applications appear in the near future: the XVideo extension used on Linux to accelerate video rendering can only be used by one stream at a time. for example with the help of the GStreamer tee element. For most conceivable applications this is not a limitation. Linux users who want to use webcams have a number of information related problems to overcome. it is usually not easy to find because there 79 . 7. requests and responses that come up on the Linux UVC mailing list clearly show that the current software has deficits. clearly showing their origins in the world of TV cards. so that any additional streams would have to be rendered using unaccelerated methods. A classic example is the fact that there are still many programs out there that do not support V4L2 but are still based on the deprecated V4L1 interface. For one thing.

Logitech will add more feature plugins to liblumvp as the framework gains momentum with the most prominent one being an algorithm for face tracking. 7. The only closed source component of our framework. the broader use of the framework will bring forth further needs that can be satisfied by future versions and.7. which is in turn developed under the LGPL. The only closed source component of the webcam framework is the liblumvp library. The Linux UVC driver is one such component that is rapidly improving. Logitech will publish a website that is designated to become such an information portal. adapted license. something that Logitech is taking initiative in. Our hope is that. this situation is quite easy. The same reasoning applies to the lvfilter GStreamer plugin.7 Licensing The licensing of open source software is a complex topic. In terms of software. There are literally hundreds of different open source licenses out there and many projects choose to use their own. on the one hand. But libwebcam is not the only component that will see further improvements. Providing software is thus not the only task on the to do list of Linux webcam developers.1 Libraries One key point that poses constraints on the licensing of a project is the set of used licenses for the underlying components.is no single point to start from. but can instead rely on libraries proven to be stable. new versions will create the need for libwebcam extensions. Many sites are incomplete and/or feature outdated information. making the search even harder. further complicating the situation. Such a licensing scheme considerably increases the number of potential users because developers of closed source applications do not need to reinvent the wheel. Some of the feature plugins contain code that Logitech has 80 . For this reason libwebcam and libwebcampanel are also released under the LGPL enabling any application to link against them and use their features. In our case. More and better information is required. that the project will give impulses for improving the existing components. 7. At the end of this chapter we will give more details about that project. As we have seen during the discussion of limitations above. Together with the webcam framework. The LGPL is considered one of the most appropriate licenses for libraries because it allows both open and closed source components to link against it. Compared to the current motion tracker algorithm it performs much better when there is only a single person visible in the picture. uses GStreamer. liblumvp. on the other hand. especially when combined with closed source components. the Linux webcam framework certainly has the potential to spur the development of new and great webcam applications as well as giving new improved tools to preexisting ones.

7. Table 7. is licensed under version 2 of the GNU GPL.8 Distribution Making the webcam framework public and getting people to use it. 81 . While liblumvp is free or charge. it is covered by an end-user license agreement very similar to the one that is used for Logitech’s Windows applications.1: Overview of the licenses used for the Linux webcam framework components.1 gives an overview over the licenses used for the different components of this project. test it. There is one question that keeps coming up in Internet forums when closed source components are discussed: "Why doesn’t the company want to publish the source code?" The answer is usually not that companies do not want to but that they cannot for legal reasons. Hardware manufacturers often buy software modules from specialized companies and these licenses do not allow the source to be made public. Component libwebcam libwebcampanel lvfilter liblumvp LVGstCap lvcmdpanel Samples License LGPL LGPL LGPL Closed source GPL GPL Public domain Table 7.licensed from third parties under conditions that disallow their distribution in source code form. in particular LVGstCap and lvcmdpanel. Logitech is currently setting up a web server that is expected to go online in the last quarter of 2006 and will contain the following: • List of drivers: Overview of the different webcam drivers available for Logitech cameras. 7. This allows anybody to make changes to the code and publish new versions as long as the modified source code is also made available. and receive feedback will be an important task of the upcoming months. The complete text of the GPL and LGPL licenses can be found in [7] and [8].7.2 Applications All non-library code.

• Compatibility information: Which devices work with which drivers? • FAQ: Answers to questions that frequently come up in the context of webcams. for example on the mailing list of the Linux UVC driver. • Downloads: All components of the Linux webcam framework (incl. 82 . The address will be announced through the appropriate channels. • Forum: Possibility for users to discuss problems with each other and ask questions to Logitech developers. sources except for liblumvp).

$ lvcmdpanel -l Listing available devices: 83 . however. The user only has direct contact with the video capture application LVGstCap and the panel application lvcmdpanel.Chapter 8 The new webcam infrastructure at work After the technical details it is now time to see the webcam framework in action–or at least static snapshots of this action. The work of the remaining components is.1 LVGstCap Figure 8. The video window to the left displays the current picture streaming from the webcam while the right-hand side contains both camera and feature controls in separate tabs.2 lvcmdpanel The following console transcript shows an example of how lvcmdpanel can be used. Currently it allows flipping the image about the horizontal and vertical axes and enabling or disabling the face tracker. still visible. 8. especially in the case of lvcmdpanel. All control elements are dynamically generated from the information that libwebcampanel provides.1 shows a screenshot of LVGstCap with its separation into video and control area. whose interface is very close to libwebcampanel’s. The Features tab gives control over the plugins that liblumvp contains. The Camera tab allows the user to change settings directly related to the image and the camera itself. 8.

’50 Hz’[1]. ’60 Hz’[2] }. 2. IS_CUSTOM }. Type : Dword.Figure 8.1: A screenshot of LVGstCap with the format choice menu open. step size: 1 ]. IS_CUSTOM }. Type : Choice. Values : [ 0 . one was recognized. the other one was detected as an unknown Logitech device and its USB PID is displayed instead. Default : 2 Backlight Compensation ID : 12. Values : { ’Disabled’[0]. CAN_WRITE. Flags : { CAN_READ. video0 video1 Unknown Logitech camera (0x08cc) Logitech QuickCam Fusion There are two devices in the system. Flags : { CAN_READ. CAN_WRITE. Default : 1 84 . $ lvcmdpanel -d video1 -c Listing available controls for device video1: Power Line Frequency Backlight Compensation Gamma Contrast Brightness $ lvcmdpanel -d video1 -cv Listing available controls for device video1: Power Line Frequency ID : 13..

lvcmdpanel consists of less than 400 lines of code and already covers the basic functionality. { CAN_READ. { CAN_READ. [ 0 . step size: 1 ]. [ 100 .. in this case the second one. The second command changes the brightness value to the maximum of 255 and the third one shows that the value was in fact changed. step size: 1 ]. 127 The -c command line switch outputs a list of controls supported by the specified video device. { CAN_READ. CAN_WRITE }. Dword. 85 . $ lvcmdpanel -d video1 -g brightness 127 $ lvcmdpanel -d video1 -s brightness 255 $ lvcmdpanel -d video1 -g brightness 255 The current brightness value is 127 as printed by the first command. CAN_WRITE }. step size: 120 ].. CAN_WRITE }. 220 2. Dword. For the second list the verbose switch was enabled. The last example shows how simple it is to create scripts to automate tasks with the help of panel applications. which yields detailed information about the type of control. [ 0 .Gamma ID : Type : Flags : Values : Default : Contrast ID : Type : Flags : Values : Default : Brightness ID : Type : Flags : Values : Default : 6. Even writing an actual panel application is very straightforward. 255. (Note that the output was slightly shortened by leaving out a number of less interesting controls. 32 1. The commands below change the brightness of the image while luvcview–or any other video application–is running.) The final part of the transcript can be followed easiest by first starting an instance of luvcview in the background. etc.. 220. the accepted and default values. 255. Dword.

last but not least. The Logitech video team has had such a relationship with the open source community for a while. Out of these three. Until recently most companies were unaware of the fact that small pieces of information that seem obvious on the inside of a product team can have a much higher value when carried outside. is a great motivation along the way. drivers written from scratch. I was in contact with project mailing lists. The expression "it’s the little things that count" immediately comes to mind and the positive reactions one receives. While many fruitful discussions are held. This is the first time that we have actively participated and while it remains to be seen what the influence of the project will be. test results. reputation. the good and bad sides go hand in hand and so mailing lists inherit 86 . and patches whereas users appreciate help with questions to which the answers are not necessarily obvious. the last two are certainly the easiest to work with. Mailing lists are a category of their own. As far as the author’s personal experience is concerned. Having been on the user side of hardware and software products for many years myself. Unfortunately. Simple information that is given out comes back in the form of improved product support. something that holds true for mailing lists. The success of modern media like Internet forums with employee participation and corporate blogs is a clear sign for this.Chapter 9 Conclusion Jumping into work in the open source community with the support of a company in the back is a truly gratifying job. the vast majority was of a positive nature. What makes democracy a successful process is the fact that everybody has their say and everybody is encouraged to speak up. and. I know how helpful the little insider tips can be. and ordinary users of open source software without a strong programming background. Developers are grateful for feedback. some of them reminded me of modern politics. Open source is in some ways similar to these media. the little feedback we have received makes us confident that the project is a success and will not end here. although in a rather low-profile manner leading to little public perception. even for small favors. developers. suggestions.

The components are there for the most part but they need to be consistently improved to make sure that they work together more closely. Linux must focus on its weaknesses and multimedia is clearly one of them. a fact that each developer must learn to live with. The pragmatic solution often beats the technically more elegant in terms of utility. Many discussions fail to reach a conclusion and silently dissolve. 87 . If open source developers need to learn one thing. much to the frustration of the person who brought up the topic. The moment when Linux users can plug in their webcam. start their favorite instant messenger and have a video conference taking advantage of all the camera’s features is within a grasp–an opportunity not to be missed. The future will show whether we are able to reach our long-term goal. it is seeing their users as customers and treating them as such.the dangers of slow decision making and standstill. achieving a webcam experience among users that can catch up to what Windows offers nowadays. There are high hopes on KDE 4 with its multimedia architecture and camera support will definitely have its place in it. The Linux platform has undoubtedly become a competitive platform but in order not to lose its momentum.

Appendix A

List of Logitech webcam USB PIDs
This appendix contains a list webcams manufactured by Logitech, their USB identifiers and the name of the driver they are reported or tested work with. We use the following abbreviated driver names in the table: Key pwc qcexpress quickcam spca5xx uvcvideo Driver Philips USB Webcam driver (see 4.3.1) QuickCam Express driver (see 4.3.4) QuickCam Messenger & Communicate driver (see 4.3.3) Spca5xx Webcam driver (see 4.3.2) Linux USB Video Class driver (see 4.3.5)

The table below contains the following information: 1. The USB product ID as reported, for example, by lsusb. Note that the vendor ID is always 0x046D. 2. The ASIC that the camera is based on. 3. The name under which the product was released. 4. The driver by which the camera is supported. An asterisk means that the state of support for the given camera is untested but that the camera is likely to work the driver given the ASIC. Possibly the driver may need patching in order to recognize the given PID. A dash means that the camera is not currently supported.

88

PID 0840 0850 0870

ASIC ST600 ST610 ST602

Product name Logitech QuickCam Express Logitech QuickCam Web Logitech QuickCam Express Logitech QuickCam for Notebooks Labtec WebCam Acer OrbiCam Acer OrbiCam Logitech QuickCam IM Labtec Webcam Plus Logitech QuickCam IM Logitech QuickCam Image Logitech QuickCam for Notebooks Deluxe Labtec Notebook Pro Logitech QuickCam IM Logitech QuickCam Communicate STX Logitech QuickCam for Notebooks Logitech QuickCam Pro Logitech QuickCam Pro 3000 Logitech QuickCam Pro for Notebooks Logitech QuickCam Pro 4000 Logitech QuickCam Zoom Logitech QuickCam Zoom Logitech QuickCam Orbit Logitech QuickCam Sphere Cisco VT Camera Logitech ViewPort AV100 Logitech QuickCam Pro 4000 Logitech QuickCam Zoom Logitech QuickCam Fusion Logitech QuickCam Orbit Logitech QuickCam Sphere MP Logitech QuickCam for Notebooks Pro Logitech QuickCam Pro 5000 QuickCam for Dell Notebooks Cisco VT Camera II Logitech QuickCam IM Logitech QuickCam Connect Logitech QuickCam Messenger MP

Driver qcexpress qcexpress qcexpress

0892 0896 08A0 08A2 08A4 08A7 08A9 08AA 08AC 08AD 08AE 08B0 08B1 08B2 08B3 08B4 08B5 08B6 08B7 08BD 08BE 08C1 08C2 08C3 08C5 08C6 08C7 08D9 08DA

VC321 VC321 VC301 VC302 VC301 VC302 VC302 VC302 VC301 VC302 VC302 SAA8116 SAA8116 SAA8116 SAA8116 SAA8116 SAA8116 SAA8116 SAA8116 SAA8116 SAA8116 SPCA525 SPCA525 SPCA525 SPCA525 SPCA525 SPCA525 VC302 VC302

– – spca5xx spca5xx spca5xx (*) spca5xx (*) spca5xx spca5xx spca5xx (*) spca5xx spca5xx pwc pwc pwc pwc pwc pwc pwc pwc pwc pwc uvcvideo uvcvideo uvcvideo uvcvideo uvcvideo uvcvideo spca5xx spca5xx

89

08F0 08F1 08F4 08F5 08F6 0920 0921 0922 0928 0929 092A 092B 092C 092D 092E 092F 09C0

ST6422 ST6422 ST6422 ST6422 ST6422 ICM532 ICM532 ICM532 SPCA561B SPCA561B SPCA561B SPCA561B SPCA561B SPCA561B SPCA561B SPCA561B SPCA525

Logitech QuickCam Messenger Logitech QuickCam Express Labtec WebCam Logitech QuickCam Communicate Logitech QuickCam Communicate Logitech QuickCam Express Labtec WebCam Logitech QuickCam Live Logitech QuickCam Express Labtec WebCam Logitech QuickCam for Notebooks Labtec WebCam Plus Logitech QuickCam Chat Logitech QuickCam Express Logitech QuickCam Chat Logitech QuckCam Express QuickCam for Dell Notebooks

quickcam quickcam (*) quickcam (*) quickcam quickcam spca5xx spca5xx spca5xx (*) spca5xx spca5xx spca5xx spca5xx spca5xx spca5xx (*) spca5xx (*) spca5xx uvcvideo

90

[9] Philip Heron. URL http:// [4] Bill Dirks. pvrusb2 driver. camorama.org/ developers/devclass_docs. URL http://www. 1999. GNU General Public License. URL http://www. Michael H. 1991. URL http://lwn.fixedgear.net/ URL http:// [10] Mike Isely.org/copyleft/gpl. URL http: //www. [7] Free Software Foundation.firestorm.thedirks. 1999-2006.net/Articles/174660/. fswebcam.Bibliography [1] Jonathan Corbet. Linux loses the Philips webcam driver.html.org/. Some upcoming sysfs enhancements. pvrusb2/pvrusb2. [12] Avery Lee. 2005.usb.org/copyleft/lesser.org/blog/pivot/entry. 2005. 2006. [2] Jonathan Corbet. Revision 1. Video for Linux Two .net/Articles/99615/.com/.1 edition. Camorama. and Hans Verkuil.gnu. GNU Lesser General Public License. LWN. 2005. [3] Creative. Video for Linux Two API Specification.creative. LWN. 2006.Driver Writer’s Guide. Schimek.htm.linuxtv. fswebcam/.gnu. URL http://www. Creative Open Source: Webcam support. URL http://lwn.cx/ URL http://www. 91 . URL http://www. opensource.org/downloads/ video4linux/API/V4L2_API/. 1999.html. [11] Greg Jones and Jens Knutson. URL http://www. Universal Serial Bus Device Class Definition for Video Devices. 2004. virtualdub. [6] USB Implementers Forum. [8] Free Software Foundation.html.php?id=78.isely. [5] Bill Dirks. 2006. URL http://www. Capture timing and capture sync.org/v4l2/v4l2dwg.

org. 2006. URL http://mxhaard.net/. QuickCam Express Driver. December 2003.iu. [17] Laurent Pinchart.fr/spca5xx.html.org/. VideoForLinux: El canal del Pingüino (“The Penguin Channel”). spca50x/Investigation/uvc/. [21] Dave Wilson. URL http://mxhaard. 2006. [18] Damien Sandras. Linux GPL and binary module exception clause?. URL http://linux-uvc. URL http://www.sourceforge. URL http://www.0/0670. 1998. [14] Christian Magnusson. URL http://qce-ga.html.org/. [20] Linus Torvalds. 2006.mag.html. URL http://www.fr/ [23] Michel Xhaard. 2006.edu/hypermail/linux/kernel/ 0312.fourcc. 2006.org/ Articulos-periodisticos/jantonio/video4linux/v4l_1. [22] Michel Xhaard.de/. URL http://home. 2004.org/. Network-Integrated Multimedia Middleware.[13] Marco Lohse. Ekiga. Linux UVC driver. [19] Tuukka Toivonen and Kurt Wal.free. 92 .ekiga.html. 2006. luvcview. QuickCam Messenger & Communicate driver for Linux. [15] Juan Antonio Martínez.de/NMM/current/Docs/videowall/index. berlios. 2006.uni-sb. SPCA5xx Webcam driver.cx/messenger/. URL http://www.ussg. free. [16] Motama and Saarland University Computer Graphics Lab. URL http:// graphics.cs.tldp. TLDP-ES/LuCAS. Setting up a Video Wall with NMM. FOURCC. networkmultimedia. URL http://es.