You are on page 1of 8

MULTIMEDIA

FUNDAMENTALS
VOLUME 1:
Media Coding and Content Processing

Ralf Steinmetz
Klara Nahrstedt

PRENTICE HALL PTR


UPPER SADDLE RIVER, NJ 07458
WWW.PHPTR.COM
Contents

Preface xv

1 Introduction 1
1.1 Interdisciplinary Aspects of Multimedia 2
1.2 Contents of This Book 3
1*3 Organization of This Book 4
1.3.1 Media Characteristics and Coding 5
1.3.2 Media Compression 5
1.3.3 Optical Storage 6
1.3.4 Content Processing 6
1.4 Further Reading About Multimedia 6

2 Media and Data Streams 7


2.1 The Term "Multimedia" 7
2.2 The Term "Media" 7
2.2.1 Perception Media 8
2.2.2 Representation Media 8
2.2.3 Presentation Media 8
2.2.4 Storage Media 9
2.2.5 Transmission Media 9

vii
viii Contents

2.2.6 Information Exchange Media 9


2.2.7 Presentation Spaces and Presentation Values 9
2.2.8 Presentation Dimensions 10
2.3 Key Properties of a Multimedia System 11
2.3.1 Discrete and Continuous Media 12
2.3.2 Independent Media 12
2.3.3 Computer-Controlled Systems 12
2.3.4 Integration 12
2.3.5 Summary 13
2.4 Characterizing Data Streams 13
2.4.1 Asynchronous Transmission Mode 13
2.4.2 Synchronous Transmission Mode 14
2.4.3 Isochronous Transmission Mode 14
2.5 Characterizing Continuous Media Data Streams 15
2.5.1 Strongly and Weakly Periodic Data Streams 15
2.5.2 Variation of the Data Volume of Consecutive Information Units 16
2.5.3 Interrelationship of Consecutive Packets 18
42.6 Information Units 19

3 Audio Technology 21
3.1 What Is Sound? 21
3.1.1 Frequency 22
3.1.2 Amplitude 23
3.1.3 Sound Perception and Psychoacoustics 23
3.2 Audio Representation on Computers 26
3.2.1 Sampling Rate 27
3.2.2 Quantization 27
3.3 Three-Dimensional Sound Projection 28
3.3.1 Spatial Sound 28
3.3.2 Reflection Systems 30
3.4 Music and the MIDI Standard 30
3.4.1 Introduction to MIDI 31
3.4.2 MIDI Devices 31
3.4.3 The MIDI and SMPTE Timing Standards 32
3.5 Speech Signals 32
Contents

3.5.1 Human Speech 32


3.5.2 Speech Synthesis 33
3.6 Speech Output 33
3.6.1 Reproducible Speech Playout 34
3.6.2 Sound Concatenation in the Time Range 34
3.6.3 Sound Concatenation in the Frequency Range 36
3.6.4 Speech Synthesis 36
3.7 Speech Input 37
3.7.1 Speech Recognition 38
3.8 Speech Transmission 40
3.8.1 Pulse Code Modulation 40
3.8.2 Source Encoding 41
3.8.3 Recognition-Synthesis Methods 42
3.8.4 Achievable Quality 43

4 Graphics and Images 45


4.1 Introduction 45
4.2 Capturing Graphics and Images 46
4.2.1 Capturing Real-World Images 46
4.2.2 Image Formats 48
4.2.3 Creating Graphics 53
4.2.4 Storing Graphics 54
4.3 Computer-Assisted Graphics and Image Processing 55
4.3.1 Image Analysis 56
4.3.2 Image Synthesis: 71
4.4 Reconstructing Images 72
4.4.1 The Radon Transform 73
4.4.2 Stereoscopy 74
4.5 Graphics and Image Output Options 75
4.5.1 Dithering 76
4.6 Summary and Outlook 77

5 Video Technology 79
5.1 Basics 79
x Contents

5.1.1 Representation of Video Signals 79


5.1.2 Signal Formats 83
5.2 Television Systems 87
5.2.1 Conventional Systems 87
5.2.2 High-Definition Television (HDTV) 88
5.3 Digitization of Video Signals 90
5.3.1 Composite Coding 91
5.3.2 Component Coding 91
5.4 Digital Television 93

6 Computer-Based Animation 95
6.1 Basic Concepts 95
6.1.1 Input Process 95
6.1.2 Composition Stage 96
6.1.3 Inbetween Process 96
6.1.4 Changing Colors 97
6.2 Specification of Animations 97
4
6.3 Methods of Controlling Animation 98
6.3.1 Explicitly Declared Control 98
' 6.3.2 Procedural Control 99
6.3.3 Constraint-Based Control 99
6.3.4 Control by Analyzing Live Action 99
6.3.5 Kinematic and Dynamic Control 100
6.4 Display of Animation 100
6.5 Transmission of Animation 101
6.6 Virtual Reality Modeling Language (VRML) 101

7 Data Compression 105


7.1 Storage Space 105
7.2 Coding Requirements 106
7.3 Source, Entropy, and Hybrid Coding 110
7.3.1 Entropy Coding 110
7.3.2 Source Coding Ill
7.3.3 Major Steps of Data Compression 111
Contents

7.4 Basic Compression Techniques 113


7.4.1 Run-Length Coding 113
7.4.2 Zero Suppression 113
7.4.3 Vector Quantization 114
7.4.4 Pattern Substitution 114
7.4.5 Diatomic Encoding 114
7.4.6 Statistical Coding 114
7.4.7 Huffman Coding 115
7.4.8 Arithmetic Coding 116
7.4.9 Transformation Coding 117
7.4.10 Subband Coding 117
7.4.11 Prediction or Relative Coding 117
7.4.12 Delta Modulation 118
7.4.13 Adaptive Compression Techniques 118
7.4.14 Other Basic Techniques 120
7.5 JPEG 120
7.5.1 Image Preparation 122
7.5.2 Lossy Sequential DCT-Based Mode 126
7.5.3 Expanded Lossy DCT-Based Mode 132
7.5.4 Lossless Mode 134
7.5.5 Hierarchical Mode 135
7.6 H.261 (px64) and H.263 135
7.6.1 Image Preparation 137
7.6.2 Coding Algorithms 137
7.6.3 Data Stream 139
7.6.4 H.263+ and H.263L 139
7.7 MPEG 139
7.7.1 Video Encoding 140
7.7.2 Audio Coding 144
7.7.3 Data Stream 146
7.7.4 MPEG-2 148
7.7.5 MPEG-4 152
7.7.6 MPEG-7 165
7.8 Fractal Compression 165
7.9 Conclusions 166
xii Contents

8 Optical Storage Media 169


8.1 History of Optical Storage 170
8.2 Basic Technology 171
8.3 Video Discs and Other WORMs 173
8.4 Compact Disc Digital Audio 175
8.4.1 Technical Basics 175
8.4.2 Eight-to-Fourteen Modulation 176
8.4.3 Error Handling 177
8.4.4 Frames, Tracks, Areas, and Blocks of a CD-DA 178
8.4.5 Advantages of Digital CD-DA Technology 180
8.5 Compact Disc Read Only Memory 180
8.5.1 Blocks 181
8.5.2 Modes 182
8.5.3 Logical File Format 183
8.5.4 Limitations of CD-ROM Technology 184
8.6 CD-ROM Extended Architecture 185
8.6.1 Forml and Form 2 186
4
8.6.2 Compressed Data of Different Media 187
8.7 Further CD-ROM-Based Developments 188
8.7.1 Compact Disc Interactive 188
8.7.2 Compact Disc Interactive Ready Format 190
8.7.3 Compact Disc Bridge Disc 191
8.7.4 Photo Compact Disc 192
8.7.5 Digital Video Interactive and Commodore Dynamic Total Vision .. 193
8.8 Compact Disc Recordable 194
8.9 Compact Disc Magneto-Optical 196
8.10 Compact Disc ReadAVrite 197
8.11 Digital Versatile Disc 198
8.11.1 DVD Standards 198
8.11.2 DVD-Video: Decoder 201
8.11.3 Eight-to-Fourteen+ Modulation (EFM+) 201
8.11.4 Logical File Format 202
8.11.5 DVD-CD Comparison 202
8.12 Closing Observations 203
Contents xiii

9 Content Analysis 205


9.1 Simple vs. Complex Features 206
9.2 Analysis of Individual Images 207
9.2.1 Text Recognition 207
9.2.2 Similarity-Based Searches in Image Databases 209
9.3 Analysis of Image Sequences 210
9.3.1 Motion Vectors 210
9.3.2 Cut Detection 214
9.3.3 Analysis of Shots 220
9.3.4 Similarity-Based Search at the Shot Level 221
9.3.5 Similarity-Based Search at the Scene and Video Level 224
9.4 Audio Analysis 226
9.4.1 Syntactic Audio Indicators 226
9.4.2 Semantic Audio Indicators 227
9.5 Applications 229
9.5.1 Genre Recognition 229
9.5.2 Text Recognition in Videos 233
9.6 Closing Remarks 234

Bibliography 235

Index 257