UNIVERSITY OF CALGARY

An Imaging System With Watermarking And Compression Capabilities

by

Yonatan Shoshan

A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

DEPARTMENT OF ELECTRICAL AND COPMUTER ENGINEERING CALGARY, ALBERTA SEPTEMBER 2009

© Yonatan Shoshan 2009

Library and Archives Canada Published Heritage Branch 395 Wellington Street Ottawa ON K1A 0N4 Canada

Bibliothèque et Archives Canada Direction du Patrimoine de l’édition 395, rue Wellington Ottawa ON K1A 0N4 Canada
Your file Votre référence ISBN: 978-0-494-54570-6 Our file Notre référence ISBN: 978-0-494-54570-6

NOTICE: The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses worldwide, for commercial or noncommercial purposes, in microform, paper, electronic and/or any other formats. . The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author’s permission. In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis. While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis.

AVIS: L’auteur a accordé une licence non exclusive permettant à la Bibliothèque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par télécommunication ou par l’Internet, prêter, distribuer et vendre des thèses partout dans le monde, à des fins commerciales ou autres, sur support microforme, papier, électronique et/ou autres formats.

L’auteur conserve la propriété du droit d’auteur et des droits moraux qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

Conformément à la loi canadienne sur la protection de la vie privée, quelques formulaires secondaires ont été enlevés de cette thèse. Bien que ces formulaires aient inclus dans la pagination, il n’y aura aucun contenu manquant.

Abstract This thesis presents an imaging system with a novel watermarking embedder and JPEG compression capabilities. The proposed system enhances data security in surveillance cameras networks and others, thus improving the reliability of the received images for security and/or evidence use. A novel-watermarking algorithm was developed for watermarking images in the DCT domain. The algorithm was optimized for an efficient implementation in hardware while still maintaining a high level of security. The imaging system was physically implemented on an evaluation board including a CMOS image sensor, an FPGA for digital control and processing and a frame grabber for image presentation and analysis. The digital circuitry implemented on the FPGA included the proposed watermarking logic as well as all the required peripheral modules and control signals. The accomplishments of this work have been published in four scientific papers. This work was part of a commercializing effort based on the proposed novelwatermarking algorithm.

ii

He has offered me his exceptional knowledge and experience in academic research through endless discussions and mutual work. Graham Jullien. My fellow students in the ISL lab: Mr. Under their support and supervision I have had the chance to experience a very liberal research atmosphere in an excellent environment. Alexander Fish for his help in each and every part of my work on this thesis project. I would also like to thank Dr. Denis Onen for valuable suggestions and productive discussions we held from time to time. Orly Yadid-Pecht and to my cosupervisor Dr. Ms. Marianna Beiderman who provided her support and advice. I would also like to thank Mr.Acknowledgements Several people have had an important contribution to my work on this thesis project. In addition. Xin Li with whom I have worked closely on this project and shared assignments. iii . I hold great dill of appreciation to my supervisor Dr. thoughts and ideas.

..................Table of Contents Abstract ..2...........................2................1 The LFSR .....4............................................................................................2 Existing RNG structures .............................................................................................40 4.10 2...............................3 RNG based watermark generator design method and implementation .....3 Watermark embedding module .............7 2......4.............................. Hardware ....................................2 Implementation of the embedding module in HW .....31 3.......... Abbreviations and Nomenclature ................2 Watermark implementations – Software vs...........................4....................................................18 CHAPTER 3: THE PROPOSED IMAGING SYSTEM ...............4 Watermark generation.............3.......................................................................11 2......1.........4..................2...............22 3................................................................................................................................22 3.................................................................2 The FCSR ..................2...3 Hardware design and verification ...............................35 3......................................................................42 4...........2 Image quality .................1 Watermark Classifications..........1................1............33 3.....................2.............................31 3..................................15 2.....4 Physical proof of concept implementation ...............3 Figures of Merit for Watermarking Systems ................................................................26 3........2 Watermark Design Considerations ..........................................................1 DCT based compression ...............................................................................................36 CHAPTER 4: IMPLEMENTATION.................................................14 2...............................29 3..............38 4................................................................1..........1 Theory and implementation of watermark algorithms .................7 2...............................................2..................24 3...........................................................................1 Image acquisition and reordering .................................................................3............................ ii Acknowledgements ....21 3....................................1 CHAPTER 2: CONSIDERATIONS IN THE DEVELOPMENT OF AN IMAGING SYSTEM WITH WATERMARKING CAPABILITIES .3......................2 Implementation of the compression module in the proposed system.........................................16 2..1 Hardware Experimental Results .......3 State of the art in hardware watermarking .........................................................2 Algorithm performance evaluation .2......... iv List of Tables ..........2 Compression module ......................................13 2.............. viii CHAPTER 1: INTRODUCTION ........1......1 Software implementation and algorithm functionality verification ........................4...........................1 The novel watermark embedding algorithm...........................................................................4........................... TESTING AND RESULTS ...................26 3..................................................................21 3.2................................................................1 The CMOS Image sensor ..............................34 3...........................................38 4............................3 The Filtered FCSR (F-FCSR) .........................7 2...................43 4.....46 iv .............................. vii List of Symbols....................................................4...................2...........1.....................................3 Computational complexity.33 3....................... vi List of Figures and Illustrations ..................................40 4...................................... iii Table of Contents .................................................................1 Fragile watermarking and benchmarking .................................1 RNG based watermark generation................................1 Robustness to Attacks .......45 4.......

........ DCT IDCT Modules ................54 REFERENCES .......................67 B..................................67 B.................................4..........................56 APPENDIX A: MATLAB CODE ................3............................ Simulation Testbench and peripherals ......................3...................1...............................2.....3............................................. Detection .........................3..63 A.....2 Issues that still need attention and future work............................................................................................1.................................................63 A................................................61 A.. Compression/Decompression ......................................2............................59 A.....3.............50 CHAPTER 5: CONCLUSION ..........1................. Watermark Embedding ........... Embedding ....................52 5........................................ Zigzag Modules ..........................................................................................65 APPENDIX B: VERILOG CODE................................................................................................. Simulation Envelope................................................................1.......................92 B.....99 B.3 Output image capture ......3.............................................................................2 Digital signal processing and control .......2..................4..........................................................80 B.....120 B.3..................1..................................................... Top Level and Peripheral Modules .................... Algorithm Implementation................................................................................................59 A..............52 5..................1 Thesis summary ........................120 v ....... CMOS Imager Control Logic and Interface...............................4.......47 4...................................53 5....................................................................................................3..........3 Possible future directions for development .......1...................... JPEG Encoding and Watermark Embedding ...................................................................................59 A.............2...........................

.............................................................................2: FPGA Synthesis Results ............................................1: N vs........... 50 vi ........ Quantization-Level Tradeoffs ..3: Resource utilization by modules in the overall design ....... 20 Table 4.................... 42 Table 4......List of Tables Table 2..................................1: Existing work in hardware digital watermarking research ... 45 Table 4......

...................... 35 Figure 3........1: Algorithm Matlab© simulation results.......................List of Figures and Illustrations Figure 2..... 25 Figure 3....3: Schematic of a DCT based compression encoder .................3 Scheme of general watermark system............................2: additional sample images . 41 Figure 4. 27 Figure 3.....................6: Reorganization of the DCT data in the Zigzag order .......................4: Hardware watermarked image................. 39 Figure 4........................................5: 8x8 Block DCT conversion and 1x64 Zigzag reordering ..... 9 Figure 2..........................11: A Gollman cascade RNG .............................................7: Internal structure of the FPGA digital design ........... 44 Figure 4........8: Schematic description of the watermarking module implementation in HW ................... 43 Figure 4................................................................................................................ 17 Figure 3.........................................4: Schematic of a HW DCT transform module ............................................................................................................... J2 ..... 22 Figure 3.. 46 Figure 4............. 27 Figure 3....... 50 vii .................9: A generalized Fibonacci implementation of an LFSR circuit ..... 47 Figure 4.................................................... 24 Figure 3.....................................................5: A general implementation of an imaging system .............................................................................3: Test setup schematic....... 37 Figure 4.......... 29 Figure 3..............................................2: Example quantization table given in the JPEG standard [4] ..... 8 Figure 2....... 23 Figure 3............ 31 Figure 3...................................................................................................8: Sample output image from the physically implemented system .......................... 46 Figure 4................................................... 34 Figure 3...................... .......................................1: An imaging system with watermarking capabilities ..........6: Mixed signal SoC fast prototyping custom development board ...........................................2 Examples of (a) the original image and (b) the image with a visible watermark ...................1 General classification of existing watermarking algorithms .............7: Example DCT data for blocks J3.......10: Galois implemented FCSR .......

Abbreviations and Nomenclature Symbol CMOS FPN RNG LFSR DCT IDCT FCSR FPGA LSB DWT PSNR PC JPEG RAM KLT HVS ROM XOR FSM ADC I/O CPU LC ASIC Definition Complimentary Metal-Oxide Semiconductor Fixed Pattern Noise Random Number Generator Linear Feedback Shift Register Discrete Cosine Transform Inverse DCT Feedback with Carry Shift Register Field Programmable Gate Array Least Significant Bit Discrete Wavelet Transform Peak Signal to Noise Ratio Personal Computer Joint Photographic Experts Group Random Access Memory Karhunen-Loeve Transform Human Visual System Read Only Memory Exclusive Or Finite State Machine Analog to Digital Converter Input Output Central Processing Unit Logic Cell Application Specific Integrated Circuit viii .List of Symbols.

Since digital images are very susceptible to manipulations and alterations. it can be made possible to detect alterations inflicted upon the image. such as cropping. without perceptually changing it. A watermark is an additional. identifying message. At the onset of this thesis work a basic implementation of a CMOS image sensor with watermarking capabilities was suggested [3]. from professional photography and broadcasting to the everyday consumer digital camera. Digital watermarking has shown the potential for solving many digital imaging problems. scaling.1 CHAPTER 1: Introduction Whether or not a digital image is authentic has become a highly nontrivial question. In this implementation analog noise was 1 . Research activity has been extensive in both the academic and commercial communities. a variety of security problems are introduced. and significant advances and breakthroughs are being constantly published [1]. and broadcast monitoring. Digital imaging is taking over the traditional analog imaging in almost all imaging applications. The ease of integrating CMOS imagers with supporting peripheral elements together with a significant reduction in power consumption introduced a variety of new portable products such as imagers on cell phones and network based surveillance and public cameras [2]. There is a need to establish the digital media as an acceptable authentic information source. including image authentication. The field of digital imaging and its subsidiaries has been going through a continuous and rapid growth during the last decade. blurring and many more. covered under the more significant image raw data. covering. copyright control. By adding a transparent watermark to the image.

criminal investigation. The system added a pseudo-random digital noise to the pixel data. video surveillance. The performance of the original concept needed to be tested and verified. and ownership disputes – arenas where evidence of an indisputable nature is essential. In order to be more practical. with potential applications in. biometric authentication. it had become apparent that a more sophisticated version needed to be developed. Increased complexity in hardware implementations means increased area and 2 . The objective of the thesis was to come up with a commercially attractive watermarking system in hardware. A watermarking technique that incorporates a high level of robustness with low image quality degradation was found to require a high level of complexity. for example.2 used as a seed to the algorithm. Issues of great concern were the effect that the embedding of the watermark had on the original image quality. A Linear Feedback Shift Register (LFSR) that was using the FPN based key as a seed generated the pseudo-random noise. The Fixed Pattern Noise (FPN) [2] in CMOS imagers was considered as an imager specific analog noise to provide a unique seed to a Random Number Generator (RNG). Upon further investigation of the proposed watermarking technique. and no universal watermarking algorithm that can satisfy the requirements for all kinds of applications has been presented in the literature. the viability of the watermark under compression and the robustness of the watermark against common attacks. the new version would target a specific range of applications. Different applications require the utilization of different watermarking techniques.

techniques that utilize fragile watermarking are inherently non-robust. The zero frequency (DC) component is the most significant as it holds the average intensity level of the transformed pixel data.the less significant it is in the description of the image. but robustness requirements are liberal on the other hand. Therefore the appropriate application to target would be one in which the hardware implementation introduces a major advantage on the one hand. At the same time. for both video and still images. it is expected that the remainder of the data will also lie in lower frequencies. Modifying the quantization table changes the tradeoff between visual quality and compression ratio. It is possible to 3 . while suffering a certain loss of accuracy.3 power consumption. Portable devices deliver captured data over communication channels. In the DCT form the image is represented in the frequency domain. The authentication of images taken by remotely spread image sensors is an application that fits the aforementioned conditions. The compression algorithm does not quantize the DCT frequencies evenly but rather uses quantization tables to define different levels of quantization for each frequency [4]. A portable device that works in real time would obviously be able to make the best of efficient hardware-based processing. As a general trend. The vast majority of available compression standards. and the higher the frequency is . utilize the Discrete Cosine Transform (DCT) as part of the compression algorithm. Compression is achieved by quantizing the DCT data thus reducing its size. These channels have limited bandwidth and therefore the raw image data must first be compressed. The frequency domain representation is a more compact representation of the image.

efficient and low cost hardware implementation allows real-time. Further development has also been done towards a more secure watermark generation technique. it is divided into blocks (standard size is 8x8). By uniquely embedding the watermark into each 8x8 block. Moreover. before DCT and quantization. Another built-in advantage of this approach is the fact that the image is first divided into 8x8 blocks. A new approach for designing secure RNGs in hardware was 4 . A semi-fragile algorithm is suitable in image authentication applications as it is robust to the legitimate compression but sensitive to other malicious modifications. on-the-scene security enhancement of the data collected by any system of remotely spread sensors. However.4 embed a watermark in the quantized DCT data. Hardware-based watermarking is the most attractive approach for a combination with real time compression. In general when compressing an image. The original RNG was an LFSR that has very good statistical properties and is most simple to implement in hardware. it has long been shown that the LFSR can be easily crypt analyzed and its seed can be recovered by observing a short sample of the output sequence [5]. which is also implemented in hardware. The watermark will be robust to any level of quantization that is less or equal to the level of quantization utilized by the encoder [4]. the watermark embedder can be naturally merged as an integral part of the compression module. The watermark is embedded in the DCT domain and is intended to be implemented as a part of a secure compression module. A novel-watermarking algorithm was developed in the frame of this thesis. A fast. tamper localization and better detection ratios are achieved.

location and others in each frame it enhances the security of watermarking a series of frames or a video stream. A CMOS image sensor may be employed as the input image data source. the design was synthesized to the onboard FPGA and the whole system was physically tested. the system was described in hardware using Verilog HDL and simulated in Modelsim. including the compression and watermarking modules. The end goal for the development efforts in this thesis was to provide a proof of concept prototype that will demonstrate the feasibility of the implementation of the proposed system. after the hardware description had been verified. the generation unit may include a module to embed significant data in the watermark. [7]. while the onboard Field Programmable Gate Array (FPGA) is used to implement the imager control signals and digital data processing. The design process included algorithmic testing and verification in software using Matlab©. The evaluation board combines digital and analog front ends.5 developed. employing Gollman cascaded Filtered Feedback with Carry Shift Register (FFCSR) cores [6]. 5 . Finally. as well as inherently providing authentic information about the circumstances involving the capturing of said image. An evaluation board was utilized as a platform for the implementation of the prototype. Once the properties of the algorithm were established. time. This structure was found to provide a modular tool for designing secure RNGs that may be suitable for a wide variety of watermarking applications. By embedding identifying features such as date. In addition to the RNG.

6 . chapter 2 provides background on the process of developing an imaging system with compression and watermarking capabilities. Conclusions and future work are discussed in Chapter 5.6 The thesis is organized as follows. The proposed imaging system will be described in detail in chapter 3. Chapter 4 presents all stages of the design process along with measured and simulated results.

Generally. though no standard definition exists to explicitly 7 . while visible watermarking has received substantially less attention [8]-[16].2 shows examples of the original image and the image with a visible watermark. most watermarking algorithms aim for the watermark to be as invisible as possible. for example. First.1.7 CHAPTER 2: Considerations in the development of an imaging system with watermarking capabilities 2.1 shows a general classification of existing watermarking algorithms. Sometimes a watermark is intentionally visible. Digital watermarking can be classified into different categories according to various criterions. a watermark is either intended to be visible or invisible. Figure 2. Invisible watermarking has the considerable advantage of not degrading the host data and not reducing its commercial value. more research attention has been drawn to this field. Figure 2. adding the network logo on the corner of videos in broadcast TV. the identifying image is embedded into the host image and both are visually noticeable. and no universal watermarking algorithm that can satisfy the requirements for all kinds of applications has been presented in the literature. Therefore. Three main categories of watermarking can be identified: Fragile. Watermarking can also be classified according to the level of robustness to image changes and alterations.1 Theory and implementation of watermark algorithms 2. The invisibility of a watermark is determined by how it affects the image perceptually.1 Watermark Classifications Different applications require utilization of watermarking with different properties. in which case. Semi-fragile and Robust.

A fragile watermark is practical when a user wishes to directly authenticate that Figure 2. However.8 determine which is which. Different applications will have different requirements. while one would need the algorithm to be as robust as possible the other may be designed to detect even the slightest modification made to an image . but 8 .1 General classification of existing watermarking algorithms the image he is observing has not been altered in any way since it as been watermarked.such a watermark is defined fragile. This might be the case in applications where raw data is used. in most existing applications such modifications as lossy compression and mild geometric changes are inherently performed to the image. For those applications it is most efficient to use a semi-fragile algorithm that is designed to withstand certain legitimate modifications.

including digital-to-analog and analog-to-digital conversions. Finally. however it complicates the algorithm implementation and therefore the embedding and extracting processes. cropping. require that the watermark be detectable even after an image goes under severe modifications and degradation. Figure 2. An additional classification relates to the domain in which the watermarking is performed. some applications. segment removal and others [13]. is good against counterfeiting attacks. such as copyright protection.[14]. scaling.9 to detect malicious ones. The most straight forward and simple approach is a watermarking implementation in the spatial domain that relates to applying the watermark to the 9 . Making the algorithm depend upon the content of the image.2 Examples of (a) the original image and (b) the image with a visible watermark The dependency of the watermark on the original content is another important distinction. A watermark that answers these requirements would be called robust.

[21] in which the image first goes through a certain transformation. The capacity of the system is defined as the amount of identifying data contained in the cover image. according to the application. for example by replacing the least significant bit (LSB) plane with an encoded one [17]. 2.2 Watermark Design Considerations Let us introduce a number of watermarking properties that affect design considerations.1. Two other common representations are the discrete cosine (DCT) and the discrete wavelet transforms (DWT) [19]. (3) Image quality degradation: the embedding of foreign contents in the image has a degrading effect on image quality. It is comprised of the probability to falsely detect an unauthentic watermark (false positive). (2) False detection ratio: this ratio is characterized according to the probability of issuing the wrong decision.[18]. The value of this ratio is usually determined experimentally. 10 .10 original image. the watermark is embedded in the transform domain and then it is inversely transformed to receive the watermarked image. That parameter is relatively hard to quantize and different measures such as peak signal-to-noise ratio (PSNR) or a subjective human perception measure may be applied [23]. (1) Capacity (the term is adopted from the communications systems field [22]): in a watermarking system the cover image can be thought of as a channel used to deliver the identifying data (the watermark). It is possible to manipulate the detection algorithm in order to minimize one or the other. and the probability to miss a legitimate one (false negative).

The following subsections show how they are considered from different design point of views and indicate several trade-offs between them. In the cover-up attack.11 These properties are elementary in every watermarking system and need to be carefully appreciated. counterfeiting [29] and transplantation [28]. the potential for such an attack is even greater (compared with the cryptographic case) as the attacker does not have to find the exact key. Known attacks are cover-up [28]. In watermarking however. There are two approaches for an attack. And still. 2. the attacker simply replaces parts of the original watermarked image with parts from other 11 . if the capacity of the watermark is large enough . Decrypting the original watermark is a cryptographic computational problem and is directly related to the capacity of the watermarking system. The second approach is to imitate the embedded mark and fool the watermark detector into believing the integrity of the image it is inspecting is intact.using a key of several hundred bits.2. the second one aims to maintain the original mark on a modified image without knowing the mark itself. Knowledge of the embedding and extracting methods is assumed.1 Robustness to Attacks An effective attack on fragile and semi-fragile watermarking will attempt to modify the perceptual content of the image. without affecting the watermark data embedded in it. this attack may not be computationally tractable.1. while the first approach requires the decryption of the encoded mark in order to produce a suitable watermark on an unauthentic image. but only one that would be close enough to pass over the detector’s threshold.

he may do so by copying other blocks in such a way that the change would be perceptually un-noticeable. as well as an image long watermark data sequence.12 watermarked images or with other parts of the original image. rotation. Attacks on watermarking for copyright protection are designed to cause defects to the embedded watermark such that it will be undetectable. Such attacks may include one or more of the following: (1) A geometric attack such as cropping. such as printing and then Analog-to-Digital conversion by scanning (can also be done by resampling). The vector quantization and counterfeiting attacks use multiple images that are marked using the same watermark data in order to synthesize fake marked images. These attacks are only possible when directed at block-wise independent watermarking algorithms [30]. For example. The transplantation makes even watermarking algorithms with block dependencies be vulnerable. but the detector would still recognize a valid watermark on the copied block. (2) A Digital-to-Analog conversion. This combination makes the attacks mentioned above ineffective. scaling etc. while still maintaining reasonable image quality. 12 . or a floor and the attacker wishes to hide a smaller object. (3) Lossy compression and (4) Duplicating small segments of the picture and deleting others (jitter attack) [25]. It is shown in [31] that deterministic block dependency is not sufficient against a transplantation attack. The proposed algorithm incorporates non-deterministic block dependency. if the image contains homogeneous areas such as a wall.

It relates directly to better false detection ratio.13 It is shown then. Unlike in content independent watermarking where the detection algorithm disregards the cover image data.1. The goal is to maintain the required image quality desired for each application and still be robust to potential attacks. that several parameters must be considered for each application. an important objective of a good invisible watermark is minimizing image quality degradation. the embedded data and/or the embedding location are also a function of the cover image. which is the measure for the capacity of that algorithm.2 Image quality As mentioned. it also adds significant high frequency values to the original image. we have shown that for a blind content-independent algorithm. it is possible to increase the security of the mark by making it content dependent [19]. Increasing the bit size of the mark increases the variance of the noise. Adding a pseudo-random noise to each pixel embeds the watermark. That trade-off is discussed in the next two subsections. Thus.2. especially in homogeneous parts of the picture. faking an un-authentic image using watermark data from a different authentic image would not work. 2. To avoid such a significant degradation. However. In a content dependent watermarking system. the trade-off between the security (capacity) of the mark and the negative affect on image quality is straightforward. This introduces higher 13 . here all the data is relevant.[26]. in order to optimize the use of counter-measures. affectively degrading its quality. In the early stages of the project [3].

In real time applications. for instance.1. tamper localization is important. Other algorithms employ global and local mean values. as will be described in the next subsection. added to the algorithm. but features a more secured mark without influencing the cover data severely. one may consider embedding the watermark in the frequency domain. If. an optimized scheme will be comprised of the 14 . However.14 computational complexity.[16]. When implementing in hardware.2. Depending on the intended application.3 Computational complexity Intuitively. Therefore. complex schemes can be implemented to withstand expected attacks. The motivation to keep the computational complexity low depends on the application and on the method of implementation. more complex embedding and detecting blocks would be required. The speed and processing power of the computational platform at hand. temporal dependencies (in video watermarking) and a variety of extra features to enhance their performance [15]. 2. limit the algorithm level of complexity that can be computed in a given time frame. higher complexity requires additional hardware that means more area and additional costs. it is obvious that in order to apply a more complicated algorithm. a partition of the image into blocks may be of use. each additional feature. computations must be done in a very short time period. If the marked image is expected to go through lossy compression. increases the computational effort and hardware resources (such as memory and adders\multipliers) used.

which is used for evaluating the robustness of watermarking algorithms designed for copyright protection applications. evaluation tools provides a good perspective on how well a watermarking system performs with respect to both known attacks and in comparison with other available systems.15 minimum number of features needed to satisfy the needs of the application it is designed for.3 Figures of Merit for Watermarking Systems As is with all security and cryptography related applications.1. An attack on a fragile or semi-fragile system must specifically address the particular algorithm. an attacker could always potentially come up with a way to break the system. While an attack on robust algorithms doesn’t have to tune the parameters of the attack to the specific algorithm. the 15 . Two common ways to characterize the quality of a watermarking system is to test its robustness for known attacks and measure performance using third party benchmarks. and then run them through the benchmark and evaluate the performance by observing the quality of detection. for fragile and semi-fragile. applies a series of attacks on a marked image [27]. The use of such independent. unless mathematically proven otherwise. In addition. The StirMark code. The Checkmark benchmark provides another framework for application-oriented evaluation of watermarking schemes [22]. it is possible to evaluate robustness to specific attacks by manually adding them to this benchmark. A designer can use the system under evaluation to embed a watermark in a series of test images. 2. third party. The evaluation of fragile and semi-fragile watermarking algorithms is more implicit however.

On the other hand it can be easily 16 .16 detector is designed to be highly sensitive to modifications. It is unsuitable for realtime applications. The system can be implemented on either software or hardware platforms. K. like a logo. 2. or some combination of the two. The detector part is at the receiving end. and it cannot be implemented on portable imaging devices that have limited processing power.3. The identifying data (W in Figure 2. as it shares computational resources and its performance is limited by the operating system. embedding and detection algorithms. the watermark can be visible or invisible. using the secret key and an inverse algorithm. As a result the attacker must be aware of the specific watermarking procedure in order to avoid modifications that may alert the detector. The system consists of watermark generation.3). The result is the watermarked image. Such an implementation is relatively slow.2 Watermark implementations – Software vs. A pure software-watermarking scheme may be implemented in a PC environment. The objective is to extract the identifying data embedded in the received image. Hardware Figure 2. First the identifying data is encoded using a secret key.3) can be meaningful. Therefore.3 shows a scheme of a general watermarking system. for it would be too slow. custom designed attacks and theoretical analysis are required and it is not practical to consider a commonly used benchmark. as shown in Figure 2. or it can simply be a known stream of bits. correlating the extracted mark with the original and applying a chosen threshold make the decision. Then the encoded identifying data is embedded into the original image (I in Figure 2. As previously mentioned. Finally.

In this work he proposes software implemented fragile watermarking.[30]. The system utilizes the advantages of software implementation by using resources needed to store image data.3 Scheme of general watermark system. The algorithm embeds the watermark only in a few selected DCT coefficients of every block in order to minimize the effect on the image. The author directly addresses known issues in similar previous works [20]. embedded in the coefficients of the block DCT. Figure 2. A good example of a software watermarking solution was presented by Li [19]. and can be used on everyday consumer PCs. inserting additional complexity to overcome security gaps. 17 . This algorithm is designed for authentication and content integrity verification of JPEG images.17 programmed to realize any algorithm of any level of complexity.[28].

Moreover. In a full imaging system that includes both the imager and watermark embedder. as it is certain that the data entering the system is untouched by any external party. Using a combination of different security resources. hardware implementations usually limit the algorithm complexity and are harder to upgrade. hardware implementations offer an optimized specific design to incorporate a small.3 State of the art in hardware watermarking In the last few years we have seen a significant effort in the field of digital watermarking in hardware. including a non-deterministic mapping of the location of coefficient modulation and block dependencies. In opposite to software solutions. more recent 18 . However. the computations involved in the embedding process are kept relatively basic. such as LSB replacement [33]. minimizing any unexpected deficiencies. where computation time has to be deterministic (unlike software running on a windows system for example) and short. 2. suggesting suitability for future hardware implementation as well. the system succeeds in facing several attacks without changing the effect on image quality – when compared to similar works. the system security is improved. While earlier implementations used more simplistic watermarking techniques.18 transform coefficients and watermark mappings. The algorithm must be carefully designed. Optimizing the marking system hardware enables it to be added into various portable-imaging devices. fast and potentially cheap watermark embedder. It is most suitable for real time applications. This effort is mainly concentrated in implementing invisible robust watermarking algorithms in hardware.

It also shows that most of the work has been done on spatial domain watermarking.19 publications have also introduced incorporation of more sophisticated embedding procedures [34]. The proposed watermark algorithm is intended to be integrated with real-time. Thus. On the other hand. The reader is referred to the chapters below for detailed presentation and explanations of theses improved properties. pipelined JPEG encoders. fragile watermarking has been given much less attention. However. This implementation addresses a field of applications. The algorithm and system proposed in this thesis offer improved properties in terms of hardware utilization. the combined system 19 . The presented implementations mainly focus on implementing algorithms previously presented in software on a real-time platform.1 is based on this survey and presents many of the studies published in the field. which is still not treated in previous work.[42]. It is demonstrated how the watermark embedder can be naturally integrated as a part of a JPEG compressor. robustness against known attacks and tolerance to legitimate compression. In one of the most recent surveys of hardware digital watermarking [32] a comprehensive review of hardware watermarking related topics is given. it seems that potential attacks on fragile and semi-fragile algorithms such is counterfeiting and collusion are not covered well [33]. Table 2. Still most of the work is concentrated on the implementation of robust watermarking algorithms for copyright applications and its subsidiaries such as broadcast monitoring. The table shows the variety of different watermarking applications and possible research directions.

20

could serve both still image sensors as well as M-JPEG video recorders. The HW implementation of the watermark embedder consists of simple memory and logic elements. It features minimal image quality degradation and good detection ratios. Table 2.1: Existing work in hardware digital watermarking research
Research Work Garimella et. al. [33] Mohanty et. al. [34] Tsai and Lu [35] Mohanty et. al. [36] Hsiao et. al. [37] Seo and Kim [38] Strycker et. al. [39] Maes et. al. [40] Tsai and Wu [41] Brunton and Zhao [42] Mathai et. al. [43] Vural et. al. [44] Petitjean et. al. [45] Platform ASIC ASIC ASIC ASIC ASIC FPGA DSP FPGA/ASIC ASIC GPU ASIC Not implemented FPGA/DSP WM Type Fragile Robust-Fragile Robust Robust Robust Robust Robust Robust Robust Fragile Robust Robust Robust Application Image Image Image Image Image Image Video Video Video Video Video Video Video Domain Spatial Spatial DCT DCT Wavelet Wavelet Spatial Spatial Spatial Spatial Spatial Wavelet Fractal

20

21

CHAPTER 3: The proposed imaging system This chapter will provide a general description of the proposed imaging system followed by a more detailed discussion of the different building blocks. Figure 3.1 presents an overview of the system. Naturally, the system is designed as a pipeline where each stage adds latency but does not compromise the overall throughput. Every stage has an initial transition phase, after which it issues a new valid output on every clock. The length of the transition phase is the latency of the system and is the total length of the transition phase times of all the stages in the pipeline. 3.1 Image acquisition and reordering Image acquisition can be done using any digital imaging device. However, employing a CMOS image sensor provides an opportunity for a higher level of integration. In the most general form, a raster scan digital pixel output is considered. A dual port memory buffer, capable of storing 16 rows of raw image data is used to reorder the pixels in 8x8 blocks. Outputting the pixels in a different order than that in which they are input performs reordering. Data is sent forward to the DCT module 55 clocks before 7 complete rows have been stored in the memory buffer. This is done to ensure that valid data for the DCT module is present in the memory buffer on every clock from this point on. The DCT module processes the data that is stored in rows 1-8 of the buffer memory block after block, while new data is stored in rows 9-16. Once all the data in rows 1-8 has been processed, new data is again written there, while the data in rows 9-16 is being processed.

21

22

The buffer memory size and latency depend on the size of the image sensor. For example, for a 1M pixel array having 1024 rows and 1024 columns with 8 bits per pixel, the size of the memory buffer is 128Kbits and the latency would be 8137 clocks.

Figure 3.1: An imaging system with watermarking capabilities 3.2 Compression module 3.2.1 DCT based compression Following is a brief overview of DCT based compression, for a detailed discussion, the reader is referred to literature [46],[48]; it has been shown that transforming a spatial

22

In addition.2: Example quantization table given in the JPEG standard [4] Compression is achieved by applying different levels of quantization to different DCT coefficients. the DCT presentation of natural images is considered to be a good approximation of the Karhunen-Loeve Transform (KLT) that is the most compact representation.23 image into the DCT domain provides a more compact presentation of the information contained in the image [48]. Figure 3. Therefore. Dividing the value of the coefficient by the quantization value assigned to it does quantization. many of the low value 23 . Division by a larger quantization value results in a more coarse representation of the coefficient (more information is lost in the process). representing the image in the DCT domain enables compression by concentrating the data in fewer coefficients and further identifying those portions of the data that are more visually significant. Furthermore. Figure 3. The Human Visual System (HVS) was found to be less sensitive to changes in higher frequency components [48]. the image is represented by its spatial frequency components. Each of the 64 DCT coefficients is assigned a specific value for quantization.2 presents the quantization table suggested in the JPEG standard. In fact.

24

higher frequency coefficients get zeroed out in the process thus increasing the compression ratio. Different applications use different quantization tables. Of course, if a higher level of compression is required, higher quantization values are used. In the scope of this work a very important property of DCT based compression is that once an image has been compressed with a certain quantization level, it can be recompressed with a smaller level of quantization without incurring any further loss of information. Therefore, embedding a watermark in the quantized coefficients ensures that the watermark is robust to DCT compression with a quantization level equal or less to the quantization level used during the watermarking process [19]. To complete the compression and translate the reduced amount of data into a representation using fewer bits, entropy encoding is applied. In JPEG, entropy encoding involves run-length encoding followed by Huffman coding or arithmetic encoding [4]. Figure 3.3 presents a schematic view of a DCT based compression module in hardware.

Figure 3.3: Schematic of a DCT based compression encoder 3.2.2 Implementation of the compression module in the proposed system A DCT transform module was implemented following a design available from Xilinx Inc. [46], based on the architecture described in Figure 3.4. This implementation takes 24

25

advantage of the separable property of the DCT transform, i.e. the 2-D DCT transform can be calculated as a series of two 1-D DCT transforms, where the first transform is applied in one direction and the second is applied on the orthogonal direction. Using vector form, the 8x8 DCT transform Y of an input block X is given by Y = CXCt, where C is the cosine coefficients and Ct are the transpose coefficients. In hardware this is realized by storing the output of the first 1-D DCT in a transpose memory buffer line after line, then applying a second 1-D DCT transform on the columns of the result.

Figure 3.4: Schematic of a HW DCT transform module It is much easier to implement multiplication in HW then division, especially in FPGA devices, where designated multipliers are often available. It is possible to predetermine the inverses of all 256 possible quantization values and storing them in a ROM. Multiplying the DCT value by the inverse quantization value gives the desired quantized DCT coefficient. The entropy encoding modules were not necessary for the purposes of this work. The proposed watermark embedder processes the quantized output of the DCT transform module. The entropy encoding stage merely changes the way the data is represented but does not lose any more information in the process. Therefore encoding the watermarked

25

26

data only to decode it back to the exact same form does not provide any additional insight. 3.3 Watermark embedding module 3.3.1 The novel watermark embedding algorithm Presented here is a novel watermarking algorithm that allows a very simple and efficient implementation of the watermark embedder in hardware. The algorithm modulates N cells in each DCT block. The values of the processed DCT block are considered along with the values of its neighbour to the left in order to choose which cells are to be modulated and in what way. A secret pseudo-random sequence, with the same length as the image, serves as the watermark data. It is used to mask the operation of the algorithm and resist attacks. As shown in Figure 3.5, the original image is divided into 8x8 blocks indexed I1-IMxN. After DCT transformation the DCT blocks J1-JMxN are reorganized in blocks of size 1x64, according to the zigzag order, shown in Figure 3.6. Let us consider the watermarking procedure for the block I3 of the example image I. Figure 3.7a-b present J3 and J2 that are the DCT transforms of the blocks I3 and I2 respectively. Figure 3.7 c is the secret watermark data W3 generated for I3. The binary matrix P3 in Figure 3.7d is the logical AND between J3, J2 and W3, thus enabling dependency of the procedure on the neighbouring block J2 and masked by the secret watermark data W3.

26

The remaining cells in P3 determine the LSB of the indicated cells. indicate the two cells that are going to be modulated in J3. the next cell to the second group and so forth until all cells have been assigned. the first two non-zero cells (marked black in Figure 3.6: Reorganization of the DCT data in the Zigzag order The matrix P3 is used to embed the watermark in J3. Still following the zigzag order.5: 8x8 Block DCT conversion and 1x64 Zigzag reordering Figure 3.7d) in the matrix P3. the cells are alternately divided into two groups such that the first cell belongs to the first group. In the example. The bits in each group are XORed to 27 . cells 47 and 43. Considering N=2 (this will be assumed in all the examples from now on) and following the zigzag order.27 Figure 3. the two groups are marked by different backgrounds.

It is obvious then. A block that produces less than two non-zero cells in a matrix P is considered un-markable and is therefore disregarded. 28 . it is reasonable to assume that an attacker would attempt to cover continuous surfaces rather than isolated spots and so morphological closure on the output detection image can improve detection ratios.28 determine the corresponding LSB value for the designated two cells. The modulated cells are identified. while at the same time image quality is reduced. performance is satisfactory. The detector first decodes the data to receive the quantized DCT data. In particular. The embedder only needs to change the value of cell 43 from 2 to 3. The input is the watermarked image in compressed format. The detection procedure is a lot similar to that of the embedding procedure. The matrix P3 is created in the exact same manner as for the embedding process. additional processing can aid detection. In principal. In the example. as N is increased so does the robustness of the algorithm. that embedding the mark would only have a slight effect on the hosting block data. a comparison is made to verify that they are indeed equal to the expected modulation value. Taking advantage of the software platform. Simulations show that even for N as low as 2. Only blocks that are distinctively homogeneous and have very low values for mid and high frequency DCT coefficients are problematic. only that instead of modifying the cells. the results of XORing the cell of each group both yield a value of 1.

3.7: Example DCT data for blocks J3. Jb(i). from the Zigzag buffer and a watermark bit Wb(i) from the watermark generator as inputs. Figure 3.2 Implementation of the embedding module in HW Let us examine how the watermarking of the example block J3 is done in hardware.8 presents a schematic view of the hardware implementation of the watermark embedder.29 Figure 3. Each clock the module receives a 12 bit quantized DCT coefficient. J2 3. 29 .

30 Here b is the number of the block within the current frame and i is the number of the DCT cell within the block b. the Ind register holds the indexes of the cells where the mark is to be embedded. The value of Pb(i)=AND(Jb(i). The modulator reads the data registered in the Previous Ind and Val registers.Wb(i)) is calculated. Jb(i) is stored in the DCT data buffer to be used in the watermarking of the next block. 30 . while the Val register holds the values for the LSB of these cells. Recall that it is now required to divide the remaining Pb values into two groups. thus using the associative nature of the XOR operation to progressively calculate the value of the XOR between all the bits in the group.e. getting JWMb-1(i). Jb+1. These values are then copied to the registers marked previous to make room for new calculations. not before XORing it with the current value of the register. The value of the index i will be recorded by the Ind register for the two first non-zero occurrences of Pb(i). This is done by alternately referring Pb(i) into the cells of the Val register. Pb(i) is forwarded on to the Current “Ind” and “Val” registers. If the value of the index i is found in the Ind register then the corresponding bit in the Val register is written to the LSB of Jb-1(i). all of the block Jb has been read.Jb-1(i). When i turns to zero i.

the initial state will determine the whole sequence. where Sn is the n-th bit of the sequence and t is the length of the period.4 Watermark generation 3. no repeating patterns and when security is a requirement. S n = S n +t .4. a pseudorandom sequence has low cross-correlation values between different samples. ∀n ≥ n0 .1 RNG based watermark generation Pseudorandom sequences are sequences that have similar statistical properties as true random sequences but still allow regeneration.8: Schematic description of the watermarking module implementation in HW 3. In hardware.31 Figure 3.e. Any sequence generated by an FSM will eventually be periodic. using a Finite State Machine (FSM) may generate a pseudorandom sequence. The initial state will be 31 . the prediction of future samples or otherwise regeneration of the sequence based on observation should not be possible. i. For a pseudo Random Number Generator (RNG). In general.

The 32 .32 determined according to a secret key. one of the common methods is to combine different RNG architectures to get one that is more secure [49]. The size of the RNG (the number of bits in the shift register) is in proportion to the key range. sequences from both LFSR and FCSR can be easily recovered from their outputs using cryptanalysis [49]. In this case. As watermarking is a security oriented application it is important to have secure RNG design and size. Thus. where the non-linear output of the FCSR core is linearly filtered. By sharing the knowledge of the secret key (which is much shorter than sharing knowledge of the whole watermark sequence). the output will be much more robust to cryptanalysis. The vast majority of proposed RNGs are based on the use of feedback shift registers (SR) where the input bit is a function of the current shift register state. both sides are able to generate the same watermark data sequence. In order to increase the complexities. Different feedback functions can be implemented. such as Linear Feedback Shift Register (LFSR) with a linear feedback function and Feedback with Carry Shift Register (FCSR). Identical RNGs are to be implemented in both the embedding side and the receiving side. An RNG can be used as a watermark generator. is a good example of such a combination [7]. The linearity or non-linearity of the RNG will determine the mathematical tools used to analyze the output of the RNG. A secure RNG is designed in such a way that a potential attacker would have to consider all the possible secret keys in order to regenerate the sequence. which has a nonlinear feedback function. A Filtered-FCSR (F-FCSR).

As mentioned in [5]. they are not very secure and given 2k bits of the sequence.4. since the LFSR has a maximum of 2k different states.2 Existing RNG structures 3.33 maximal proportion between the shift register size and the key range is 2n-1. It is clear that the sequence sn is periodic.1 The LFSR In an LFSR the input bit is a linear function of its previous state. the Berlekamp-Massey algorithm can be used to regenerate the whole sequence. 3. This function is determined by the connection polynomial q ( x ) = ∑ i =0 qi xi − 1. the mathematical tools of finite binary fields are used.2. defined by the initial state.1} . The polynomial S ( X ) = ∑ n =0 sn x n ∈ GF ( 2 ) [[ X ]] where s ∈ {0. However. m-sequences have very good statistical properties and are distributed uniformly. Figure 3. In order to be able to analyze the properties of the output sequence. It can be shown that S(x)=u(x)/q(x) [5].9 shows the generalized Fibonacci implementation of an LFSR.4. and a polynomial u(x).1} . An m-sequence is a sequence with the maximal 2k-1 period. If q(x) is a primitive polynomial then sn is an m-sequence. qi ∈ {0. The shift register is driven by the sum modulo 2 (XOR) of some bits of the overall shift register state. is the ∞ generating function for the output sequence. k 33 . where n is the number of bits in the shift register.

which can be understood if we ∞ ∞ consider the result of 1+ ∑ 2 i = 0 . To receive a strictly (without any transient phase) periodic sequence s.2. shown in Figure 3.4. A 2-adic integer can be associated to any binary sequence.1} . Another intuitive association is to digital arithmetic i= 0 ∞ where Z2 can be associated to an infinitely large 2’s complement system. An important observation is that −1 = ∑ i =0 2i . The sequence can no longer be analyzed using finite fields and the related structure is 2-adic integer numbers [5]. The set of 2-adic integers is denoted by Z2. using the FCSR shown above.9: A generalized Fibonacci implementation of an LFSR circuit 3. a carry from the last summation is added.2 The FCSR The Galois mode implementation of an FCSR. A 2-adic integer is formally a power series s = ∑ n= 0 sn 2n . is an expansion of the LFSR where instead of sum modulo 2. 34 . This introduces non-linearity and enhances the security of the output sequence.34 Figure 3.10.[47]. s ∈ {0.

The period and statistical properties of the output sequence depend only upon q. If q is odd and p and q are co-prime. It is suggested 35 . where q must be odd and negative and p<-q.3 The Filtered FCSR (F-FCSR) As its name suggests. Figure 3. i. it determines where an addition device will be added between two cells and where the bit would simply be shifted. In order to achieve the maximal period. the period of s is the order of 2 modulo q. 2q-1-1 is divided by q.. The linear filter is a XOR gate with selected bits of the FCSR serving as inputs.10: Galois implemented FCSR The FCSR basically performs the 2-adic division p/q.2. the period T. the smallest integer t such that 2t≡1 mod q. The 2adic integer q is the connection integer of the FCSR. would be equal to (|q|-1)/2. In that case.4.35 we need to consider two co-prime integers p and q. available cryptanalysis methods may be used to recover the secret key by observing a short minimal output sequence. The FCSR is not a secure RNG and similar to LFSR. That is. to receive the sequence s.e. q must be a prime number with 2 as a primitive root. The integer p is a function of the initial state. 3. The parameter p is related to the initial state of the generator. the F-FCSR utilizes an FCSR in addition to a linear filter.

The implemented RNG is comprised of several fundamental F-FCSR building blocks connected in series. This undefined structure makes the F-FCSR robust to any known cryptanalysis methods [7]. When identical 36 . 3.36 that if qi equals one then Mi-1 is connected to the filter. is depicted in Figure 3. as shown in [7]. Utilizing cascades offers a straightforward and simple approach to enhance the performance of many systems.4. By creating an initial pool of relatively small-sized core RNGs it is possible to construct a custom sized generator without any significant design effort.3 RNG based watermark generator design method and implementation A novel design technique involving a Gollmann cascade with F-FCSR cores was proposed and implemented. The period of a cascaded generator is the product of the periods of the different cores. introduced in [6]. The important feature of this method of cascading is the use of the XOR function for the coupling of the register clock. Shift register based RNGs are cascaded by making the registers be clock controlled by their predecessors. Earlier cascades utilizing AND functions resulted in a very low clock rate for the registers further down the cascade.11. In other words. Originally. where fi indicates whether Mi 0 is connected to the filter or not. Here. cascading is used to complicate the structure of the RNG and make it more secure. The addition of the linear filter breaks the 2-adic nature of the FCSR and introduces a new mathematical structure which is nor 2-adic neither linear. we proposed to use cascading to achieve a high level of modularity for the designer. A Gollman cascade RNG. if the output sequence is denoted by s = ⊕ iN=−1 fi M i then fi=qi+1.

Figure 3.37 cores are used T=(T’)l.45 frames. Considering a 256x256 sensor array as an example.321 bits. 37 . Using F-FCSR cores in a Gollmann cascade structure makes the best out of both concepts. In watermarking applications the modularity of the design method allows for worry free adjustments of the RNG size according to the specific implementation requirements.11: A Gollman cascade RNG This structure was found to provide a modular tool for designing secure RNGs that may be suitable for a wide variety of applications.568. T’ is the period of one core and l is the number of the cores in the cascade. The tool was utilized to design a 22-bits RNG composed of two 11-bits cores connected as a Gollman cascade [50]. In this implementation the periodic binary sequence is used directly as the secret watermark. the watermark will repeat itself every 54. The implemented RNG outputs a binary sequence with a period of 3. Each core is inherently secure and the design complexity remains simple. where T is the period of the cascade. The detector would need to have knowledge of the starting frame and the secret key in order to extract the correct watermark.

The results of a simulation of the proposed algorithm are presented in Figure 4. testing and results 4. This is facilitated by the homogeneous nature of the surrounding neighbourhood.38 CHAPTER 4: Implementation.1(a) shows the original 128x128 example image. the algorithm performance was evaluated using a Matlab© simulation. A more easily noticeable example of such modifications is shown on the lower left corner of the image where the reflection of the sun is partially removed. The tampered image is shown in Figure 4. Figure 4.1 Software implementation and algorithm functionality verification First. The detection map is used to indicate which blocks of the image are suspected as inauthentic. With only 35% of the DCT data being non-zero. A sample image is embedded with a watermark according to the proposed algorithm.1(d).1. Figure 4.1(c). Two areas of the image have been modified after the cover-up attack has been applied to the watermarked sample image. Both modifications would be easily noticed using the detection map created by the WM detector and presented in Figure 4. the PSNR is still very high at 43. On the upper right corner. To an innocent observer the original existence of the airplane in the image is visually un-detectable. The attacked image is analyzed by the watermark detector that outputs a detection map. the airplane in the original image is removed by copying the contents of adjacent blocks onto the blocks where the airplane is supposed to appear. The cover-up attack is applied to the watermarked sample image. This example shows that the watermark 38 .4dB and the difference between the images is practically un-noticeable.1(b) is the image after it was compressed and embedded with the watermark.

Figure 4.39 is as effective on homogeneous surfaces (where only a small portion of the DCT data is non-zero) as it is on high spatial frequency surfaces.1: Algorithm Matlab© simulation results 39 .

a larger N should be considered when an aggressive level of quantization is desired. Several experiments have been conducted to examine the optimal number of cells to modulate. Therefore. The user merely needs to embed a mark. As to image quality. In terms of hardware costs the difference is reasonable.1. There.2 Algorithm performance evaluation The results shown above are achieved while using N=2. That means only two cells in each block of DCT data were modulated. Table 4.1 Fragile watermarking and benchmarking It is important to emphasize that benchmarking for a fragile watermarking algorithm is a tricky issue. becoming more significant as the quantization level increases. summarize the effect of changing N along with the level of quantization. The level of quantization is indicated by the average ratio of nonzero cells in a block after quantization. shown in Table 4. 4.2. then run the marked image through the benchmark and then try to detect the mark. N=4 exhibits slightly better detection ratios.1 shows that the difference is less than 0. 40 . in terms of PSNR. the objective of the benchmark is to apply known attacks on a marked image such that the mark would be undetectable.5dB that is mostly negligible. and it sums up to extra registers and larger multiplexers. The cost of increasing N is additional hardware and a reduction in image quality.40 4. As mentioned before. in robust algorithms it is possible to utilize third party benchmark with relative ease. The results.

Therefore running a marked image through any of the available benchmarks would simply damage the watermark such that the detector thinks (rightfully) the image has been tampered with.1 and 41 .2: additional sample images WM Image Tampered Image Detection Zones The evaluation strategy taken in this thesis includes a combination of experimental results from the tamper detection on sample images such as the ones presented in Figure 4. An attack on a fragile watermark must consider the specific watermarking algorithm as it needs to try and imitate what the authentic embedder is doing in order to be able to fool the detector. this flow is not practical. The objective of a potential attacker is to make modifications to the marked image without damaging the embedded mark.41 In fragile algorithms however. Original Image Monkey Lena Figure 4.

0 N 2 4 2 4 2 4 PSNR [dB] 45. The data is then DCT transformed.6 94. The DCT and inverse DCT transform blocks were borrowed from [46].22 38.0 21. A detailed proof of the sufficiency of these measures is given in [19].1: N vs.8 28.8 43.7 76.7 90.3 4.0 28. First. quantized and arranged in the zigzag order according to the procedure described above. Quantization-Level Tradeoffs Nonzero Cells [%] 43.7 90. The memory is used to emulate a digital image sensor.85 45.64 41.24 39. the algorithm is inherently resilient to known attacks against fragile watermarking thanks to inter block dependency and a non-deterministic choice of the watermarked cells within each block. The onboard Cyclone FPGA is used to implement the proposed watermark embedding architecture. as well as the necessary peripheral blocks. Table 4.3. In addition. The data output from the memory is treated as if an image sensor generated it. an image is copied onto the onboard frame memory.0 21.59 41.3 Hardware design and verification The watermark embedder block was described using Verilog HDL. A preset watermark sequence is 42 .42 Figure 4.77 Detection Ratio [%] 82.2.0 89. An evaluation board was employed to assess the performance of the algorithm when synthesized to an Altera Cyclone FPGA device. The schematic of the test setup on the evaluation board is given in Figure 4.

The DCT data of a sample image was pre-calculated in software and read to a virtual buffer. [51] for a complete 43 JPEG compressor system. Finally. Finally the marked DCT data was stored in a file and analyzed to verify it was marked correctly. The DCT data was embedded with the watermark. (a) shows the input image to the watermark hardware test setup and (b) presents the watermarked image at the output. The Verilog module reads the DCT data from the virtual buffer the same way it would do with the output of the zigzag buffer.43 used by the watermark embedder module to embed the watermark in the DCT coefficients of the test image.3. the design was also mapped to an Altera FLEX10KE FPGA such that it is possible to compare the results with the results of Agostini et al. the DCT data is de-quantized and rearranged before it is inversely transformed back to the spatial domain.3: Test setup schematic 4. . Figure 4.1 Hardware Experimental Results The Verilog description has been verified through HDL simulation and experimented on the evaluation board. The output is the original image that now contains the watermark. In addition. The design was synthesized to an Altera Cyclone EP1C20 FPGA device using the Altera Quartus II design software.

The results clearly show that the watermark embedder can easily be added to an existing JPEG compressor. The addition does not affect the desired throughput and requires a negligent addition of hardware resources and power. even when that compressor is oriented at lowcost high throughput applications. throughput and latency.4: Hardware watermarked image 44 . compared to the complete system to which it is added. Figure 4.44 Table 4.2 summarizes the performance of the three mappings in terms of hardware usage.

45 .4 Physical proof of concept implementation As part of a commercialization effort and to provide further validation.73% to the hardware of the JPEG compressor. 4.93 MHz 209.84 MHz 238 The hardware embedder would add 132 more logic cells that is a negligible addition of 2. The evaluation board has an LVDS I/O port for fast communication with a digital frame grabber. The combined system would easily fit in the original FPGA device. Figure 4. the proposed imaging system was physically implemented.5 presents the necessary elements required for the physical implementation.6 was utilized as the implementation platform. [16] JPEG compressor FLEX Logic Cells 132 113 Memory Bits 744 744 Frequency\ Throughput 187.16 MHz Latency 64 64 4844 7436 39. any device that is large enough for the implementation of the JPEG compressor would be enough to accommodate the additional hardware required for the watermark embedder block. The evaluation board shown in Figure 4. The onboard FGPA device provides memory and a platform for control signals generation and digital image processing.2: FPGA Synthesis Results Design WM Embedder FLEX WM Embedder Cyclone Agostini et al. In general.45 Table 4. A CMOS imager with an internal ADC and analog biasing circuitry is employed as the test imager.

There are two important attributes to 46 . A 256x256 pixels rolling shutter CMOS image sensor with an internal 12 bit pipelined ADC was borrowed from [52].4.5: A general implementation of an imaging system Figure 4.46 Figure 4.1 The CMOS Image sensor Following is a brief description of the properties and mode of operation of the test CMOS image sensor.6: Mixed signal SoC fast prototyping custom development board 4.

2 Digital signal processing and control All the digital signal processing and control was performed on the onboard FPGA device. rolling shutter operation introduces a non-continuous readout sequence with a setup phase for performing row reset and analog pixel data readout operations. The FGPA device handles I/O communication with multiple devices on the board. Theses two attributes.47 consider.7: Internal structure of the FPGA digital design 47 .4. 4. describing the internal design structure of the FPGA is given in Figure 4.7. The 12 bit pipelined ADC introduces a latency of 6 clock cycles. while requiring special attention. offer insight on the applicability of the proposed system to a common imaging system setup. A schematic diagram. Figure 4.

48 The CPU interface is responsible for communications with the onboard microcontroller. which in turn handles communications with a PC. Its size is 256x16 words. The microcontroller handles the CPU interface in a similar manner to an external memory. An input memory buffer is used to reorder the pixels. The image data can be transmitted to a digital frame grabber without further processing. In this implementation the encoder/embedder receives as input the pixel data and synchronization signals generated by the imager interface. JPEG requires that the pixel data be input in 8x8 blocks. The analog data of every pixel in the line is then sampled by the ADC one pixel at a time. 48 . The timing and length of the pause are determined according to the synchronization signals. This allows the user to change internal FGPA register values online. The imager interface is in charge of generating control signals to the CMOS imager and of receiving and synchronizing its output pixel data. According to the imager interface operation the data is written one line at a time with a pause between every two lines. A line setup phase takes place before the readout of every line. It has a capacity of sixteen lines allowing alternating read/write operation where eight lines are being written to the memory while the other eight lines are being processed. The control signal generation is based on a fundamental line sequence that is repeated periodically. A sample clock and a decoder that controls an analog multiplexer handle the sampling. the decoder input and the latency of the ADC and generates synchronization signals including pixel clock. line enable and frame enable. The imager interface accounts for the sample clock.

The 49 . a data output enable for the entire encoder is turned on.9% of the LCs used by the encoding module. When valid watermarked data is output from the inverse DCT module. The encoding module and its interface make the main demand for hardware resources. Data is read out in chunks of eight lines at a time with suitable synchronization lines generated for the transmission of the watermarked data to the frame grabber. an output memory buffer is used to reverse the 8x8 blocks order back to standard rolling shutter. which is less than 0. Table 4. after the watermark has been embedded and the watermarked DCT data has been inverse transformed back into spatial pixel data.49 While the general architecture of the encoder is meant to function as a pipeline.3 summarizes the resource utilization in the final implementation of the design. followed by a pause to wait for the next eight lines to be completely written into the input memory buffer. input enable has been added to all the registers in the design such that it is possible to freeze the module state at any point without loss of information. The data is then written to the output memory buffer that operates in a similar manner to that of the input memory buffer except it is in the reverse direction. Finally. it is obvious that due to the imager mode of operation. To comply with the pipelined nature of the encoder. it is impossible to achieve a completely continuous operation. the data from the input memory buffer is read eight lines at a time. The inclusion of the watermark-embedding module including the RNG adds only 113 logics cells (LCs). Every element in the encoder has a data output enable signal to facilitate synchronization between different elements in the design. Instead.

Table 4.8: Sample output image from the physically implemented system 4.4.50 watermark-embedding module also utilizes 744 memory bits to store the DCT values of one block. An example image. NI LabVIEW© software is used for the analysis and presentation of the captured data.1% of the memory used in the overall encoding module. about 1.3 Output image capture A National Instruments (NI) PCI-1428 digital frame grabber card is used to capture the image output from the evaluation board.3: Resource utilization by modules in the overall design Module Overall design Imager interface CPU interface Encoder + buffers WM embedder RNG Logic Cells 13471 198 282 12573 113 30 Registers 8307 103 154 7905 78 28 Memory Bits 66280 0 0 66280 744 0 (a) output image w/ watermark (b) output image w/o watermark Figure 4. taken by the implemented 50 .

51 .51 system is given in Figure 4.8. The system offers the option to produce a watermarked image and a reference image that only goes through DCT and IDCT for the purpose of comparison.

The watermark algorithm must incorporate security features but have low complexity. watermarking in hardware was still an emerging field of research. In most existing watermarking algorithms in the DCT domain. Watermarking in the DCT domain was identified as having the potential to accommodate these requirements.1 Thesis summary This thesis has been conducted as part of an ongoing hardware watermarking I2I project. This prototype implemented an elementary watermarking algorithm in the spatial domain [3].52 CHAPTER 5: Conclusion 5. the addition of a watermark embedding module based on a low complexity algorithm would require very little overhead. 52 . in order to provide a more commercially appealing implementation it has been determined that a more sophisticated design must be realized. Therefore. It was found that while much work has been done in the field of watermarking in software. At the starting point of the thesis work. The objective was to come up with a concept that would allow the addition of a secure watermark in hardware without resulting in performance degradation and/or increasing costs significantly. Watermark embedding in hardware introduces an opportunity to enhance the security of a real-time imaging system but is also a design challenge. the DCT transform and quantization are the most hardware intensive elements. a prototype has already been designed. An extensive literature survey has been conducted to explore existing watermarking methods and applications. However.

5. The result of this limited accuracy is a significant distortion of the image due only to the DCT and IDCT modules operation. The novel watermarking algorithm was presented in the IEEE ICECS 2008 conference along with our proposed RNG design technique [54]. The paper was submitted to the IEEE Transactions on Information Forensics and Security. Hardware synthesis and experimental results were described in a paper on the application of the proposed system to publicly spread surveillance cameras. Currently. An effort has been made to achieve better performance. it is possible to apply the proposed design with little to no additional cost. As illustrated in the proof of concept physical implementation. the design is expected to fit either discrete systems with separate chips for the imager and digital processing or systems integrating the complete design in an ASIC. tested and implemented. The findings of the extensive literature survey were presented in a paper that appeared in the 2008 IJ ITK [53]. The accomplishments achieved during the course of this thesis have contributed to the publication of four papers with the most recent results summarized in a fifth paper awaiting review. [55]. As DCT compression is widely used in most common imaging systems.53 A novel watermarking algorithm for the DCT domain was designed. the implementation of the algorithm resulted in a hardware increase of a mere 1%.2 Issues that still need attention and future work To allow a more in –depth view of the proposed system it is important to achieve a more reliable implementation of the DCT and IDCT hardware modules. the implemented modules have only limited accuracy. however to this point with no 53 . As expected.

3 Possible future directions for development One of the important features of the proposed technology is its compatibility to a wide range of implementation platform. it was decided to make use of the imperfect modules in order to get basic initial results.54 satisfactory results. further testing can be performed. While this analysis is a significant indication of the algorithms’ robustness. After the DCT and IDCT modules will have been improved. as well as examine multiple quantization tables and levels. robustness against potential attacks is mainly based on mathematical analysis of the nature of the algorithm. it may be useful to custom design a test bench with known attacks on semifragile watermarking for further validation. Hence it is still possible to demonstrate the functionality of the physically implemented system. As a semi-fragile watermark technique that addresses mainly tamper detection. Presentable results were obtained by applying input images with reduced quality. Many real time applications use digital signal processing (DSP) dedicated processors for the implementation of digital processing algorithms. In particular it is recommended that the detection performance be evaluated under real hardware conditions. 5. wired or wireless. Further analysis should consider the effects of data transmission over the communication channel. These processors are microcontrollers with powerful arithmetic logic units 54 . Because hardware implementation of the compression modules has only been approached as a subsidiary assignment (these are readily available for purchase and hold no novelty) in this thesis.

At the prototype stage work was concentrated on still imaging compression. MPEG-x or h. Future work will include the implementation of an imaging system employing DCT based video compression. it has also been determined that the algorithm is compatible with DCT based video compression standards.55 (ALU) specific for speeding signal processing related computations.26x. Open source modules are available for both still and video DCT based compression standards. The proposed algorithm has been designed to be compatible with DCT based compression standards. However. MJPEG. 55 . It is expected that the proposed watermarking system be accommodated in a DSP based platform while allowing it to maintain performance. making the main challenge be the integration of the algorithm on the DSP platform along with the other existing components of the compression module. e.g. A video standard based system will also provide a chance to introduce temporal security features in the watermark generation unit.

Ed.. C. and I. Canada Sep. Anderson and M.” in IEEE Trans. Nov. " CMOS imagers: from phototransduction to image processing". Potdar. May 2005. no. pp. pp. 5326-5329 ISO/IEC JTC1/SC2/WG10 Digital Compression and Coding of Continuous-Tone Still Images Draft. 1997. 9. on Circuits and Systems (ISCAS '05). 1062-1078 R. Kobe. Yeung. C. 1985. 10. 2005. G. Han. G. Image Process. 4. Jan. 2002." in Proc. “Image authenticity and integrity verification via content-based watermarks and a public key cryptosystem”. Eds. 1992. 0216-0220 F. and M. applications and evaluation of known watermarking algorithms with checkmark. Tirkel and C. Braudaway." IEEE Trans. 1374-1383. pp. Tsai and H. Hilton New York & Towers. Lecture Notes in Computer Science Berlin. 1998 K. G. of the IEEE International Conference on Multimedia and Expo.. 151. Y.” IEEE Proceedings of the Fourth Annual ACIS International Conference on Computer and Information Science. T. Springer-Verlag. Aug. US Patent 6 993 151. Ingemarsson. C. 6. T. B. Wolfgang and E. A. 5. 9–12. vol. 3313-3316. no. Tsai. "A new class of stream ciphers combining LFSR and FCSR architectures. 2001. 783-791. Image Proc. A.vol. Workshop. pp. Lou and T. vol. "A watermark for digital images. no. 5. Arnault. Li. "Watermark embedding and extracting method and embedding hardware structure used in image compression system". Vis. Bartolini and A. S. Wang. “Attacks applications and evaluation of known watermarking algorithms with Checkmark”. Acoustics. Li. Kuhn. 10. Proc." in IEE Proc. 3. M.11. Vancouver. “Why is image quality assessment so difficult?.. P. Computers. "Digital fragile watermarking scheme for authentication of JPEG images.716 Yadid-Pecht and R. Electronic Imaging. May. Jullien and O. 2004. "Attacks.0-76952296-3/05. Chang. P." In Advances in Cryptology ." in Proc. M.” in Proc. “Counterfeiting attacks on oblivious block-wise independent invisible watermarking schemes. Symp. 2005. Petitcolas. Berlin: Springer-Verlag. 86–90. Beth. S. F. Conf.” in Proc. Lu. 209. H. number 2551 in Lecture Notes in Computer Science. vol. pp. F. pp. vol. Arnault and T.. pp. Mohanty. “Effective and ineffective digital watermarks. pp 22-33." in Proc. C. van Schyndel. Berger and A. Osborne. IEEE Int. 709. pp. NY. F. Mohanty. 1525. Meerwald and S. vol. J. N.csee. pp. 3. 1996. Y. Y. D. P. Berger.August 2. Conf. Petitcolas. Germany: SpringerVerlag.” Information Hiding: 2nd Int. Nelson. URL: http://www. Necer. vol. Austin. 694-697. . 432–411. A. T." IEEE Trans. R. Memon. Nakamura and K.. vol. IEEE Int. Barni. 2002. "Design and properties of a new pseudorandom generator based on a filtered FCSR automaton." in Advances in Cryptology: Proceedings of Eurocrypt ‘84. 2006. Delp. C. Conf. Holliman and N. SPIE Electron. Lausanne. III. G. 3rd IEEE International Conference on Industrial Informatics (INDIN '05). Speech and Sig.INDOCRYPT 2002. vol. Images Processing. 2000. G. IEEE Int. 460-466. P. "Pseudo-random properties of cascade connections of clock-controlled shift registers. Matsui. Bovik and L. IEEE Int. pp. M. 219–222. “Attacks on copyright marking systems. P. 1994. Proc. R. "A DCT Domain Visible Watermarking Technique for Images". Pereira.usf.Aucsmith. D. "Improved Wavelet-Based Watermarking Through Pixel-Wise Masking. et al. on Image Processing.pdf F. Image Signal Processing. Piva.A Survey" Proc. 2002. Conf. T. IEEE Int. "CMOS image sensor with watermarking capabilities". 93-98. International Standard 10918-1. TX. E. A. in Proc. F. 56 .2000. Conf. H. Pereira. Switzerland. Z. "A survey of digital image watermarking techniques". Lu. D. of SPIE. Yadid-Pecht. P. Etienne-Cummings. Nov. "Information Hiding . 2. S. R. 54. v4675 F. pp.edu/~smohanty/research/Reports/WMSurvey1999Mohanty. Mintzer. "Embedding Secret Information into a Dithered Multi-level Image" IEEE Military Communications Conference 1990 pp. Imaging. Kuhn. no. Chen. vol.2005. IEEE Int. pp. Kluwer Academic Publishers G. Japan. Proc. ser. USA. M. Z. Meerwald S. Lecture Notes in Computer Science. IEEE 87(7) Jul. pp. Anderson.. J. T. Gollmann. Image Processing. New York City. Tanaka. J. 2000. vol. and M. "Digital Watermarking: A Tutorial Review". "A digital watermark. 1999 pp. R. “Wavelet Transform Based Digital Watermarking for Image Authentication. Image Process. July 30. J.56 References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] V. Security and Watermarking of Multimedia Contents IV. Cot. 218-238.

and Its Applications (Vol. J. Y. "A fair benchmark for image watermarking systems. "Real-time Video Watermarking on Programmable Graphics Hardware. Conf. S. Tsai. pp. T." in Proc. Sys. S. Mohanty. J. "Towards Real-time Video Watermarking for SystemsOn-Chip. P. pp. "Secret and public key authentication watermarking schemes that resist vector quantization attack. Video Compression Using DCT. “Attacks on copyright marking systems. pp. Proc. Available: http://www. C. A. Imaging. G. pp. pp. pp. S. Petitcolas." in Proc. Eng. vol.” in Information [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] Hiding: 2nd Int. Talstra. A. “Digital Watamarking for DVD Video Copyright Protection. Barreto. "VLSI Implementation of a Real-Time Video Watermark Embedder and Detector. 3971. Nicolai. 216–217. Multimedia and Expo (Vol. 339-358. 88–93. G. P. 292–295. C. .. G. 2000. M. R. Xilinx. CA: IS&T and SPIE. P. Hawkes. S. 2. 1st Workshop on Embedded Sys. Mohanty. L. Maes. "An Implementation of Configurable Digital Watermarking Systems in MPEG Video Encoder. Mathai. Electron. G. 57 .” in Proc. Circuits and Systems. 3657." in Proc. N. 2005. "Security Issue and Collusion Attacks in Video Watermarking. pp. Rey. pp." in Proc. 2).” Elsevier IJ Comp. Rijmen. G. Opt. Sig. Mahapatra.L. Lu. P.com/support/documentation/topicaudiovideoimageprocess_compression. M.” in Proc. Wong and N. K.57 [25] F. France. 597–600. J. 2005. J. Workshop. 1996. & Dig. S. Computer and Communications Security. complementarity and complexity of 2-adic FCSR combiner generators” In Proceedings of the ACM Symposium on Information. Y. Haitsma.. Strycker. H. Symp. N. 2002. 303–304. “Real-Time Blind Watermarking Algorithm and its Hardware Implementation for Motion JPEG2000 Image Codec. SPIE Int. M. Lecture Notes in Computer Science. A. in proc. Hsiao. 1998. 2001. IEEE Int. V. Circ. ser. V. Jullien and O. pp. Consumer Electron. Workshop on Intelligent Sig. M. S. Linnartz. 1). N. 772–775. 2004." New York: Wiley. "Further attack on Yeung-Mintzer watermarking scheme. Opt. P.Aucsmith. A.). IEEE Int. Inc. Ranganathan. Maes.” IET Comp. vol. 17 (5) (2000) 47–57. “A Systems Level Design for Embedded Watermark Technique using DSC Systems. L. Ranganathan and R. J. 53 (5) (2006) 394– 398. G. 11th Int. E. Shoshan.Y. Yadid-Pecht. no. A. M. R. Tomii. H. Techniques (CDT) 1 (5) (2007) 600–611. pp. Niranjan. Workshop. pp.. 2003. Yamauchi. “A Simplified Approach for Designing Secure Random Number Generators in HW”. pp. no. J. 2). P. Eng. 57-62. Li. ASIACCS ’06. Kutter and F. 417-427. vol.. 2009." in Proc. (2008." in Proc. "An Implementation of a Real-time Digital Watermarking Process for Broadcast Monitoring on a Trimedia VLIW Processor. and Comp. Memon. J. Seo. 46 (3) (2000) 628–636. pp. and Elect. Satyanarayana. G. 2nd Ed. D. H. IEEE Workshop Signal Processing Systems. [online]. D. M. Murugesh. 2005. G. Goljan and N.” IEEE Transactions on Circuits and Systems II (TCAS-II). for Real-Time Multimedia. P. Sheikholeslami. 1525. 218-238. “A Dual Voltage-Frequency VLSI Chip for Image Watermarking in DCT Domain. 1999. 148. and Comm. F. Vision Image Signal Process. Y. Termont. Chang. Eng. Image Proc. Mohanty. S. A. Sept 2008. Proc. Consumer Electron.xilinx. vol. Kalker. vol. Haitsma. Canadian Conf. Balakrishnan.. Y. P. Ramanan. IEE Int. 2005 B. U. P. Malta." Ph. “Hardware assisted watermarking for multimedia.” IEEE Sig. Gabriele. on Electronics. Germany: Springer-Verlag.D. Eng. Universite de Nice Sophia-Antipolis. pp. V. Doerr. 2. Tsai. Kougianos. T. “ASIC for Digital Color Image Watermarking. 2002. K. dissertation. H. Fridrich. "VLSI implementation of invisible digital watermarking algorithms towards the developement of a secure JPEG encoder. 1312–1315. Kundur. 35." in Proc. K.htm S. Mag." in Proc. vol. 2003. Proc. G. Brunton. 775–779. 11th IEEE Dig. D. Kim. Dugelay. 2003. E. Vural. N. H. 2006 Handbook of Image and Video Processing By Alan Conrad Bovik Published by Academic Press. Taiwan. Y.. Taipei. "Video Watermarking For Digital Cinema Contents. Wu. San Jose. G. “VLSI Design of an Efficient Embedded Zerotree Wavelet Coder with Function of Digital Watermarking. "Toward secure publickey blockwise fragile authentication watermarking. Conf.W. IEEE Int. IEEE Int. Ed. Depovere. S. Kalker. "Applied Cryptography. Apr. 13th European Sig. Zhao. Garimella. Conf." in Proc. J. J. C." in Proc. Ranganathan. on Elec. Conf. A." in IEE Proc. Namballa. 428-437. W. Kim and V. Schneier. X. “Periodicity. Anand. H. (Vol. Soc. H. IEEE Int. 2003. Conf. Berlin. J. Petitjean. Mohanty. T. A. C. Tai. Kuhn.S. Memon. 1999. C. D. Proc. Anderson and M. Kougianos.M. and Sys. J..” IEEE Trans. Depovere. F. P. 183-188. “VLSI architecture and chip for combined invisible robust and fragile watermarking. N. Fish. Soc. Vandewege.” in Proc. P. Symp. Petitcolas. 3971. SPIE Int. 2000.

I. pp. and O. "VLSI Watermark Implemetations and Applications. Shoshan.” in Sensors and Actuators." in IJ Information and Knowledge Technologies. Silva and S. Jullien. Fish. 31(8). 2008.2006. 134. pp. X. Li.002 S. Y. Malta. 2008. X.. vol. Jullien and O.” in Proc. A.. G. 372-375. 2008.1016/j.doi. G. Y. A. Hamami. Elec. “CMOS Image Sensor Employing 3. A. Microprocessors and Microsystems [online]. 58 . “A simplified approach for designing secure Random Number Generators in HW. V. 368-371.” in Proc. 119-125.2. Shoshan. 1. 2007. pp. pp. Yadid-Pecht. Available: http://dx. A. Fish. February). no.. A: Physical. S. “Hardware Implementation of a DCT Watermark for CMOS Image Sensors. and Syst. A. Li. Vol. Bampi (2006. A. Conf. Fleshel and O. Circ and Syst. Yadid-Pecht. Yadid-Pecht. 487-497. Y.02. Agostini.micpro.V 12 bit 6. Fish. Malta. Shoshan. L. Yadid-Pecht. Conf. IEEE Int.3 MS/s Pipelines ADC..58 [51] L.org/10. Circ. Multiplierless and fully pipelined JPEG compression [52] [53] [54] [55] soft IP targeting FPGAs.3. Elec. IEEE Int. Jullien and O. G.

'NumberTitle'.['WM Image'].[27 12].['Detection Zones'].[27 13].[6 3]).nc).[27 7].'off') imshow(LOC). I3 = copyblock(I3.'NumberTitle'. I3 = copyblock(I3.[27 10].[9 26]. imshow(uint8(I)).['Tampered Image'].'off') %watermarked image imshow(uint8(I2)). 59 . [C LOC] = detect_hardware(I3.[27 18]. Simulation Envelope NEW_HW_Sim.[6 3]). I21 = dct2im(J.[7 26].A.32). J2 = im2dct(I. I3 = copyblock(I3.'off') %original after % decompression imshow(uint8(I21)).59 APPENDIX A: A. q = 4.[27 16].1.ma MATLAB CODE % This code simulates the proposed watermarking system nc = 2.[27 13].se).q). LOC = reshape(LOC.q.[27 4].['Compressed Image']. % % LOC = imclose(LOC.[27 13].'NumberTitle'. %number of coefficients marked I = double(imread('testimage.nc). %optional closing stage to enhance detection figure('Name'.[2 4]).32.q.[6 3]).A. I2 = dct2im(J.1. %% simulate a cover-up attack I3 = copyblock(I2. %load saved wm data sequence [J H] = embed_hardware(I. I3 = copyblock(I3.[27 1].1).[6 2]). se = strel('diamond'. Simulation Testbench and peripherals A. figure('Name'. figure('Name'.q).1.H.'NumberTitle'. figure('Name'.[6 1]). % quantization factor % Step 2 load WM_data. I3 = copyblock(I3.'off') imshow(uint8(I3)).bmp')).q).

Iout(mht:mlt. Iout = Iin.nlt:nrt) = Iin(mhs:mls. nlt = (Bt(2)-1)*8 + 1. and rms is the root mean square difference between two images. nls = (Bs(2)-1)*8 + 1.60 PSNR(I/255. which measure the ratio of the peak signal and the difference between two images. CompRatio = mean(sum(J ~= 0))/64 end copyblock. nrs = (Bs(2)+Bsize(2)-1)*8. The PSNR is given in decibel units (dB). mls = (Bs(1)+Bsize(1)-1)*8.nls:nrs).double(uint8(I2))/255).m % this function simulates the cover up attack function Iout = copyblock(Iin.Bsize) % calculate coordinates mhs = (Bs(1)-1)*8 + 1. 60 . nrt = (Bt(2)+Bsize(2)-1)*8.Bs.m function PSNR(A.B) % % % % % % if A == B error('Images are identical: PSNR has infinite value') end max2_A = max(max(A)).Bt. mht = (Bt(1)-1)*8 + 1. mlt = (Bt(1)+Bsize(1)-1)*8. PSNR = 20 * log10(b/rms) where b is the largest possible value of the signal (typically 255 or 1). PSNR.

2f dB'.nargin)).decibels)) A. disp(sprintf('PSNR = +%5.^2))))). if max2_A > 1 | max2_B > 1 | min2_A < 0 | min2_B < 0 error('input matrices must have values in the interval [0.1]') end err = A .61 max2_B = max(max(B)).2.quality) % im2dct receives an image x and a quality factor quality as inputs % the output is the 8x8 DCT transformed image % the coefficients are reorganized in the zigzag order and quantized % with a pre-set quantization table error(nargchk(1. decibels = 20*log10(1/(sqrt(mean(mean(err. min2_A = min(min(A)).m function y=im2dct(x . % quantization table Q=[1 1 1 1 1 1 2 1 1 1 1 1 2 3 1 1 1 1 3 3 4 1 1 1 1 4 3 4 1 1 2 2 4 4 5 2 3 3 4 6 5 1 3 3 3 4 6 1 1 3 3 3 3 1 1 1 61 . min2_B = min(min(B)).2. end % Compute DCTs of 8x8 blocks and quantize the coefficients. quality=1. if nargin<2.B. Compression/Decompression im2dct.

if nargin<2.Q). %reorder column elements in zigzag format 62 .fun).[8 8]. : ). end % quantization table Q=[1 1 1 1 1 1 1 1 1 1 1 2 2 3 5 5 1 1 1 1 3 3 4 5 1 1 1 1 4 3 4 5 1 1 2 2 4 4 5 1 2 3 3 4 6 5 1 1 3 3 3 4 6 1 1 1 3 3 3 3 1 1 1 1] * quality. 'distinct'). dct2im.2. %perform 8x8 block dct J = blkproc(J./P1)'. fun = @dct2. %organize 8x8 blocks in 1x64 columns J = J(order. [8 8]. J = blkproc(x. [8 8]. %quantize the result %reorder the coefficients with the zigzag pattern J = im2col(J.quality) % dct2im receives the transformed and quantized image J and % performs inverse quantization and transform to output the % reconstructed image error(nargchk(1. order = [1 9 2 3 10 17 25 18 11 4 5 12 19 26 33 41 34 27 20 13 6 7 14 21 28 35 42 49 57 50 43 36 29 22 15 8 16 23 30 37 44 51 58 59 52 45 38 31 24 32 39 46 53 60 61 54 47 40 48 55 62 63 56 64].'round(x.nargin)).62 5 5 5 5 1 1 1 1] * quality.m function y=dct2im(J . y = J. quality=1.

Embedding embed_hardware. Jfun(:.3.fun). zigzag NumOfC = nc. S = zeros(NumOfC. % arrange 64 % columns into 8x8 blocks J = blkproc(J. [8 8]. Neg = J < 0. : ). Jfun(:. %de-quantize the % coefficients fun = @idct2.fun). H = H . M = zeros(NumOfC. quantization.Q).'distinct').[8 8]. % zig-zag format % reorder column elements back from J = col2im(J. I = blkproc(J. fun = @findh.[8 8]. A. %dct.63 inv_order = [1 3 4 10 11 21 22 36 2 5 9 12 20 23 35 37 6 8 13 19 24 34 38 49 7 14 18 25 33 39 48 50 15 17 26 32 40 47 51 58 16 27 31 41 46 52 57 59 28 30 42 45 53 56 60 63 29 43 44 54 55 61 62 64]. Algorithm Implementation A.q).nc) J = im2dct(I.3.m function [y H] = embed_hardware(I.1:end-1).2:end)).1.[64 1]. % find the location of two higest indexes of non-zero cells H = blkproc(Jfun.'x.A. %inverse transform y = I.q.*P1'.60/NumOfC).60/NumOfC).1. 63 .J(:.2:end) = and(J(:.1) = 1.[256 256]. J = J(inv_order. % perform logical and between adjacent blocks Jfun = J.

b)).64 P = zeros(1. end if x == 0 x = 2. %embedding sequence for m = 2:32 b = m.:)).1.b)) ~= mod(sum(S(s.b).b) + 1. else x = x + 1.b).b) = toggle2(J(H(s.1)*32.b).:) = P(Sm+s).*A(1:H(NumOfC. P(1:H(NumOfC.b). for n = 2:32 b = m + (n . for s = 1:NumOfC S(s. if CheckLSB(J(H(s. 64 . end % LSB check end % s loop end end y = J.m function y=CheckLSB(x) C = bitget(uint16(abs(x)).b)) = Jfun(1:H(NumOfC. Sm = 0:NumOfC:60-NumOfC.60).1:16).2) J(H(s.b) + 1.2) x = x . P(1:end) = 0. y = C(1).m function j = toggle2(x) if mod(x. toggle2.b) + 1. CheckLSB.

if count == (n + 1) return end end end A.J(:.[64 1]. count = count + 1. Jfun(:.2:end) = and(J(:.1) = 1. % set the number of marked coefficients NumOfC = nc. findh.A. H = H .q.h'.3.2. count = 1.fun). Jfun = J.m % this function is responsible for finding the two highest indexes of % non-zero cells in the block function h = findh(J) n = 2. h = 0:n-1.q).1:end-1). Detection Detect_hardware. h = n . H = blkproc(Jfun.nc) J = im2dct(I. fun = @findh. 65 .H.65 end j = x.2:end)).1.m function [C LOC] = detect_hardware(I. for i = 1:64 if J(65-i) h(count) = 65-i. Jfun(:.

b). LOC = sum(HD)>0. for m = 2:32 b = m. S = zeros(NumOfC.sum(sum(HD))/4096. P(1:H(NumOfC.b)) ~= mod(sum(S(s.:)).b).66 HD = zeros(size(H)). for s = 1:NumOfC S(s.b)) = Jfun(1:H(NumOfC.*A(1:H(NumOfC.:) = P(Sm+s).1)*32. P(1:end) = 0. HD(s. M = zeros(NumOfC.60).2). end % s loop end end C = 1 .60/NumOfC).b).60/NumOfC).b)+1. 66 .b). P = zeros(1.b) = CheckLSB(J(H(s. Sm = 0:NumOfC:60-NumOfC. for n = 2:32 b = m + (n .

SRAM_ZZ. // target interfce input [11:0] TARGET_OUT. inout [7:0] CPU_AD. SADC_ARD. SDAC_RST. PI_CLK. FADC_BOE. SADC_BRDY. CPU_WR. SADC_CLK. SHS. CPU_ALE. COL. output LED.v //--------------------------------------------------------------------// Description : This is the top level module for the integrated system //--------------------------------------------------------------------`timescale 1ns / 100ps module littel (CLK. CPU_RD. TARGET_OUT. fg_pen. // cpu interface input CPU_RD. aps_ext_in. output [15:0] aps_ext_in. GEN1. ExtCtr. SRAM_CS. ExtBias. SADC_AOTR. output SHS. fg_len. GEN2. CPU_AD. input [7:0] CPU_A. // globals input CLK. Rset. RowDecColDec. FADC_BDV. input CPU_ALE. input CPU_WR. SADC_BRD. FADC_AOV. SHR. SRAM_RD.fg_dout). Cset. PI_IN. FADC_A. ROW. SADC_BOTR. SADC_DOUT. D. CntExt_. fg_fen . SRAM_WR. FADC_BOV. RRst. RST. PixelRst. SADC_APD. SRAM_D. ExtCtrPipe. // debug output [7:0] COL. PI_EN.67 APPENDIX B: VERILOG CODE B. ROW. RST. FADC_CLK. TARGET_CLK. 67 . output TARGET_CLK. SADC_ARDY. CPU_A. FADC_ADV. SADC_BPD. Top Level and Peripheral Modules little_top. LVDS_PD. LED. SRAM_A.1. FADC_AOE.

output FADC_BOE. output PixelRst. output SRAM_RD. output ExtCtrPipe. // waveform generator output [15:0] GEN1. output ExtCtr. FADC_AOV. SADC_ARDY. SRAM_ZZ. // fast AD convertor output FADC_CLK. SADC_BPD. output SADC_AOTR. output RowDecColDec. output FADC_AOE. SRAM_WR. output SADC_BOTR. GEN2. output [17:0] SADC_DOUT. input [12:1] FADC_A. output Rset. output [38:23] D. input FADC_ADV. output SDAC_RST. SADC_APD. inout [17:0] SRAM_D. 68 . input FADC_BDV. output SHR. SADC_ARD. SADC_BRD. output CntExt_. output ExtBias.FADC_BOV. output Cset. // slow AD convertor output SADC_CLK. // ZBT sram intrerface output [19:0] SRAM_A. // frame grabber (LVDS camera link interface) output LVDS_PD. output [1:0] SRAM_CS. SADC_BRDY.68 output RRst.

wire [7:0] aps_time_ref. wire [7:0] aps_fra_dat. output fg_fen. // PI interface output PI_EN. wire [20:0] cpu_add. wire [15:0] CPU_GEN1. wire [1:0] select_zbt_user. wire APS_TARGET_CLK. wire [7:0] wm_fra_out. wire pic_module_active. wire aps_pen. output fg_len. wire [11:0] aps_dout. wire aps_len. wire clk_pulse. input PI_CLK. wire [7:0] aps_controls. wire [17:0] zbt_dout. //top level buses wire cpu_zbt_rd_n. output [11:0] fg_dout. wire [11:0] user_data_fg. wire cpu_zbt_we_n. input [14:0] PI_IN. wire aps_fen. wire zbt_dout_v_n.69 output fg_pen. wire [7:0] aps_clk_counter. wire [17:0] cpu_din. /////////////////////////////////////begin////////////////////////////// //////// led led_inst 69 .

. .sram_zz(SRAM_ZZ).cpu_rd_n(cpu_zbt_rd_n).CLK(CLK). .gen2value().pic_module_active(pic_module_active).zbt_dout_v_n(zbt_dout_v_n) ).select_zbt_user(select_zbt_user). . . .sram_wr_n(SRAM_WR).sram2_ce1_n(SRAM_CS[1]).70 ( . . . cpu_int cpu_int ( .cpu_add(cpu_add). 70 . .zbt_dout(zbt_dout). . . . .aps_controls(aps_controls). .sram_oe_n(SRAM_RD).sram1_ce1_n(SRAM_CS[0]).ale_n(CPU_ALE). . .led_out(LED) ).cpu_add(cpu_add). . . zbt_mux U2 ( . .cpu_din(cpu_din). . . . .gen1value(CPU_GEN1).rst(RST).rst(RST).aps_time_ref(aps_time_ref). .cpu_we_n(cpu_zbt_we_n).sram_add(SRAM_A).CLK(CLK).sram_data(SRAM_D).clk(CLK).

pic_module_active(pic_module_active). . .user_data_fg(user_data_fg) ). .fg_len(aps_len).cpu_zbt_rd_n(cpu_zbt_rd_n).ExtCtrPipe(ExtCtrPipe).zbt_dout_v_n(zbt_dout_v_n). .Cset(Cset).RowDecColDec(RowDecColDec). .ExtCtr(ExtCtr). . . .71 . .zbt_dout_v_n(zbt_dout_v_n).SHR(SHR). .cpu_rd_n(CPU_RD). .Rset(Rset).cpu_wr_n(CPU_WR).rst(RST).rst(RST). . .SHS(SHS). .fg_pen(aps_pen).PixelRst(PixelRst). .aps_ext_in(aps_ext_in).target_out(TARGET_OUT). .port2(CPU_A).zbt_dout(zbt_dout). . .cpu_din(cpu_din). . .zbt_dout(zbt_dout).CntExt_(CntExt_). .aps_controls(aps_controls). . . .select_zbt_user(select_zbt_user). 71 . . . .ExtBias(ExtBias).cpu_zbt_we_n(cpu_zbt_we_n). . . .CLK(CLK).port0(CPU_AD). .RRst(RRst). aps_int aps_int ( .

.fg_fen(fg_fen).aps_fen(aps_fen).rst(!RST). . .72 . . . .aps_clk_pulse(aps_clk_pulse).clk(CLK). .adc_dv(FADC_ADV).fg_dout(fg_dout) ). .fg_len(fg_len).aps_clk_counter(aps_clk_counter). assign FADC_AOE = 1'b0. .aps_clk(APS_TARGET_CLK). . assign LVDS_PD = 1'b1.user_data_fg(user_data_fg) ). assign GEN1 = 16'h7FFF. assign TARGET_CLK = aps_controls[7] ? aps_controls[2] : APS_TARGET_CLK.aps_dout(aps_dout).aps_clk_counter(aps_clk_counter). .aps_pen(aps_pen).target_clk(APS_TARGET_CLK).time_ref(aps_time_ref). . endmodule 72 .fg_pen(fg_pen). . wm_int wm_int ( . .fg_fen(aps_fen). . . .aps_clk_pulse(aps_clk_pulse). .fg_dout(aps_dout).aps_len(aps_len). . assign GEN2 = 16'h7FFF.

input ale_n .port0 .cpu_wr_n .73 cpu_int. 73 .pic_module_active . // zbt interface output [1:0] select_zbt_user . It is responsible for communications with an external PC // It contains registers which can be written by the onboard CPU `timescale 1ns / 100ps module cpu_int ( zbt_dout_v_n . input CLK .aps_controls .cpu_add . `include "chip_def.user_data_fg . output cpu_zbt_we_n .aps_time_ref . // cpu bus input cpu_rd_n . output cpu_zbt_rd_n . input [7:0] port2 . input zbt_dout_v_n .gen2value). output [20:0] cpu_add .cpu_zbt_we_n .rst .v // Title : cpu_int // Description : This is the cpu interface for the TestBoard Altera // chip. inout [7:0] port0 . // gen interface output [15:0] gen1value.cpu_rd_n . gen2value. output [17:0] cpu_din . input [17:0] zbt_dout .cpu_din .ale_n .gen1value .cpu_zbt_rd_n .CLK .zbt_dout . input cpu_wr_n .port2 .v" input rst .select_zbt_user .

wire altera_busy. reg cpu_we_d. reg cpu_zbt_we_n. reg inst_go. reg [4:0] ADD. reg [31:0] Add. always @(negedge ale_n) ADD_l = {port2.74 //aps_int output pic_module_active. // regs reg [15:0] ADD_l. reg [15:0] Gen1_reg. reg [7:0] Aps_time_ref. data_out <= 0.port0}. reg [3:0] Cmd. Aps_controls. output [7:0] aps_controls. Gen2_reg. reg [7:0] data_in_s. output [11:0] user_data_fg. reg [2:0] Mux_control. reg [11:0] user_data_fg. Dat. ///////////////////////////////////////////////////////////////// // Chip registers ///////////////////////////////////////////////////////////////// always @(posedge CLK or negedge rst) begin if (~rst) begin data_in_s <= 0. output [7:0] aps_time_ref. cpu_we_ddd. cpu_we_dd <= 1. cpu_we_d <= 1. // sample latch 74 . cpu_zbt_rd_n . cpu_we_dd. // simple async //latch for address always @(posedge CLK) ADD = ADD_l[4:0]. data_out.

if (~zbt_dout_v_n) begin Dat[17:0] <= zbt_dout. Aps_time_ref <= 0. Gen2_reg <= 0. inst_go <= 1. end 75 . Mux_control <= 0. cpu_we_d <= cpu_wr_n. inst_go <= 0. Dat <= 0. Dat[31:18] <= 0. end // write if ((cpu_we_ddd == 1) && (cpu_we_dd == 0)) // transition to low case (ADD) CMD_REG_ADD : begin Cmd <= data_in_s[3:0].75 cpu_we_ddd <= 1. cpu_we_ddd <= cpu_we_dd. Cmd <= 0. cpu_we_dd <= cpu_we_d. end else begin // defaults inst_go <= 0. Gen1_reg <= 0. // sampled signals data_in_s <= port0. Add <= 0. user_data_fg <=0. Aps_controls <= 0.

MUXC_REG_ADD : data_out <={5'h0.Mux_control}. ADD3_REG_ADD : data_out <=Add[31:24]. ADD1_REG_ADD : data_out <=Add[15:8]. <= data_in_s. <= data_in_s. APS_RATE_ADD : data_out <=Aps_time_ref. <= data_in_s. <= data_in_s. but by // the time the cpu will get to read data_out it sould be stable case (ADD) TEST_REG_ADD : data_out <= 8'h5a. ADD2_REG_ADD : data_out <=Add[23:16]. default endcase end end : data_out <= 0.altera_busy}. <= data_in_s. // RO CMD_REG_ADD : data_out <= {4'b0. DAT2_REG_ADD : data_out <=Dat[23:16]. 76 .Cmd}. ALTERA_BUSY_ADD : data_out <={7'h0. APS_CONTROLS_ADD : data_out <=Aps_controls. DAT0_REG_ADD : data_out <=Dat[7:0]. <= data_in_s.76 ADD3_REG_ADD : Add[31:24] ADD2_REG_ADD : Add[23:16] ADD1_REG_ADD : Add[15:8] ADD0_REG_ADD : Add[7:0] DAT3_REG_ADD : Dat[31:24] DAT2_REG_ADD : Dat[23:16] DAT1_REG_ADD : Dat[15:8] DAT0_REG_ADD : Dat[7:0] MUXC_REG_ADD : Mux_control APS_RATE_ADD : Aps_time_ref APS_CONTROLS_ADD : Aps_controls endcase <= data_in_s. ADD0_REG_ADD : data_out <=Add[7:0]. <= data_in_s. DAT1_REG_ADD : data_out <=Dat[15:8]. <= data_in_s. <= data_in_s[2:0]. DAT3_REG_ADD : data_out <=Dat[31:24]. <= data_in_s. // read mux // note : ADD is a synchronous signal sampling an async latch.

// if active if (~inst_running_n) case (Cmd[3:0]) 77 . // connect reg outputs assign select_zbt_user = Mux_control[1:0]. assign aps_controls = Aps_controls. assign aps_time_ref = Aps_time_ref. assign port0 = (~cpu_rd_n) ? data_out : 8'hZZ. cpu_zbt_rd_n <= 1. always @(posedge CLK or negedge rst) begin if (~rst) begin inst_running_n <= 1. cpu_zbt_we_n <= 1. end else begin // signals default values cpu_zbt_we_n <= 1. cpu_zbt_rd_n <= 1. pic_module_active <= 0. reg [1:0] stage. // NOTE : TBUF is open in every read cycle. reg pic_module_active.77 // CPU output tbuf. if (inst_go) inst_running_n <= 0. stage <= 0. //////////////////////////////////////////////////////////////// // Command State Machine ///////////////////////////////////////////////////////////////// reg inst_running_n.

end 1 : begin if (zbt_dout_v_n==0) begin inst_running_n <= 1. end end endcase CMD_SRAM_WRITE_REG : // ZBT write begin cpu_zbt_we_n <= 0. 78 . stage <= 1. end CMD_PIC_STOP : begin pic_module_active <= 0. stage <= 0. inst_running_n <= 1. inst_running_n <= 1. inst_running_n <= 1. end default : // return to inactive state inst_running_n <= 1. end CMD_PIC_GO : begin pic_module_active <= 1.78 CMD_SRAM_READ_REG : // ZBT read case (stage) 0: begin cpu_zbt_rd_n <= 0.

5'ha . = 4'h9 . = 4'hc . 5'h3 . 5'h4 . 5'h5 . 5'h9 . = 4'hb .v // altera def file // reg address parameter TEST_REG_ADD parameter CMD_REG_ADD parameter MUXC_REG_ADD parameter ADD3_REG_ADD parameter ADD2_REG_ADD parameter ADD1_REG_ADD parameter ADD0_REG_ADD parameter DAT3_REG_ADD parameter DAT2_REG_ADD parameter DAT1_REG_ADD parameter DAT0_REG_ADD parameter APS_RATE_ADD = = = = = = = = = = = = 5'h0 . endmodule chip_def. assign cpu_din = Dat[17:0]. parameter APS_CONTROLS_ADD = parameter ALTERA_BUSY_ADD = // command coding parameter CMD_SRAM_READ_REG parameter CMD_SRAM_WRITE_REG parameter CMD_PIC_GO parameter CMD_PIC_STOP = 4'h8 . 5'h16. 5'h14. assign altera_busy = ~inst_running_n. 5'h7 . 5'h8 .79 endcase end end // ZBT assign cpu_add = Add[20:0]. 79 . 5'h15. 5'h6 . 5'h2 . 5'h1 .

aps_clk_pulse . target_out.PixelRst . output aps_clk_pulse. input pic_module_active.aps_controls. adc_dv). output [7:0] aps_clk_counter. fg_len.aps_clk_counter .ExtCtr . input [7:0] time_ref. B. user_data_fg. //zbt_mux interface input [17:0] zbt_dout. //globlas input CLK. zbt_dout_v_n. zbt_dout. CntExt_. 80 .80 // SRAM SOURCE MUX states parameter SRAM_MUX_CPU = 2'b00 .RRst . output RRst.ExtBias . time_ref .2. Rset. fg_fen. pic_module_active. output [15:0] aps_ext_in.ExtCtrPipe . // cpu_int interface input [7:0] aps_controls. //aps interface input [11:0] target_out.Cset . output SHS. CMOS Imager Control Logic and Interface aps_int. aps_ext_in. input rst. input zbt_dout_v_n.v // Description: This module functions as a top level for the imager // control logic `timescale 1ns / 100ps module aps_int (CLK.target_clk . output ExtCtrPipe. output target_clk. RowDecColDec. fg_pen. rst. SHR. SHS. fg_dout.

output [11:0] fg_dout. wire [15:0] cdsIn_level. wire APS_ExtCtrPipe.clk_pulse(aps_clk_pulse).81 output SHR. wire [11:0] fg_dout_board. clk_gen U1 ( . output ExtCtr. output CntExt_. . . . wire [7:0] aps_clk_counter. .clk_counter(aps_clk_counter) 81 . output fg_len. output ExtBias. //fadc int input adc_dv.target_clk(target_clk). output fg_fen. output PixelRst. wire aps_clk_pulse. output RowDecColDec.rst(rst). wire target_clk.CLK(CLK). // frame grabber output fg_pen. wire APS_RRst. output Rset. wire APS_ExtCtr_in. wire APS_SHS. wire APS_RowDecColDec. wire APS_ExtCtr_out. output Cset.time_ref(time_ref). .

= aps_controls[6] ? 1'b1 : APS_ExtCtrPipe.module_active(pic_module_active).aps_ext_ctr(APS_ExtCtr_out). = aps_controls[1]. . . endmodule 82 .aps_shs(APS_SHS). assign CntExt_ assign Cset assign ExtBias assign Rset assign ExtCtrPipe assign RowDecColDec assign SHS assign RRst assign ExtCtr = ~aps_controls[0]. assign APS_ExtCtr_in = aps_controls[4].aps_ext_in(aps_ext_in). .aps_shr(SHR). assign APS_ExtCtrPipe = 0. .fg_pen(fg_pen).ext_in_en(CntExt_). pic_extract U4 ( .aps_clk(target_clk). = APS_ExtCtr_out. .aps_rowdeccoldec(APS_RowDecColDec). . . .82 ). . = aps_controls[3]. = aps_controls[5]. .fg_fen(fg_fen). . .rst(rst).aps_out(target_out). . .clk(CLK). = ExtCtr ? 1'b1 : APS_RowDecColDec.fg_len(fg_len). .ext_ctr_en(APS_ExtCtr_in). . .aps_rst(PixelRst). . = APS_RRst.aps_clk_pulse(aps_clk_pulse).fg_dout(fg_dout).aps_clk_counter(aps_clk_counter) ).aps_rst_cnt(APS_RRst). . = APS_SHS.

output aps_rst_cnt. aps_shr. aps_clk_counter. rst. The second structure handles reception of pixel data from // the imager and generation of synchronization signals to an external // frame grabber `timescale 1 ns / 100 ps module pic_extract (clk. // shs signal to the sensor (Active High) // reset signal to the sensor (Active High) // shr signal to the sensor (Active High) // row . aps_ext_ctr. aps_shs. output aps_shr. input ext_ctr_en. output aps_rowdeccoldec. fg_dout. output aps_rst.v // this is the core for control signal generation for the CMOS image // sensor. aps_rst_cnt. input ext_in_en.col enable // couter output // enables ext in output // APS Counter Reset (Active High) // enables cdsin control // frame grabber output fg_pen. // Output lines to the sensor output aps_shs. /* input input clk. Input/Output Declerations // main clock // main reset */ // controls input module_active. input [11:0] aps_out. aps_out). ext_in_en. fg_pen. aps_clk. output [15:0] aps_ext_in. input aps_clk_pulse. module_active. fg_len. aps_rst. fg_fen. aps_ext_in. // pixelclk to output lvds 83 . input aps_clk. output aps_ext_ctr. rst. aps_clk_pulse. ext_ctr_en.83 pic_extract. aps_rowdeccoldec. // running while '1' // sync to APS clk input [7:0] aps_clk_counter.

// lineclk to output lvds // frameclk to output lvds // fg data lines //*************************** Wires and registers ******************// // output regs reg aps_shs. = 4'h8. = 4'h3. = 4'h7. // internal wires wire aps_clk. // intrenal regs reg [3:0] control_state. = 4'h5. reg first_run. = 4'h9. reg aps_clk_d. reg [7:0] col_cnt. aquire. reg [3:0] ext_ctr_cnt. = 4'h1. reg aps_ext_ctr. = 4'h4. = 4'h2. aps_rst_cnt. aps_rst. reg fg_len_int. wire fg_pen. reg [11:0] din_d. output fg_fen.84 output fg_len. 84 . = 4'h6. fg_fen_int. fg_fen. aps_shr. aps_rowdeccoldec. reg [9:0] delay_cnt. //*************************** constants def ****************// parameter STATE_IDLE parameter STATE_IDLE_TO_SHS parameter STATE_SHS_HIGH parameter STATE_SHS_TO_RST parameter STATE_RST_HIGH parameter STATE_RST_TO_SHR parameter STATE_SHR_HIGH parameter STATE_SHR_TO_READ parameter STATE_READ parameter STATE_WAIT = 4'h0. reg [7:0] row_cnt. reg [1:0] wait_cnt. reg [11:0] fg_dout. reg fg_len. output [11:0] fg_dout.

fg_len_int <= 0. // main_clock aps_rst_cnt <= 0. row_cnt <= 0. = 7. //200ns = 39. //0. always @(posedge clk or negedge rst) begin if (~rst) begin delay_cnt <= 0. wait_cnt <= 0. //200ns = 19. aps_ext_ctr <= 0. //200ns = 39. <= 1'b0. end else begin // rising_edge(clk) 85 . fg_fen_int <= 0. //1us = 7. aps_rowdeccoldec <= 0. aps_shs aps_shr aps_rst <= 1'b0. aps_rst_cnt <= 0. control_state <= STATE_IDLE. //1us = 7. //200ns =7. ext_ctr_cnt <= 0. <= 1'b0. first_run <= 1. reg [PIPE_DELAY:0] fg_len_d. col_cnt <= 0.85 // Note : 0 is one clk delay clk is 40MHz //(that is 25ns for etch clock cycle) // (add 1 ) parameter IDLE_TO_SHS_TIME parameter SHS_HIGH_TIME parameter SHS_TO_RST_TIME parameter RST_HIGH_TIME parameter RST_TO_SHR_TIME parameter SHR_HIGH_TIME parameter SHR_TO_READ_TIME parameter PIPE_DELAY reg [PIPE_DELAY:0] fg_fen_d.5us = 7.

col_cnt <= 0. 86 . end STATE_IDLE_TO_SHS : begin aps_rowdeccoldec <= 1. aps_rowdeccoldec <= 0. end else begin delay_cnt <= delay_cnt + 1'b1. <= 1'b0. <= 0. first_run <= 1. if (delay_cnt == IDLE_TO_SHS_TIME) begin control_state <= STATE_SHS_HIGH. if (module_active) control_state <= STATE_IDLE_TO_SHS. if (first_run) aps_rst_cnt <= 1. // changed to provide samller ROI row_cnt <= 0. // changed to provide samller ROI delay_cnt <= 0. <= 1'b0. delay_cnt <= 0. end end STATE_SHS_HIGH : begin aps_shs <= 1'b1.86 case (control_state) STATE_IDLE : begin // reset aps_shs aps_shr aps_rst <= 1'b0. if (delay_cnt == SHS_HIGH_TIME) begin control_state <= STATE_SHS_TO_RST. aps_rst_cnt <= 0. fg_len_int fg_fen_int wait_cnt <= 0. <= 0.

aps_rst_cnt <= 0. fg_fen_int <= 1. if (delay_cnt == SHS_TO_RST_TIME) begin control_state <= STATE_RST_HIGH. end else begin delay_cnt <= delay_cnt + 1'b1. if (delay_cnt == RST_HIGH_TIME) begin control_state <= STATE_RST_TO_SHR. end end STATE_RST_TO_SHR : begin aps_rst <= 1'b0. end else begin delay_cnt <= delay_cnt + 1'b1. delay_cnt <= 0. delay_cnt <= 0. end else begin delay_cnt <= delay_cnt + 1'b1. end else begin delay_cnt <= delay_cnt + 1'b1. end end 87 . end end STATE_SHS_TO_RST : begin aps_shs <= 1'b0. delay_cnt <= 0. end end STATE_RST_HIGH : begin aps_rst <= 1'b1. if (delay_cnt == RST_TO_SHR_TIME) begin control_state <= STATE_SHR_HIGH.87 delay_cnt <= 0.

if (delay_cnt == 3) col_cnt <= col_cnt + 1'b1. control_state <= STATE_WAIT. fg_len_int <= 1. end else begin delay_cnt <= delay_cnt + 1'b1. row_cnt <= 0. if (aps_clk_pulse) begin // clk en delay_cnt <= 0. end end STATE_SHR_TO_READ : begin aps_shr <= 1'b0.88 STATE_SHR_HIGH : begin aps_shr <= 1'b1. delay_cnt <= 4. end end STATE_READ : begin delay_cnt <= delay_cnt + 1'b1. ext_ctr_cnt <= ext_ctr_cnt + 1'b1. if (col_cnt == 255) begin col_cnt <= col_cnt + 1'b1. end else begin delay_cnt <= delay_cnt + 1'b1. if (delay_cnt == SHR_HIGH_TIME) begin control_state <= STATE_SHR_TO_READ. delay_cnt <= 0. aps_rowdeccoldec <= 0. 88 . if (delay_cnt == SHR_TO_READ_TIME) begin control_state <= STATE_READ. if (row_cnt == 255) begin fg_fen_int <= 1'b0.

end if (~ext_ctr_en) aps_ext_ctr <= 1'b0. row_cnt <= row_cnt + 1'b1. if (ext_ctr_cnt == 8) begin aps_ext_ctr <= 1'b0. end end if (ext_ctr_cnt > 0) ext_ctr_cnt <= ext_ctr_cnt + 1'b1. end fg_len_int first_run <= 0. if (ext_ctr_cnt == 2) aps_ext_ctr <= 1'b1.89 end else begin fg_fen_int <= fg_fen_int. endcase // case // stop override <= 0. control_state <= STATE_IDLE_TO_SHS. 89 . end default : control_state <= STATE_IDLE. ext_ctr_cnt <= 4'b0. control_state <= STATE_IDLE_TO_SHS. end STATE_WAIT : begin if (&wait_cnt) begin fg_fen_int <= 0. end wait_cnt <= wait_cnt + 1'b1.

<= fg_fen_d[PIPE_DELAY]. fg_fen <= 0. fg_len <= 0.1 : 0]. fg_dout <= fg_dout. fg_fen_d <= {fg_fen_d[PIPE_DELAY .1 : 0]. fg_fen_d <= 0. fg_len_int}. end // Else end // Always //Output delay and sync lines to the FG always @(posedge clk or negedge rst) begin if (~rst) begin fg_len_d <= 0. fg_len_d <= {fg_len_d[PIPE_DELAY . end else if ((aps_clk_counter == 8'h9) & ~aps_clk) begin din_d <= aps_out. fg_fen_d <= fg_fen_d. fg_dout <= din_d. fg_dout <= 0. fg_fen fg_len end else begin din_d <= din_d. <= fg_len_d[PIPE_DELAY].90 if (~module_active) control_state <= STATE_IDLE. <= fg_len. fg_len_d <= fg_len_d. fg_fen fg_len end end //note that fg_pen has an offset of ~2ns from the aps_clk <= fg_fen. din_d <= 0. // main_clock 90 . fg_fen_int}.

reg clk_pulse.clk_pulse. //cpu bus input [7:0] time_ref.v //generates target clock `timescale 1ps / 1ps module clk_gen (CLK . always @(posedge CLK or negedge rst) begin if (~rst) begin target_clk <= 0. output [7:0] clk_counter.91 assign fg_pen = fg_len & aps_clk_pulse. clk_counter).rst . //globals input CLK. clk_counter <= 0. endmodule clk_gen. col_cnt}. reg [7:0] clk_counter. end else 91 . clk_pulse <= 0. reg target_clk. assign aps_ext_in = ext_in_en ? 16'h0 : {row_cnt.target_clk . input rst.time_ref . //outputs output target_clk. output clk_pulse.

input aps_pen. fg_pen. fg_fen. fg_dout. clk_counter <= 0. The module includes state machines that // synchronize the imager data output with the encoder\embedder modules module wm_int (clk. aps_pen. end end endmodule B. input aps_clk. aps_fen. wm_ena). input aps_clk_pulse. if (clk_counter == time_ref) begin if (~target_clk) clk_pulse <= 1'b1. 92 . input [7:0] aps_clk_counter. rst. aps_clk. clk_pulse <= 0. JPEG Encoding and Watermark Embedding `resetall `timescale 1ns / 100ps // this module is responsible for interfacing the encoder\embedder and // the imager controller // there are two pixel data buffer to allow reordering from row scan to // 8x8 blocks and vice versa. input clk. aps_clk_counter.3. fg_len. target_clk <= ~target_clk. aps_len. aps_clk_pulse. aps_dout. input rst.92 begin //defaults clk_counter <= clk_counter + 1'b1. end else target_clk <= target_clk.

output fg_len. //regs and wires reg [11:0] in_pic_add_wr_cnt. in_pic_add_rd. //write enable sync state machine reg [1:0] state. input aps_fen. next_state. input wm_ena. //next state logic always @(state or in_pic_add_wr_cnt or in_pic_add_rd_cnt) begin case (state) IDLE : 93 . wire [7:0] in_pic_dat_wr. parameter READ = 2'b01. output fg_fen. else state <= state. else if ((aps_clk_counter == 8'h9) & ~aps_clk) state <= next_state. in_pic_dat_rd. in_pic_add_rd_cnt. //state parameters parameter IDLE = 2'b00. wire in_pic_wren = aps_len. always @(posedge clk or posedge rst) if (rst) state <= IDLE. output fg_pen. output [11:0] fg_dout. wire [11:0] in_pic_add_wr.93 input aps_len. input [11:0] aps_dout.

endcase end wire in_pic_rden. else next_state = IDLE. else next_state = READ. always @ (posedge clk or posedge rst) if (rst) in_pic_add_wr_cnt <= 12'h0. else in_pic_rden_d <= in_pic_rden.94 if (&in_pic_add_wr_cnt[10:0]) next_state = READ. reg in_pic_rden_d. else if ((aps_clk_counter == 8'h9) & ~aps_clk) begin if (aps_len) 94 . always @ (posedge clk or posedge rst) if (rst) in_pic_rden_d <= 0. READ : if (&in_pic_add_rd_cnt[10:0]) next_state = IDLE. //output logic assign in_pic_rden = (state == READ). default: next_state = IDLE.

in _pic_add_rd_cnt[2:0]}.95 in_pic_add_wr_cnt <= in_pic_add_wr_cnt + 1. else in_pic_add_wr_cnt <= in_pic_add_wr_cnt.in_pic_add_rd_cnt[5:3]. else out_state <= out_state. assign in_pic_dat_wr = aps_dout[3] ? aps_dout[11:4] + 1 : aps_dout[11:4]. always @ (posedge clk or posedge rst) if (rst) in_pic_add_rd_cnt <= 12'h0. else if ((aps_clk_counter == 8'h9) & ~aps_clk) out_state <= out_next_state. else if (&in_pic_add_rd_cnt[10:0]) in_pic_add_rd_cnt <= in_pic_add_rd_cnt + 1. 95 . else if ((aps_clk_counter == 8'h9) & ~aps_clk) begin if (in_pic_rden) in_pic_add_rd_cnt <= in_pic_add_rd_cnt + 1. else in_pic_add_rd_cnt <= in_pic_add_rd_cnt.in_pic_add_rd_cnt[10:6]. //state machine to control output memory rd/wr reg [1:0] out_state. end assign in_pic_add_wr = in_pic_add_wr_cnt. out_next_state. always @(posedge clk or posedge rst) if (rst) out_state <= IDLE. end //address manipulation to account for the 8x8 block readout order assign in_pic_add_rd = {in_pic_add_rd_cnt[11].

wire [11:0] out_pic_add_wr. else if ((aps_clk_counter == 8'h9) & ~aps_clk) 96 . else out_next_state = READ. else out_next_state = IDLE. READ : if (&out_pic_add_rd_cnt[10:0]) out_next_state = IDLE. //output logic assign out_pic_rden = (out_state == READ). always @ (posedge clk or posedge rst) if (rst) out_pic_add_wr_cnt <= 12'h0. endcase end wire out_pic_rden. reg [15:0] out_pic_add_rd_cnt. default: out_next_state = IDLE.96 reg [11:0] out_pic_add_wr_cnt. out_pic_add_rd. wire out_pic_wren. wire [7:0] out_pic_dat_wr. //next state logic always @(out_state or out_pic_add_wr_cnt or out_pic_add_rd_cnt) begin case (out_state) IDLE : if (&out_pic_add_wr_cnt[10:0]) out_next_state = READ.

out_pic_add_wr_cnt[5:3]. else out_pic_add_rd_cnt <= out_pic_add_rd_cnt. end assign out_pic_add_wr = {out_pic_add_wr_cnt[11]. else if (&out_pic_add_rd_cnt[10:0]) out_pic_add_rd_cnt <= out_pic_add_rd_cnt + 1. always @ (posedge clk or posedge rst) begin if (rst) begin fg_len <= 1'b0. else if ((aps_clk_counter == 8'h8) & ~aps_clk) begin if (out_pic_rden) out_pic_add_rd_cnt <= out_pic_add_rd_cnt + 1. end else 97 . wire fg_pen. reg fg_len. fg_fen <= 1'b0. fg_fen.out_pic_add_wr_cnt[2:0]}. end assign out_pic_add_rd = out_pic_add_rd_cnt[11:0]. else out_pic_add_wr_cnt <= out_pic_add_wr_cnt.out_pic_add_wr_cnt[10:6] . always @ (posedge clk or posedge rst) if (rst) out_pic_add_rd_cnt <= 16'h0.97 begin if (out_pic_wren) out_pic_add_wr_cnt <= out_pic_add_wr_cnt + 1. wire [11:0] fg_dout.

data_b(). . else if (out_pic_rden) fg_fen <= 1'b1.address_a(in_pic_add_wr). else if (&out_pic_add_wr_cnt[10:0] & (aps_clk_counter == 8'h8)) fg_fen <= 1'b1. end end wire [7:0] out_pic_dat_rd. wire wm_ena. else if (out_pic_rden) fg_len <= 1'b1.98 begin if ((&out_pic_add_rd_cnt[7:0]) & aps_clk) fg_len <= 1'b0. assign fg_dout = {4'h0.data_a(in_pic_dat_wr). else fg_fen <= fg_fen. .clock(clk). else fg_len <= fg_len.out_pic_dat_rd}. pixel_mem_buffer in_mem_buffer ( . else if (&out_pic_add_wr_cnt[10:0] & (aps_clk_counter == 8'h8) & (!aps_clk)) fg_len <= 1'b1. assign fg_pen = aps_clk_pulse & fg_len. . 98 . .address_b(in_pic_add_rd). if (&out_pic_add_rd_cnt & aps_clk) fg_fen <= 1'b0.

rst(rst).3. .dout(out_pic_dat_wr). .data_b(). . .wren_b(1'b0).wren_a(out_pic_wren). encoder encoder_embedder_decoder ( . . .wren_b(1'b0). . Watermark Embedding WM_plus_RNG_top. .wren_a(in_pic_wren).address_b(out_pic_add_rd). . pixel_mem_buffer out_mem_buffer ( . . endmodule B.clk(aps_clk). .v // This module is the top level that connects the watermark embedding // module with the watermark generator (RNG) module `resetall `timescale 1ns / 100ps 99 . .1. .clock(clk).douten(out_pic_wren). .q_a().ena(in_pic_rden_d).q_a().99 . .q_b(in_pic_dat_rd)).wm_ena(wm_ena) ).q_b(out_pic_dat_rd)).address_a(out_pic_add_wr).din(in_pic_dat_rd). . .data_a(out_pic_dat_wr).

reg [5:0] cntr64.WM_data_in(WM_data_in). reg shift. wire douten. input clk.N(N). serial_data_out. parameter CKEY_VAL = 4. .rst(rst). reg ddata_valid.WM_out(WM_out) ). WM_out. ena.shift(shift). .CKEY_VAL(CKEY_VAL)) WM_RNG ( 100 .100 module top (clk. wire WM_data_in. //unmarked DCT data from Zigzag buffer output [COEFF_SIZE-1:0] WM_out. . //watermarked DCT data output douten. parameter MKEY_VAL = 3. .MKEY_VAL(MKEY_VAL). parameter COEFF_SIZE = 12.serial_data_out(serial_data_out). wire s. .clk(clk). reg douten_reg. . WM_top #(COEFF_SIZE) WM_embedder ( . douten). parameter N = 22. input [COEFF_SIZE -1:0] serial_data_out. input rst. ffcsr22 #(. . rst. input ena.

end always @ (posedge clk or posedge rst) if (rst) shift <= 1'b0. end else cntr64 <= #1 cntr64.s(s) ). else shift <= ena. . else if (shift) begin if (cntr64 < 6'h3f) cntr64 <= #1 cntr64 + 1'b1. assign WM_data_in = s. . 101 .rst(rst). always @ (posedge clk or posedge rst) if (rst) douten_reg <= 0.clk(clk).shift(shift). . else cntr64 <= #1 cntr64. assign douten = douten_reg & shift. else douten_reg <= (&cntr64).101 . //control and sync signals always @(posedge clk or posedge rst) begin if (rst) cntr64 <= #1 0.

output [COEFF_SIZE-1:0] WM_out. input WM_data_in. // Modules instantiation // DCT data buffer ram_sr B_P_reg ( . rst. input rst.102 endmodule WM_top.clken(shift). wire [COEFF_SIZE-1:0] d_out. B_P_out. wire [COEFF_SIZE-1:0] d_in.clock(clk). wire p_i. wire [COEFF_SIZE-1:0] WM_out. // watermarking logic 102 . input [COEFF_SIZE -1:0] serial_data_out. wire pointer_full. . wire shift.shiftin(d_in). shift.v `resetall `timescale 1ns/10ps // this module is the top level for the watermark embedder and connects // the DCT data buffer and the watermarking logic module WM_top (clk. input clk. WM_out). serial_data_out. input shift. parameter COEFF_SIZE = 12. .shiftout(d_out). . WM_data_in. .taps() ).

WM_out(WM_out) ). .B_P_out(B_P_out). . assign B_P_out = d_out. That way. The results are stored in the next_pointer register. //assignments // d_in is the i-th coefficient from the current block out of the DCT // block // B_P_out is the i-th coefficient stored in B_P_reg from the previous // block assign d_in = serial_data_out.103 WM_point_logic #(COEFF_SIZE) WM_point_logic1 ( . endmodule WM_point_logic_ver2. the module starts to calculate the value to be embedded in the corresponding coefficients. the values from the 'next' registers are copied to the current registers. assign p_i = (|(d_in))&(|(B_P_out))&(pointer_full ? WM_data_in : 1'b1). for each block there are two phases: 103 . Finally. .pointer_full(pointer_full).clk(clk). and the 'next' regs are reset 2.shift(shift). and the p_reg reg holds the value to embed. The 'current' regs are used to embed the WM in the selected coeff's.rst(rst).v /* This module does two simultaneous assignments 1. .p_i(p_i). . When all N cells of next_pointer are full. when all coefficients from DCT block are rcvd. The pointer reg points at the indexes where the embedding takes place. The computation simply stores the XOR between the current value in next_p_reg[n] and the output of p_i. Identifying the first (representing highest spatial frequency) N cells (after anding 2 neighbors) that are non-zero. .

wire p_i. p_i. WM_out.1 :0] wire [COEFF_SIZE-1:0] B_P_out. WM_out ). clk. During that stage. p_i. output [COEFF_SIZE. pointer_full. one coefficient is output from the DCT block and stored in the reg_stack. 104 . shift. both coeff's are used to calculate the value of p_i and the coefficient of the previous block is embedded with the WM and sent out as secured image data */ `resetall `timescale 1ns/10ps module WM_point_logic( clk. wire [COEFF_SIZE-1:0] WM_out.104 Each clock cycle. parameter COEFF_SIZE = 12. // Internal Declarations input input input input input output [COEFF_SIZE-1:0] B_P_out. shift. rst. pointer_full. wire [5:0] inc. rst. reg pointer_cnt. B_P_out. while the coef of the same index from the previous block is being output from the stack.

end 105 . next_p_reg. //mux for shift enabled SR reg [5:0] pointer embed reg [5:0] next_pointer [N-1:0]. else if (p_rst) for (j = 0.105 // pointer reg declarations and assignments parameter N = 2. end else for (j = 0.j = j + 1) next_pointer[j] <= #1 6'h3f. next_pointer[N-1] <= #1 inc. integer j.j < N.j = j + 1) next_pointer[j] <= #1 next_pointer[j+1]. reg [N-1:0] p_reg. //N is the number of coefficients to [N-1:0]. else if (shift_condition) begin for (j = 0.j = j + 1) next_pointer[j] <= #1 next_pointer[j]. wire shift_condition = (p_i && !pointer_full && shift). // this register stores the LSB values // to embed wire pointer_full = !(&next_pointer[0]).j < N.j < N-1.j < N. the new input is non-zero and a data //valid signal is on (shift) always @(posedge clk or posedge rst) begin : synchronous_sr if (rst) for (j = 0.j = j + 1) next_pointer[j] <= #1 6'h3f. //This shift register has async rst and syncronous p_rst //It shifts only when the conditions for shift are met: //The register is still not full. reg p_rst.

j = j + 1) pointer[j] <= #1 next_pointer[j]. next_p_reg <= #1 0. end else begin p_reg <= #1 p_reg.106 always @ (posedge clk or posedge rst) begin if (rst) begin pointer_cnt <= #1 1'b0.j < N. 1'b1 : 106 . else pointer_cnt <= #1 pointer_cnt. p_rst <= #1 0. p_reg <= #1 0. if (&inc) begin p_reg <= #1 next_p_reg. p_rst <= #1 0. p_rst <= #1 1. pointer_cnt <= #1 1'b0. end if (pointer_full) begin case (inc[0]) 1'b0 : next_p_reg[0] <= #1 next_p_reg[0]^p_i. next_p_reg <= #1 0. end else if (shift) begin if (pointer[pointer_cnt] == inc) pointer_cnt <= #1 1'b1. for (j = 0.

p_reg[pointer_cnt]} : B_P_out. 2'b11 : next_p_reg[3] <= next_p_reg[3]^p_i. 2'b10 : next_p_reg[2] <= next_p_reg[2]^p_i. // // // // endcase end end //shift operations else p_rst <= #1 0. . make the // numbers parity be equal to the WM bit // wire parity.inc_out(inc) ).clk(clk). assign WM_out = (pointer[pointer_cnt] == inc) ? {B_P_out[COEFF_SIZE1:1]. // assign parity = ^(B_P_out). end //sync always // two oprions to embed teh WM in the LSB .107 next_p_reg[1] <= #1 next_p_reg[1]^p_i. .just set\rst. en_cnt. inc_out).en_cnt(shift). incr pointer_gen ( . 107 . rst.~B_P_out[0]}. // assign WM_out = (pointer[pointer_cnt] ~= inc) ? B_P_out : (parity == p_reg[pointer_cnt]) ? B_P_out : {B_P_out[COEFF_SIZE-1:1].rst(rst). . endmodule module incr (clk. input clk.

reg [5:0] inc_out.v // Megafunction Name(s): // altshift_taps // ============================================================ // ************************************************************ // THIS IS A WIZARD-GENERATED FILE. else inc_out <= inc_out.108 input rst. output [5:0] inc_out. end endmodule ram_sr.v // megafunction wizard: %Shift register (RAM-based)% // GENERATION: STANDARD // VERSION: WM1. input en_cnt.0 // MODULE: altshift_taps // ============================================================ // File Name: ram_sr. always @ (posedge clk or posedge rst) begin if (rst) inc_out <= 6'h00.0 Build 148 04/26/2005 SJ Full Version // ************************************************************ 108 . DO NOT EDIT THIS FILE! // // 5. else if (en_cnt) inc_out <= inc_out + 1'b1.

//without limitation. wire [11:0] shiftout = sub_wire1[11:0]. taps). shiftout.109 //Copyright (C) 1991-2005 Altera Corporation //Your use of Altera Corporation's design tools. including. clken. that your use is for the sole purpose of //programming logic devices manufactured by Altera and sold by //Altera or its authorized distributors. and any output files any of the foregoing //(including device programming or simulation files). taps. or other applicable license agreement. 109 . wire [11:0] taps = sub_wire0[11:0]. logic functions //and other software and tools. clock. altshift_taps altshift_taps_component ( . Altera MegaCore Function License //Agreement.clken (clken). and any //associated documentation or information are expressly subject //to the terms and conditions of the Altera Program License //Subscription Agreement. and its AMPP partner logic //functions. input [11:0] input input output output shiftin. clken. wire [11:0] sub_wire1. // synopsys translate_off `timescale 1 ns / 10 ps // synopsys translate_on module ram_sr ( shiftin. wire [11:0] sub_wire0. clock. Please refer to the //applicable agreement for further details. [11:0] [11:0] shiftout.

altshift_taps_component. parameter MKEY_VAL = 3. endmodule ffcsr22. [N-1:0] mkey = MKEY_VAL.shiftin (shiftin).lpm_type = "altshift_taps". input shift. altshift_taps_component.v // this module implements a 22 bits ffcsr RNG `resetall `timescale 1ns/10ps module ffcsr22 parameter parameter parameter (clk. s). [5:0] s. [N-1:0] mstate.tap_distance = 64. q= 4194793. rst. [N-1:0] mstate_N. N = 22. parameter CKEY_VAL = 4. . output reg wire reg wire wire wire s. 110 .clock (clock). defparam altshift_taps_component. .taps (sub_wire0). [5:0] cstate. d=22'b1000000000000011110101.width = 12. shift. input rst.shiftout (sub_wire1)). .number_of_taps = 1. cstate_N.110 . altshift_taps_component. input clk.

assign mstate_N[18]=mstate[19]. assign cstate_N[2]=mstate[5]&cstate[2]^cstate[2]&mstate[0]^mstate[5]&mstate[0]. assign mstate_N[7]=mstate[8]^d[7]&cstate[5]^d[7]&mstate[0]. // Define the FCSR and Filter function assign mstate_N[0]=mstate[1]^d[0]&cstate[0]^d[0]&mstate[0]. assign cstate_N[1]=mstate[3]&cstate[1]^cstate[1]&mstate[0]^mstate[3]&mstate[0]. 111 . assign mstate_N[19]=mstate[20]. assign mstate_N[3]=mstate[4]. assign cstate_N[0]=mstate[1]&cstate[0]^cstate[0]&mstate[0]^mstate[1]&mstate[0]. assign mstate_N[21]=mstate[0]. assign mstate_N[10]=mstate[11]. assign mstate_N[20]=mstate[21]. assign mstate_N[15]=mstate[16]. assign mstate_N[5]=mstate[6]^d[5]&cstate[3]^d[5]&mstate[0]. assign mstate_N[16]=mstate[17]. assign mstate_N[2]=mstate[3]^d[2]&cstate[1]^d[2]&mstate[0]. assign cstate_N[3]=mstate[6]&cstate[3]^cstate[3]&mstate[0]^mstate[6]&mstate[0]. assign mstate_N[13]=mstate[14]. assign mstate_N[12]=mstate[13]. assign mstate_N[1]=mstate[2]. assign mstate_N[11]=mstate[12]. assign cstate_N[5]=mstate[8]&cstate[5]^cstate[5]&mstate[0]^mstate[8]&mstate[0]. assign mstate_N[6]=mstate[7]^d[6]&cstate[4]^d[6]&mstate[0]. assign mstate_N[17]=mstate[18]. assign cstate_N[4]=mstate[7]&cstate[4]^cstate[4]&mstate[0]^mstate[7]&mstate[0].111 wire [5:0] ckey = CKEY_VAL. assign mstate_N[4]=mstate[5]^d[4]&cstate[2]^d[4]&mstate[0]. assign mstate_N[9]=mstate[10]. assign mstate_N[14]=mstate[15]. assign mstate_N[8]=mstate[9].

112

// Caculate the output sequence always @(posedge clk or posedge rst) begin if(rst) begin mstate<= #1 mkey; cstate<= #1 ckey; end else if (shift) begin mstate<= #1 mstate_N; cstate<= #1 cstate_N; end else begin mstate<= #1 mstate; cstate<= #1 cstate; end end assign s=(mstate[0]^mstate[2])^(mstate[4]^mstate[5])^(mstate[6]^mstate[7]^mstat e[21]); //the paratheses will hopefully minimize delay endmodule

B.3.2. DCT IDCT Modules

The DCT modules were borrowed from [ref] where the source code is also available.
B.3.3. Zigzag Modules
Zigzag.v ///////////////////////////////////////////////////////////////////// //// //// //// //// Zig-Zag Unit Performs zigzag-ing, as used by many DCT based encoders //// ////

112

113

//// //// //// //// //// Author: Richard Herveille richard@asics.ws www.asics.ws

//// //// //// //// ////

///////////////////////////////////////////////////////////////////// //// //// //// Copyright (C) 2002 Richard Herveille //// //// richard@asics.ws //// //// //// ////

//// This source file may be used and distributed without

//// restriction provided that this copyright statement is not //// //// removed from the file and that any derivative work contains //// //// the original copyright notice and the associated disclaimer.//// //// //// /////////////////////////////////////////////////////////////////////

`timescale 1ns/10ps module zigzag( clk, rst, ena, dct_2d, dout, douten ); parameter do_width = 12; // // inputs & outputs // input clk; input rst; input ena; // clk enable // system clock

113

114

input [do_width-1:0] dct_2d; output [do_width-1:0] dout; output // // variables // wire block_rdy; reg ld_zigzag; reg douten_reg; wire douten; reg [do_width-1:0] sresult_in [63:0]; // store results for zig// zagging reg [do_width-1:0] sresult_out[63:0]; reg [5:0] sample_cnt; // // module body // always @ (posedge clk or posedge rst) if (rst) sample_cnt <= 6'h0; else if (ena) sample_cnt <= sample_cnt + 1'b1; else sample_cnt <= sample_cnt; assign block_rdy = &sample_cnt; douten; // data-out enable

always @ (posedge clk) ld_zigzag <= block_rdy; always @ (posedge clk or posedge rst) if (rst) douten_reg <= 1'b0;

114

// // 0: 1: 2: 3: 4: 5: 6: 7: // 0: 63 62 58 57 49 48 36 35 // 1: 61 59 56 50 47 37 34 21 // 2: 60 55 51 46 38 33 22 20 // 3: 54 52 45 39 32 23 19 10 // 4: 53 44 40 31 24 18 11 09 // 5: 43 41 30 25 17 12 08 03 // 6: 42 29 26 16 13 07 04 02 // 7: 28 27 15 14 06 05 01 00 // // zig-zag the DCT results integer n. end if(ld_zigzag) begin // reload results-register file 0: 1: 2: 3: 4: 5: 6: 7: 3f 3e 3a 39 31 30 24 23 3d 3b 38 32 2f 25 22 15 3c 37 33 2e 26 21 16 14 36 34 2d 27 20 17 13 0a 35 2c 28 1f 18 12 0b 09 2b 29 1e 19 11 0c 08 03 2a 1d 1a 10 0d 07 04 02 1c 1b 0f 0e 06 05 01 00 115 . assign douten = douten_reg & ena. // // Generate zig-zag structure // // This implicates that the quantization step be performed after // the zig-zagging. always @(posedge clk) if (ena) begin for (n=1. sresult_in[0] <= #1 dct_2d. else douten_reg <= douten_reg.115 else if (block_rdy) douten_reg <= 1'b1. n=n+1) // sresult_in[0] gets the new input begin sresult_in[n] <= #1 sresult_in[n -1]. n<=63.

sresult_out[24] <= #1 sresult_in[27]. sresult_out[33] <= #1 sresult_in[21]. sresult_out[21] <= #1 sresult_in[06]. 116 . sresult_out[15] <= #1 sresult_in[40]. sresult_out[10] <= #1 sresult_in[04]. sresult_out[29] <= #1 sresult_in[49]. sresult_out[34] <= #1 sresult_in[14]. sresult_out[12] <= #1 sresult_in[18]. sresult_out[18] <= #1 sresult_in[19]. sresult_out[23] <= #1 sresult_in[20]. sresult_out[09] <= #1 sresult_in[03]. sresult_out[03] <= #1 sresult_in[02]. sresult_out[19] <= #1 sresult_in[12]. sresult_out[36] <= #1 sresult_in[15]. sresult_out[13] <= #1 sresult_in[25]. sresult_out[35] <= #1 sresult_in[07].116 sresult_out[00] <= #1 sresult_in[00]. sresult_out[17] <= #1 sresult_in[26]. sresult_out[04] <= #1 sresult_in[09]. sresult_out[07] <= #1 sresult_in[17]. sresult_out[27] <= #1 sresult_in[48]. sresult_out[02] <= #1 sresult_in[01]. sresult_out[20] <= #1 sresult_in[05]. sresult_out[37] <= #1 sresult_in[22]. sresult_out[28] <= #1 sresult_in[56]. sresult_out[05] <= #1 sresult_in[16]. sresult_out[14] <= #1 sresult_in[32]. sresult_out[16] <= #1 sresult_in[33]. sresult_out[22] <= #1 sresult_in[13]. sresult_out[31] <= #1 sresult_in[35]. sresult_out[08] <= #1 sresult_in[10]. sresult_out[26] <= #1 sresult_in[41]. sresult_out[11] <= #1 sresult_in[11]. sresult_out[06] <= #1 sresult_in[24]. sresult_out[30] <= #1 sresult_in[42]. sresult_out[25] <= #1 sresult_in[34]. sresult_out[01] <= #1 sresult_in[08]. sresult_out[32] <= #1 sresult_in[28].

sresult_out[54] <= #1 sresult_in[60]. sresult_out[63] <= #1 sresult_in[63]. n<63. sresult_out[40] <= #1 sresult_in[43]. sresult_out[50] <= #1 sresult_in[38]. sresult_out[55] <= #1 sresult_in[53]. sresult_out[56] <= #1 sresult_in[46].117 sresult_out[38] <= #1 sresult_in[29]. end end assign dout = sresult_out[0]. n=n+1) // do not change sresult[0] sresult_out[n] <= #1 sresult_out[n +1]. sresult_out[45] <= #1 sresult_in[44]. sresult_out[42] <= #1 sresult_in[57]. sresult_out[48] <= #1 sresult_in[23]. sresult_out[53] <= #1 sresult_in[59]. sresult_out[46] <= #1 sresult_in[37]. end else begin for (n=0. endmodule 117 . sresult_out[62] <= #1 sresult_in[55]. sresult_out[58] <= #1 sresult_in[47]. sresult_out[43] <= #1 sresult_in[58]. sresult_out[60] <= #1 sresult_in[61]. sresult_out[44] <= #1 sresult_in[51]. sresult_out[52] <= #1 sresult_in[52]. sresult_out[39] <= #1 sresult_in[36]. sresult_out[51] <= #1 sresult_in[45]. sresult_out[61] <= #1 sresult_in[62]. sresult_out[57] <= #1 sresult_in[39]. sresult_out[59] <= #1 sresult_in[54]. sresult_out[41] <= #1 sresult_in[50]. sresult_out[47] <= #1 sresult_in[30]. sresult_out[49] <= #1 sresult_in[31].

ena. wire douten. input clk. din. // data-out enable reg ld_zigzag. 118 .118 reverse_zigzag. always @ (posedge clk or posedge rst) if (rst) sample_cnt <= 6'h0. // store results for zig// zagging reg [do_width-1:0] sresult_out[63:0]. input [do_width-1:0] din. douten. reg douten_reg. rst. input ena. else sample_cnt <= sample_cnt. dct_2d. input rst. reg [do_width-1:0] sresult_in [63:0]. douten ).v `timescale 1ns/10ps module reverse_zigzag( clk. parameter do_width = 12. reg [5:0] sample_cnt. else if (ena) sample_cnt <= sample_cnt + 1'b1. output // system clock // clk ena output [do_width-1:0] dct_2d.

n=n+1) // sresult_in[0] gets the new input begin sresult_in[n] <= #1 sresult_in[n -1]. always @ (posedge clk or posedge rst) if (rst) douten_reg <= 1'b0. else if (ld_zigzag) douten_reg <= 1'b1. sresult_in[0] <= #1 din. else douten_reg <= douten_reg. // // Generate zig-zag structure // // // 0: 1: 2: 3: 4: 5: 6: 7: // 0: 63 62 58 57 49 48 36 35 // 1: 61 59 56 50 47 37 34 21 // 2: 60 55 51 46 38 33 22 20 // 3: 54 52 45 39 32 23 19 10 // 4: 53 44 40 31 24 18 11 09 // 5: 43 41 30 25 17 12 08 03 // 6: 42 29 26 16 13 07 04 02 // 7: 28 27 15 14 06 05 01 00 // // zig-zag the DCT results integer n. always @(posedge clk) if (ena) begin for (n=1. end 0: 1: 2: 3: 4: 5: 6: 7: 3f 3e 3a 39 31 30 24 23 3d 3b 38 32 2f 25 22 15 3c 37 33 2e 26 21 16 14 36 34 2d 27 20 17 13 0a 35 2c 28 1f 18 12 0b 09 2b 29 1e 19 11 0c 08 03 2a 1d 1a 10 0d 07 04 02 1c 1b 0f 0e 06 05 01 00 119 . assign douten = douten_reg & ena. n<=63.119 always @ (posedge clk) ld_zigzag <= &sample_cnt.

sresult_out[13] <= #1 sresult_in[22]. sresult_out[03] <= #1 sresult_in[09]. sresult_out[24] <= #1 sresult_in[06]. sresult_out[40] <= #1 sresult_in[15]. sresult_out[42] <= #1 sresult_in[30]. sresult_out[25] <= #1 sresult_in[13].120 if(ld_zigzag) // reload results-register file begin sresult_out[00] <= #1 sresult_in[00]. sresult_out[41] <= #1 sresult_in[26]. sresult_out[49] <= #1 sresult_in[29]. sresult_out[14] <= #1 sresult_in[34]. sresult_out[28] <= #1 sresult_in[32]. sresult_out[20] <= #1 sresult_in[23]. sresult_out[08] <= #1 sresult_in[01]. sresult_out[48] <= #1 sresult_in[27]. sresult_out[56] <= #1 sresult_in[28]. sresult_out[10] <= #1 sresult_in[08]. sresult_out[34] <= #1 sresult_in[25]. sresult_out[12] <= #1 sresult_in[19]. sresult_out[04] <= #1 sresult_in[10]. sresult_out[33] <= #1 sresult_in[16]. sresult_out[11] <= #1 sresult_in[11]. sresult_out[02] <= #1 sresult_in[03]. sresult_out[06] <= #1 sresult_in[21]. sresult_out[05] <= #1 sresult_in[20]. sresult_out[21] <= #1 sresult_in[33]. 120 . sresult_out[27] <= #1 sresult_in[24]. sresult_out[18] <= #1 sresult_in[12]. sresult_out[01] <= #1 sresult_in[02]. sresult_out[09] <= #1 sresult_in[04]. sresult_out[35] <= #1 sresult_in[31]. sresult_out[16] <= #1 sresult_in[05]. sresult_out[26] <= #1 sresult_in[17]. sresult_out[17] <= #1 sresult_in[07]. sresult_out[19] <= #1 sresult_in[18]. sresult_out[32] <= #1 sresult_in[14].

sresult_out[57] <= #1 sresult_in[42]. sresult_out[51] <= #1 sresult_in[44]. sresult_out[59] <= #1 sresult_in[53].121 sresult_out[07] <= #1 sresult_in[35]. sresult_out[44] <= #1 sresult_in[45]. sresult_out[47] <= #1 sresult_in[58]. n=n+1) // do not change sresult[63] sresult_out[n] <= #1 sresult_out[n +1]. end end assign dct_2d = sresult_out[00]. sresult_out[43] <= #1 sresult_in[40]. sresult_out[30] <= #1 sresult_in[47]. sresult_out[50] <= #1 sresult_in[41]. sresult_out[61] <= #1 sresult_in[60]. sresult_out[58] <= #1 sresult_in[43]. sresult_out[37] <= #1 sresult_in[46]. sresult_out[31] <= #1 sresult_in[49]. sresult_out[23] <= #1 sresult_in[48]. sresult_out[60] <= #1 sresult_in[54]. sresult_out[54] <= #1 sresult_in[59]. sresult_out[46] <= #1 sresult_in[56]. n<63. sresult_out[29] <= #1 sresult_in[38]. sresult_out[53] <= #1 sresult_in[55]. sresult_out[62] <= #1 sresult_in[61]. sresult_out[52] <= #1 sresult_in[52]. sresult_out[36] <= #1 sresult_in[39]. sresult_out[15] <= #1 sresult_in[36]. sresult_out[38] <= #1 sresult_in[50]. end else begin for (n=0. sresult_out[39] <= #1 sresult_in[57]. sresult_out[22] <= #1 sresult_in[37]. sresult_out[55] <= #1 sresult_in[62]. sresult_out[63] <= #1 sresult_in[63]. endmodule 121 . sresult_out[45] <= #1 sresult_in[51].