UNIVERSITY OF CALGARY

An Imaging System With Watermarking And Compression Capabilities

by

Yonatan Shoshan

A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

DEPARTMENT OF ELECTRICAL AND COPMUTER ENGINEERING CALGARY, ALBERTA SEPTEMBER 2009

© Yonatan Shoshan 2009

Library and Archives Canada Published Heritage Branch 395 Wellington Street Ottawa ON K1A 0N4 Canada

Bibliothèque et Archives Canada Direction du Patrimoine de l’édition 395, rue Wellington Ottawa ON K1A 0N4 Canada
Your file Votre référence ISBN: 978-0-494-54570-6 Our file Notre référence ISBN: 978-0-494-54570-6

NOTICE: The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses worldwide, for commercial or noncommercial purposes, in microform, paper, electronic and/or any other formats. . The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author’s permission. In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis. While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis.

AVIS: L’auteur a accordé une licence non exclusive permettant à la Bibliothèque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par télécommunication ou par l’Internet, prêter, distribuer et vendre des thèses partout dans le monde, à des fins commerciales ou autres, sur support microforme, papier, électronique et/ou autres formats.

L’auteur conserve la propriété du droit d’auteur et des droits moraux qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

Conformément à la loi canadienne sur la protection de la vie privée, quelques formulaires secondaires ont été enlevés de cette thèse. Bien que ces formulaires aient inclus dans la pagination, il n’y aura aucun contenu manquant.

Abstract This thesis presents an imaging system with a novel watermarking embedder and JPEG compression capabilities. The proposed system enhances data security in surveillance cameras networks and others, thus improving the reliability of the received images for security and/or evidence use. A novel-watermarking algorithm was developed for watermarking images in the DCT domain. The algorithm was optimized for an efficient implementation in hardware while still maintaining a high level of security. The imaging system was physically implemented on an evaluation board including a CMOS image sensor, an FPGA for digital control and processing and a frame grabber for image presentation and analysis. The digital circuitry implemented on the FPGA included the proposed watermarking logic as well as all the required peripheral modules and control signals. The accomplishments of this work have been published in four scientific papers. This work was part of a commercializing effort based on the proposed novelwatermarking algorithm.

ii

Xin Li with whom I have worked closely on this project and shared assignments. Under their support and supervision I have had the chance to experience a very liberal research atmosphere in an excellent environment. My fellow students in the ISL lab: Mr. I would also like to thank Mr. Ms. In addition. Orly Yadid-Pecht and to my cosupervisor Dr. Graham Jullien. I hold great dill of appreciation to my supervisor Dr. Marianna Beiderman who provided her support and advice. thoughts and ideas. Denis Onen for valuable suggestions and productive discussions we held from time to time. iii . I would also like to thank Dr. He has offered me his exceptional knowledge and experience in academic research through endless discussions and mutual work.Acknowledgements Several people have had an important contribution to my work on this thesis project. Alexander Fish for his help in each and every part of my work on this thesis project.

...................26 3............................................................................2 Image quality ..........2........1...................... viii CHAPTER 1: INTRODUCTION ...10 2............................40 4........................................7 2......................................................................4.33 3........2 Watermark implementations – Software vs...........................2 Implementation of the compression module in the proposed system..........................................................33 3.................1 The novel watermark embedding algorithm......29 3............................................3....................22 3....................................................11 2......1 Hardware Experimental Results ................................3......................4.......................................13 2..............................................................4 Physical proof of concept implementation ....................1..................................................................................................1....................1......3 State of the art in hardware watermarking ....7 2.............................1 The CMOS Image sensor ...................21 3.....................................14 2.........................................3 Computational complexity.....................43 4..............................15 2.........................................4....2 Watermark Design Considerations ..42 4.36 CHAPTER 4: IMPLEMENTATION.............................2........ TESTING AND RESULTS ....................2..................................................3 RNG based watermark generator design method and implementation ..........................................................2 Algorithm performance evaluation ....................40 4...............................................................................................................2................................ ii Acknowledgements ...................16 2............................31 3...34 3.....................................26 3..........1 The LFSR ..........4 Watermark generation. Hardware ......18 CHAPTER 3: THE PROPOSED IMAGING SYSTEM ..................2.......................2 Compression module ........................2......3 Watermark embedding module ...............................................................................1.............35 3............................21 3.....................................3 Figures of Merit for Watermarking Systems .............................................................7 2............................1..........................................................................................................................Table of Contents Abstract ............................................... iii Table of Contents ......3 Hardware design and verification ...1 Theory and implementation of watermark algorithms ............................................................2...........1 Software implementation and algorithm functionality verification ....................................1 Robustness to Attacks ..................................38 4.......................1 Watermark Classifications..............................22 3................ Abbreviations and Nomenclature ...........................1 DCT based compression ............................................3 The Filtered FCSR (F-FCSR) ..............38 4................................................................................................................................2 Existing RNG structures .....3.........24 3............31 3....4...... iv List of Tables ........2............4.1 CHAPTER 2: CONSIDERATIONS IN THE DEVELOPMENT OF AN IMAGING SYSTEM WITH WATERMARKING CAPABILITIES ...................................1 RNG based watermark generation..........................46 iv .........................................2 The FCSR .......................................1 Image acquisition and reordering ................................................1 Fragile watermarking and benchmarking ........................2............45 4.............. vii List of Symbols..............................4................................................................2 Implementation of the embedding module in HW . vi List of Figures and Illustrations ..................................................4..............

..........1............2 Digital signal processing and control ......50 CHAPTER 5: CONCLUSION ..............3 Output image capture ...........................................3.......................................99 B...................3.....2...........................................63 A.............1.......4..........................................2.............................................52 5...1.....................53 5................................................................. CMOS Imager Control Logic and Interface......................................................................120 v .....1...3..................3.....................................................................3..... Simulation Envelope....................... Simulation Testbench and peripherals ..............56 APPENDIX A: MATLAB CODE ....................................................................65 APPENDIX B: VERILOG CODE..............1......... Top Level and Peripheral Modules ..47 4.....................3 Possible future directions for development ................... Compression/Decompression ........................2 Issues that still need attention and future work.......52 5.....120 B........80 B..........................................................................................................54 REFERENCES ..........................................4..........4........................................................63 A................... Algorithm Implementation.........92 B..........1 Thesis summary .......................... JPEG Encoding and Watermark Embedding .................................. Watermark Embedding ................................................. DCT IDCT Modules ... Zigzag Modules ...................................................................................................................................................................................... Detection .....................................................................3........................................59 A........67 B..................................................... Embedding ................2.........67 B.................................59 A.2........................3....3.............................................................................59 A..................................61 A..........................................1.....................

.......................................... Quantization-Level Tradeoffs .....3: Resource utilization by modules in the overall design ..1: N vs..1: Existing work in hardware digital watermarking research ...........................................................................2: FPGA Synthesis Results . 50 vi ............................. 45 Table 4......................... 42 Table 4......................... 20 Table 4...List of Tables Table 2.........

...................................................2: additional sample images ......................8: Schematic description of the watermarking module implementation in HW .............2: Example quantization table given in the JPEG standard [4] ..............................3: Test setup schematic................... 25 Figure 3................ 27 Figure 3.............................................................................................3: Schematic of a DCT based compression encoder ........................List of Figures and Illustrations Figure 2............................................... 47 Figure 4................................ 34 Figure 3........4: Schematic of a HW DCT transform module ................................................................ 27 Figure 3.............................................................. 8 Figure 2........................... J2 ... 44 Figure 4...............4: Hardware watermarked image..........................................................6: Reorganization of the DCT data in the Zigzag order .....................3 Scheme of general watermark system....... 9 Figure 2.......................... 24 Figure 3........5: 8x8 Block DCT conversion and 1x64 Zigzag reordering .......................................... 35 Figure 3..........8: Sample output image from the physically implemented system .................................................. 17 Figure 3.............. 46 Figure 4.............1: An imaging system with watermarking capabilities ..............................................................5: A general implementation of an imaging system .............................. 46 Figure 4................... 23 Figure 3..........1: Algorithm Matlab© simulation results..............10: Galois implemented FCSR ............ ... 41 Figure 4......................7: Example DCT data for blocks J3.........................................9: A generalized Fibonacci implementation of an LFSR circuit ..........................................6: Mixed signal SoC fast prototyping custom development board ..........................2 Examples of (a) the original image and (b) the image with a visible watermark . 29 Figure 3........... 22 Figure 3.......1 General classification of existing watermarking algorithms ...... 39 Figure 4.. 37 Figure 4.................................... 43 Figure 4........................................................ 31 Figure 3.....7: Internal structure of the FPGA digital design ...........11: A Gollman cascade RNG ................................................ 50 vii ...

Abbreviations and Nomenclature Symbol CMOS FPN RNG LFSR DCT IDCT FCSR FPGA LSB DWT PSNR PC JPEG RAM KLT HVS ROM XOR FSM ADC I/O CPU LC ASIC Definition Complimentary Metal-Oxide Semiconductor Fixed Pattern Noise Random Number Generator Linear Feedback Shift Register Discrete Cosine Transform Inverse DCT Feedback with Carry Shift Register Field Programmable Gate Array Least Significant Bit Discrete Wavelet Transform Peak Signal to Noise Ratio Personal Computer Joint Photographic Experts Group Random Access Memory Karhunen-Loeve Transform Human Visual System Read Only Memory Exclusive Or Finite State Machine Analog to Digital Converter Input Output Central Processing Unit Logic Cell Application Specific Integrated Circuit viii .List of Symbols.

blurring and many more. Digital imaging is taking over the traditional analog imaging in almost all imaging applications. and significant advances and breakthroughs are being constantly published [1]. The ease of integrating CMOS imagers with supporting peripheral elements together with a significant reduction in power consumption introduced a variety of new portable products such as imagers on cell phones and network based surveillance and public cameras [2]. Research activity has been extensive in both the academic and commercial communities. A watermark is an additional. and broadcast monitoring. In this implementation analog noise was 1 . Digital watermarking has shown the potential for solving many digital imaging problems. a variety of security problems are introduced. such as cropping. covering. There is a need to establish the digital media as an acceptable authentic information source. copyright control. At the onset of this thesis work a basic implementation of a CMOS image sensor with watermarking capabilities was suggested [3]. including image authentication. scaling. it can be made possible to detect alterations inflicted upon the image. The field of digital imaging and its subsidiaries has been going through a continuous and rapid growth during the last decade. from professional photography and broadcasting to the everyday consumer digital camera.1 CHAPTER 1: Introduction Whether or not a digital image is authentic has become a highly nontrivial question. covered under the more significant image raw data. Since digital images are very susceptible to manipulations and alterations. identifying message. without perceptually changing it. By adding a transparent watermark to the image.

The Fixed Pattern Noise (FPN) [2] in CMOS imagers was considered as an imager specific analog noise to provide a unique seed to a Random Number Generator (RNG). Issues of great concern were the effect that the embedding of the watermark had on the original image quality. the new version would target a specific range of applications. video surveillance. In order to be more practical. The performance of the original concept needed to be tested and verified.2 used as a seed to the algorithm. with potential applications in. and ownership disputes – arenas where evidence of an indisputable nature is essential. The objective of the thesis was to come up with a commercially attractive watermarking system in hardware. Increased complexity in hardware implementations means increased area and 2 . Different applications require the utilization of different watermarking techniques. Upon further investigation of the proposed watermarking technique. The system added a pseudo-random digital noise to the pixel data. criminal investigation. A Linear Feedback Shift Register (LFSR) that was using the FPN based key as a seed generated the pseudo-random noise. the viability of the watermark under compression and the robustness of the watermark against common attacks. A watermarking technique that incorporates a high level of robustness with low image quality degradation was found to require a high level of complexity. for example. it had become apparent that a more sophisticated version needed to be developed. and no universal watermarking algorithm that can satisfy the requirements for all kinds of applications has been presented in the literature. biometric authentication.

but robustness requirements are liberal on the other hand. The compression algorithm does not quantize the DCT frequencies evenly but rather uses quantization tables to define different levels of quantization for each frequency [4]. In the DCT form the image is represented in the frequency domain. At the same time. techniques that utilize fragile watermarking are inherently non-robust. Therefore the appropriate application to target would be one in which the hardware implementation introduces a major advantage on the one hand. for both video and still images. The vast majority of available compression standards.3 power consumption. The authentication of images taken by remotely spread image sensors is an application that fits the aforementioned conditions. utilize the Discrete Cosine Transform (DCT) as part of the compression algorithm. A portable device that works in real time would obviously be able to make the best of efficient hardware-based processing. Compression is achieved by quantizing the DCT data thus reducing its size. and the higher the frequency is . The frequency domain representation is a more compact representation of the image.the less significant it is in the description of the image. Portable devices deliver captured data over communication channels. As a general trend. it is expected that the remainder of the data will also lie in lower frequencies. Modifying the quantization table changes the tradeoff between visual quality and compression ratio. These channels have limited bandwidth and therefore the raw image data must first be compressed. It is possible to 3 . The zero frequency (DC) component is the most significant as it holds the average intensity level of the transformed pixel data. while suffering a certain loss of accuracy.

tamper localization and better detection ratios are achieved. the watermark embedder can be naturally merged as an integral part of the compression module. it has long been shown that the LFSR can be easily crypt analyzed and its seed can be recovered by observing a short sample of the output sequence [5].4 embed a watermark in the quantized DCT data. efficient and low cost hardware implementation allows real-time. A new approach for designing secure RNGs in hardware was 4 . A novel-watermarking algorithm was developed in the frame of this thesis. which is also implemented in hardware. The watermark is embedded in the DCT domain and is intended to be implemented as a part of a secure compression module. it is divided into blocks (standard size is 8x8). Hardware-based watermarking is the most attractive approach for a combination with real time compression. The original RNG was an LFSR that has very good statistical properties and is most simple to implement in hardware. Further development has also been done towards a more secure watermark generation technique. before DCT and quantization. By uniquely embedding the watermark into each 8x8 block. Moreover. on-the-scene security enhancement of the data collected by any system of remotely spread sensors. However. Another built-in advantage of this approach is the fact that the image is first divided into 8x8 blocks. The watermark will be robust to any level of quantization that is less or equal to the level of quantization utilized by the encoder [4]. In general when compressing an image. A semi-fragile algorithm is suitable in image authentication applications as it is robust to the legitimate compression but sensitive to other malicious modifications. A fast.

Once the properties of the algorithm were established. the system was described in hardware using Verilog HDL and simulated in Modelsim. [7]. the design was synthesized to the onboard FPGA and the whole system was physically tested. 5 . This structure was found to provide a modular tool for designing secure RNGs that may be suitable for a wide variety of watermarking applications. after the hardware description had been verified. time. employing Gollman cascaded Filtered Feedback with Carry Shift Register (FFCSR) cores [6]. A CMOS image sensor may be employed as the input image data source. including the compression and watermarking modules. Finally. In addition to the RNG. location and others in each frame it enhances the security of watermarking a series of frames or a video stream. An evaluation board was utilized as a platform for the implementation of the prototype. The design process included algorithmic testing and verification in software using Matlab©. The evaluation board combines digital and analog front ends. The end goal for the development efforts in this thesis was to provide a proof of concept prototype that will demonstrate the feasibility of the implementation of the proposed system. as well as inherently providing authentic information about the circumstances involving the capturing of said image.5 developed. while the onboard Field Programmable Gate Array (FPGA) is used to implement the imager control signals and digital data processing. the generation unit may include a module to embed significant data in the watermark. By embedding identifying features such as date.

6 The thesis is organized as follows. chapter 2 provides background on the process of developing an imaging system with compression and watermarking capabilities. Chapter 4 presents all stages of the design process along with measured and simulated results. The proposed imaging system will be described in detail in chapter 3. 6 . Conclusions and future work are discussed in Chapter 5.

Invisible watermarking has the considerable advantage of not degrading the host data and not reducing its commercial value. while visible watermarking has received substantially less attention [8]-[16]. and no universal watermarking algorithm that can satisfy the requirements for all kinds of applications has been presented in the literature. Three main categories of watermarking can be identified: Fragile. Semi-fragile and Robust. for example. Figure 2. The invisibility of a watermark is determined by how it affects the image perceptually.1 Theory and implementation of watermark algorithms 2. adding the network logo on the corner of videos in broadcast TV. the identifying image is embedded into the host image and both are visually noticeable. Therefore.2 shows examples of the original image and the image with a visible watermark. Digital watermarking can be classified into different categories according to various criterions. though no standard definition exists to explicitly 7 .7 CHAPTER 2: Considerations in the development of an imaging system with watermarking capabilities 2. Generally. Figure 2. most watermarking algorithms aim for the watermark to be as invisible as possible. Sometimes a watermark is intentionally visible. in which case.1. more research attention has been drawn to this field. Watermarking can also be classified according to the level of robustness to image changes and alterations.1 Watermark Classifications Different applications require utilization of watermarking with different properties. a watermark is either intended to be visible or invisible. First.1 shows a general classification of existing watermarking algorithms.

For those applications it is most efficient to use a semi-fragile algorithm that is designed to withstand certain legitimate modifications.8 determine which is which.1 General classification of existing watermarking algorithms the image he is observing has not been altered in any way since it as been watermarked. in most existing applications such modifications as lossy compression and mild geometric changes are inherently performed to the image.such a watermark is defined fragile. This might be the case in applications where raw data is used. Different applications will have different requirements. However. but 8 . while one would need the algorithm to be as robust as possible the other may be designed to detect even the slightest modification made to an image . A fragile watermark is practical when a user wishes to directly authenticate that Figure 2.

2 Examples of (a) the original image and (b) the image with a visible watermark The dependency of the watermark on the original content is another important distinction. An additional classification relates to the domain in which the watermarking is performed.9 to detect malicious ones. Figure 2. require that the watermark be detectable even after an image goes under severe modifications and degradation. Making the algorithm depend upon the content of the image. segment removal and others [13].[14]. some applications. however it complicates the algorithm implementation and therefore the embedding and extracting processes. is good against counterfeiting attacks. A watermark that answers these requirements would be called robust. including digital-to-analog and analog-to-digital conversions. cropping. such as copyright protection. scaling. The most straight forward and simple approach is a watermarking implementation in the spatial domain that relates to applying the watermark to the 9 . Finally.

for example by replacing the least significant bit (LSB) plane with an encoded one [17]. 2. It is possible to manipulate the detection algorithm in order to minimize one or the other. and the probability to miss a legitimate one (false negative). The capacity of the system is defined as the amount of identifying data contained in the cover image. according to the application. (1) Capacity (the term is adopted from the communications systems field [22]): in a watermarking system the cover image can be thought of as a channel used to deliver the identifying data (the watermark). the watermark is embedded in the transform domain and then it is inversely transformed to receive the watermarked image. Two other common representations are the discrete cosine (DCT) and the discrete wavelet transforms (DWT) [19]. (2) False detection ratio: this ratio is characterized according to the probability of issuing the wrong decision. 10 . It is comprised of the probability to falsely detect an unauthentic watermark (false positive). The value of this ratio is usually determined experimentally. That parameter is relatively hard to quantize and different measures such as peak signal-to-noise ratio (PSNR) or a subjective human perception measure may be applied [23].10 original image.1.[18]. (3) Image quality degradation: the embedding of foreign contents in the image has a degrading effect on image quality.[21] in which the image first goes through a certain transformation.2 Watermark Design Considerations Let us introduce a number of watermarking properties that affect design considerations.

counterfeiting [29] and transplantation [28]. The following subsections show how they are considered from different design point of views and indicate several trade-offs between them. Decrypting the original watermark is a cryptographic computational problem and is directly related to the capacity of the watermarking system. The second approach is to imitate the embedded mark and fool the watermark detector into believing the integrity of the image it is inspecting is intact. if the capacity of the watermark is large enough . this attack may not be computationally tractable. while the first approach requires the decryption of the encoded mark in order to produce a suitable watermark on an unauthentic image.11 These properties are elementary in every watermarking system and need to be carefully appreciated. There are two approaches for an attack. Known attacks are cover-up [28]. but only one that would be close enough to pass over the detector’s threshold. 2. without affecting the watermark data embedded in it. In the cover-up attack. the potential for such an attack is even greater (compared with the cryptographic case) as the attacker does not have to find the exact key. Knowledge of the embedding and extracting methods is assumed. And still. the attacker simply replaces parts of the original watermarked image with parts from other 11 .using a key of several hundred bits. In watermarking however. the second one aims to maintain the original mark on a modified image without knowing the mark itself.1.1 Robustness to Attacks An effective attack on fragile and semi-fragile watermarking will attempt to modify the perceptual content of the image.2.

12 . or a floor and the attacker wishes to hide a smaller object. It is shown in [31] that deterministic block dependency is not sufficient against a transplantation attack. he may do so by copying other blocks in such a way that the change would be perceptually un-noticeable. The transplantation makes even watermarking algorithms with block dependencies be vulnerable. as well as an image long watermark data sequence. (2) A Digital-to-Analog conversion. such as printing and then Analog-to-Digital conversion by scanning (can also be done by resampling). These attacks are only possible when directed at block-wise independent watermarking algorithms [30]. rotation. Attacks on watermarking for copyright protection are designed to cause defects to the embedded watermark such that it will be undetectable. The proposed algorithm incorporates non-deterministic block dependency. if the image contains homogeneous areas such as a wall. (3) Lossy compression and (4) Duplicating small segments of the picture and deleting others (jitter attack) [25]. For example. while still maintaining reasonable image quality. Such attacks may include one or more of the following: (1) A geometric attack such as cropping. The vector quantization and counterfeiting attacks use multiple images that are marked using the same watermark data in order to synthesize fake marked images. scaling etc.12 watermarked images or with other parts of the original image. This combination makes the attacks mentioned above ineffective. but the detector would still recognize a valid watermark on the copied block.

affectively degrading its quality. which is the measure for the capacity of that algorithm. here all the data is relevant.[26]. an important objective of a good invisible watermark is minimizing image quality degradation. the trade-off between the security (capacity) of the mark and the negative affect on image quality is straightforward. This introduces higher 13 . Adding a pseudo-random noise to each pixel embeds the watermark. Increasing the bit size of the mark increases the variance of the noise. Thus. That trade-off is discussed in the next two subsections. 2. that several parameters must be considered for each application.13 It is shown then. in order to optimize the use of counter-measures. In the early stages of the project [3].1. faking an un-authentic image using watermark data from a different authentic image would not work. In a content dependent watermarking system. Unlike in content independent watermarking where the detection algorithm disregards the cover image data. it is possible to increase the security of the mark by making it content dependent [19]. It relates directly to better false detection ratio. However. The goal is to maintain the required image quality desired for each application and still be robust to potential attacks. it also adds significant high frequency values to the original image.2 Image quality As mentioned. we have shown that for a blind content-independent algorithm. the embedded data and/or the embedding location are also a function of the cover image. To avoid such a significant degradation.2. especially in homogeneous parts of the picture.

added to the algorithm. When implementing in hardware. The motivation to keep the computational complexity low depends on the application and on the method of implementation. as will be described in the next subsection. increases the computational effort and hardware resources (such as memory and adders\multipliers) used. 2. but features a more secured mark without influencing the cover data severely. Other algorithms employ global and local mean values. Therefore.[16]. complex schemes can be implemented to withstand expected attacks. limit the algorithm level of complexity that can be computed in a given time frame. higher complexity requires additional hardware that means more area and additional costs. each additional feature. However. a partition of the image into blocks may be of use. temporal dependencies (in video watermarking) and a variety of extra features to enhance their performance [15]. computations must be done in a very short time period. for instance. an optimized scheme will be comprised of the 14 .2. Depending on the intended application.1. tamper localization is important. If. The speed and processing power of the computational platform at hand. it is obvious that in order to apply a more complicated algorithm. more complex embedding and detecting blocks would be required.3 Computational complexity Intuitively. one may consider embedding the watermark in the frequency domain. In real time applications.14 computational complexity. If the marked image is expected to go through lossy compression.

An attack on a fragile or semi-fragile system must specifically address the particular algorithm.1. and then run them through the benchmark and evaluate the performance by observing the quality of detection. which is used for evaluating the robustness of watermarking algorithms designed for copyright protection applications. third party. evaluation tools provides a good perspective on how well a watermarking system performs with respect to both known attacks and in comparison with other available systems. it is possible to evaluate robustness to specific attacks by manually adding them to this benchmark. for fragile and semi-fragile.15 minimum number of features needed to satisfy the needs of the application it is designed for. The StirMark code. The Checkmark benchmark provides another framework for application-oriented evaluation of watermarking schemes [22]. The evaluation of fragile and semi-fragile watermarking algorithms is more implicit however. applies a series of attacks on a marked image [27]. an attacker could always potentially come up with a way to break the system. unless mathematically proven otherwise. Two common ways to characterize the quality of a watermarking system is to test its robustness for known attacks and measure performance using third party benchmarks. A designer can use the system under evaluation to embed a watermark in a series of test images. the 15 . While an attack on robust algorithms doesn’t have to tune the parameters of the attack to the specific algorithm. The use of such independent.3 Figures of Merit for Watermarking Systems As is with all security and cryptography related applications. In addition. 2.

using the secret key and an inverse algorithm. First the identifying data is encoded using a secret key. 2.3) can be meaningful. The system can be implemented on either software or hardware platforms. Then the encoded identifying data is embedded into the original image (I in Figure 2. for it would be too slow. As previously mentioned. The detector part is at the receiving end. like a logo. K. The system consists of watermark generation. custom designed attacks and theoretical analysis are required and it is not practical to consider a commonly used benchmark.2 Watermark implementations – Software vs.3). A pure software-watermarking scheme may be implemented in a PC environment. the watermark can be visible or invisible. Hardware Figure 2. or it can simply be a known stream of bits. or some combination of the two. As a result the attacker must be aware of the specific watermarking procedure in order to avoid modifications that may alert the detector. On the other hand it can be easily 16 . as it shares computational resources and its performance is limited by the operating system. The identifying data (W in Figure 2. Such an implementation is relatively slow.16 detector is designed to be highly sensitive to modifications. Finally. and it cannot be implemented on portable imaging devices that have limited processing power. The objective is to extract the identifying data embedded in the received image. It is unsuitable for realtime applications. Therefore.3 shows a scheme of a general watermarking system. as shown in Figure 2.3. embedding and detection algorithms. correlating the extracted mark with the original and applying a chosen threshold make the decision. The result is the watermarked image.

The system utilizes the advantages of software implementation by using resources needed to store image data. Figure 2.3 Scheme of general watermark system. This algorithm is designed for authentication and content integrity verification of JPEG images.17 programmed to realize any algorithm of any level of complexity. embedded in the coefficients of the block DCT. and can be used on everyday consumer PCs.[28]. The author directly addresses known issues in similar previous works [20]. 17 .[30]. In this work he proposes software implemented fragile watermarking. inserting additional complexity to overcome security gaps. The algorithm embeds the watermark only in a few selected DCT coefficients of every block in order to minimize the effect on the image. A good example of a software watermarking solution was presented by Li [19].

the system succeeds in facing several attacks without changing the effect on image quality – when compared to similar works. Using a combination of different security resources. While earlier implementations used more simplistic watermarking techniques. fast and potentially cheap watermark embedder. hardware implementations offer an optimized specific design to incorporate a small. such as LSB replacement [33].18 transform coefficients and watermark mappings. It is most suitable for real time applications. However.3 State of the art in hardware watermarking In the last few years we have seen a significant effort in the field of digital watermarking in hardware. This effort is mainly concentrated in implementing invisible robust watermarking algorithms in hardware. including a non-deterministic mapping of the location of coefficient modulation and block dependencies. Moreover. minimizing any unexpected deficiencies. suggesting suitability for future hardware implementation as well. the computations involved in the embedding process are kept relatively basic. hardware implementations usually limit the algorithm complexity and are harder to upgrade. 2. The algorithm must be carefully designed. In a full imaging system that includes both the imager and watermark embedder. where computation time has to be deterministic (unlike software running on a windows system for example) and short. as it is certain that the data entering the system is untouched by any external party. In opposite to software solutions. more recent 18 . the system security is improved. Optimizing the marking system hardware enables it to be added into various portable-imaging devices.

However. Table 2. The presented implementations mainly focus on implementing algorithms previously presented in software on a real-time platform. the combined system 19 . Still most of the work is concentrated on the implementation of robust watermarking algorithms for copyright applications and its subsidiaries such as broadcast monitoring. robustness against known attacks and tolerance to legitimate compression. This implementation addresses a field of applications. The proposed watermark algorithm is intended to be integrated with real-time. Thus. On the other hand. The reader is referred to the chapters below for detailed presentation and explanations of theses improved properties.19 publications have also introduced incorporation of more sophisticated embedding procedures [34].1 is based on this survey and presents many of the studies published in the field. It is demonstrated how the watermark embedder can be naturally integrated as a part of a JPEG compressor. In one of the most recent surveys of hardware digital watermarking [32] a comprehensive review of hardware watermarking related topics is given. which is still not treated in previous work. fragile watermarking has been given much less attention. The algorithm and system proposed in this thesis offer improved properties in terms of hardware utilization. The table shows the variety of different watermarking applications and possible research directions. it seems that potential attacks on fragile and semi-fragile algorithms such is counterfeiting and collusion are not covered well [33]. It also shows that most of the work has been done on spatial domain watermarking. pipelined JPEG encoders.[42].

20

could serve both still image sensors as well as M-JPEG video recorders. The HW implementation of the watermark embedder consists of simple memory and logic elements. It features minimal image quality degradation and good detection ratios. Table 2.1: Existing work in hardware digital watermarking research
Research Work Garimella et. al. [33] Mohanty et. al. [34] Tsai and Lu [35] Mohanty et. al. [36] Hsiao et. al. [37] Seo and Kim [38] Strycker et. al. [39] Maes et. al. [40] Tsai and Wu [41] Brunton and Zhao [42] Mathai et. al. [43] Vural et. al. [44] Petitjean et. al. [45] Platform ASIC ASIC ASIC ASIC ASIC FPGA DSP FPGA/ASIC ASIC GPU ASIC Not implemented FPGA/DSP WM Type Fragile Robust-Fragile Robust Robust Robust Robust Robust Robust Robust Fragile Robust Robust Robust Application Image Image Image Image Image Image Video Video Video Video Video Video Video Domain Spatial Spatial DCT DCT Wavelet Wavelet Spatial Spatial Spatial Spatial Spatial Wavelet Fractal

20

21

CHAPTER 3: The proposed imaging system This chapter will provide a general description of the proposed imaging system followed by a more detailed discussion of the different building blocks. Figure 3.1 presents an overview of the system. Naturally, the system is designed as a pipeline where each stage adds latency but does not compromise the overall throughput. Every stage has an initial transition phase, after which it issues a new valid output on every clock. The length of the transition phase is the latency of the system and is the total length of the transition phase times of all the stages in the pipeline. 3.1 Image acquisition and reordering Image acquisition can be done using any digital imaging device. However, employing a CMOS image sensor provides an opportunity for a higher level of integration. In the most general form, a raster scan digital pixel output is considered. A dual port memory buffer, capable of storing 16 rows of raw image data is used to reorder the pixels in 8x8 blocks. Outputting the pixels in a different order than that in which they are input performs reordering. Data is sent forward to the DCT module 55 clocks before 7 complete rows have been stored in the memory buffer. This is done to ensure that valid data for the DCT module is present in the memory buffer on every clock from this point on. The DCT module processes the data that is stored in rows 1-8 of the buffer memory block after block, while new data is stored in rows 9-16. Once all the data in rows 1-8 has been processed, new data is again written there, while the data in rows 9-16 is being processed.

21

22

The buffer memory size and latency depend on the size of the image sensor. For example, for a 1M pixel array having 1024 rows and 1024 columns with 8 bits per pixel, the size of the memory buffer is 128Kbits and the latency would be 8137 clocks.

Figure 3.1: An imaging system with watermarking capabilities 3.2 Compression module 3.2.1 DCT based compression Following is a brief overview of DCT based compression, for a detailed discussion, the reader is referred to literature [46],[48]; it has been shown that transforming a spatial

22

The Human Visual System (HVS) was found to be less sensitive to changes in higher frequency components [48].2 presents the quantization table suggested in the JPEG standard. the image is represented by its spatial frequency components. Therefore.23 image into the DCT domain provides a more compact presentation of the information contained in the image [48]. In addition. the DCT presentation of natural images is considered to be a good approximation of the Karhunen-Loeve Transform (KLT) that is the most compact representation. Figure 3. Figure 3. many of the low value 23 .2: Example quantization table given in the JPEG standard [4] Compression is achieved by applying different levels of quantization to different DCT coefficients. Dividing the value of the coefficient by the quantization value assigned to it does quantization. Furthermore. representing the image in the DCT domain enables compression by concentrating the data in fewer coefficients and further identifying those portions of the data that are more visually significant. Each of the 64 DCT coefficients is assigned a specific value for quantization. In fact. Division by a larger quantization value results in a more coarse representation of the coefficient (more information is lost in the process).

24

higher frequency coefficients get zeroed out in the process thus increasing the compression ratio. Different applications use different quantization tables. Of course, if a higher level of compression is required, higher quantization values are used. In the scope of this work a very important property of DCT based compression is that once an image has been compressed with a certain quantization level, it can be recompressed with a smaller level of quantization without incurring any further loss of information. Therefore, embedding a watermark in the quantized coefficients ensures that the watermark is robust to DCT compression with a quantization level equal or less to the quantization level used during the watermarking process [19]. To complete the compression and translate the reduced amount of data into a representation using fewer bits, entropy encoding is applied. In JPEG, entropy encoding involves run-length encoding followed by Huffman coding or arithmetic encoding [4]. Figure 3.3 presents a schematic view of a DCT based compression module in hardware.

Figure 3.3: Schematic of a DCT based compression encoder 3.2.2 Implementation of the compression module in the proposed system A DCT transform module was implemented following a design available from Xilinx Inc. [46], based on the architecture described in Figure 3.4. This implementation takes 24

25

advantage of the separable property of the DCT transform, i.e. the 2-D DCT transform can be calculated as a series of two 1-D DCT transforms, where the first transform is applied in one direction and the second is applied on the orthogonal direction. Using vector form, the 8x8 DCT transform Y of an input block X is given by Y = CXCt, where C is the cosine coefficients and Ct are the transpose coefficients. In hardware this is realized by storing the output of the first 1-D DCT in a transpose memory buffer line after line, then applying a second 1-D DCT transform on the columns of the result.

Figure 3.4: Schematic of a HW DCT transform module It is much easier to implement multiplication in HW then division, especially in FPGA devices, where designated multipliers are often available. It is possible to predetermine the inverses of all 256 possible quantization values and storing them in a ROM. Multiplying the DCT value by the inverse quantization value gives the desired quantized DCT coefficient. The entropy encoding modules were not necessary for the purposes of this work. The proposed watermark embedder processes the quantized output of the DCT transform module. The entropy encoding stage merely changes the way the data is represented but does not lose any more information in the process. Therefore encoding the watermarked

25

26

data only to decode it back to the exact same form does not provide any additional insight. 3.3 Watermark embedding module 3.3.1 The novel watermark embedding algorithm Presented here is a novel watermarking algorithm that allows a very simple and efficient implementation of the watermark embedder in hardware. The algorithm modulates N cells in each DCT block. The values of the processed DCT block are considered along with the values of its neighbour to the left in order to choose which cells are to be modulated and in what way. A secret pseudo-random sequence, with the same length as the image, serves as the watermark data. It is used to mask the operation of the algorithm and resist attacks. As shown in Figure 3.5, the original image is divided into 8x8 blocks indexed I1-IMxN. After DCT transformation the DCT blocks J1-JMxN are reorganized in blocks of size 1x64, according to the zigzag order, shown in Figure 3.6. Let us consider the watermarking procedure for the block I3 of the example image I. Figure 3.7a-b present J3 and J2 that are the DCT transforms of the blocks I3 and I2 respectively. Figure 3.7 c is the secret watermark data W3 generated for I3. The binary matrix P3 in Figure 3.7d is the logical AND between J3, J2 and W3, thus enabling dependency of the procedure on the neighbouring block J2 and masked by the secret watermark data W3.

26

6: Reorganization of the DCT data in the Zigzag order The matrix P3 is used to embed the watermark in J3. Still following the zigzag order. the two groups are marked by different backgrounds. indicate the two cells that are going to be modulated in J3. the cells are alternately divided into two groups such that the first cell belongs to the first group.27 Figure 3.7d) in the matrix P3. Considering N=2 (this will be assumed in all the examples from now on) and following the zigzag order. the next cell to the second group and so forth until all cells have been assigned. In the example. cells 47 and 43. The remaining cells in P3 determine the LSB of the indicated cells.5: 8x8 Block DCT conversion and 1x64 Zigzag reordering Figure 3. The bits in each group are XORed to 27 . the first two non-zero cells (marked black in Figure 3.

Simulations show that even for N as low as 2. additional processing can aid detection.28 determine the corresponding LSB value for the designated two cells. The embedder only needs to change the value of cell 43 from 2 to 3. the results of XORing the cell of each group both yield a value of 1. The input is the watermarked image in compressed format. In the example. while at the same time image quality is reduced. Only blocks that are distinctively homogeneous and have very low values for mid and high frequency DCT coefficients are problematic. as N is increased so does the robustness of the algorithm. that embedding the mark would only have a slight effect on the hosting block data. The detection procedure is a lot similar to that of the embedding procedure. The matrix P3 is created in the exact same manner as for the embedding process. a comparison is made to verify that they are indeed equal to the expected modulation value. In principal. 28 . A block that produces less than two non-zero cells in a matrix P is considered un-markable and is therefore disregarded. it is reasonable to assume that an attacker would attempt to cover continuous surfaces rather than isolated spots and so morphological closure on the output detection image can improve detection ratios. only that instead of modifying the cells. It is obvious then. In particular. The modulated cells are identified. performance is satisfactory. The detector first decodes the data to receive the quantized DCT data. Taking advantage of the software platform.

8 presents a schematic view of the hardware implementation of the watermark embedder. Figure 3. Each clock the module receives a 12 bit quantized DCT coefficient. Jb(i). 29 .2 Implementation of the embedding module in HW Let us examine how the watermarking of the example block J3 is done in hardware.29 Figure 3.7: Example DCT data for blocks J3. from the Zigzag buffer and a watermark bit Wb(i) from the watermark generator as inputs. J2 3.3.

Recall that it is now required to divide the remaining Pb values into two groups. getting JWMb-1(i). If the value of the index i is found in the Ind register then the corresponding bit in the Val register is written to the LSB of Jb-1(i).Jb-1(i). The modulator reads the data registered in the Previous Ind and Val registers. These values are then copied to the registers marked previous to make room for new calculations. The value of the index i will be recorded by the Ind register for the two first non-zero occurrences of Pb(i). thus using the associative nature of the XOR operation to progressively calculate the value of the XOR between all the bits in the group.Wb(i)) is calculated.30 Here b is the number of the block within the current frame and i is the number of the DCT cell within the block b. not before XORing it with the current value of the register. 30 . When i turns to zero i. the Ind register holds the indexes of the cells where the mark is to be embedded. Jb(i) is stored in the DCT data buffer to be used in the watermarking of the next block. Jb+1.e. while the Val register holds the values for the LSB of these cells. all of the block Jb has been read. This is done by alternately referring Pb(i) into the cells of the Val register. The value of Pb(i)=AND(Jb(i). Pb(i) is forwarded on to the Current “Ind” and “Val” registers.

8: Schematic description of the watermarking module implementation in HW 3.e. using a Finite State Machine (FSM) may generate a pseudorandom sequence. The initial state will be 31 . the prediction of future samples or otherwise regeneration of the sequence based on observation should not be possible. S n = S n +t .31 Figure 3. i.4. a pseudorandom sequence has low cross-correlation values between different samples. the initial state will determine the whole sequence. Any sequence generated by an FSM will eventually be periodic. In hardware. In general.4 Watermark generation 3. ∀n ≥ n0 .1 RNG based watermark generation Pseudorandom sequences are sequences that have similar statistical properties as true random sequences but still allow regeneration. where Sn is the n-th bit of the sequence and t is the length of the period. For a pseudo Random Number Generator (RNG). no repeating patterns and when security is a requirement.

The linearity or non-linearity of the RNG will determine the mathematical tools used to analyze the output of the RNG. such as Linear Feedback Shift Register (LFSR) with a linear feedback function and Feedback with Carry Shift Register (FCSR). In order to increase the complexities. An RNG can be used as a watermark generator. both sides are able to generate the same watermark data sequence. The 32 . is a good example of such a combination [7]. where the non-linear output of the FCSR core is linearly filtered. the output will be much more robust to cryptanalysis. The size of the RNG (the number of bits in the shift register) is in proportion to the key range. As watermarking is a security oriented application it is important to have secure RNG design and size. which has a nonlinear feedback function. In this case. one of the common methods is to combine different RNG architectures to get one that is more secure [49].32 determined according to a secret key. The vast majority of proposed RNGs are based on the use of feedback shift registers (SR) where the input bit is a function of the current shift register state. Different feedback functions can be implemented. By sharing the knowledge of the secret key (which is much shorter than sharing knowledge of the whole watermark sequence). Thus. Identical RNGs are to be implemented in both the embedding side and the receiving side. sequences from both LFSR and FCSR can be easily recovered from their outputs using cryptanalysis [49]. A secure RNG is designed in such a way that a potential attacker would have to consider all the possible secret keys in order to regenerate the sequence. A Filtered-FCSR (F-FCSR).

It is clear that the sequence sn is periodic.2. qi ∈ {0. and a polynomial u(x). The shift register is driven by the sum modulo 2 (XOR) of some bits of the overall shift register state. the mathematical tools of finite binary fields are used. defined by the initial state. However. An m-sequence is a sequence with the maximal 2k-1 period.9 shows the generalized Fibonacci implementation of an LFSR. This function is determined by the connection polynomial q ( x ) = ∑ i =0 qi xi − 1. The polynomial S ( X ) = ∑ n =0 sn x n ∈ GF ( 2 ) [[ X ]] where s ∈ {0. In order to be able to analyze the properties of the output sequence. 3. they are not very secure and given 2k bits of the sequence. It can be shown that S(x)=u(x)/q(x) [5]. since the LFSR has a maximum of 2k different states. k 33 . Figure 3.4. If q(x) is a primitive polynomial then sn is an m-sequence.1} .33 maximal proportion between the shift register size and the key range is 2n-1. the Berlekamp-Massey algorithm can be used to regenerate the whole sequence.1} . where n is the number of bits in the shift register. is the ∞ generating function for the output sequence.4. m-sequences have very good statistical properties and are distributed uniformly. As mentioned in [5].1 The LFSR In an LFSR the input bit is a linear function of its previous state.2 Existing RNG structures 3.

34 Figure 3. Another intuitive association is to digital arithmetic i= 0 ∞ where Z2 can be associated to an infinitely large 2’s complement system. The set of 2-adic integers is denoted by Z2.1} .2. shown in Figure 3. The sequence can no longer be analyzed using finite fields and the related structure is 2-adic integer numbers [5].4. 34 .10.2 The FCSR The Galois mode implementation of an FCSR. To receive a strictly (without any transient phase) periodic sequence s. An important observation is that −1 = ∑ i =0 2i . s ∈ {0. is an expansion of the LFSR where instead of sum modulo 2. A 2-adic integer is formally a power series s = ∑ n= 0 sn 2n .9: A generalized Fibonacci implementation of an LFSR circuit 3. a carry from the last summation is added. A 2-adic integer can be associated to any binary sequence.[47]. which can be understood if we ∞ ∞ consider the result of 1+ ∑ 2 i = 0 . This introduces non-linearity and enhances the security of the output sequence. using the FCSR shown above.

to receive the sequence s. i. Figure 3.2. The linear filter is a XOR gate with selected bits of the FCSR serving as inputs. where q must be odd and negative and p<-q. The period and statistical properties of the output sequence depend only upon q. In that case. would be equal to (|q|-1)/2.. The 2adic integer q is the connection integer of the FCSR. it determines where an addition device will be added between two cells and where the bit would simply be shifted. q must be a prime number with 2 as a primitive root. If q is odd and p and q are co-prime. The FCSR is not a secure RNG and similar to LFSR. The parameter p is related to the initial state of the generator.e. 2q-1-1 is divided by q. The integer p is a function of the initial state. the smallest integer t such that 2t≡1 mod q.10: Galois implemented FCSR The FCSR basically performs the 2-adic division p/q.3 The Filtered FCSR (F-FCSR) As its name suggests. the period T. available cryptanalysis methods may be used to recover the secret key by observing a short minimal output sequence. In order to achieve the maximal period. the F-FCSR utilizes an FCSR in addition to a linear filter. That is. It is suggested 35 .4. 3. the period of s is the order of 2 modulo q.35 we need to consider two co-prime integers p and q.

Here. The addition of the linear filter breaks the 2-adic nature of the FCSR and introduces a new mathematical structure which is nor 2-adic neither linear. When identical 36 . By creating an initial pool of relatively small-sized core RNGs it is possible to construct a custom sized generator without any significant design effort. In other words. This undefined structure makes the F-FCSR robust to any known cryptanalysis methods [7]. introduced in [6]. as shown in [7]. is depicted in Figure 3. cascading is used to complicate the structure of the RNG and make it more secure. if the output sequence is denoted by s = ⊕ iN=−1 fi M i then fi=qi+1. we proposed to use cascading to achieve a high level of modularity for the designer. The implemented RNG is comprised of several fundamental F-FCSR building blocks connected in series. Earlier cascades utilizing AND functions resulted in a very low clock rate for the registers further down the cascade.4.3 RNG based watermark generator design method and implementation A novel design technique involving a Gollmann cascade with F-FCSR cores was proposed and implemented. Shift register based RNGs are cascaded by making the registers be clock controlled by their predecessors. Originally.36 that if qi equals one then Mi-1 is connected to the filter. 3. The important feature of this method of cascading is the use of the XOR function for the coupling of the register clock. where fi indicates whether Mi 0 is connected to the filter or not.11. Utilizing cascades offers a straightforward and simple approach to enhance the performance of many systems. The period of a cascaded generator is the product of the periods of the different cores. A Gollman cascade RNG.

45 frames.37 cores are used T=(T’)l. The tool was utilized to design a 22-bits RNG composed of two 11-bits cores connected as a Gollman cascade [50]. T’ is the period of one core and l is the number of the cores in the cascade. Using F-FCSR cores in a Gollmann cascade structure makes the best out of both concepts. Considering a 256x256 sensor array as an example.568. In this implementation the periodic binary sequence is used directly as the secret watermark. the watermark will repeat itself every 54. Each core is inherently secure and the design complexity remains simple.321 bits. 37 . where T is the period of the cascade. In watermarking applications the modularity of the design method allows for worry free adjustments of the RNG size according to the specific implementation requirements. Figure 3. The detector would need to have knowledge of the starting frame and the secret key in order to extract the correct watermark. The implemented RNG outputs a binary sequence with a period of 3.11: A Gollman cascade RNG This structure was found to provide a modular tool for designing secure RNGs that may be suitable for a wide variety of applications.

1 Software implementation and algorithm functionality verification First. the PSNR is still very high at 43. This is facilitated by the homogeneous nature of the surrounding neighbourhood. The cover-up attack is applied to the watermarked sample image. To an innocent observer the original existence of the airplane in the image is visually un-detectable. Two areas of the image have been modified after the cover-up attack has been applied to the watermarked sample image. The detection map is used to indicate which blocks of the image are suspected as inauthentic. the algorithm performance was evaluated using a Matlab© simulation. On the upper right corner. A sample image is embedded with a watermark according to the proposed algorithm. the airplane in the original image is removed by copying the contents of adjacent blocks onto the blocks where the airplane is supposed to appear.4dB and the difference between the images is practically un-noticeable.1(d).1(b) is the image after it was compressed and embedded with the watermark. Figure 4. Figure 4. The attacked image is analyzed by the watermark detector that outputs a detection map. The tampered image is shown in Figure 4.1(c).1. This example shows that the watermark 38 . The results of a simulation of the proposed algorithm are presented in Figure 4. With only 35% of the DCT data being non-zero.38 CHAPTER 4: Implementation. Both modifications would be easily noticed using the detection map created by the WM detector and presented in Figure 4.1(a) shows the original 128x128 example image. A more easily noticeable example of such modifications is shown on the lower left corner of the image where the reflection of the sun is partially removed. testing and results 4.

Figure 4.1: Algorithm Matlab© simulation results 39 .39 is as effective on homogeneous surfaces (where only a small portion of the DCT data is non-zero) as it is on high spatial frequency surfaces.

and it sums up to extra registers and larger multiplexers. the objective of the benchmark is to apply known attacks on a marked image such that the mark would be undetectable. becoming more significant as the quantization level increases. The user merely needs to embed a mark.1. There. As mentioned before. summarize the effect of changing N along with the level of quantization.2 Algorithm performance evaluation The results shown above are achieved while using N=2. in terms of PSNR. in robust algorithms it is possible to utilize third party benchmark with relative ease. shown in Table 4. Table 4.1 shows that the difference is less than 0. That means only two cells in each block of DCT data were modulated. 40 .2. The results. As to image quality. The cost of increasing N is additional hardware and a reduction in image quality. Several experiments have been conducted to examine the optimal number of cells to modulate. In terms of hardware costs the difference is reasonable. N=4 exhibits slightly better detection ratios. then run the marked image through the benchmark and then try to detect the mark. a larger N should be considered when an aggressive level of quantization is desired.40 4. 4.1 Fragile watermarking and benchmarking It is important to emphasize that benchmarking for a fragile watermarking algorithm is a tricky issue. The level of quantization is indicated by the average ratio of nonzero cells in a block after quantization.5dB that is mostly negligible. Therefore.

41 In fragile algorithms however.1 and 41 . this flow is not practical. Therefore running a marked image through any of the available benchmarks would simply damage the watermark such that the detector thinks (rightfully) the image has been tampered with. Original Image Monkey Lena Figure 4. The objective of a potential attacker is to make modifications to the marked image without damaging the embedded mark. An attack on a fragile watermark must consider the specific watermarking algorithm as it needs to try and imitate what the authentic embedder is doing in order to be able to fool the detector.2: additional sample images WM Image Tampered Image Detection Zones The evaluation strategy taken in this thesis includes a combination of experimental results from the tamper detection on sample images such as the ones presented in Figure 4.

8 43.7 90. The schematic of the test setup on the evaluation board is given in Figure 4.77 Detection Ratio [%] 82.0 28.3.7 76.3 Hardware design and verification The watermark embedder block was described using Verilog HDL.85 45. Quantization-Level Tradeoffs Nonzero Cells [%] 43. First.42 Figure 4. as well as the necessary peripheral blocks.2. an image is copied onto the onboard frame memory. The DCT and inverse DCT transform blocks were borrowed from [46].6 94.3 4. quantized and arranged in the zigzag order according to the procedure described above.0 N 2 4 2 4 2 4 PSNR [dB] 45.0 89. The data output from the memory is treated as if an image sensor generated it.24 39.22 38. A preset watermark sequence is 42 . In addition. The onboard Cyclone FPGA is used to implement the proposed watermark embedding architecture.59 41.0 21. The memory is used to emulate a digital image sensor.7 90. Table 4.64 41. A detailed proof of the sufficiency of these measures is given in [19].1: N vs.8 28. The data is then DCT transformed.0 21. An evaluation board was employed to assess the performance of the algorithm when synthesized to an Altera Cyclone FPGA device. the algorithm is inherently resilient to known attacks against fragile watermarking thanks to inter block dependency and a non-deterministic choice of the watermarked cells within each block.

The output is the original image that now contains the watermark. . Figure 4. Finally the marked DCT data was stored in a file and analyzed to verify it was marked correctly. [51] for a complete 43 JPEG compressor system. The design was synthesized to an Altera Cyclone EP1C20 FPGA device using the Altera Quartus II design software. the DCT data is de-quantized and rearranged before it is inversely transformed back to the spatial domain. the design was also mapped to an Altera FLEX10KE FPGA such that it is possible to compare the results with the results of Agostini et al. (a) shows the input image to the watermark hardware test setup and (b) presents the watermarked image at the output.3: Test setup schematic 4. The Verilog module reads the DCT data from the virtual buffer the same way it would do with the output of the zigzag buffer. The DCT data was embedded with the watermark. The DCT data of a sample image was pre-calculated in software and read to a virtual buffer.43 used by the watermark embedder module to embed the watermark in the DCT coefficients of the test image. Finally.1 Hardware Experimental Results The Verilog description has been verified through HDL simulation and experimented on the evaluation board.3. In addition.

The results clearly show that the watermark embedder can easily be added to an existing JPEG compressor. The addition does not affect the desired throughput and requires a negligent addition of hardware resources and power. compared to the complete system to which it is added. even when that compressor is oriented at lowcost high throughput applications. Figure 4.4: Hardware watermarked image 44 .2 summarizes the performance of the three mappings in terms of hardware usage.44 Table 4. throughput and latency.

The evaluation board shown in Figure 4. The combined system would easily fit in the original FPGA device. Figure 4. any device that is large enough for the implementation of the JPEG compressor would be enough to accommodate the additional hardware required for the watermark embedder block.73% to the hardware of the JPEG compressor. 4. The evaluation board has an LVDS I/O port for fast communication with a digital frame grabber.2: FPGA Synthesis Results Design WM Embedder FLEX WM Embedder Cyclone Agostini et al. In general.5 presents the necessary elements required for the physical implementation.93 MHz 209.16 MHz Latency 64 64 4844 7436 39.6 was utilized as the implementation platform.45 Table 4. 45 . A CMOS imager with an internal ADC and analog biasing circuitry is employed as the test imager.84 MHz 238 The hardware embedder would add 132 more logic cells that is a negligible addition of 2. The onboard FGPA device provides memory and a platform for control signals generation and digital image processing. the proposed imaging system was physically implemented.4 Physical proof of concept implementation As part of a commercialization effort and to provide further validation. [16] JPEG compressor FLEX Logic Cells 132 113 Memory Bits 744 744 Frequency\ Throughput 187.

4. There are two important attributes to 46 .1 The CMOS Image sensor Following is a brief description of the properties and mode of operation of the test CMOS image sensor.5: A general implementation of an imaging system Figure 4.46 Figure 4.6: Mixed signal SoC fast prototyping custom development board 4. A 256x256 pixels rolling shutter CMOS image sensor with an internal 12 bit pipelined ADC was borrowed from [52].

rolling shutter operation introduces a non-continuous readout sequence with a setup phase for performing row reset and analog pixel data readout operations. offer insight on the applicability of the proposed system to a common imaging system setup. The 12 bit pipelined ADC introduces a latency of 6 clock cycles.4. describing the internal design structure of the FPGA is given in Figure 4.7: Internal structure of the FPGA digital design 47 .7. The FGPA device handles I/O communication with multiple devices on the board.47 consider. A schematic diagram.2 Digital signal processing and control All the digital signal processing and control was performed on the onboard FPGA device. Theses two attributes. while requiring special attention. 4. Figure 4.

48 . which in turn handles communications with a PC. The image data can be transmitted to a digital frame grabber without further processing. The microcontroller handles the CPU interface in a similar manner to an external memory. The timing and length of the pause are determined according to the synchronization signals. It has a capacity of sixteen lines allowing alternating read/write operation where eight lines are being written to the memory while the other eight lines are being processed. The analog data of every pixel in the line is then sampled by the ADC one pixel at a time. An input memory buffer is used to reorder the pixels. According to the imager interface operation the data is written one line at a time with a pause between every two lines. Its size is 256x16 words. The control signal generation is based on a fundamental line sequence that is repeated periodically. A line setup phase takes place before the readout of every line.48 The CPU interface is responsible for communications with the onboard microcontroller. A sample clock and a decoder that controls an analog multiplexer handle the sampling. the decoder input and the latency of the ADC and generates synchronization signals including pixel clock. The imager interface is in charge of generating control signals to the CMOS imager and of receiving and synchronizing its output pixel data. JPEG requires that the pixel data be input in 8x8 blocks. In this implementation the encoder/embedder receives as input the pixel data and synchronization signals generated by the imager interface. line enable and frame enable. This allows the user to change internal FGPA register values online. The imager interface accounts for the sample clock.

Table 4. followed by a pause to wait for the next eight lines to be completely written into the input memory buffer. the data from the input memory buffer is read eight lines at a time. To comply with the pipelined nature of the encoder.9% of the LCs used by the encoding module. input enable has been added to all the registers in the design such that it is possible to freeze the module state at any point without loss of information. The 49 . Data is read out in chunks of eight lines at a time with suitable synchronization lines generated for the transmission of the watermarked data to the frame grabber. an output memory buffer is used to reverse the 8x8 blocks order back to standard rolling shutter. When valid watermarked data is output from the inverse DCT module.49 While the general architecture of the encoder is meant to function as a pipeline. Instead. The encoding module and its interface make the main demand for hardware resources. a data output enable for the entire encoder is turned on. The inclusion of the watermark-embedding module including the RNG adds only 113 logics cells (LCs). Every element in the encoder has a data output enable signal to facilitate synchronization between different elements in the design. it is impossible to achieve a completely continuous operation. Finally. which is less than 0. after the watermark has been embedded and the watermarked DCT data has been inverse transformed back into spatial pixel data.3 summarizes the resource utilization in the final implementation of the design. The data is then written to the output memory buffer that operates in a similar manner to that of the input memory buffer except it is in the reverse direction. it is obvious that due to the imager mode of operation.

An example image. Table 4.8: Sample output image from the physically implemented system 4.3: Resource utilization by modules in the overall design Module Overall design Imager interface CPU interface Encoder + buffers WM embedder RNG Logic Cells 13471 198 282 12573 113 30 Registers 8307 103 154 7905 78 28 Memory Bits 66280 0 0 66280 744 0 (a) output image w/ watermark (b) output image w/o watermark Figure 4.50 watermark-embedding module also utilizes 744 memory bits to store the DCT values of one block.4. about 1. taken by the implemented 50 .3 Output image capture A National Instruments (NI) PCI-1428 digital frame grabber card is used to capture the image output from the evaluation board. NI LabVIEW© software is used for the analysis and presentation of the captured data.1% of the memory used in the overall encoding module.

51 .8.51 system is given in Figure 4. The system offers the option to produce a watermarked image and a reference image that only goes through DCT and IDCT for the purpose of comparison.

the DCT transform and quantization are the most hardware intensive elements. Therefore. in order to provide a more commercially appealing implementation it has been determined that a more sophisticated design must be realized. watermarking in hardware was still an emerging field of research. An extensive literature survey has been conducted to explore existing watermarking methods and applications. However. Watermarking in the DCT domain was identified as having the potential to accommodate these requirements. In most existing watermarking algorithms in the DCT domain.52 CHAPTER 5: Conclusion 5. the addition of a watermark embedding module based on a low complexity algorithm would require very little overhead. This prototype implemented an elementary watermarking algorithm in the spatial domain [3]. a prototype has already been designed.1 Thesis summary This thesis has been conducted as part of an ongoing hardware watermarking I2I project. At the starting point of the thesis work. The watermark algorithm must incorporate security features but have low complexity. 52 . The objective was to come up with a concept that would allow the addition of a secure watermark in hardware without resulting in performance degradation and/or increasing costs significantly. Watermark embedding in hardware introduces an opportunity to enhance the security of a real-time imaging system but is also a design challenge. It was found that while much work has been done in the field of watermarking in software.

The findings of the extensive literature survey were presented in a paper that appeared in the 2008 IJ ITK [53]. however to this point with no 53 . As illustrated in the proof of concept physical implementation. 5. Currently. it is possible to apply the proposed design with little to no additional cost.2 Issues that still need attention and future work To allow a more in –depth view of the proposed system it is important to achieve a more reliable implementation of the DCT and IDCT hardware modules. An effort has been made to achieve better performance. As expected. the implemented modules have only limited accuracy.53 A novel watermarking algorithm for the DCT domain was designed. As DCT compression is widely used in most common imaging systems. The accomplishments achieved during the course of this thesis have contributed to the publication of four papers with the most recent results summarized in a fifth paper awaiting review. The novel watermarking algorithm was presented in the IEEE ICECS 2008 conference along with our proposed RNG design technique [54]. tested and implemented. The result of this limited accuracy is a significant distortion of the image due only to the DCT and IDCT modules operation. [55]. the design is expected to fit either discrete systems with separate chips for the imager and digital processing or systems integrating the complete design in an ASIC. The paper was submitted to the IEEE Transactions on Information Forensics and Security. the implementation of the algorithm resulted in a hardware increase of a mere 1%. Hardware synthesis and experimental results were described in a paper on the application of the proposed system to publicly spread surveillance cameras.

it was decided to make use of the imperfect modules in order to get basic initial results. In particular it is recommended that the detection performance be evaluated under real hardware conditions. Further analysis should consider the effects of data transmission over the communication channel. Hence it is still possible to demonstrate the functionality of the physically implemented system. These processors are microcontrollers with powerful arithmetic logic units 54 . as well as examine multiple quantization tables and levels. As a semi-fragile watermark technique that addresses mainly tamper detection. 5. further testing can be performed. Many real time applications use digital signal processing (DSP) dedicated processors for the implementation of digital processing algorithms. Because hardware implementation of the compression modules has only been approached as a subsidiary assignment (these are readily available for purchase and hold no novelty) in this thesis. While this analysis is a significant indication of the algorithms’ robustness. Presentable results were obtained by applying input images with reduced quality. it may be useful to custom design a test bench with known attacks on semifragile watermarking for further validation. wired or wireless.3 Possible future directions for development One of the important features of the proposed technology is its compatibility to a wide range of implementation platform. After the DCT and IDCT modules will have been improved.54 satisfactory results. robustness against potential attacks is mainly based on mathematical analysis of the nature of the algorithm.

However. MJPEG. Future work will include the implementation of an imaging system employing DCT based video compression. making the main challenge be the integration of the algorithm on the DSP platform along with the other existing components of the compression module.26x. At the prototype stage work was concentrated on still imaging compression. 55 . A video standard based system will also provide a chance to introduce temporal security features in the watermark generation unit.55 (ALU) specific for speeding signal processing related computations. Open source modules are available for both still and video DCT based compression standards. MPEG-x or h. It is expected that the proposed watermarking system be accommodated in a DSP based platform while allowing it to maintain performance.g. The proposed algorithm has been designed to be compatible with DCT based compression standards. it has also been determined that the algorithm is compatible with DCT based video compression standards. e.

"Digital Watermarking: A Tutorial Review". van Schyndel. Kobe. J. Lecture Notes in Computer Science Berlin. P. pp 22-33. J.. 2000. 219–222. 1994. 0216-0220 F. 1999 pp. 3rd IEEE International Conference on Industrial Informatics (INDIN '05).. 460-466. vol. Computers. pp. Memon. Osborne. F. vol." in Proc. 1998 K. pp.2000. 3. Gollmann. Barni. Nov. Z. Tanaka. G." In Advances in Cryptology . Image Processing. 2002. pp.716 Yadid-Pecht and R. Conf. Wang. "Attacks. Y. May 2005. . Nov. May. J." in Proc. pp. Switzerland. 2001. "Watermark embedding and extracting method and embedding hardware structure used in image compression system". Braudaway. Petitcolas. Han. 93-98. Nakamura and K. "Design and properties of a new pseudorandom generator based on a filtered FCSR automaton. Conf. Springer-Verlag. pp. Cot. Matsui. pp. Kuhn. Delp. R. vol. Image Process. 1996. International Standard 10918-1. pp. Ed. "CMOS image sensor with watermarking capabilities". no. Tirkel and C. Proc. Yadid-Pecht. Conf. "A watermark for digital images. 151. N. 2. 4.11. "Pseudo-random properties of cascade connections of clock-controlled shift registers." IEEE Trans. Imaging.csee. F.usf.pdf F. E. Conf. v4675 F. vol. F. Li. New York City. S. T. pp." in Advances in Cryptology: Proceedings of Eurocrypt ‘84. Chen. P.0-76952296-3/05. 1997.edu/~smohanty/research/Reports/WMSurvey1999Mohanty.August 2. "A DCT Domain Visible Watermarking Technique for Images". Lu. P. IEEE Int. Image Signal Processing. M. D. “Wavelet Transform Based Digital Watermarking for Image Authentication. 2002. Z. Berger and A. Potdar." in Proc.. 2006. Lecture Notes in Computer Science. Austin.” in Proc. Chang. pp. Etienne-Cummings.A Survey" Proc. et al. 2004." IEEE Trans. 694-697. P. Kuhn. Japan. 432–411.INDOCRYPT 2002. Kluwer Academic Publishers G." in IEE Proc. S. Images Processing. T. no.” in IEEE Trans. Aug. Anderson. Security and Watermarking of Multimedia Contents IV. Conf. 1992.56 References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] V. vol. G. "A survey of digital image watermarking techniques". Workshop. P. Li. Nelson. Proc. J. Hilton New York & Towers. III. Lausanne. R. 1985. IEEE Int. 1062-1078 R. “Effective and ineffective digital watermarks. Mintzer. pp. IEEE Int. Arnault. Arnault and T. Berlin: Springer-Verlag. Lu. R. Canada Sep. pp. Y. “Attacks applications and evaluation of known watermarking algorithms with Checkmark”.. D. vol. no. C. Berger. Acoustics. Holliman and N. 1525. M. Piva. Vancouver. B. Eds. 54. Image Process. Jullien and O. D. 2000. and M. vol. 86–90. 783-791. Speech and Sig. SPIE Electron. Mohanty. Meerwald S. Necer. "A digital watermark. IEEE 87(7) Jul. 5326-5329 ISO/IEC JTC1/SC2/WG10 Digital Compression and Coding of Continuous-Tone Still Images Draft. Tsai and H. 9–12. G.. H. 10. NY. 2005. 9. A. on Image Processing. in Proc. IEEE Int. "A new class of stream ciphers combining LFSR and FCSR architectures. Yeung. "Information Hiding . Symp. 2005. Pereira. “Attacks on copyright marking systems. T. vol. Germany: SpringerVerlag. on Circuits and Systems (ISCAS '05). 2002. G. vol. Electronic Imaging. Jan. M. US Patent 6 993 151. 3313-3316. Anderson and M. Image Proc. 209. 56 . Pereira. of SPIE. URL: http://www. 5. Proc. pp. "Improved Wavelet-Based Watermarking Through Pixel-Wise Masking. 218-238.” in Proc. T.” IEEE Proceedings of the Fourth Annual ACIS International Conference on Computer and Information Science. C. pp. Meerwald and S. H. C. IEEE Int. "Embedding Secret Information into a Dithered Multi-level Image" IEEE Military Communications Conference 1990 pp. TX. Vis. A. ser.Aucsmith. Bartolini and A. 5. of the IEEE International Conference on Multimedia and Expo.vol. vol. C. “Why is image quality assessment so difficult?. number 2551 in Lecture Notes in Computer Science. Beth. USA. no. Y. " CMOS imagers: from phototransduction to image processing". C. “Image authenticity and integrity verification via content-based watermarks and a public key cryptosystem”.” Information Hiding: 2nd Int.. Wolfgang and E. 6. 10. July 30. F. A. T. Tsai. “Counterfeiting attacks on oblivious block-wise independent invisible watermarking schemes. S. 1374-1383. Bovik and L. R. Ingemarsson. Lou and T. and M. IEEE Int. P. applications and evaluation of known watermarking algorithms with checkmark.2005. "Digital fragile watermarking scheme for authentication of JPEG images. M. Petitcolas. 3. and I. G. Mohanty. 709. A.

V. V. IEEE Int. Hawkes. and Sys. L. “ASIC for Digital Color Image Watermarking. P.. Proc. pp. 1998. 303–304." in Proc. Image Proc. D. G. Circ. Ed. Murugesh. H. Yadid-Pecht. vol. 2006 Handbook of Image and Video Processing By Alan Conrad Bovik Published by Academic Press.” in Proc. 53 (5) (2006) 394– 398.xilinx. Kundur. Eng. P.L.” in Proc. Ranganathan. Ranganathan and R. pp. A. pp. Symp. W. Video Compression Using DCT. S. 2005. "An Implementation of Configurable Digital Watermarking Systems in MPEG Video Encoder. IEEE Int.. Proc." in Proc. 11th IEEE Dig. 772–775. Kougianos. D. H. Memon.. pp. 2000. Balakrishnan. P. H. vol. Haitsma.Aucsmith. San Jose." in Proc. Y. no. ASIACCS ’06. Talstra. Tai. Multimedia and Expo (Vol. Fish. F. CA: IS&T and SPIE. Mahapatra. Maes. Eng.M. pp. Chang. Computer and Communications Security. Eng. “Periodicity. M. Conf. “VLSI Design of an Efficient Embedded Zerotree Wavelet Coder with Function of Digital Watermarking. Ranganathan. 775–779. Germany: Springer-Verlag. 46 (3) (2000) 628–636. Tomii. IEE Int. 2005. Mohanty. Eng. A. (Vol.” IEEE Transactions on Circuits and Systems II (TCAS-II). 3971. Taipei." in Proc. Sept 2008. pp. M. J. Sys. P. "Security Issue and Collusion Attacks in Video Watermarking. "VLSI Implementation of a Real-Time Video Watermark Embedder and Detector. “Attacks on copyright marking systems. Y." in Proc. T. ser." in Proc. 1999.). A. P. 292–295. Nicolai. S. IEEE Int.57 [25] F. Kalker. J.” IEEE Sig. P. Consumer Electron. S. S. 2). T. Li." in IEE Proc. Hsiao. 57-62. G." New York: Wiley. Soc." in Proc. Depovere. G. F. 216–217. P. 35. S. SPIE Int. Opt. P. . Opt. A. J. "Real-time Video Watermarking on Programmable Graphics Hardware. A.” IET Comp. and Elect. IEEE Int. Zhao.” IEEE Trans. IEEE Int.htm S. N. C. 1st Workshop on Embedded Sys. France.” Elsevier IJ Comp. G. pp. Ramanan. Lecture Notes in Computer Science. J. [online]. Dugelay. G. X. Canadian Conf. complementarity and complexity of 2-adic FCSR combiner generators” In Proceedings of the ACM Symposium on Information. Memon. A.D. Lu. C. A. C. pp. Workshop. 2nd Ed.. Depovere. P. 2003. Wong and N. Berlin. Techniques (CDT) 1 (5) (2007) 600–611. 1312–1315. pp.Y. Proc. 2. Rijmen.W. Tsai. 1). Electron. Niranjan. 2000. Vision Image Signal Process. Conf. Petitcolas. “VLSI architecture and chip for combined invisible robust and fragile watermarking. J. 2). Garimella. Apr. Kim. P. M. 2004.” in Information [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] Hiding: 2nd Int. Y. 218-238. Conf. V." in Proc. Sheikholeslami. Goljan and N. on Electronics. Soc. “Digital Watamarking for DVD Video Copyright Protection. Symp. Inc. Seo. pp. IEEE Workshop Signal Processing Systems. M. and Comp. S. Doerr. Mag. Strycker. G. (2008. K. pp. "Applied Cryptography. Haitsma. “A Simplified Approach for Designing Secure Random Number Generators in HW”. "A fair benchmark for image watermarking systems. “A Dual Voltage-Frequency VLSI Chip for Image Watermarking in DCT Domain. R. "Toward secure publickey blockwise fragile authentication watermarking. Linnartz. "Further attack on Yeung-Mintzer watermarking scheme. E. K. Circuits and Systems." Ph. L. 11th Int. 428-437. M. Mohanty. 2009. N. 1999. 1996. Satyanarayana. Barreto. Y. SPIE Int. Termont. 2. H. 2001. D. 2002. Wu. Jullien and O. and Its Applications (Vol. 2003. J. 1525. pp. Sig. C. dissertation. vol. 2003. Rey. H.com/support/documentation/topicaudiovideoimageprocess_compression. Consumer Electron. on Elec. no. Tsai. vol. P. Xilinx. J. Vural. Mohanty. M. Y. H. G. 17 (5) (2000) 47–57. Namballa.. "Secret and public key authentication watermarking schemes that resist vector quantization attack. Kougianos. J. D. Universite de Nice Sophia-Antipolis. G. “A Systems Level Design for Embedded Watermark Technique using DSC Systems. Kutter and F.. pp. G. Shoshan. Conf. J. E. “Real-Time Blind Watermarking Algorithm and its Hardware Implementation for Motion JPEG2000 Image Codec. 57 . Available: http://www. Kalker. R. “Hardware assisted watermarking for multimedia. K. Workshop on Intelligent Sig. vol. 148. "VLSI implementation of invisible digital watermarking algorithms towards the developement of a secure JPEG encoder. 2005 B. N. "Towards Real-time Video Watermarking for SystemsOn-Chip. & Dig.” in Proc. Malta. pp. Anderson and M. J. T. 339-358. "Video Watermarking For Digital Cinema Contents. 2003. Petitjean. 597–600. U. and Comm. Petitcolas. S. S. G. Schneier. 13th European Sig. Workshop. 3657. A. Mohanty. 183-188." in Proc. vol. Proc. Mathai. Vandewege. Fridrich. Conf. Kim and V. 3971. for Real-Time Multimedia. Gabriele. H. N. C. "An Implementation of a Real-time Digital Watermarking Process for Broadcast Monitoring on a Trimedia VLIW Processor. 88–93. Taiwan. 417-427." in Proc.. J. Kuhn. 2002. Imaging.S. Brunton. in proc. Maes. N. Yamauchi. Anand. 2005.

Available: http://dx. Shoshan. Shoshan. February).org/10. S.” in Proc. Conf.2. pp.3. G. pp.micpro. Shoshan. 1. Elec.doi. Yadid-Pecht. 2008. Yadid-Pecht. Fleshel and O. 31(8). A: Physical. Conf.. A. IEEE Int. Microprocessors and Microsystems [online]. V. 134. 368-371.1016/j. Agostini.3 MS/s Pipelines ADC. I. Malta. “A simplified approach for designing secure Random Number Generators in HW. Vol. Fish. pp. vol. A. 487-497.2006. IEEE Int. Y. Yadid-Pecht.58 [51] L. Y. Hamami. G. 372-375. A. Li. 119-125. 58 .” in Sensors and Actuators. "VLSI Watermark Implemetations and Applications. Fish.002 S.” in Proc. Circ. Elec. Y. and O. A. Jullien. Multiplierless and fully pipelined JPEG compression [52] [53] [54] [55] soft IP targeting FPGAs. Fish. and Syst. pp. Li... no. L. X. Jullien and O. A.02.V 12 bit 6. Silva and S.. X. “CMOS Image Sensor Employing 3. Yadid-Pecht. G. 2007. 2008. Jullien and O." in IJ Information and Knowledge Technologies. 2008. Malta. A. Circ and Syst. Bampi (2006. “Hardware Implementation of a DCT Watermark for CMOS Image Sensors.

I2 = dct2im(J.nc). [C LOC] = detect_hardware(I3. % quantization factor % Step 2 load WM_data.'NumberTitle'. q = 4.1).1. Simulation Testbench and peripherals A. % % LOC = imclose(LOC.'off') %watermarked image imshow(uint8(I2)).[6 2]). I3 = copyblock(I3.[6 3]).['Compressed Image'].[27 13]. Simulation Envelope NEW_HW_Sim.'off') imshow(uint8(I3)).H.se). I3 = copyblock(I3.[9 26].['Detection Zones']. I3 = copyblock(I3.A. J2 = im2dct(I.q.32). figure('Name'.q). %% simulate a cover-up attack I3 = copyblock(I2.'NumberTitle'. I3 = copyblock(I3.[6 1]).'NumberTitle'.'off') imshow(LOC).[6 3]). %optional closing stage to enhance detection figure('Name'.[27 4].[27 7].'off') %original after % decompression imshow(uint8(I21)).1.[27 1].[27 10]. %number of coefficients marked I = double(imread('testimage.[2 4]). se = strel('diamond'.['WM Image'].[27 18]. LOC = reshape(LOC.[27 13]. figure('Name'.32. figure('Name'. I3 = copyblock(I3.[6 3]).[7 26].q).ma MATLAB CODE % This code simulates the proposed watermarking system nc = 2.A.1.['Tampered Image'].'NumberTitle'.[27 12].q.[27 16].nc). 59 .59 APPENDIX A: A.bmp')). imshow(uint8(I)).[27 13]. I21 = dct2im(J.q). %load saved wm data sequence [J H] = embed_hardware(I.

Bt.60 PSNR(I/255. Iout(mht:mlt. PSNR = 20 * log10(b/rms) where b is the largest possible value of the signal (typically 255 or 1). nls = (Bs(2)-1)*8 + 1. 60 .nls:nrs). PSNR. mlt = (Bt(1)+Bsize(1)-1)*8.Bs.m function PSNR(A.B) % % % % % % if A == B error('Images are identical: PSNR has infinite value') end max2_A = max(max(A)). nlt = (Bt(2)-1)*8 + 1.double(uint8(I2))/255). CompRatio = mean(sum(J ~= 0))/64 end copyblock. The PSNR is given in decibel units (dB).nlt:nrt) = Iin(mhs:mls. Iout = Iin.Bsize) % calculate coordinates mhs = (Bs(1)-1)*8 + 1. and rms is the root mean square difference between two images. which measure the ratio of the peak signal and the difference between two images. mls = (Bs(1)+Bsize(1)-1)*8.m % this function simulates the cover up attack function Iout = copyblock(Iin. nrs = (Bs(2)+Bsize(2)-1)*8. nrt = (Bt(2)+Bsize(2)-1)*8. mht = (Bt(1)-1)*8 + 1.

decibels = 20*log10(1/(sqrt(mean(mean(err.2f dB'.61 max2_B = max(max(B)). % quantization table Q=[1 1 1 1 1 1 2 1 1 1 1 1 2 3 1 1 1 1 3 3 4 1 1 1 1 4 3 4 1 1 2 2 4 4 5 2 3 3 4 6 5 1 3 3 3 4 6 1 1 3 3 3 3 1 1 1 61 . Compression/Decompression im2dct.2. if max2_A > 1 | max2_B > 1 | min2_A < 0 | min2_B < 0 error('input matrices must have values in the interval [0. min2_A = min(min(A)). end % Compute DCTs of 8x8 blocks and quantize the coefficients. disp(sprintf('PSNR = +%5.decibels)) A.2.nargin)).^2))))). quality=1. if nargin<2.m function y=im2dct(x .1]') end err = A .quality) % im2dct receives an image x and a quality factor quality as inputs % the output is the 8x8 DCT transformed image % the coefficients are reorganized in the zigzag order and quantized % with a pre-set quantization table error(nargchk(1.B. min2_B = min(min(B)).

fun = @dct2./P1)'. %perform 8x8 block dct J = blkproc(J.quality) % dct2im receives the transformed and quantized image J and % performs inverse quantization and transform to output the % reconstructed image error(nargchk(1. quality=1.m function y=dct2im(J .62 5 5 5 5 1 1 1 1] * quality. [8 8].[8 8]. 'distinct'). : ).nargin)).2. order = [1 9 2 3 10 17 25 18 11 4 5 12 19 26 33 41 34 27 20 13 6 7 14 21 28 35 42 49 57 50 43 36 29 22 15 8 16 23 30 37 44 51 58 59 52 45 38 31 24 32 39 46 53 60 61 54 47 40 48 55 62 63 56 64]. %reorder column elements in zigzag format 62 . J = blkproc(x.'round(x. dct2im.fun). y = J. %organize 8x8 blocks in 1x64 columns J = J(order. %quantize the result %reorder the coefficients with the zigzag pattern J = im2col(J.Q). end % quantization table Q=[1 1 1 1 1 1 1 1 1 1 1 2 2 3 5 5 1 1 1 1 3 3 4 5 1 1 1 1 4 3 4 5 1 1 2 2 4 4 5 1 2 3 3 4 6 5 1 1 3 3 3 4 6 1 1 1 3 3 3 3 1 1 1 1] * quality. if nargin<2. [8 8].

1. Embedding embed_hardware.2:end)). 63 . % perform logical and between adjacent blocks Jfun = J.'x.'distinct').3.2:end) = and(J(:. H = H . % find the location of two higest indexes of non-zero cells H = blkproc(Jfun. %de-quantize the % coefficients fun = @idct2.nc) J = im2dct(I.A.60/NumOfC).q.3.1:end-1). I = blkproc(J. A.q).[64 1]. M = zeros(NumOfC.[8 8]. Jfun(:.60/NumOfC).*P1'. J = J(inv_order. fun = @findh. %dct.63 inv_order = [1 3 4 10 11 21 22 36 2 5 9 12 20 23 35 37 6 8 13 19 24 34 38 49 7 14 18 25 33 39 48 50 15 17 26 32 40 47 51 58 16 27 31 41 46 52 57 59 28 30 42 45 53 56 60 63 29 43 44 54 55 61 62 64]. % arrange 64 % columns into 8x8 blocks J = blkproc(J. Neg = J < 0. S = zeros(NumOfC. Jfun(:.[8 8].fun).Q).[256 256]. : ).1) = 1. zigzag NumOfC = nc.m function [y H] = embed_hardware(I. % zig-zag format % reorder column elements back from J = col2im(J.J(:. Algorithm Implementation A. quantization. %inverse transform y = I.1.fun). [8 8].

Sm = 0:NumOfC:60-NumOfC.b). CheckLSB.64 P = zeros(1. end if x == 0 x = 2.1:16).b) + 1.b)).1.1)*32. 64 .m function y=CheckLSB(x) C = bitget(uint16(abs(x)).b). toggle2. for s = 1:NumOfC S(s.b).60).b)) = Jfun(1:H(NumOfC. end % LSB check end % s loop end end y = J.:) = P(Sm+s). P(1:end) = 0. %embedding sequence for m = 2:32 b = m. for n = 2:32 b = m + (n .b) + 1.2) x = x .b) + 1. if CheckLSB(J(H(s. else x = x + 1. P(1:H(NumOfC.2) J(H(s. y = C(1).b) = toggle2(J(H(s.b).b)) ~= mod(sum(S(s.m function j = toggle2(x) if mod(x.*A(1:H(NumOfC.:)).

fun = @findh. H = H .3. Detection Detect_hardware.fun).nc) J = im2dct(I. h = 0:n-1. H = blkproc(Jfun.1. Jfun(:.[64 1]. Jfun(:.m % this function is responsible for finding the two highest indexes of % non-zero cells in the block function h = findh(J) n = 2. count = 1. % set the number of marked coefficients NumOfC = nc. Jfun = J. count = count + 1.q). 65 . findh.1) = 1.H.65 end j = x. h = n .1:end-1).2:end)).q.m function [C LOC] = detect_hardware(I. for i = 1:64 if J(65-i) h(count) = 65-i.h'.2:end) = and(J(:.2.A. if count == (n + 1) return end end end A.J(:.

b). P = zeros(1.b). Sm = 0:NumOfC:60-NumOfC.2). P(1:end) = 0.60). for s = 1:NumOfC S(s.*A(1:H(NumOfC. 66 .b) = CheckLSB(J(H(s.:)).:) = P(Sm+s).b)+1. P(1:H(NumOfC. LOC = sum(HD)>0.b)) = Jfun(1:H(NumOfC.1)*32.b).b).60/NumOfC).60/NumOfC). for m = 2:32 b = m. S = zeros(NumOfC. M = zeros(NumOfC.b)) ~= mod(sum(S(s. end % s loop end end C = 1 .sum(sum(HD))/4096. HD(s.66 HD = zeros(size(H)). for n = 2:32 b = m + (n .

Cset. inout [7:0] CPU_AD. FADC_AOV.fg_dout). ExtCtrPipe. FADC_ADV. input CPU_ALE. FADC_AOE. SADC_BRDY. SRAM_D. CntExt_. TARGET_CLK. SADC_BRD. CPU_RD. SRAM_CS. CPU_A. LVDS_PD. output TARGET_CLK. CPU_WR.v //--------------------------------------------------------------------// Description : This is the top level module for the integrated system //--------------------------------------------------------------------`timescale 1ns / 100ps module littel (CLK. SADC_CLK. FADC_BOE. SADC_APD. SADC_DOUT. ExtCtr. // cpu interface input CPU_RD. RST. TARGET_OUT. input [7:0] CPU_A. SRAM_RD. 67 . FADC_CLK. SHS. RowDecColDec. SRAM_WR. fg_pen. RRst. SADC_ARD. FADC_BDV. SADC_AOTR. SADC_BOTR. Top Level and Peripheral Modules little_top. FADC_BOV. GEN2. ExtBias. fg_len. D. Rset. SRAM_ZZ. CPU_ALE. RST. PixelRst. aps_ext_in. input CPU_WR. SADC_BPD. ROW. PI_EN. fg_fen . // target interfce input [11:0] TARGET_OUT. output SHS. FADC_A. SRAM_A. LED. // debug output [7:0] COL. // globals input CLK. COL. output [15:0] aps_ext_in. GEN1. SDAC_RST. ROW. SADC_ARDY. PI_CLK. PI_IN.67 APPENDIX B: VERILOG CODE B. output LED. CPU_AD.1. SHR.

output ExtCtrPipe. output CntExt_. SADC_ARDY. 68 . output Rset. SADC_APD. inout [17:0] SRAM_D. SRAM_ZZ. SRAM_WR. output [1:0] SRAM_CS.68 output RRst. input FADC_ADV. output FADC_BOE. output ExtBias. output [17:0] SADC_DOUT. SADC_ARD. SADC_BRD. // frame grabber (LVDS camera link interface) output LVDS_PD. FADC_AOV. output SADC_AOTR. // ZBT sram intrerface output [19:0] SRAM_A. output Cset. output RowDecColDec. input [12:1] FADC_A. // slow AD convertor output SADC_CLK. // waveform generator output [15:0] GEN1. output SADC_BOTR. output SRAM_RD. // fast AD convertor output FADC_CLK.FADC_BOV. output ExtCtr. SADC_BPD. output FADC_AOE. output SHR. SADC_BRDY. GEN2. output SDAC_RST. output PixelRst. input FADC_BDV. output [38:23] D.

wire APS_TARGET_CLK. wire [20:0] cpu_add. /////////////////////////////////////begin////////////////////////////// //////// led led_inst 69 . output fg_len. output [11:0] fg_dout. output fg_fen. wire [7:0] aps_controls. wire cpu_zbt_we_n. wire [7:0] wm_fra_out. wire [1:0] select_zbt_user. //top level buses wire cpu_zbt_rd_n. wire aps_len. wire clk_pulse. wire [7:0] aps_clk_counter. wire aps_fen. input [14:0] PI_IN.69 output fg_pen. wire [7:0] aps_time_ref. wire [15:0] CPU_GEN1. input PI_CLK. wire [7:0] aps_fra_dat. wire pic_module_active. wire zbt_dout_v_n. wire [17:0] cpu_din. // PI interface output PI_EN. wire [17:0] zbt_dout. wire [11:0] user_data_fg. wire aps_pen. wire [11:0] aps_dout.

.rst(RST).clk(CLK).sram1_ce1_n(SRAM_CS[0]).cpu_din(cpu_din).70 ( . . . .cpu_rd_n(cpu_zbt_rd_n). .CLK(CLK).select_zbt_user(select_zbt_user). . . . . zbt_mux U2 ( .sram_zz(SRAM_ZZ).sram2_ce1_n(SRAM_CS[1]).cpu_add(cpu_add). .rst(RST). . . .sram_data(SRAM_D).sram_wr_n(SRAM_WR). .sram_add(SRAM_A).CLK(CLK). . . 70 .cpu_add(cpu_add).ale_n(CPU_ALE).pic_module_active(pic_module_active). . .aps_time_ref(aps_time_ref).gen2value().zbt_dout(zbt_dout).zbt_dout_v_n(zbt_dout_v_n) ). cpu_int cpu_int ( .gen1value(CPU_GEN1). .cpu_we_n(cpu_zbt_we_n). .sram_oe_n(SRAM_RD).led_out(LED) ). .aps_controls(aps_controls). . . .

fg_pen(aps_pen). . .RRst(RRst).port0(CPU_AD).rst(RST). . . .cpu_zbt_rd_n(cpu_zbt_rd_n). .fg_len(aps_len). .select_zbt_user(select_zbt_user). .zbt_dout_v_n(zbt_dout_v_n).SHR(SHR).pic_module_active(pic_module_active).Rset(Rset).cpu_zbt_we_n(cpu_zbt_we_n). . . .zbt_dout(zbt_dout). . .CLK(CLK). .aps_ext_in(aps_ext_in). .zbt_dout_v_n(zbt_dout_v_n).cpu_din(cpu_din).cpu_wr_n(CPU_WR). . . .user_data_fg(user_data_fg) ). .Cset(Cset). . . 71 .zbt_dout(zbt_dout).rst(RST). aps_int aps_int ( .71 .aps_controls(aps_controls).CntExt_(CntExt_).RowDecColDec(RowDecColDec). . . . .ExtCtr(ExtCtr).port2(CPU_A). .ExtCtrPipe(ExtCtrPipe).cpu_rd_n(CPU_RD). .target_out(TARGET_OUT).ExtBias(ExtBias). .PixelRst(PixelRst). . . .SHS(SHS).

aps_clk(APS_TARGET_CLK).user_data_fg(user_data_fg) ). . .aps_clk_counter(aps_clk_counter).clk(CLK). . . . . .fg_dout(aps_dout).time_ref(aps_time_ref). .aps_dout(aps_dout).target_clk(APS_TARGET_CLK). assign GEN2 = 16'h7FFF. assign GEN1 = 16'h7FFF.rst(!RST). . .72 .aps_clk_pulse(aps_clk_pulse). .aps_clk_counter(aps_clk_counter). assign LVDS_PD = 1'b1.fg_pen(fg_pen). assign FADC_AOE = 1'b0.aps_clk_pulse(aps_clk_pulse).fg_fen(fg_fen). . assign TARGET_CLK = aps_controls[7] ? aps_controls[2] : APS_TARGET_CLK. .adc_dv(FADC_ADV). .fg_fen(aps_fen). .aps_len(aps_len).fg_len(fg_len).fg_dout(fg_dout) ). . wm_int wm_int ( .aps_pen(aps_pen). endmodule 72 . . . .aps_fen(aps_fen).

cpu_din .rst .CLK . input [7:0] port2 . 73 . input cpu_wr_n .v // Title : cpu_int // Description : This is the cpu interface for the TestBoard Altera // chip.v" input rst .cpu_zbt_rd_n . output cpu_zbt_we_n .73 cpu_int. input [17:0] zbt_dout .pic_module_active . output [17:0] cpu_din .user_data_fg . // gen interface output [15:0] gen1value. // cpu bus input cpu_rd_n .port2 . // zbt interface output [1:0] select_zbt_user . inout [7:0] port0 .port0 . output [20:0] cpu_add .ale_n . input CLK .aps_controls .cpu_wr_n .cpu_zbt_we_n .cpu_rd_n . input zbt_dout_v_n . `include "chip_def. gen2value.select_zbt_user . output cpu_zbt_rd_n .gen1value . input ale_n .cpu_add . It is responsible for communications with an external PC // It contains registers which can be written by the onboard CPU `timescale 1ns / 100ps module cpu_int ( zbt_dout_v_n .zbt_dout .gen2value).aps_time_ref .

Aps_controls. reg [11:0] user_data_fg. // simple async //latch for address always @(posedge CLK) ADD = ADD_l[4:0]. reg cpu_we_d. cpu_zbt_rd_n . output [7:0] aps_time_ref. reg [15:0] Gen1_reg. always @(negedge ale_n) ADD_l = {port2. Gen2_reg. wire altera_busy. reg [7:0] Aps_time_ref. reg [4:0] ADD. reg [2:0] Mux_control. Dat.74 //aps_int output pic_module_active.port0}. ///////////////////////////////////////////////////////////////// // Chip registers ///////////////////////////////////////////////////////////////// always @(posedge CLK or negedge rst) begin if (~rst) begin data_in_s <= 0. reg inst_go. cpu_we_dd <= 1. cpu_we_d <= 1. cpu_we_dd. cpu_we_ddd. data_out <= 0. output [11:0] user_data_fg. reg [31:0] Add. // regs reg [15:0] ADD_l. reg cpu_zbt_we_n. data_out. // sample latch 74 . output [7:0] aps_controls. reg [7:0] data_in_s. reg [3:0] Cmd.

Gen2_reg <= 0. Mux_control <= 0. user_data_fg <=0. end 75 . Dat[31:18] <= 0. cpu_we_ddd <= cpu_we_dd. inst_go <= 0. cpu_we_dd <= cpu_we_d. inst_go <= 1.75 cpu_we_ddd <= 1. Cmd <= 0. if (~zbt_dout_v_n) begin Dat[17:0] <= zbt_dout. end // write if ((cpu_we_ddd == 1) && (cpu_we_dd == 0)) // transition to low case (ADD) CMD_REG_ADD : begin Cmd <= data_in_s[3:0]. Gen1_reg <= 0. Add <= 0. Dat <= 0. Aps_controls <= 0. // sampled signals data_in_s <= port0. Aps_time_ref <= 0. cpu_we_d <= cpu_wr_n. end else begin // defaults inst_go <= 0.

ADD3_REG_ADD : data_out <=Add[31:24]. ADD1_REG_ADD : data_out <=Add[15:8]. MUXC_REG_ADD : data_out <={5'h0.Cmd}. ADD2_REG_ADD : data_out <=Add[23:16]. <= data_in_s. but by // the time the cpu will get to read data_out it sould be stable case (ADD) TEST_REG_ADD : data_out <= 8'h5a. <= data_in_s. <= data_in_s. DAT1_REG_ADD : data_out <=Dat[15:8]. ALTERA_BUSY_ADD : data_out <={7'h0. 76 . default endcase end end : data_out <= 0.76 ADD3_REG_ADD : Add[31:24] ADD2_REG_ADD : Add[23:16] ADD1_REG_ADD : Add[15:8] ADD0_REG_ADD : Add[7:0] DAT3_REG_ADD : Dat[31:24] DAT2_REG_ADD : Dat[23:16] DAT1_REG_ADD : Dat[15:8] DAT0_REG_ADD : Dat[7:0] MUXC_REG_ADD : Mux_control APS_RATE_ADD : Aps_time_ref APS_CONTROLS_ADD : Aps_controls endcase <= data_in_s. <= data_in_s. APS_CONTROLS_ADD : data_out <=Aps_controls.Mux_control}. // RO CMD_REG_ADD : data_out <= {4'b0.altera_busy}. DAT0_REG_ADD : data_out <=Dat[7:0]. ADD0_REG_ADD : data_out <=Add[7:0]. // read mux // note : ADD is a synchronous signal sampling an async latch. <= data_in_s. APS_RATE_ADD : data_out <=Aps_time_ref. DAT3_REG_ADD : data_out <=Dat[31:24]. <= data_in_s[2:0]. <= data_in_s. DAT2_REG_ADD : data_out <=Dat[23:16]. <= data_in_s. <= data_in_s. <= data_in_s.

if (inst_go) inst_running_n <= 0. pic_module_active <= 0. // NOTE : TBUF is open in every read cycle. assign aps_controls = Aps_controls. assign port0 = (~cpu_rd_n) ? data_out : 8'hZZ. reg pic_module_active. //////////////////////////////////////////////////////////////// // Command State Machine ///////////////////////////////////////////////////////////////// reg inst_running_n. // connect reg outputs assign select_zbt_user = Mux_control[1:0]. cpu_zbt_rd_n <= 1. // if active if (~inst_running_n) case (Cmd[3:0]) 77 . always @(posedge CLK or negedge rst) begin if (~rst) begin inst_running_n <= 1. end else begin // signals default values cpu_zbt_we_n <= 1. cpu_zbt_we_n <= 1. stage <= 0. reg [1:0] stage. cpu_zbt_rd_n <= 1. assign aps_time_ref = Aps_time_ref.77 // CPU output tbuf.

end CMD_PIC_STOP : begin pic_module_active <= 0. end end endcase CMD_SRAM_WRITE_REG : // ZBT write begin cpu_zbt_we_n <= 0. 78 . stage <= 1. end CMD_PIC_GO : begin pic_module_active <= 1. end 1 : begin if (zbt_dout_v_n==0) begin inst_running_n <= 1. inst_running_n <= 1. inst_running_n <= 1. stage <= 0.78 CMD_SRAM_READ_REG : // ZBT read case (stage) 0: begin cpu_zbt_rd_n <= 0. end default : // return to inactive state inst_running_n <= 1. inst_running_n <= 1.

5'h4 . 5'ha . endmodule chip_def. 5'h8 . assign cpu_din = Dat[17:0]. 5'h5 .79 endcase end end // ZBT assign cpu_add = Add[20:0]. = 4'h9 . = 4'hb . 5'h9 . 5'h16. 5'h15. 5'h6 . 5'h1 . 5'h7 . 79 . 5'h3 . = 4'hc . assign altera_busy = ~inst_running_n. parameter APS_CONTROLS_ADD = parameter ALTERA_BUSY_ADD = // command coding parameter CMD_SRAM_READ_REG parameter CMD_SRAM_WRITE_REG parameter CMD_PIC_GO parameter CMD_PIC_STOP = 4'h8 .v // altera def file // reg address parameter TEST_REG_ADD parameter CMD_REG_ADD parameter MUXC_REG_ADD parameter ADD3_REG_ADD parameter ADD2_REG_ADD parameter ADD1_REG_ADD parameter ADD0_REG_ADD parameter DAT3_REG_ADD parameter DAT2_REG_ADD parameter DAT1_REG_ADD parameter DAT0_REG_ADD parameter APS_RATE_ADD = = = = = = = = = = = = 5'h0 . 5'h2 . 5'h14.

time_ref .Cset .target_clk . adc_dv). SHS. output ExtCtrPipe. CMOS Imager Control Logic and Interface aps_int.aps_clk_counter .v // Description: This module functions as a top level for the imager // control logic `timescale 1ns / 100ps module aps_int (CLK. aps_ext_in. zbt_dout_v_n. //zbt_mux interface input [17:0] zbt_dout. output target_clk. SHR. fg_len. output SHS. pic_module_active.PixelRst . output [7:0] aps_clk_counter. input rst. RowDecColDec. input zbt_dout_v_n. B.aps_controls. fg_pen. target_out. input pic_module_active.aps_clk_pulse . //globlas input CLK. rst. CntExt_.ExtBias . Rset. fg_fen. // cpu_int interface input [7:0] aps_controls.2.ExtCtrPipe . fg_dout. 80 .RRst . output [15:0] aps_ext_in.80 // SRAM SOURCE MUX states parameter SRAM_MUX_CPU = 2'b00 . input [7:0] time_ref. output aps_clk_pulse. zbt_dout. //aps interface input [11:0] target_out.ExtCtr . user_data_fg. output RRst.

wire APS_SHS. // frame grabber output fg_pen. wire target_clk. . . wire [7:0] aps_clk_counter. output PixelRst. wire aps_clk_pulse. output CntExt_. wire APS_RowDecColDec.81 output SHR.time_ref(time_ref).clk_pulse(aps_clk_pulse).target_clk(target_clk). wire APS_ExtCtrPipe. output ExtCtr. clk_gen U1 ( . output Rset. wire [11:0] fg_dout_board. //fadc int input adc_dv. output Cset. wire APS_ExtCtr_in. output ExtBias. .clk_counter(aps_clk_counter) 81 . wire [15:0] cdsIn_level. wire APS_RRst. output RowDecColDec. output [11:0] fg_dout. . .CLK(CLK).rst(rst). output fg_len. wire APS_ExtCtr_out. output fg_fen.

. assign CntExt_ assign Cset assign ExtBias assign Rset assign ExtCtrPipe assign RowDecColDec assign SHS assign RRst assign ExtCtr = ~aps_controls[0]. pic_extract U4 ( . .aps_ext_in(aps_ext_in). . . endmodule 82 . = APS_ExtCtr_out. .aps_rst_cnt(APS_RRst).aps_clk_counter(aps_clk_counter) ).fg_fen(fg_fen). .clk(CLK).aps_clk(target_clk). . = aps_controls[3].module_active(pic_module_active).ext_in_en(CntExt_).rst(rst).ext_ctr_en(APS_ExtCtr_in). .aps_rst(PixelRst). = ExtCtr ? 1'b1 : APS_RowDecColDec.fg_pen(fg_pen). = APS_RRst.aps_out(target_out).aps_rowdeccoldec(APS_RowDecColDec). .fg_len(fg_len). .82 ). . . assign APS_ExtCtrPipe = 0.fg_dout(fg_dout). .aps_shs(APS_SHS).aps_ext_ctr(APS_ExtCtr_out). = aps_controls[6] ? 1'b1 : APS_ExtCtrPipe. = APS_SHS. assign APS_ExtCtr_in = aps_controls[4]. = aps_controls[5]. .aps_clk_pulse(aps_clk_pulse). . = aps_controls[1].aps_shr(SHR). . . . .

fg_len. // pixelclk to output lvds 83 . fg_fen. // Output lines to the sensor output aps_shs. aps_clk. aps_shr. /* input input clk. rst. output aps_rst_cnt. input aps_clk_pulse. output aps_shr. output aps_rst. rst. aps_clk_counter. The second structure handles reception of pixel data from // the imager and generation of synchronization signals to an external // frame grabber `timescale 1 ns / 100 ps module pic_extract (clk. aps_rst. input aps_clk. aps_rst_cnt. ext_in_en. input ext_in_en. input ext_ctr_en. aps_out).v // this is the core for control signal generation for the CMOS image // sensor.col enable // couter output // enables ext in output // APS Counter Reset (Active High) // enables cdsin control // frame grabber output fg_pen. aps_ext_in.83 pic_extract. fg_dout. input [11:0] aps_out. // running while '1' // sync to APS clk input [7:0] aps_clk_counter. aps_ext_ctr. aps_clk_pulse. ext_ctr_en. output [15:0] aps_ext_in. // shs signal to the sensor (Active High) // reset signal to the sensor (Active High) // shr signal to the sensor (Active High) // row . aps_shs. fg_pen. output aps_rowdeccoldec. aps_rowdeccoldec. module_active. Input/Output Declerations // main clock // main reset */ // controls input module_active. output aps_ext_ctr.

84 . aps_shr. reg [11:0] fg_dout. aps_rowdeccoldec. reg [9:0] delay_cnt. wire fg_pen. = 4'h9. fg_fen. output fg_fen. reg [11:0] din_d. = 4'h2. aps_rst. = 4'h1. reg first_run. reg fg_len_int. output [11:0] fg_dout. reg [3:0] ext_ctr_cnt. // internal wires wire aps_clk. reg aps_clk_d. = 4'h5. = 4'h4. = 4'h8. reg [7:0] col_cnt. aps_rst_cnt. // intrenal regs reg [3:0] control_state. = 4'h7. // lineclk to output lvds // frameclk to output lvds // fg data lines //*************************** Wires and registers ******************// // output regs reg aps_shs. //*************************** constants def ****************// parameter STATE_IDLE parameter STATE_IDLE_TO_SHS parameter STATE_SHS_HIGH parameter STATE_SHS_TO_RST parameter STATE_RST_HIGH parameter STATE_RST_TO_SHR parameter STATE_SHR_HIGH parameter STATE_SHR_TO_READ parameter STATE_READ parameter STATE_WAIT = 4'h0. = 4'h6. aquire. reg [1:0] wait_cnt. reg aps_ext_ctr. fg_fen_int. = 4'h3.84 output fg_len. reg fg_len. reg [7:0] row_cnt.

// main_clock aps_rst_cnt <= 0. aps_rst_cnt <= 0. aps_rowdeccoldec <= 0. <= 1'b0. ext_ctr_cnt <= 0. //200ns =7. reg [PIPE_DELAY:0] fg_len_d. fg_len_int <= 0. //200ns = 39. col_cnt <= 0.5us = 7. = 7.85 // Note : 0 is one clk delay clk is 40MHz //(that is 25ns for etch clock cycle) // (add 1 ) parameter IDLE_TO_SHS_TIME parameter SHS_HIGH_TIME parameter SHS_TO_RST_TIME parameter RST_HIGH_TIME parameter RST_TO_SHR_TIME parameter SHR_HIGH_TIME parameter SHR_TO_READ_TIME parameter PIPE_DELAY reg [PIPE_DELAY:0] fg_fen_d. fg_fen_int <= 0. aps_ext_ctr <= 0. //0. <= 1'b0. wait_cnt <= 0. row_cnt <= 0. //200ns = 39. //200ns = 19. //1us = 7. always @(posedge clk or negedge rst) begin if (~rst) begin delay_cnt <= 0. aps_shs aps_shr aps_rst <= 1'b0. end else begin // rising_edge(clk) 85 . first_run <= 1. //1us = 7. control_state <= STATE_IDLE.

if (first_run) aps_rst_cnt <= 1. <= 0. aps_rst_cnt <= 0. end STATE_IDLE_TO_SHS : begin aps_rowdeccoldec <= 1. // changed to provide samller ROI row_cnt <= 0.86 case (control_state) STATE_IDLE : begin // reset aps_shs aps_shr aps_rst <= 1'b0. col_cnt <= 0. delay_cnt <= 0. <= 1'b0. end end STATE_SHS_HIGH : begin aps_shs <= 1'b1. end else begin delay_cnt <= delay_cnt + 1'b1. // changed to provide samller ROI delay_cnt <= 0. if (delay_cnt == IDLE_TO_SHS_TIME) begin control_state <= STATE_SHS_HIGH. first_run <= 1. fg_len_int fg_fen_int wait_cnt <= 0. 86 . <= 1'b0. <= 0. aps_rowdeccoldec <= 0. if (module_active) control_state <= STATE_IDLE_TO_SHS. if (delay_cnt == SHS_HIGH_TIME) begin control_state <= STATE_SHS_TO_RST.

end else begin delay_cnt <= delay_cnt + 1'b1. if (delay_cnt == RST_TO_SHR_TIME) begin control_state <= STATE_SHR_HIGH. if (delay_cnt == SHS_TO_RST_TIME) begin control_state <= STATE_RST_HIGH. fg_fen_int <= 1. end end STATE_RST_HIGH : begin aps_rst <= 1'b1. delay_cnt <= 0.87 delay_cnt <= 0. end else begin delay_cnt <= delay_cnt + 1'b1. if (delay_cnt == RST_HIGH_TIME) begin control_state <= STATE_RST_TO_SHR. delay_cnt <= 0. end end STATE_RST_TO_SHR : begin aps_rst <= 1'b0. delay_cnt <= 0. end end 87 . end end STATE_SHS_TO_RST : begin aps_shs <= 1'b0. end else begin delay_cnt <= delay_cnt + 1'b1. end else begin delay_cnt <= delay_cnt + 1'b1. aps_rst_cnt <= 0.

aps_rowdeccoldec <= 0. delay_cnt <= 0. ext_ctr_cnt <= ext_ctr_cnt + 1'b1. if (delay_cnt == SHR_HIGH_TIME) begin control_state <= STATE_SHR_TO_READ. fg_len_int <= 1. if (delay_cnt == 3) col_cnt <= col_cnt + 1'b1. end else begin delay_cnt <= delay_cnt + 1'b1. if (aps_clk_pulse) begin // clk en delay_cnt <= 0. row_cnt <= 0. if (row_cnt == 255) begin fg_fen_int <= 1'b0.88 STATE_SHR_HIGH : begin aps_shr <= 1'b1. if (col_cnt == 255) begin col_cnt <= col_cnt + 1'b1. delay_cnt <= 4. control_state <= STATE_WAIT. if (delay_cnt == SHR_TO_READ_TIME) begin control_state <= STATE_READ. end end STATE_READ : begin delay_cnt <= delay_cnt + 1'b1. end else begin delay_cnt <= delay_cnt + 1'b1. 88 . end end STATE_SHR_TO_READ : begin aps_shr <= 1'b0.

89 end else begin fg_fen_int <= fg_fen_int. end default : control_state <= STATE_IDLE. ext_ctr_cnt <= 4'b0. control_state <= STATE_IDLE_TO_SHS. if (ext_ctr_cnt == 2) aps_ext_ctr <= 1'b1. control_state <= STATE_IDLE_TO_SHS. row_cnt <= row_cnt + 1'b1. if (ext_ctr_cnt == 8) begin aps_ext_ctr <= 1'b0. end end if (ext_ctr_cnt > 0) ext_ctr_cnt <= ext_ctr_cnt + 1'b1. endcase // case // stop override <= 0. end STATE_WAIT : begin if (&wait_cnt) begin fg_fen_int <= 0. 89 . end fg_len_int first_run <= 0. end if (~ext_ctr_en) aps_ext_ctr <= 1'b0. end wait_cnt <= wait_cnt + 1'b1.

fg_dout <= din_d. fg_len_d <= fg_len_d. <= fg_len. // main_clock 90 . fg_len <= 0.90 if (~module_active) control_state <= STATE_IDLE. fg_fen_d <= {fg_fen_d[PIPE_DELAY . fg_fen_d <= 0. fg_fen_d <= fg_fen_d. fg_dout <= 0. <= fg_fen_d[PIPE_DELAY]. fg_fen_int}. fg_fen fg_len end end //note that fg_pen has an offset of ~2ns from the aps_clk <= fg_fen. fg_fen fg_len end else begin din_d <= din_d.1 : 0]. end else if ((aps_clk_counter == 8'h9) & ~aps_clk) begin din_d <= aps_out. fg_dout <= fg_dout. din_d <= 0. fg_fen <= 0. <= fg_len_d[PIPE_DELAY]. fg_len_d <= {fg_len_d[PIPE_DELAY . fg_len_int}. end // Else end // Always //Output delay and sync lines to the FG always @(posedge clk or negedge rst) begin if (~rst) begin fg_len_d <= 0.1 : 0].

clk_pulse <= 0.91 assign fg_pen = fg_len & aps_clk_pulse. col_cnt}.rst .clk_pulse. output [7:0] clk_counter.target_clk . output clk_pulse. reg [7:0] clk_counter. clk_counter <= 0. input rst. //outputs output target_clk. reg target_clk. reg clk_pulse. end else 91 . assign aps_ext_in = ext_in_en ? 16'h0 : {row_cnt. //cpu bus input [7:0] time_ref. endmodule clk_gen.v //generates target clock `timescale 1ps / 1ps module clk_gen (CLK . //globals input CLK. always @(posedge CLK or negedge rst) begin if (~rst) begin target_clk <= 0.time_ref . clk_counter).

fg_len. clk_counter <= 0. JPEG Encoding and Watermark Embedding `resetall `timescale 1ns / 100ps // this module is responsible for interfacing the encoder\embedder and // the imager controller // there are two pixel data buffer to allow reordering from row scan to // 8x8 blocks and vice versa. aps_clk_counter. aps_clk. aps_pen. input clk. aps_len. fg_dout. aps_clk_pulse. aps_fen. end else target_clk <= target_clk. aps_dout. fg_fen.3. target_clk <= ~target_clk. input aps_pen. input aps_clk. The module includes state machines that // synchronize the imager data output with the encoder\embedder modules module wm_int (clk. end end endmodule B. 92 .92 begin //defaults clk_counter <= clk_counter + 1'b1. if (clk_counter == time_ref) begin if (~target_clk) clk_pulse <= 1'b1. fg_pen. input rst. input [7:0] aps_clk_counter. input aps_clk_pulse. wm_ena). clk_pulse <= 0. rst.

//write enable sync state machine reg [1:0] state. input aps_fen. in_pic_add_rd_cnt. output [11:0] fg_dout. always @(posedge clk or posedge rst) if (rst) state <= IDLE. parameter READ = 2'b01. //regs and wires reg [11:0] in_pic_add_wr_cnt. input wm_ena.93 input aps_len. next_state. output fg_pen. in_pic_add_rd. //next state logic always @(state or in_pic_add_wr_cnt or in_pic_add_rd_cnt) begin case (state) IDLE : 93 . input [11:0] aps_dout. else state <= state. output fg_len. else if ((aps_clk_counter == 8'h9) & ~aps_clk) state <= next_state. //state parameters parameter IDLE = 2'b00. wire [7:0] in_pic_dat_wr. wire in_pic_wren = aps_len. wire [11:0] in_pic_add_wr. in_pic_dat_rd. output fg_fen.

else next_state = READ.94 if (&in_pic_add_wr_cnt[10:0]) next_state = READ. else next_state = IDLE. else in_pic_rden_d <= in_pic_rden. endcase end wire in_pic_rden. READ : if (&in_pic_add_rd_cnt[10:0]) next_state = IDLE. default: next_state = IDLE. else if ((aps_clk_counter == 8'h9) & ~aps_clk) begin if (aps_len) 94 . reg in_pic_rden_d. always @ (posedge clk or posedge rst) if (rst) in_pic_add_wr_cnt <= 12'h0. always @ (posedge clk or posedge rst) if (rst) in_pic_rden_d <= 0. //output logic assign in_pic_rden = (state == READ).

else if ((aps_clk_counter == 8'h9) & ~aps_clk) out_state <= out_next_state. assign in_pic_dat_wr = aps_dout[3] ? aps_dout[11:4] + 1 : aps_dout[11:4]. out_next_state.in_pic_add_rd_cnt[10:6]. end assign in_pic_add_wr = in_pic_add_wr_cnt. //state machine to control output memory rd/wr reg [1:0] out_state.in _pic_add_rd_cnt[2:0]}. end //address manipulation to account for the 8x8 block readout order assign in_pic_add_rd = {in_pic_add_rd_cnt[11]. else in_pic_add_rd_cnt <= in_pic_add_rd_cnt. always @(posedge clk or posedge rst) if (rst) out_state <= IDLE. always @ (posedge clk or posedge rst) if (rst) in_pic_add_rd_cnt <= 12'h0. else if (&in_pic_add_rd_cnt[10:0]) in_pic_add_rd_cnt <= in_pic_add_rd_cnt + 1.in_pic_add_rd_cnt[5:3]. else if ((aps_clk_counter == 8'h9) & ~aps_clk) begin if (in_pic_rden) in_pic_add_rd_cnt <= in_pic_add_rd_cnt + 1. 95 . else out_state <= out_state. else in_pic_add_wr_cnt <= in_pic_add_wr_cnt.95 in_pic_add_wr_cnt <= in_pic_add_wr_cnt + 1.

96 reg [11:0] out_pic_add_wr_cnt. reg [15:0] out_pic_add_rd_cnt. //next state logic always @(out_state or out_pic_add_wr_cnt or out_pic_add_rd_cnt) begin case (out_state) IDLE : if (&out_pic_add_wr_cnt[10:0]) out_next_state = READ. else out_next_state = READ. endcase end wire out_pic_rden. default: out_next_state = IDLE. else out_next_state = IDLE. out_pic_add_rd. always @ (posedge clk or posedge rst) if (rst) out_pic_add_wr_cnt <= 12'h0. wire out_pic_wren. READ : if (&out_pic_add_rd_cnt[10:0]) out_next_state = IDLE. wire [7:0] out_pic_dat_wr. wire [11:0] out_pic_add_wr. //output logic assign out_pic_rden = (out_state == READ). else if ((aps_clk_counter == 8'h9) & ~aps_clk) 96 .

else if ((aps_clk_counter == 8'h8) & ~aps_clk) begin if (out_pic_rden) out_pic_add_rd_cnt <= out_pic_add_rd_cnt + 1. end assign out_pic_add_rd = out_pic_add_rd_cnt[11:0]. fg_fen <= 1'b0.out_pic_add_wr_cnt[2:0]}. always @ (posedge clk or posedge rst) if (rst) out_pic_add_rd_cnt <= 16'h0. always @ (posedge clk or posedge rst) begin if (rst) begin fg_len <= 1'b0. wire [11:0] fg_dout. reg fg_len. fg_fen. else if (&out_pic_add_rd_cnt[10:0]) out_pic_add_rd_cnt <= out_pic_add_rd_cnt + 1. else out_pic_add_rd_cnt <= out_pic_add_rd_cnt. wire fg_pen. end assign out_pic_add_wr = {out_pic_add_wr_cnt[11].out_pic_add_wr_cnt[5:3]. end else 97 . else out_pic_add_wr_cnt <= out_pic_add_wr_cnt.97 begin if (out_pic_wren) out_pic_add_wr_cnt <= out_pic_add_wr_cnt + 1.out_pic_add_wr_cnt[10:6] .

end end wire [7:0] out_pic_dat_rd.address_a(in_pic_add_wr). else if (&out_pic_add_wr_cnt[10:0] & (aps_clk_counter == 8'h8) & (!aps_clk)) fg_len <= 1'b1. else fg_len <= fg_len. pixel_mem_buffer in_mem_buffer ( .98 begin if ((&out_pic_add_rd_cnt[7:0]) & aps_clk) fg_len <= 1'b0. . . assign fg_pen = aps_clk_pulse & fg_len. . else fg_fen <= fg_fen. wire wm_ena. 98 .clock(clk).out_pic_dat_rd}.data_b(). assign fg_dout = {4'h0. else if (&out_pic_add_wr_cnt[10:0] & (aps_clk_counter == 8'h8)) fg_fen <= 1'b1. else if (out_pic_rden) fg_fen <= 1'b1. else if (out_pic_rden) fg_len <= 1'b1. . if (&out_pic_add_rd_cnt & aps_clk) fg_fen <= 1'b0.address_b(in_pic_add_rd).data_a(in_pic_dat_wr).

Watermark Embedding WM_plus_RNG_top. .clock(clk).wren_b(1'b0).clk(aps_clk).address_a(out_pic_add_wr).q_a(). . . . .dout(out_pic_dat_wr).rst(rst). .data_b().q_b(out_pic_dat_rd)).wren_b(1'b0).wm_ena(wm_ena) ).douten(out_pic_wren). . .q_a(). .v // This module is the top level that connects the watermark embedding // module with the watermark generator (RNG) module `resetall `timescale 1ns / 100ps 99 . . . .ena(in_pic_rden_d). .din(in_pic_dat_rd).data_a(out_pic_dat_wr).1. .99 .wren_a(in_pic_wren). encoder encoder_embedder_decoder ( .address_b(out_pic_add_rd). . .3.q_b(in_pic_dat_rd)). . endmodule B.wren_a(out_pic_wren). pixel_mem_buffer out_mem_buffer ( .

.CKEY_VAL(CKEY_VAL)) WM_RNG ( 100 . reg [5:0] cntr64. wire WM_data_in. //watermarked DCT data output douten. rst.N(N). parameter MKEY_VAL = 3. parameter CKEY_VAL = 4.serial_data_out(serial_data_out). . //unmarked DCT data from Zigzag buffer output [COEFF_SIZE-1:0] WM_out. serial_data_out. ena. WM_out. reg douten_reg.clk(clk). input ena. reg shift. . parameter N = 22. input clk.rst(rst).WM_out(WM_out) ). wire douten. douten).shift(shift). .WM_data_in(WM_data_in). . parameter COEFF_SIZE = 12. wire s. ffcsr22 #(. reg ddata_valid. . WM_top #(COEFF_SIZE) WM_embedder ( . input [COEFF_SIZE -1:0] serial_data_out.MKEY_VAL(MKEY_VAL). input rst. .100 module top (clk.

end else cntr64 <= #1 cntr64.s(s) ). end always @ (posedge clk or posedge rst) if (rst) shift <= 1'b0.shift(shift). . . always @ (posedge clk or posedge rst) if (rst) douten_reg <= 0.101 . else shift <= ena. else douten_reg <= (&cntr64). . else cntr64 <= #1 cntr64. //control and sync signals always @(posedge clk or posedge rst) begin if (rst) cntr64 <= #1 0. assign douten = douten_reg & shift.rst(rst).clk(clk). else if (shift) begin if (cntr64 < 6'h3f) cntr64 <= #1 cntr64 + 1'b1. assign WM_data_in = s. 101 .

clock(clk). shift. . input WM_data_in.102 endmodule WM_top. . . wire p_i. WM_out). input shift. wire shift. wire [COEFF_SIZE-1:0] d_out. // Modules instantiation // DCT data buffer ram_sr B_P_reg ( .shiftout(d_out). input clk. . wire [COEFF_SIZE-1:0] WM_out. B_P_out. wire [COEFF_SIZE-1:0] d_in. rst.taps() ). parameter COEFF_SIZE = 12. input [COEFF_SIZE -1:0] serial_data_out. wire pointer_full. input rst. WM_data_in.v `resetall `timescale 1ns/10ps // this module is the top level for the watermark embedder and connects // the DCT data buffer and the watermarking logic module WM_top (clk. output [COEFF_SIZE-1:0] WM_out. // watermarking logic 102 .clken(shift).shiftin(d_in). serial_data_out.

and the 'next' regs are reset 2. .v /* This module does two simultaneous assignments 1. .WM_out(WM_out) ).pointer_full(pointer_full). the values from the 'next' registers are copied to the current registers. The computation simply stores the XOR between the current value in next_p_reg[n] and the output of p_i.p_i(p_i). endmodule WM_point_logic_ver2. The results are stored in the next_pointer register. assign B_P_out = d_out. . . The 'current' regs are used to embed the WM in the selected coeff's. when all coefficients from DCT block are rcvd. assign p_i = (|(d_in))&(|(B_P_out))&(pointer_full ? WM_data_in : 1'b1). //assignments // d_in is the i-th coefficient from the current block out of the DCT // block // B_P_out is the i-th coefficient stored in B_P_reg from the previous // block assign d_in = serial_data_out. . Finally. That way. Identifying the first (representing highest spatial frequency) N cells (after anding 2 neighbors) that are non-zero.103 WM_point_logic #(COEFF_SIZE) WM_point_logic1 ( . The pointer reg points at the indexes where the embedding takes place. the module starts to calculate the value to be embedded in the corresponding coefficients.rst(rst).clk(clk).shift(shift).B_P_out(B_P_out). for each block there are two phases: 103 . and the p_reg reg holds the value to embed. . When all N cells of next_pointer are full.

reg pointer_cnt. shift. one coefficient is output from the DCT block and stored in the reg_stack. while the coef of the same index from the previous block is being output from the stack. pointer_full. WM_out. p_i. wire [COEFF_SIZE-1:0] WM_out. // Internal Declarations input input input input input output [COEFF_SIZE-1:0] B_P_out. wire [5:0] inc. 104 . rst. p_i. During that stage. pointer_full.104 Each clock cycle. output [COEFF_SIZE. both coeff's are used to calculate the value of p_i and the coefficient of the previous block is embedded with the WM and sent out as secured image data */ `resetall `timescale 1ns/10ps module WM_point_logic( clk. wire p_i. B_P_out. clk. rst. WM_out ). shift.1 :0] wire [COEFF_SIZE-1:0] B_P_out. parameter COEFF_SIZE = 12.

//This shift register has async rst and syncronous p_rst //It shifts only when the conditions for shift are met: //The register is still not full. else if (shift_condition) begin for (j = 0.105 // pointer reg declarations and assignments parameter N = 2. end else for (j = 0. wire shift_condition = (p_i && !pointer_full && shift). //mux for shift enabled SR reg [5:0] pointer embed reg [5:0] next_pointer [N-1:0]. next_p_reg.j < N. //N is the number of coefficients to [N-1:0]. end 105 .j < N-1. integer j. reg p_rst. else if (p_rst) for (j = 0. next_pointer[N-1] <= #1 inc.j = j + 1) next_pointer[j] <= #1 next_pointer[j]. the new input is non-zero and a data //valid signal is on (shift) always @(posedge clk or posedge rst) begin : synchronous_sr if (rst) for (j = 0.j < N. // this register stores the LSB values // to embed wire pointer_full = !(&next_pointer[0]).j = j + 1) next_pointer[j] <= #1 next_pointer[j+1].j = j + 1) next_pointer[j] <= #1 6'h3f.j = j + 1) next_pointer[j] <= #1 6'h3f. reg [N-1:0] p_reg.j < N.

end else if (shift) begin if (pointer[pointer_cnt] == inc) pointer_cnt <= #1 1'b1. next_p_reg <= #1 0. p_reg <= #1 0. end else begin p_reg <= #1 p_reg.j = j + 1) pointer[j] <= #1 next_pointer[j]. next_p_reg <= #1 0.106 always @ (posedge clk or posedge rst) begin if (rst) begin pointer_cnt <= #1 1'b0. p_rst <= #1 0.j < N. 1'b1 : 106 . for (j = 0. else pointer_cnt <= #1 pointer_cnt. p_rst <= #1 0. p_rst <= #1 1. if (&inc) begin p_reg <= #1 next_p_reg. pointer_cnt <= #1 1'b0. end if (pointer_full) begin case (inc[0]) 1'b0 : next_p_reg[0] <= #1 next_p_reg[0]^p_i.

inc_out(inc) ).rst(rst).107 next_p_reg[1] <= #1 next_p_reg[1]^p_i. // assign parity = ^(B_P_out). 2'b11 : next_p_reg[3] <= next_p_reg[3]^p_i.clk(clk).just set\rst. en_cnt. . 107 .~B_P_out[0]}. rst. endmodule module incr (clk. .en_cnt(shift). . end //sync always // two oprions to embed teh WM in the LSB . 2'b10 : next_p_reg[2] <= next_p_reg[2]^p_i. make the // numbers parity be equal to the WM bit // wire parity. // // // // endcase end end //shift operations else p_rst <= #1 0. // assign WM_out = (pointer[pointer_cnt] ~= inc) ? B_P_out : (parity == p_reg[pointer_cnt]) ? B_P_out : {B_P_out[COEFF_SIZE-1:1]. incr pointer_gen ( .p_reg[pointer_cnt]} : B_P_out. inc_out). assign WM_out = (pointer[pointer_cnt] == inc) ? {B_P_out[COEFF_SIZE1:1]. input clk.

108 input rst.0 // MODULE: altshift_taps // ============================================================ // File Name: ram_sr. input en_cnt. else if (en_cnt) inc_out <= inc_out + 1'b1.v // Megafunction Name(s): // altshift_taps // ============================================================ // ************************************************************ // THIS IS A WIZARD-GENERATED FILE. end endmodule ram_sr. else inc_out <= inc_out. DO NOT EDIT THIS FILE! // // 5.v // megafunction wizard: %Shift register (RAM-based)% // GENERATION: STANDARD // VERSION: WM1. always @ (posedge clk or posedge rst) begin if (rst) inc_out <= 6'h00.0 Build 148 04/26/2005 SJ Full Version // ************************************************************ 108 . reg [5:0] inc_out. output [5:0] inc_out.

wire [11:0] taps = sub_wire0[11:0]. and any //associated documentation or information are expressly subject //to the terms and conditions of the Altera Program License //Subscription Agreement.clken (clken). that your use is for the sole purpose of //programming logic devices manufactured by Altera and sold by //Altera or its authorized distributors.109 //Copyright (C) 1991-2005 Altera Corporation //Your use of Altera Corporation's design tools. Altera MegaCore Function License //Agreement. //without limitation. // synopsys translate_off `timescale 1 ns / 10 ps // synopsys translate_on module ram_sr ( shiftin. including. Please refer to the //applicable agreement for further details. or other applicable license agreement. altshift_taps altshift_taps_component ( . wire [11:0] shiftout = sub_wire1[11:0]. clock. 109 . taps). clken. logic functions //and other software and tools. clock. taps. clken. and any output files any of the foregoing //(including device programming or simulation files). wire [11:0] sub_wire0. wire [11:0] sub_wire1. [11:0] [11:0] shiftout. shiftout. input [11:0] input input output output shiftin. and its AMPP partner logic //functions.

input shift.width = 12. [5:0] s.shiftout (sub_wire1)). endmodule ffcsr22.clock (clock).number_of_taps = 1. output reg wire reg wire wire wire s. . shift.shiftin (shiftin). input clk. . 110 . q= 4194793. d=22'b1000000000000011110101. altshift_taps_component. altshift_taps_component. defparam altshift_taps_component.lpm_type = "altshift_taps". s). . [N-1:0] mkey = MKEY_VAL. altshift_taps_component.v // this module implements a 22 bits ffcsr RNG `resetall `timescale 1ns/10ps module ffcsr22 parameter parameter parameter (clk.110 . cstate_N. input rst. parameter CKEY_VAL = 4. [5:0] cstate. [N-1:0] mstate_N. parameter MKEY_VAL = 3. N = 22.tap_distance = 64. [N-1:0] mstate. rst.taps (sub_wire0).

assign mstate_N[17]=mstate[18]. assign mstate_N[12]=mstate[13]. assign mstate_N[13]=mstate[14]. assign mstate_N[15]=mstate[16]. assign mstate_N[8]=mstate[9]. assign mstate_N[9]=mstate[10]. assign cstate_N[4]=mstate[7]&cstate[4]^cstate[4]&mstate[0]^mstate[7]&mstate[0]. assign mstate_N[16]=mstate[17]. assign cstate_N[1]=mstate[3]&cstate[1]^cstate[1]&mstate[0]^mstate[3]&mstate[0]. // Define the FCSR and Filter function assign mstate_N[0]=mstate[1]^d[0]&cstate[0]^d[0]&mstate[0]. assign mstate_N[18]=mstate[19]. assign mstate_N[14]=mstate[15]. assign mstate_N[5]=mstate[6]^d[5]&cstate[3]^d[5]&mstate[0]. assign mstate_N[21]=mstate[0]. assign mstate_N[11]=mstate[12]. assign mstate_N[20]=mstate[21]. assign mstate_N[7]=mstate[8]^d[7]&cstate[5]^d[7]&mstate[0].111 wire [5:0] ckey = CKEY_VAL. assign mstate_N[19]=mstate[20]. 111 . assign cstate_N[2]=mstate[5]&cstate[2]^cstate[2]&mstate[0]^mstate[5]&mstate[0]. assign mstate_N[3]=mstate[4]. assign mstate_N[4]=mstate[5]^d[4]&cstate[2]^d[4]&mstate[0]. assign mstate_N[10]=mstate[11]. assign mstate_N[6]=mstate[7]^d[6]&cstate[4]^d[6]&mstate[0]. assign cstate_N[5]=mstate[8]&cstate[5]^cstate[5]&mstate[0]^mstate[8]&mstate[0]. assign cstate_N[3]=mstate[6]&cstate[3]^cstate[3]&mstate[0]^mstate[6]&mstate[0]. assign cstate_N[0]=mstate[1]&cstate[0]^cstate[0]&mstate[0]^mstate[1]&mstate[0]. assign mstate_N[1]=mstate[2]. assign mstate_N[2]=mstate[3]^d[2]&cstate[1]^d[2]&mstate[0].

112

// Caculate the output sequence always @(posedge clk or posedge rst) begin if(rst) begin mstate<= #1 mkey; cstate<= #1 ckey; end else if (shift) begin mstate<= #1 mstate_N; cstate<= #1 cstate_N; end else begin mstate<= #1 mstate; cstate<= #1 cstate; end end assign s=(mstate[0]^mstate[2])^(mstate[4]^mstate[5])^(mstate[6]^mstate[7]^mstat e[21]); //the paratheses will hopefully minimize delay endmodule

B.3.2. DCT IDCT Modules

The DCT modules were borrowed from [ref] where the source code is also available.
B.3.3. Zigzag Modules
Zigzag.v ///////////////////////////////////////////////////////////////////// //// //// //// //// Zig-Zag Unit Performs zigzag-ing, as used by many DCT based encoders //// ////

112

113

//// //// //// //// //// Author: Richard Herveille richard@asics.ws www.asics.ws

//// //// //// //// ////

///////////////////////////////////////////////////////////////////// //// //// //// Copyright (C) 2002 Richard Herveille //// //// richard@asics.ws //// //// //// ////

//// This source file may be used and distributed without

//// restriction provided that this copyright statement is not //// //// removed from the file and that any derivative work contains //// //// the original copyright notice and the associated disclaimer.//// //// //// /////////////////////////////////////////////////////////////////////

`timescale 1ns/10ps module zigzag( clk, rst, ena, dct_2d, dout, douten ); parameter do_width = 12; // // inputs & outputs // input clk; input rst; input ena; // clk enable // system clock

113

114

input [do_width-1:0] dct_2d; output [do_width-1:0] dout; output // // variables // wire block_rdy; reg ld_zigzag; reg douten_reg; wire douten; reg [do_width-1:0] sresult_in [63:0]; // store results for zig// zagging reg [do_width-1:0] sresult_out[63:0]; reg [5:0] sample_cnt; // // module body // always @ (posedge clk or posedge rst) if (rst) sample_cnt <= 6'h0; else if (ena) sample_cnt <= sample_cnt + 1'b1; else sample_cnt <= sample_cnt; assign block_rdy = &sample_cnt; douten; // data-out enable

always @ (posedge clk) ld_zigzag <= block_rdy; always @ (posedge clk or posedge rst) if (rst) douten_reg <= 1'b0;

114

else douten_reg <= douten_reg. end if(ld_zigzag) begin // reload results-register file 0: 1: 2: 3: 4: 5: 6: 7: 3f 3e 3a 39 31 30 24 23 3d 3b 38 32 2f 25 22 15 3c 37 33 2e 26 21 16 14 36 34 2d 27 20 17 13 0a 35 2c 28 1f 18 12 0b 09 2b 29 1e 19 11 0c 08 03 2a 1d 1a 10 0d 07 04 02 1c 1b 0f 0e 06 05 01 00 115 . n=n+1) // sresult_in[0] gets the new input begin sresult_in[n] <= #1 sresult_in[n -1]. always @(posedge clk) if (ena) begin for (n=1. n<=63.115 else if (block_rdy) douten_reg <= 1'b1. assign douten = douten_reg & ena. sresult_in[0] <= #1 dct_2d. // // 0: 1: 2: 3: 4: 5: 6: 7: // 0: 63 62 58 57 49 48 36 35 // 1: 61 59 56 50 47 37 34 21 // 2: 60 55 51 46 38 33 22 20 // 3: 54 52 45 39 32 23 19 10 // 4: 53 44 40 31 24 18 11 09 // 5: 43 41 30 25 17 12 08 03 // 6: 42 29 26 16 13 07 04 02 // 7: 28 27 15 14 06 05 01 00 // // zig-zag the DCT results integer n. // // Generate zig-zag structure // // This implicates that the quantization step be performed after // the zig-zagging.

sresult_out[14] <= #1 sresult_in[32]. sresult_out[08] <= #1 sresult_in[10]. sresult_out[25] <= #1 sresult_in[34]. sresult_out[28] <= #1 sresult_in[56]. sresult_out[37] <= #1 sresult_in[22]. sresult_out[20] <= #1 sresult_in[05].116 sresult_out[00] <= #1 sresult_in[00]. sresult_out[15] <= #1 sresult_in[40]. sresult_out[03] <= #1 sresult_in[02]. sresult_out[36] <= #1 sresult_in[15]. sresult_out[11] <= #1 sresult_in[11]. sresult_out[05] <= #1 sresult_in[16]. sresult_out[29] <= #1 sresult_in[49]. sresult_out[33] <= #1 sresult_in[21]. sresult_out[04] <= #1 sresult_in[09]. sresult_out[19] <= #1 sresult_in[12]. 116 . sresult_out[21] <= #1 sresult_in[06]. sresult_out[35] <= #1 sresult_in[07]. sresult_out[13] <= #1 sresult_in[25]. sresult_out[31] <= #1 sresult_in[35]. sresult_out[24] <= #1 sresult_in[27]. sresult_out[12] <= #1 sresult_in[18]. sresult_out[32] <= #1 sresult_in[28]. sresult_out[06] <= #1 sresult_in[24]. sresult_out[27] <= #1 sresult_in[48]. sresult_out[18] <= #1 sresult_in[19]. sresult_out[34] <= #1 sresult_in[14]. sresult_out[26] <= #1 sresult_in[41]. sresult_out[07] <= #1 sresult_in[17]. sresult_out[23] <= #1 sresult_in[20]. sresult_out[30] <= #1 sresult_in[42]. sresult_out[22] <= #1 sresult_in[13]. sresult_out[10] <= #1 sresult_in[04]. sresult_out[16] <= #1 sresult_in[33]. sresult_out[09] <= #1 sresult_in[03]. sresult_out[01] <= #1 sresult_in[08]. sresult_out[17] <= #1 sresult_in[26]. sresult_out[02] <= #1 sresult_in[01].

sresult_out[52] <= #1 sresult_in[52]. sresult_out[58] <= #1 sresult_in[47]. endmodule 117 . sresult_out[61] <= #1 sresult_in[62]. sresult_out[39] <= #1 sresult_in[36]. sresult_out[50] <= #1 sresult_in[38]. sresult_out[62] <= #1 sresult_in[55]. sresult_out[49] <= #1 sresult_in[31]. sresult_out[48] <= #1 sresult_in[23]. sresult_out[60] <= #1 sresult_in[61]. n=n+1) // do not change sresult[0] sresult_out[n] <= #1 sresult_out[n +1]. end else begin for (n=0. sresult_out[59] <= #1 sresult_in[54]. sresult_out[45] <= #1 sresult_in[44]. sresult_out[41] <= #1 sresult_in[50]. sresult_out[40] <= #1 sresult_in[43]. end end assign dout = sresult_out[0]. sresult_out[46] <= #1 sresult_in[37]. n<63. sresult_out[57] <= #1 sresult_in[39]. sresult_out[44] <= #1 sresult_in[51]. sresult_out[56] <= #1 sresult_in[46].117 sresult_out[38] <= #1 sresult_in[29]. sresult_out[51] <= #1 sresult_in[45]. sresult_out[43] <= #1 sresult_in[58]. sresult_out[42] <= #1 sresult_in[57]. sresult_out[53] <= #1 sresult_in[59]. sresult_out[55] <= #1 sresult_in[53]. sresult_out[47] <= #1 sresult_in[30]. sresult_out[63] <= #1 sresult_in[63]. sresult_out[54] <= #1 sresult_in[60].

wire douten. input clk. input [do_width-1:0] din. 118 . din. input ena. always @ (posedge clk or posedge rst) if (rst) sample_cnt <= 6'h0.v `timescale 1ns/10ps module reverse_zigzag( clk. output // system clock // clk ena output [do_width-1:0] dct_2d.118 reverse_zigzag. reg [5:0] sample_cnt. reg [do_width-1:0] sresult_in [63:0]. dct_2d. douten. parameter do_width = 12. douten ). // data-out enable reg ld_zigzag. else sample_cnt <= sample_cnt. reg douten_reg. else if (ena) sample_cnt <= sample_cnt + 1'b1. ena. input rst. rst. // store results for zig// zagging reg [do_width-1:0] sresult_out[63:0].

always @(posedge clk) if (ena) begin for (n=1. n=n+1) // sresult_in[0] gets the new input begin sresult_in[n] <= #1 sresult_in[n -1]. sresult_in[0] <= #1 din. // // Generate zig-zag structure // // // 0: 1: 2: 3: 4: 5: 6: 7: // 0: 63 62 58 57 49 48 36 35 // 1: 61 59 56 50 47 37 34 21 // 2: 60 55 51 46 38 33 22 20 // 3: 54 52 45 39 32 23 19 10 // 4: 53 44 40 31 24 18 11 09 // 5: 43 41 30 25 17 12 08 03 // 6: 42 29 26 16 13 07 04 02 // 7: 28 27 15 14 06 05 01 00 // // zig-zag the DCT results integer n.119 always @ (posedge clk) ld_zigzag <= &sample_cnt. else if (ld_zigzag) douten_reg <= 1'b1. end 0: 1: 2: 3: 4: 5: 6: 7: 3f 3e 3a 39 31 30 24 23 3d 3b 38 32 2f 25 22 15 3c 37 33 2e 26 21 16 14 36 34 2d 27 20 17 13 0a 35 2c 28 1f 18 12 0b 09 2b 29 1e 19 11 0c 08 03 2a 1d 1a 10 0d 07 04 02 1c 1b 0f 0e 06 05 01 00 119 . always @ (posedge clk or posedge rst) if (rst) douten_reg <= 1'b0. n<=63. assign douten = douten_reg & ena. else douten_reg <= douten_reg.

sresult_out[17] <= #1 sresult_in[07]. sresult_out[42] <= #1 sresult_in[30]. sresult_out[49] <= #1 sresult_in[29]. sresult_out[19] <= #1 sresult_in[18]. sresult_out[13] <= #1 sresult_in[22]. sresult_out[05] <= #1 sresult_in[20]. sresult_out[26] <= #1 sresult_in[17]. sresult_out[33] <= #1 sresult_in[16]. sresult_out[06] <= #1 sresult_in[21]. sresult_out[21] <= #1 sresult_in[33]. sresult_out[34] <= #1 sresult_in[25]. sresult_out[48] <= #1 sresult_in[27]. sresult_out[02] <= #1 sresult_in[03]. sresult_out[35] <= #1 sresult_in[31]. sresult_out[09] <= #1 sresult_in[04]. sresult_out[32] <= #1 sresult_in[14]. sresult_out[03] <= #1 sresult_in[09]. sresult_out[01] <= #1 sresult_in[02]. sresult_out[56] <= #1 sresult_in[28]. sresult_out[11] <= #1 sresult_in[11]. sresult_out[08] <= #1 sresult_in[01]. sresult_out[12] <= #1 sresult_in[19]. sresult_out[18] <= #1 sresult_in[12]. sresult_out[40] <= #1 sresult_in[15]. sresult_out[27] <= #1 sresult_in[24]. sresult_out[10] <= #1 sresult_in[08]. sresult_out[20] <= #1 sresult_in[23]. sresult_out[41] <= #1 sresult_in[26]. sresult_out[14] <= #1 sresult_in[34]. sresult_out[04] <= #1 sresult_in[10]. sresult_out[16] <= #1 sresult_in[05]. 120 .120 if(ld_zigzag) // reload results-register file begin sresult_out[00] <= #1 sresult_in[00]. sresult_out[24] <= #1 sresult_in[06]. sresult_out[28] <= #1 sresult_in[32]. sresult_out[25] <= #1 sresult_in[13].

sresult_out[46] <= #1 sresult_in[56]. sresult_out[36] <= #1 sresult_in[39]. sresult_out[50] <= #1 sresult_in[41]. sresult_out[54] <= #1 sresult_in[59]. end else begin for (n=0. n=n+1) // do not change sresult[63] sresult_out[n] <= #1 sresult_out[n +1]. sresult_out[31] <= #1 sresult_in[49]. sresult_out[37] <= #1 sresult_in[46]. sresult_out[58] <= #1 sresult_in[43]. n<63. sresult_out[29] <= #1 sresult_in[38]. sresult_out[52] <= #1 sresult_in[52]. sresult_out[30] <= #1 sresult_in[47]. sresult_out[53] <= #1 sresult_in[55]. sresult_out[62] <= #1 sresult_in[61]. sresult_out[38] <= #1 sresult_in[50]. sresult_out[55] <= #1 sresult_in[62]. sresult_out[60] <= #1 sresult_in[54]. sresult_out[61] <= #1 sresult_in[60]. endmodule 121 . sresult_out[43] <= #1 sresult_in[40]. sresult_out[23] <= #1 sresult_in[48]. sresult_out[59] <= #1 sresult_in[53]. sresult_out[63] <= #1 sresult_in[63]. sresult_out[47] <= #1 sresult_in[58]. sresult_out[51] <= #1 sresult_in[44]. sresult_out[22] <= #1 sresult_in[37]. sresult_out[44] <= #1 sresult_in[45]. sresult_out[39] <= #1 sresult_in[57]. sresult_out[15] <= #1 sresult_in[36]. end end assign dct_2d = sresult_out[00].121 sresult_out[07] <= #1 sresult_in[35]. sresult_out[57] <= #1 sresult_in[42]. sresult_out[45] <= #1 sresult_in[51].

Sign up to vote on this title
UsefulNot useful