Out 3

UNIVERSITY OF CALGARY
An Imaging System With Watermarking And Compression Capabilities
by
Yonatan Shoshan
A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE
DEPARTMENT OF ELECTRICAL AND COPMUTER ENGINEERING CALGARY, ALBERTA SEPTEMBER 2009
Yonatan Shoshan 2009
Library and Archives Canada Published Heritage Branch 395 Wellington Street Ottawa ON K1A 0N4 Canada
Bibliothque et Archives Canada Direction du Patrimoine de ldition 395, rue Wellington Ottawa ON K1A 0N4 Canada
Your file Votre rfrence ISBN: 978-0-494-54570-6 Our file Notre rfrence ISBN: 978-0-494-54570-6
NOTICE: The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses worldwide, for commercial or noncommercial purposes, in microform, paper, electronic and/or any other formats. . The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the authors permission. In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis. While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis.
AVIS: Lauteur a accord une licence non exclusive permettant la Bibliothque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par tlcommunication ou par lInternet, prter, distribuer et vendre des thses partout dans le monde, des fins commerciales ou autres, sur support microforme, papier, lectronique et/ou autres formats.
Lauteur conserve la proprit du droit dauteur et des droits moraux qui protge cette thse. Ni la thse ni des extraits substantiels de celle-ci ne doivent tre imprims ou autrement reproduits sans son autorisation.
Conformment la loi canadienne sur la protection de la vie prive, quelques formulaires secondaires ont t enlevs de cette thse. Bien que ces formulaires aient inclus dans la pagination, il ny aura aucun contenu manquant.
Abstract This thesis presents an imaging system with a novel watermarking embedder and JPEG compression capabilities. The proposed system enhances data security in surveillance cameras networks and others, thus improving the reliability of the received images for security and/or evidence use. A novel-watermarking algorithm was developed for watermarking images in the DCT domain. The algorithm was optimized for an efficient implementation in hardware while still maintaining a high level of security. The imaging system was physically implemented on an evaluation board including a CMOS image sensor, an FPGA for digital control and processing and a frame grabber for image presentation and analysis. The digital circuitry implemented on the FPGA included the proposed watermarking logic as well as all the required peripheral modules and control signals. The accomplishments of this work have been published in four scientific papers. This work was part of a commercializing effort based on the proposed novelwatermarking algorithm.
ii
Acknowledgements Several people have had an important contribution to my work on this thesis project. I hold great dill of appreciation to my supervisor Dr. Orly Yadid-Pecht and to my cosupervisor Dr. Graham Jullien. Under their support and supervision I have had the chance to experience a very liberal research atmosphere in an excellent environment. I would also like to thank Dr. Alexander Fish for his help in each and every part of my work on this thesis project. He has offered me his exceptional knowledge and experience in academic research through endless discussions and mutual work. My fellow students in the ISL lab: Mr. Xin Li with whom I have worked closely on this project and shared assignments, thoughts and ideas. Ms. Marianna Beiderman who provided her support and advice. In addition, I would also like to thank Mr. Denis Onen for valuable suggestions and productive discussions we held from time to time.
iii
Table of Contents Abstract ............................................................................................................................... ii Acknowledgements ............................................................................................................ iii Table of Contents ............................................................................................................... iv List of Tables ..................................................................................................................... vi List of Figures and Illustrations ........................................................................................ vii List of Symbols, Abbreviations and Nomenclature ......................................................... viii CHAPTER 1: INTRODUCTION ........................................................................................1 CHAPTER 2: CONSIDERATIONS IN THE DEVELOPMENT OF AN IMAGING SYSTEM WITH WATERMARKING CAPABILITIES ...........................................7 2.1 Theory and implementation of watermark algorithms ..............................................7 2.1.1 Watermark Classifications.................................................................................7 2.1.2 Watermark Design Considerations ..................................................................10 2.1.2.1 Robustness to Attacks ............................................................................11 2.1.2.2 Image quality .........................................................................................13 2.1.2.3 Computational complexity.....................................................................14 2.1.3 Figures of Merit for Watermarking Systems ...................................................15 2.2 Watermark implementations Software vs. Hardware ...........................................16 2.3 State of the art in hardware watermarking ...............................................................18 CHAPTER 3: THE PROPOSED IMAGING SYSTEM ...................................................21 3.1 Image acquisition and reordering ............................................................................21 3.2 Compression module ...............................................................................................22 3.2.1 DCT based compression ..................................................................................22 3.2.2 Implementation of the compression module in the proposed system..............24 3.3 Watermark embedding module ................................................................................26 3.3.1 The novel watermark embedding algorithm....................................................26 3.3.2 Implementation of the embedding module in HW ..........................................29 3.4 Watermark generation..............................................................................................31 3.4.1 RNG based watermark generation...................................................................31 3.4.2 Existing RNG structures ..................................................................................33 3.4.2.1 The LFSR ...............................................................................................33 3.4.2.2 The FCSR ..............................................................................................34 3.4.2.3 The Filtered FCSR (F-FCSR) ................................................................35 3.4.3 RNG based watermark generator design method and implementation ...........36 CHAPTER 4: IMPLEMENTATION, TESTING AND RESULTS ..................................38 4.1 Software implementation and algorithm functionality verification .........................38 4.2 Algorithm performance evaluation ..........................................................................40 4.2.1 Fragile watermarking and benchmarking ........................................................40 4.3 Hardware design and verification ............................................................................42 4.3.1 Hardware Experimental Results ......................................................................43 4.4 Physical proof of concept implementation ..............................................................45 4.4.1 The CMOS Image sensor ................................................................................46 iv
4.4.2 Digital signal processing and control ..............................................................47 4.4.3 Output image capture ......................................................................................50 CHAPTER 5: CONCLUSION ..........................................................................................52 5.1 Thesis summary .......................................................................................................52 5.2 Issues that still need attention and future work........................................................53 5.3 Possible future directions for development .............................................................54 REFERENCES ..................................................................................................................56 APPENDIX A: MATLAB CODE .....................................................................................59 A.1. Simulation Testbench and peripherals ...................................................................59 A.1.1. Simulation Envelope......................................................................................59 A.2. Compression/Decompression ................................................................................61 A.3. Algorithm Implementation.....................................................................................63 A.3.1. Embedding .....................................................................................................63 A.3.2. Detection ........................................................................................................65 APPENDIX B: VERILOG CODE.....................................................................................67 B.1. Top Level and Peripheral Modules ........................................................................67 B.2. CMOS Imager Control Logic and Interface...........................................................80 B.3. JPEG Encoding and Watermark Embedding .........................................................92 B.3.1. Watermark Embedding ..................................................................................99 B.3.2. DCT IDCT Modules ....................................................................................120 B.3.3. Zigzag Modules ...........................................................................................120
List of Tables Table 2.1: Existing work in hardware digital watermarking research .............................. 20 Table 4.1: N vs. Quantization-Level Tradeoffs ................................................................ 42 Table 4.2: FPGA Synthesis Results .................................................................................. 45 Table 4.3: Resource utilization by modules in the overall design .................................... 50
vi
List of Figures and Illustrations Figure 2.1 General classification of existing watermarking algorithms ............................. 8 Figure 2.2 Examples of (a) the original image and (b) the image with a visible watermark ................................................................................................................... 9 Figure 2.3 Scheme of general watermark system. ............................................................ 17 Figure 3.1: An imaging system with watermarking capabilities ...................................... 22 Figure 3.2: Example quantization table given in the JPEG standard [4] .......................... 23 Figure 3.3: Schematic of a DCT based compression encoder .......................................... 24 Figure 3.4: Schematic of a HW DCT transform module .................................................. 25 Figure 3.5: 8x8 Block DCT conversion and 1x64 Zigzag reordering .............................. 27 Figure 3.6: Reorganization of the DCT data in the Zigzag order ..................................... 27 Figure 3.7: Example DCT data for blocks J3, J2 ............................................................... 29 Figure 3.8: Schematic description of the watermarking module implementation in HW ............................................................................................................................ 31 Figure 3.9: A generalized Fibonacci implementation of an LFSR circuit ....................... 34 Figure 3.10: Galois implemented FCSR ........................................................................... 35 Figure 3.11: A Gollman cascade RNG ............................................................................. 37 Figure 4.1: Algorithm Matlab simulation results........................................................... 39 Figure 4.2: additional sample images ............................................................................... 41 Figure 4.3: Test setup schematic....................................................................................... 43 Figure 4.4: Hardware watermarked image........................................................................ 44 Figure 4.5: A general implementation of an imaging system ........................................... 46 Figure 4.6: Mixed signal SoC fast prototyping custom development board .................... 46 Figure 4.7: Internal structure of the FPGA digital design ................................................ 47 Figure 4.8: Sample output image from the physically implemented system .................... 50
vii
List of Symbols, Abbreviations and Nomenclature
Symbol CMOS FPN RNG LFSR DCT IDCT FCSR FPGA LSB DWT PSNR PC JPEG RAM KLT HVS ROM XOR FSM ADC I/O CPU LC ASIC
Definition Complimentary Metal-Oxide Semiconductor Fixed Pattern Noise Random Number Generator Linear Feedback Shift Register Discrete Cosine Transform Inverse DCT Feedback with Carry Shift Register Field Programmable Gate Array Least Significant Bit Discrete Wavelet Transform Peak Signal to Noise Ratio Personal Computer Joint Photographic Experts Group Random Access Memory Karhunen-Loeve Transform Human Visual System Read Only Memory Exclusive Or Finite State Machine Analog to Digital Converter Input Output Central Processing Unit Logic Cell Application Specific Integrated Circuit
viii
CHAPTER 1: Introduction Whether or not a digital image is authentic has become a highly nontrivial question. The field of digital imaging and its subsidiaries has been going through a continuous and rapid growth during the last decade. Research activity has been extensive in both the academic and commercial communities, and significant advances and breakthroughs are being constantly published [1]. Digital imaging is taking over the traditional analog imaging in almost all imaging applications, from professional photography and broadcasting to the everyday consumer digital camera. The ease of integrating CMOS imagers with supporting peripheral elements together with a significant reduction in power consumption introduced a variety of new portable products such as imagers on cell phones and network based surveillance and public cameras [2]. Since digital images are very susceptible to manipulations and alterations, a variety of security problems are introduced. There is a need to establish the digital media as an acceptable authentic information source. Digital watermarking has shown the potential for solving many digital imaging problems, including image authentication, copyright control, and broadcast monitoring. A watermark is an additional, identifying message, covered under the more significant image raw data, without perceptually changing it. By adding a transparent watermark to the image, it can be made possible to detect alterations inflicted upon the image, such as cropping, scaling, covering, blurring and many more. At the onset of this thesis work a basic implementation of a CMOS image sensor with watermarking capabilities was suggested [3]. In this implementation analog noise was 1
used as a seed to the algorithm. The Fixed Pattern Noise (FPN) [2] in CMOS imagers was considered as an imager specific analog noise to provide a unique seed to a Random Number Generator (RNG). The system added a pseudo-random digital noise to the pixel data. A Linear Feedback Shift Register (LFSR) that was using the FPN based key as a seed generated the pseudo-random noise. The objective of the thesis was to come up with a commercially attractive watermarking system in hardware, with potential applications in, for example, video surveillance, criminal investigation, biometric authentication, and ownership disputes arenas where evidence of an indisputable nature is essential. The performance of the original concept needed to be tested and verified. Issues of great concern were the effect that the
embedding of the watermark had on the original image quality, the viability of the watermark under compression and the robustness of the watermark against common attacks. Upon further investigation of the proposed watermarking technique, it had become apparent that a more sophisticated version needed to be developed. In order to be more practical, the new version would target a specific range of applications. Different applications require the utilization of different watermarking techniques; and no universal watermarking algorithm that can satisfy the requirements for all kinds of applications has been presented in the literature. A watermarking technique that incorporates a high level of robustness with low image quality degradation was found to require a high level of complexity. Increased complexity in hardware implementations means increased area and 2
power consumption. Therefore the appropriate application to target would be one in which the hardware implementation introduces a major advantage on the one hand, but robustness requirements are liberal on the other hand. A portable device that works in real time would obviously be able to make the best of efficient hardware-based processing. At the same time, techniques that utilize fragile watermarking are inherently non-robust. The authentication of images taken by remotely spread image sensors is an application that fits the aforementioned conditions. Portable devices deliver captured data over communication channels. These channels have limited bandwidth and therefore the raw image data must first be compressed. The vast majority of available compression standards, for both video and still images, utilize the Discrete Cosine Transform (DCT) as part of the compression algorithm. In the DCT form the image is represented in the frequency domain. The frequency domain representation is a more compact representation of the image. The zero frequency (DC) component is the most significant as it holds the average intensity level of the transformed pixel data. As a general trend, it is expected that the remainder of the data will also lie in lower frequencies, and the higher the frequency is - the less significant it is in the description of the image. Compression is achieved by quantizing the DCT data thus reducing its size, while suffering a certain loss of accuracy. The compression algorithm does not quantize the DCT frequencies evenly but rather uses quantization tables to define different levels of quantization for each frequency [4]. Modifying the quantization table changes the tradeoff between visual quality and compression ratio. It is possible to 3
embed a watermark in the quantized DCT data. Hardware-based watermarking is the most attractive approach for a combination with real time compression, which is also implemented in hardware. Moreover, the watermark embedder can be naturally merged as an integral part of the compression module. A novel-watermarking algorithm was developed in the frame of this thesis. The watermark is embedded in the DCT domain and is intended to be implemented as a part of a secure compression module. In general when compressing an image, before DCT and quantization, it is divided into blocks (standard size is 8x8). The watermark will be robust to any level of quantization that is less or equal to the level of quantization utilized by the encoder [4]. Another built-in advantage of this approach is the fact that the image is first divided into 8x8 blocks. By uniquely embedding the watermark into each 8x8 block, tamper localization and better detection ratios are achieved. A semi-fragile algorithm is suitable in image authentication applications as it is robust to the legitimate compression but sensitive to other malicious modifications. A fast, efficient and low cost hardware implementation allows real-time, on-the-scene security enhancement of the data collected by any system of remotely spread sensors. Further development has also been done towards a more secure watermark generation technique. The original RNG was an LFSR that has very good statistical properties and is most simple to implement in hardware. However, it has long been shown that the LFSR can be easily crypt analyzed and its seed can be recovered by observing a short sample of the output sequence [5]. A new approach for designing secure RNGs in hardware was 4
developed, employing Gollman cascaded Filtered Feedback with Carry Shift Register (FFCSR) cores [6], [7]. This structure was found to provide a modular tool for designing secure RNGs that may be suitable for a wide variety of watermarking applications. In addition to the RNG, the generation unit may include a module to embed significant data in the watermark. By embedding identifying features such as date, time, location and others in each frame it enhances the security of watermarking a series of frames or a video stream, as well as inherently providing authentic information about the circumstances involving the capturing of said image. The end goal for the development efforts in this thesis was to provide a proof of concept prototype that will demonstrate the feasibility of the implementation of the proposed system. An evaluation board was utilized as a platform for the implementation of the prototype. The evaluation board combines digital and analog front ends. A CMOS image sensor may be employed as the input image data source, while the onboard Field Programmable Gate Array (FPGA) is used to implement the imager control signals and digital data processing, including the compression and watermarking modules. The design process included algorithmic testing and verification in software using Matlab. Once the properties of the algorithm were established, the system was described in hardware using Verilog HDL and simulated in Modelsim. Finally, after the hardware description had been verified, the design was synthesized to the onboard FPGA and the whole system was physically tested.
The thesis is organized as follows; chapter 2 provides background on the process of developing an imaging system with compression and watermarking capabilities. The proposed imaging system will be described in detail in chapter 3. Chapter 4 presents all stages of the design process along with measured and simulated results. Conclusions and future work are discussed in Chapter 5.
CHAPTER 2: Considerations in the development of an imaging system with watermarking capabilities 2.1 Theory and implementation of watermark algorithms 2.1.1 Watermark Classifications Different applications require utilization of watermarking with different properties, and no universal watermarking algorithm that can satisfy the requirements for all kinds of applications has been presented in the literature. Digital watermarking can be classified into different categories according to various criterions. Figure 2.1 shows a general classification of existing watermarking algorithms. First, a watermark is either intended to be visible or invisible. The invisibility of a watermark is determined by how it affects the image perceptually. Sometimes a watermark is intentionally visible; in which case, the identifying image is embedded into the host image and both are visually noticeable, for example, adding the network logo on the corner of videos in broadcast TV. Figure 2.2 shows examples of the original image and the image with a visible watermark. Generally, most watermarking algorithms aim for the watermark to be as invisible as possible. Invisible watermarking has the considerable advantage of not degrading the host data and not reducing its commercial value. Therefore, more research attention has been drawn to this field, while visible watermarking has received substantially less attention [8]-[16]. Watermarking can also be classified according to the level of robustness to image changes and alterations. Three main categories of watermarking can be identified: Fragile, Semi-fragile and Robust, though no standard definition exists to explicitly
determine which is which. Different applications will have different requirements, while one would need the algorithm to be as robust as possible the other may be designed to detect even the slightest modification made to an image - such a watermark is defined fragile. A fragile watermark is practical when a user wishes to directly authenticate that
Figure 2.1 General classification of existing watermarking algorithms the image he is observing has not been altered in any way since it as been watermarked. This might be the case in applications where raw data is used. However, in most existing applications such modifications as lossy compression and mild geometric changes are inherently performed to the image. For those applications it is most efficient to use a semi-fragile algorithm that is designed to withstand certain legitimate modifications, but
to detect malicious ones. Finally, some applications, such as copyright protection, require that the watermark be detectable even after an image goes under severe modifications and degradation, including digital-to-analog and analog-to-digital conversions, cropping, scaling, segment removal and others [13],[14]. A watermark that answers these requirements would be called robust.
Figure 2.2 Examples of (a) the original image and (b) the image with a visible watermark The dependency of the watermark on the original content is another important distinction. Making the algorithm depend upon the content of the image, is good against counterfeiting attacks, however it complicates the algorithm implementation and therefore the embedding and extracting processes. An additional classification relates to the domain in which the watermarking is performed. The most straight forward and simple approach is a watermarking implementation in the spatial domain that relates to applying the watermark to the 9
10
original image, for example by replacing the least significant bit (LSB) plane with an encoded one [17],[18]. Two other common representations are the discrete cosine (DCT) and the discrete wavelet transforms (DWT) [19],[21] in which the image first goes through a certain transformation, the watermark is embedded in the transform domain and then it is inversely transformed to receive the watermarked image. 2.1.2 Watermark Design Considerations Let us introduce a number of watermarking properties that affect design considerations. (1) Capacity (the term is adopted from the communications systems field [22]): in a watermarking system the cover image can be thought of as a channel used to deliver the identifying data (the watermark). The capacity of the system is defined as the amount of identifying data contained in the cover image, (2) False detection ratio: this ratio is characterized according to the probability of issuing the wrong decision. It is comprised of the probability to falsely detect an unauthentic watermark (false positive), and the probability to miss a legitimate one (false negative). It is possible to manipulate the detection algorithm in order to minimize one or the other, according to the application. The value of this ratio is usually determined experimentally, (3) Image quality degradation: the embedding of foreign contents in the image has a degrading effect on image quality. That parameter is relatively hard to quantize and different measures such as peak signal-to-noise ratio (PSNR) or a subjective human perception measure may be applied [23].
10
11
These properties are elementary in every watermarking system and need to be carefully appreciated. The following subsections show how they are considered from different design point of views and indicate several trade-offs between them. 2.1.2.1 Robustness to Attacks An effective attack on fragile and semi-fragile watermarking will attempt to modify the perceptual content of the image, without affecting the watermark data embedded in it. Knowledge of the embedding and extracting methods is assumed. There are two approaches for an attack; while the first approach requires the decryption of the encoded mark in order to produce a suitable watermark on an unauthentic image, the second one aims to maintain the original mark on a modified image without knowing the mark itself. Decrypting the original watermark is a cryptographic computational problem and is directly related to the capacity of the watermarking system. In watermarking however, the potential for such an attack is even greater (compared with the cryptographic case) as the attacker does not have to find the exact key, but only one that would be close enough to pass over the detectors threshold. And still, if the capacity of the watermark is large enough - using a key of several hundred bits, this attack may not be computationally tractable. The second approach is to imitate the embedded mark and fool the watermark detector into believing the integrity of the image it is inspecting is intact. Known attacks are cover-up [28], counterfeiting [29] and transplantation [28]. In the cover-up attack, the attacker simply replaces parts of the original watermarked image with parts from other 11
12
watermarked images or with other parts of the original image. For example, if the image contains homogeneous areas such as a wall, or a floor and the attacker wishes to hide a smaller object, he may do so by copying other blocks in such a way that the change would be perceptually un-noticeable, but the detector would still recognize a valid watermark on the copied block. The vector quantization and counterfeiting attacks use multiple images that are marked using the same watermark data in order to synthesize fake marked images. These attacks are only possible when directed at block-wise independent watermarking algorithms [30]. The transplantation makes even
watermarking algorithms with block dependencies be vulnerable. It is shown in [31] that deterministic block dependency is not sufficient against a transplantation attack. The proposed algorithm incorporates non-deterministic block dependency, as well as an image long watermark data sequence. This combination makes the attacks mentioned above ineffective. Attacks on watermarking for copyright protection are designed to cause defects to the embedded watermark such that it will be undetectable, while still maintaining reasonable image quality. Such attacks may include one or more of the following: (1) A geometric attack such as cropping, rotation, scaling etc, (2) A Digital-to-Analog conversion, such as printing and then Analog-to-Digital conversion by scanning (can also be done by resampling), (3) Lossy compression and (4) Duplicating small segments of the picture and deleting others (jitter attack) [25].
12
13
It is shown then, that several parameters must be considered for each application, in order to optimize the use of counter-measures. The goal is to maintain the required image quality desired for each application and still be robust to potential attacks. That trade-off is discussed in the next two subsections. 2.1.2.2 Image quality As mentioned, an important objective of a good invisible watermark is minimizing image quality degradation. In the early stages of the project [3], we have shown that for a blind content-independent algorithm, the trade-off between the security (capacity) of the mark and the negative affect on image quality is straightforward. Adding a pseudo-random noise to each pixel embeds the watermark. Increasing the bit size of the mark increases the variance of the noise, which is the measure for the capacity of that algorithm. It relates directly to better false detection ratio. However, it also adds significant high frequency values to the original image, affectively degrading its quality, especially in homogeneous parts of the picture. To avoid such a significant degradation, it is possible to increase the security of the mark by making it content dependent [19],[26]. In a content dependent watermarking system, the embedded data and/or the embedding location are also a function of the cover image. Unlike in content independent watermarking where the detection algorithm disregards the cover image data, here all the data is relevant. Thus, faking an un-authentic image using watermark data from a different authentic image would not work. This introduces higher
13
14
computational complexity, but features a more secured mark without influencing the cover data severely. 2.1.2.3 Computational complexity Intuitively, it is obvious that in order to apply a more complicated algorithm, more complex embedding and detecting blocks would be required. The motivation to keep the computational complexity low depends on the application and on the method of implementation. In real time applications, computations must be done in a very short time period. The speed and processing power of the computational platform at hand, limit the algorithm level of complexity that can be computed in a given time frame. When implementing in hardware, higher complexity requires additional hardware that means more area and additional costs. Depending on the intended application, complex schemes can be implemented to withstand expected attacks. If, for instance, tamper localization is important, a partition of the image into blocks may be of use. If the marked image is expected to go through lossy compression, one may consider embedding the watermark in the frequency domain, as will be described in the next subsection. Other algorithms employ global and local mean values, temporal dependencies (in video watermarking) and a variety of extra features to enhance their performance [15],[16]. However, each additional feature, added to the algorithm, increases the computational effort and hardware resources (such as memory and adders\multipliers) used. Therefore, an optimized scheme will be comprised of the
14
15
minimum number of features needed to satisfy the needs of the application it is designed for. 2.1.3 Figures of Merit for Watermarking Systems As is with all security and cryptography related applications, unless mathematically proven otherwise, an attacker could always potentially come up with a way to break the system. Two common ways to characterize the quality of a watermarking system is to test its robustness for known attacks and measure performance using third party benchmarks. A designer can use the system under evaluation to embed a watermark in a series of test images, and then run them through the benchmark and evaluate the performance by observing the quality of detection. The StirMark code, which is used for evaluating the robustness of watermarking algorithms designed for copyright protection applications, applies a series of attacks on a marked image [27]. In addition, it is possible to evaluate robustness to specific attacks by manually adding them to this benchmark. The Checkmark benchmark provides another framework for application-oriented evaluation of watermarking schemes [22]. The use of such independent, third party, evaluation tools provides a good perspective on how well a watermarking system performs with respect to both known attacks and in comparison with other available systems. The evaluation of fragile and semi-fragile watermarking algorithms is more implicit however. An attack on a fragile or semi-fragile system must specifically address the particular algorithm. While an attack on robust algorithms doesnt have to tune the parameters of the attack to the specific algorithm, for fragile and semi-fragile, the 15
16
detector is designed to be highly sensitive to modifications. As a result the attacker must be aware of the specific watermarking procedure in order to avoid modifications that may alert the detector. Therefore, custom designed attacks and theoretical analysis are required and it is not practical to consider a commonly used benchmark. 2.2 Watermark implementations Software vs. Hardware Figure 2.3 shows a scheme of a general watermarking system. The system consists of watermark generation, embedding and detection algorithms. The identifying data (W in Figure 2.3) can be meaningful, like a logo, or it can simply be a known stream of bits. First the identifying data is encoded using a secret key, K. Then the encoded identifying data is embedded into the original image (I in Figure 2.3). The result is the watermarked image. As previously mentioned, the watermark can be visible or invisible, as shown in Figure 2.3. The detector part is at the receiving end. The objective is to extract the identifying data embedded in the received image, using the secret key and an inverse algorithm. Finally, correlating the extracted mark with the original and applying a chosen threshold make the decision. The system can be implemented on either software or hardware platforms, or some combination of the two. A pure software-watermarking scheme may be implemented in a PC environment. Such an implementation is relatively slow, as it shares computational resources and its performance is limited by the operating system. It is unsuitable for realtime applications, for it would be too slow, and it cannot be implemented on portable imaging devices that have limited processing power. On the other hand it can be easily 16
17
programmed to realize any algorithm of any level of complexity, and can be used on everyday consumer PCs.
Figure 2.3 Scheme of general watermark system. A good example of a software watermarking solution was presented by Li [19]. In this work he proposes software implemented fragile watermarking, embedded in the coefficients of the block DCT. This algorithm is designed for authentication and content integrity verification of JPEG images. The algorithm embeds the watermark only in a few selected DCT coefficients of every block in order to minimize the effect on the image. The author directly addresses known issues in similar previous works [20],[28],[30], inserting additional complexity to overcome security gaps. The system utilizes the advantages of software implementation by using resources needed to store image data, 17
18
transform coefficients and watermark mappings. Using a combination of different security resources, including a non-deterministic mapping of the location of coefficient modulation and block dependencies, the system succeeds in facing several attacks without changing the effect on image quality when compared to similar works. Moreover, the computations involved in the embedding process are kept relatively basic, suggesting suitability for future hardware implementation as well. In opposite to software solutions, hardware implementations offer an optimized specific design to incorporate a small, fast and potentially cheap watermark embedder. It is most suitable for real time applications, where computation time has to be deterministic (unlike software running on a windows system for example) and short. Optimizing the marking system hardware enables it to be added into various portable-imaging devices. In a full imaging system that includes both the imager and watermark embedder, the system security is improved, as it is certain that the data entering the system is untouched by any external party. However, hardware implementations usually limit the algorithm complexity and are harder to upgrade. The algorithm must be carefully designed, minimizing any unexpected deficiencies. 2.3 State of the art in hardware watermarking In the last few years we have seen a significant effort in the field of digital watermarking in hardware. This effort is mainly concentrated in implementing invisible robust watermarking algorithms in hardware. While earlier implementations used more simplistic watermarking techniques, such as LSB replacement [33], more recent 18
19
publications have also introduced incorporation of more sophisticated embedding procedures [34]. In one of the most recent surveys of hardware digital watermarking [32] a comprehensive review of hardware watermarking related topics is given. Table 2.1 is based on this survey and presents many of the studies published in the field. The table shows the variety of different watermarking applications and possible research directions. Still most of the work is concentrated on the implementation of robust watermarking algorithms for copyright applications and its subsidiaries such as broadcast monitoring. It also shows that most of the work has been done on spatial domain watermarking. On the other hand, fragile watermarking has been given much less attention. The presented implementations mainly focus on implementing algorithms previously presented in software on a real-time platform. However, it seems that potential attacks on fragile and semi-fragile algorithms such is counterfeiting and collusion are not covered well [33],[42]. The algorithm and system proposed in this thesis offer improved properties in terms of hardware utilization, robustness against known attacks and tolerance to legitimate compression. The reader is referred to the chapters below for detailed presentation and explanations of theses improved properties. This implementation addresses a field of applications, which is still not treated in previous work. It is demonstrated how the watermark embedder can be naturally integrated as a part of a JPEG compressor. The proposed watermark algorithm is intended to be integrated with real-time, pipelined JPEG encoders. Thus, the combined system 19
20
could serve both still image sensors as well as M-JPEG video recorders. The HW implementation of the watermark embedder consists of simple memory and logic elements. It features minimal image quality degradation and good detection ratios. Table 2.1: Existing work in hardware digital watermarking research
Research Work Garimella et. al. [33] Mohanty et. al. [34] Tsai and Lu [35] Mohanty et. al. [36] Hsiao et. al. [37] Seo and Kim [38] Strycker et. al. [39] Maes et. al. [40] Tsai and Wu [41] Brunton and Zhao [42] Mathai et. al. [43] Vural et. al. [44] Petitjean et. al. [45] Platform ASIC ASIC ASIC ASIC ASIC FPGA DSP FPGA/ASIC ASIC GPU ASIC Not implemented FPGA/DSP WM Type Fragile Robust-Fragile Robust Robust Robust Robust Robust Robust Robust Fragile Robust Robust Robust Application Image Image Image Image Image Image Video Video Video Video Video Video Video Domain Spatial Spatial DCT DCT Wavelet Wavelet Spatial Spatial Spatial Spatial Spatial Wavelet Fractal
20
21
CHAPTER 3: The proposed imaging system This chapter will provide a general description of the proposed imaging system followed by a more detailed discussion of the different building blocks. Figure 3.1 presents an overview of the system. Naturally, the system is designed as a pipeline where each stage adds latency but does not compromise the overall throughput. Every stage has an initial transition phase, after which it issues a new valid output on every clock. The length of the transition phase is the latency of the system and is the total length of the transition phase times of all the stages in the pipeline. 3.1 Image acquisition and reordering Image acquisition can be done using any digital imaging device. However, employing a CMOS image sensor provides an opportunity for a higher level of integration. In the most general form, a raster scan digital pixel output is considered. A dual port memory buffer, capable of storing 16 rows of raw image data is used to reorder the pixels in 8x8 blocks. Outputting the pixels in a different order than that in which they are input performs reordering. Data is sent forward to the DCT module 55 clocks before 7 complete rows have been stored in the memory buffer. This is done to ensure that valid data for the DCT module is present in the memory buffer on every clock from this point on. The DCT module processes the data that is stored in rows 1-8 of the buffer memory block after block, while new data is stored in rows 9-16. Once all the data in rows 1-8 has been processed, new data is again written there, while the data in rows 9-16 is being processed.
21
22
The buffer memory size and latency depend on the size of the image sensor. For example, for a 1M pixel array having 1024 rows and 1024 columns with 8 bits per pixel, the size of the memory buffer is 128Kbits and the latency would be 8137 clocks.
Figure 3.1: An imaging system with watermarking capabilities 3.2 Compression module 3.2.1 DCT based compression Following is a brief overview of DCT based compression, for a detailed discussion, the reader is referred to literature [46],[48]; it has been shown that transforming a spatial
22
23
image into the DCT domain provides a more compact presentation of the information contained in the image [48]. In fact, the DCT presentation of natural images is considered to be a good approximation of the Karhunen-Loeve Transform (KLT) that is the most compact representation. Furthermore, the image is represented by its spatial frequency components. The Human Visual System (HVS) was found to be less sensitive to changes in higher frequency components [48]. Therefore, representing the image in the DCT domain enables compression by concentrating the data in fewer coefficients and further identifying those portions of the data that are more visually significant.
Figure 3.2: Example quantization table given in the JPEG standard [4] Compression is achieved by applying different levels of quantization to different DCT coefficients. Figure 3.2 presents the quantization table suggested in the JPEG standard. Each of the 64 DCT coefficients is assigned a specific value for quantization. Dividing the value of the coefficient by the quantization value assigned to it does quantization. Division by a larger quantization value results in a more coarse representation of the coefficient (more information is lost in the process). In addition, many of the low value 23
24
higher frequency coefficients get zeroed out in the process thus increasing the compression ratio. Different applications use different quantization tables. Of course, if a higher level of compression is required, higher quantization values are used. In the scope of this work a very important property of DCT based compression is that once an image has been compressed with a certain quantization level, it can be recompressed with a smaller level of quantization without incurring any further loss of information. Therefore, embedding a watermark in the quantized coefficients ensures that the watermark is robust to DCT compression with a quantization level equal or less to the quantization level used during the watermarking process [19]. To complete the compression and translate the reduced amount of data into a representation using fewer bits, entropy encoding is applied. In JPEG, entropy encoding involves run-length encoding followed by Huffman coding or arithmetic encoding [4]. Figure 3.3 presents a schematic view of a DCT based compression module in hardware.
Figure 3.3: Schematic of a DCT based compression encoder 3.2.2 Implementation of the compression module in the proposed system A DCT transform module was implemented following a design available from Xilinx Inc. [46], based on the architecture described in Figure 3.4. This implementation takes 24
25
advantage of the separable property of the DCT transform, i.e. the 2-D DCT transform can be calculated as a series of two 1-D DCT transforms, where the first transform is applied in one direction and the second is applied on the orthogonal direction. Using vector form, the 8x8 DCT transform Y of an input block X is given by Y = CXCt, where C is the cosine coefficients and Ct are the transpose coefficients. In hardware this is realized by storing the output of the first 1-D DCT in a transpose memory buffer line after line, then applying a second 1-D DCT transform on the columns of the result.
Figure 3.4: Schematic of a HW DCT transform module It is much easier to implement multiplication in HW then division, especially in FPGA devices, where designated multipliers are often available. It is possible to predetermine the inverses of all 256 possible quantization values and storing them in a ROM. Multiplying the DCT value by the inverse quantization value gives the desired quantized DCT coefficient. The entropy encoding modules were not necessary for the purposes of this work. The proposed watermark embedder processes the quantized output of the DCT transform module. The entropy encoding stage merely changes the way the data is represented but does not lose any more information in the process. Therefore encoding the watermarked
25
26
data only to decode it back to the exact same form does not provide any additional insight. 3.3 Watermark embedding module 3.3.1 The novel watermark embedding algorithm Presented here is a novel watermarking algorithm that allows a very simple and efficient implementation of the watermark embedder in hardware. The algorithm modulates N cells in each DCT block. The values of the processed DCT block are considered along with the values of its neighbour to the left in order to choose which cells are to be modulated and in what way. A secret pseudo-random sequence, with the same length as the image, serves as the watermark data. It is used to mask the operation of the algorithm and resist attacks. As shown in Figure 3.5, the original image is divided into 8x8 blocks indexed I1-IMxN. After DCT transformation the DCT blocks J1-JMxN are reorganized in blocks of size 1x64, according to the zigzag order, shown in Figure 3.6. Let us consider the watermarking procedure for the block I3 of the example image I. Figure 3.7a-b present J3 and J2 that are the DCT transforms of the blocks I3 and I2 respectively. Figure 3.7 c is the secret watermark data W3 generated for I3. The binary matrix P3 in Figure 3.7d is the logical AND between J3, J2 and W3, thus enabling dependency of the procedure on the neighbouring block J2 and masked by the secret watermark data W3.
26
27
Figure 3.5: 8x8 Block DCT conversion and 1x64 Zigzag reordering
Figure 3.6: Reorganization of the DCT data in the Zigzag order The matrix P3 is used to embed the watermark in J3. Considering N=2 (this will be assumed in all the examples from now on) and following the zigzag order, the first two non-zero cells (marked black in Figure 3.7d) in the matrix P3, cells 47 and 43, indicate the two cells that are going to be modulated in J3. The remaining cells in P3 determine the LSB of the indicated cells. Still following the zigzag order, the cells are alternately divided into two groups such that the first cell belongs to the first group, the next cell to the second group and so forth until all cells have been assigned. In the example, the two groups are marked by different backgrounds. The bits in each group are XORed to
27
28
determine the corresponding LSB value for the designated two cells. In the example, the results of XORing the cell of each group both yield a value of 1. The embedder only needs to change the value of cell 43 from 2 to 3. It is obvious then, that embedding the mark would only have a slight effect on the hosting block data. The detection procedure is a lot similar to that of the embedding procedure. The input is the watermarked image in compressed format. The detector first decodes the data to receive the quantized DCT data. The matrix P3 is created in the exact same manner as for the embedding process. The modulated cells are identified, only that instead of modifying the cells, a comparison is made to verify that they are indeed equal to the expected modulation value. Taking advantage of the software platform, additional processing can aid detection. In particular, it is reasonable to assume that an attacker would attempt to cover continuous surfaces rather than isolated spots and so morphological closure on the output detection image can improve detection ratios. In principal, as N is increased so does the robustness of the algorithm, while at the same time image quality is reduced. Simulations show that even for N as low as 2, performance is satisfactory. A block that produces less than two non-zero cells in a matrix P is considered un-markable and is therefore disregarded. Only blocks that are distinctively homogeneous and have very low values for mid and high frequency DCT coefficients are problematic.
28
29
Figure 3.7: Example DCT data for blocks J3, J2 3.3.2 Implementation of the embedding module in HW Let us examine how the watermarking of the example block J3 is done in hardware. Figure 3.8 presents a schematic view of the hardware implementation of the watermark embedder. Each clock the module receives a 12 bit quantized DCT coefficient, Jb(i), from the Zigzag buffer and a watermark bit Wb(i) from the watermark generator as inputs. 29
30
Here b is the number of the block within the current frame and i is the number of the DCT cell within the block b. The value of Pb(i)=AND(Jb(i),Jb-1(i),Wb(i)) is calculated. Jb(i) is stored in the DCT data buffer to be used in the watermarking of the next block, Jb+1. Pb(i) is forwarded on to the Current Ind and Val registers. The value of the index i will be recorded by the Ind register for the two first non-zero occurrences of Pb(i). Recall that it is now required to divide the remaining Pb values into two groups. This is done by alternately referring Pb(i) into the cells of the Val register, not before XORing it with the current value of the register, thus using the associative nature of the XOR operation to progressively calculate the value of the XOR between all the bits in the group. When i turns to zero i.e. all of the block Jb has been read, the Ind register holds the indexes of the cells where the mark is to be embedded, while the Val register holds the values for the LSB of these cells. These values are then copied to the registers marked previous to make room for new calculations. The modulator reads the data registered in the Previous Ind and Val registers. If the value of the index i is found in the Ind register then the corresponding bit in the Val register is written to the LSB of Jb-1(i), getting JWMb-1(i).
30
31
Figure 3.8: Schematic description of the watermarking module implementation in HW 3.4 Watermark generation 3.4.1 RNG based watermark generation Pseudorandom sequences are sequences that have similar statistical properties as true random sequences but still allow regeneration. In general, a pseudorandom sequence has low cross-correlation values between different samples, no repeating patterns and when security is a requirement, the prediction of future samples or otherwise regeneration of the sequence based on observation should not be possible. In hardware, using a Finite State Machine (FSM) may generate a pseudorandom sequence. Any sequence generated by an FSM will eventually be periodic, i.e. n n0 , S n = S n +t , where Sn is the n-th bit of the sequence and t is the length of the period. For a pseudo Random Number Generator (RNG), the initial state will determine the whole sequence. The initial state will be 31
32
determined according to a secret key. A secure RNG is designed in such a way that a potential attacker would have to consider all the possible secret keys in order to regenerate the sequence. The vast majority of proposed RNGs are based on the use of feedback shift registers (SR) where the input bit is a function of the current shift register state. Different feedback functions can be implemented, such as Linear Feedback Shift Register (LFSR) with a linear feedback function and Feedback with Carry Shift Register (FCSR), which has a nonlinear feedback function. The linearity or non-linearity of the RNG will determine the mathematical tools used to analyze the output of the RNG. Thus, sequences from both LFSR and FCSR can be easily recovered from their outputs using cryptanalysis [49]. In order to increase the complexities, one of the common methods is to combine different RNG architectures to get one that is more secure [49]. A Filtered-FCSR (F-FCSR), where the non-linear output of the FCSR core is linearly filtered, is a good example of such a combination [7]. In this case, the output will be much more robust to cryptanalysis. An RNG can be used as a watermark generator. Identical RNGs are to be implemented in both the embedding side and the receiving side. By sharing the knowledge of the secret key (which is much shorter than sharing knowledge of the whole watermark sequence), both sides are able to generate the same watermark data sequence. As watermarking is a security oriented application it is important to have secure RNG design and size. The size of the RNG (the number of bits in the shift register) is in proportion to the key range. The
32
33
maximal proportion between the shift register size and the key range is 2n-1, where n is the number of bits in the shift register. 3.4.2 Existing RNG structures 3.4.2.1 The LFSR In an LFSR the input bit is a linear function of its previous state. In order to be able to analyze the properties of the output sequence, the mathematical tools of finite binary fields are used. Figure 3.9 shows the generalized Fibonacci implementation of an LFSR. The shift register is driven by the sum modulo 2 (XOR) of some bits of the overall shift register state. The polynomial S ( X ) = n =0 sn x n GF ( 2 ) [[ X ]] where s {0,1} , is the
generating function for the output sequence. This function is determined by the connection polynomial q ( x ) = i =0 qi xi 1, qi {0,1} , and a polynomial u(x), defined by the initial state. It can be shown that S(x)=u(x)/q(x) [5]. It is clear that the sequence sn is periodic, since the LFSR has a maximum of 2k different states. An m-sequence is a sequence with the maximal 2k-1 period. If q(x) is a primitive polynomial then sn is an m-sequence. As mentioned in [5], m-sequences have very good statistical properties and are distributed uniformly. However, they are not very secure and given 2k bits of the sequence, the Berlekamp-Massey algorithm can be used to regenerate the whole sequence.
k
33
34
Figure 3.9: A generalized Fibonacci implementation of an LFSR circuit
3.4.2.2 The FCSR The Galois mode implementation of an FCSR, shown in Figure 3.10, is an expansion of the LFSR where instead of sum modulo 2, a carry from the last summation is added. This introduces non-linearity and enhances the security of the output sequence. The sequence can no longer be analyzed using finite fields and the related structure is 2-adic integer numbers [5],[47]. A 2-adic integer is formally a power series s = n= 0 sn 2n , s {0,1} . The
set of 2-adic integers is denoted by Z2. A 2-adic integer can be associated to any binary sequence. An important observation is that 1 = i =0 2i , which can be understood if we
consider the result of 1+ 2 i = 0 . Another intuitive association is to digital arithmetic

i= 0
where Z2 can be associated to an infinitely large 2s complement system. To receive a strictly (without any transient phase) periodic sequence s, using the FCSR shown above,
34
35
we need to consider two co-prime integers p and q, where q must be odd and negative and p<-q.
Figure 3.10: Galois implemented FCSR
The FCSR basically performs the 2-adic division p/q, to receive the sequence s. The 2adic integer q is the connection integer of the FCSR; it determines where an addition device will be added between two cells and where the bit would simply be shifted. The integer p is a function of the initial state. The period and statistical properties of the output sequence depend only upon q. If q is odd and p and q are co-prime, the period of s is the order of 2 modulo q, i.e., the smallest integer t such that 2t1 mod q. In order to achieve the maximal period, q must be a prime number with 2 as a primitive root. That is, 2q-1-1 is divided by q. In that case, the period T, would be equal to (|q|-1)/2. The parameter p is related to the initial state of the generator. The FCSR is not a secure RNG and similar to LFSR, available cryptanalysis methods may be used to recover the secret key by observing a short minimal output sequence. 3.4.2.3 The Filtered FCSR (F-FCSR) As its name suggests, the F-FCSR utilizes an FCSR in addition to a linear filter. The linear filter is a XOR gate with selected bits of the FCSR serving as inputs. It is suggested 35
36
that if qi equals one then Mi-1 is connected to the filter, as shown in [7]. In other words, if the output sequence is denoted by s = iN=1 fi M i then fi=qi+1, where fi indicates whether Mi 0 is connected to the filter or not. The addition of the linear filter breaks the 2-adic nature of the FCSR and introduces a new mathematical structure which is nor 2-adic neither linear. This undefined structure makes the F-FCSR robust to any known cryptanalysis methods [7].
3.4.3 RNG based watermark generator design method and implementation
A novel design technique involving a Gollmann cascade with F-FCSR cores was proposed and implemented. The implemented RNG is comprised of several fundamental F-FCSR building blocks connected in series. Utilizing cascades offers a straightforward and simple approach to enhance the performance of many systems. Shift register based RNGs are cascaded by making the registers be clock controlled by their predecessors. A Gollman cascade RNG, introduced in [6], is depicted in Figure 3.11. The important feature of this method of cascading is the use of the XOR function for the coupling of the register clock. Earlier cascades utilizing AND functions resulted in a very low clock rate for the registers further down the cascade. Originally, cascading is used to complicate the structure of the RNG and make it more secure. Here, we proposed to use cascading to achieve a high level of modularity for the designer. By creating an initial pool of relatively small-sized core RNGs it is possible to construct a custom sized generator without any significant design effort. The period of a cascaded generator is the product of the periods of the different cores. When identical 36
37
cores are used T=(T)l, where T is the period of the cascade, T is the period of one core and l is the number of the cores in the cascade. Using F-FCSR cores in a Gollmann cascade structure makes the best out of both concepts. Each core is inherently secure and the design complexity remains simple.
Figure 3.11: A Gollman cascade RNG
This structure was found to provide a modular tool for designing secure RNGs that may be suitable for a wide variety of applications. In watermarking applications the modularity of the design method allows for worry free adjustments of the RNG size according to the specific implementation requirements. The tool was utilized to design a 22-bits RNG composed of two 11-bits cores connected as a Gollman cascade [50]. In this implementation the periodic binary sequence is used directly as the secret watermark. The detector would need to have knowledge of the starting frame and the secret key in order to extract the correct watermark. The implemented RNG outputs a binary sequence with a period of 3,568,321 bits. Considering a 256x256 sensor array as an example, the watermark will repeat itself every 54.45 frames.
37
38
CHAPTER 4: Implementation, testing and results 4.1 Software implementation and algorithm functionality verification
First, the algorithm performance was evaluated using a Matlab simulation. A sample image is embedded with a watermark according to the proposed algorithm. The cover-up attack is applied to the watermarked sample image. The attacked image is analyzed by the watermark detector that outputs a detection map. The detection map is used to indicate which blocks of the image are suspected as inauthentic. The results of a simulation of the proposed algorithm are presented in Figure 4.1. Figure 4.1(a) shows the original 128x128 example image, Figure 4.1(b) is the image after it was compressed and embedded with the watermark. With only 35% of the DCT data being non-zero, the PSNR is still very high at 43.4dB and the difference between the images is practically un-noticeable. The tampered image is shown in Figure 4.1(c). Two areas of the image have been modified after the cover-up attack has been applied to the watermarked sample image. On the upper right corner, the airplane in the original image is removed by copying the contents of adjacent blocks onto the blocks where the airplane is supposed to appear. To an innocent observer the original existence of the airplane in the image is visually un-detectable. This is facilitated by the homogeneous nature of the surrounding neighbourhood. A more easily noticeable example of such modifications is shown on the lower left corner of the image where the reflection of the sun is partially removed. Both modifications would be easily noticed using the detection map created by the WM detector and presented in Figure 4.1(d). This example shows that the watermark 38
39
is as effective on homogeneous surfaces (where only a small portion of the DCT data is non-zero) as it is on high spatial frequency surfaces.
Figure 4.1: Algorithm Matlab simulation results
39
40
4.2 Algorithm performance evaluation
The results shown above are achieved while using N=2. That means only two cells in each block of DCT data were modulated. Several experiments have been conducted to examine the optimal number of cells to modulate. The results, shown in Table 4.1, summarize the effect of changing N along with the level of quantization. The level of quantization is indicated by the average ratio of nonzero cells in a block after quantization. N=4 exhibits slightly better detection ratios, becoming more significant as the quantization level increases. Therefore, a larger N should be considered when an aggressive level of quantization is desired. The cost of increasing N is additional hardware and a reduction in image quality. In terms of hardware costs the difference is reasonable, and it sums up to extra registers and larger multiplexers. As to image quality, in terms of PSNR, Table 4.1 shows that the difference is less than 0.5dB that is mostly negligible.
4.2.1 Fragile watermarking and benchmarking
It is important to emphasize that benchmarking for a fragile watermarking algorithm is a tricky issue. As mentioned before, in robust algorithms it is possible to utilize third party benchmark with relative ease. There, the objective of the benchmark is to apply known attacks on a marked image such that the mark would be undetectable. The user merely needs to embed a mark, then run the marked image through the benchmark and then try to detect the mark.
40
41
In fragile algorithms however, this flow is not practical. The objective of a potential attacker is to make modifications to the marked image without damaging the embedded mark. Therefore running a marked image through any of the available benchmarks would simply damage the watermark such that the detector thinks (rightfully) the image has been tampered with. An attack on a fragile watermark must consider the specific watermarking algorithm as it needs to try and imitate what the authentic embedder is doing in order to be able to fool the detector. Original Image
Monkey Lena Figure 4.2: additional sample images
WM Image
Tampered Image
Detection Zones
The evaluation strategy taken in this thesis includes a combination of experimental results from the tamper detection on sample images such as the ones presented in Figure 4.1 and 41
42
Figure 4.2. In addition, the algorithm is inherently resilient to known attacks against fragile watermarking thanks to inter block dependency and a non-deterministic choice of the watermarked cells within each block. A detailed proof of the sufficiency of these measures is given in [19].
Table 4.1: N vs. Quantization-Level Tradeoffs
Nonzero Cells [%] 43.8 43.8 28.0 28.0 21.0 21.0
N 2 4 2 4 2 4
PSNR [dB] 45.85 45.59 41.64 41.24 39.22 38.77
Detection Ratio [%] 82.6 94.7 90.7 90.7 76.0 89.3
4.3 Hardware design and verification
The watermark embedder block was described using Verilog HDL. An evaluation board was employed to assess the performance of the algorithm when synthesized to an Altera Cyclone FPGA device. The schematic of the test setup on the evaluation board is given in Figure 4.3. The onboard Cyclone FPGA is used to implement the proposed watermark embedding architecture, as well as the necessary peripheral blocks. The DCT and inverse DCT transform blocks were borrowed from [46]. First, an image is copied onto the onboard frame memory. The memory is used to emulate a digital image sensor. The data output from the memory is treated as if an image sensor generated it. The data is then DCT transformed, quantized and arranged in the zigzag order according to the procedure described above. A preset watermark sequence is 42
43
used by the watermark embedder module to embed the watermark in the DCT coefficients of the test image. Finally, the DCT data is de-quantized and rearranged before it is inversely transformed back to the spatial domain. The output is the original image that now contains the watermark.
Figure 4.3: Test setup schematic 4.3.1 Hardware Experimental Results
The Verilog description has been verified through HDL simulation and experimented on the evaluation board. The DCT data of a sample image was pre-calculated in software and read to a virtual buffer. The Verilog module reads the DCT data from the virtual buffer the same way it would do with the output of the zigzag buffer. The DCT data was embedded with the watermark. Finally the marked DCT data was stored in a file and analyzed to verify it was marked correctly. (a) shows the input image to the watermark hardware test setup and (b) presents the watermarked image at the output. The design was synthesized to an Altera Cyclone EP1C20 FPGA device using the Altera Quartus II design software. In addition, the design was also mapped to an Altera FLEX10KE FPGA such that it is possible to compare the results with the results of Agostini et al. [51] for a complete 43 JPEG compressor system.
44
Table 4.2 summarizes the performance of the three mappings in terms of hardware usage, throughput and latency. The results clearly show that the watermark embedder can easily be added to an existing JPEG compressor, even when that compressor is oriented at lowcost high throughput applications. The addition does not affect the desired throughput and requires a negligent addition of hardware resources and power, compared to the complete system to which it is added.
Figure 4.4: Hardware watermarked image
44
45
Table 4.2: FPGA Synthesis Results

Design WM Embedder FLEX WM Embedder Cyclone Agostini et al. [16] JPEG compressor FLEX Logic Cells 132 113 Memory Bits 744 744 Frequency\ Throughput 187.93 MHz 209.16 MHz Latency 64 64
4844
7436
39.84 MHz
238
The hardware embedder would add 132 more logic cells that is a negligible addition of 2.73% to the hardware of the JPEG compressor. The combined system would easily fit in the original FPGA device. In general, any device that is large enough for the implementation of the JPEG compressor would be enough to accommodate the additional hardware required for the watermark embedder block.
4.4 Physical proof of concept implementation
As part of a commercialization effort and to provide further validation, the proposed imaging system was physically implemented. Figure 4.5 presents the necessary elements required for the physical implementation. The evaluation board shown in Figure 4.6 was utilized as the implementation platform. A CMOS imager with an internal ADC and analog biasing circuitry is employed as the test imager. The onboard FGPA device provides memory and a platform for control signals generation and digital image processing. The evaluation board has an LVDS I/O port for fast communication with a digital frame grabber.
45
46
Figure 4.5: A general implementation of an imaging system
Figure 4.6: Mixed signal SoC fast prototyping custom development board 4.4.1 The CMOS Image sensor
Following is a brief description of the properties and mode of operation of the test CMOS image sensor. A 256x256 pixels rolling shutter CMOS image sensor with an internal 12 bit pipelined ADC was borrowed from [52]. There are two important attributes to 46
47
consider; rolling shutter operation introduces a non-continuous readout sequence with a setup phase for performing row reset and analog pixel data readout operations. The 12 bit pipelined ADC introduces a latency of 6 clock cycles. Theses two attributes, while requiring special attention, offer insight on the applicability of the proposed system to a common imaging system setup.
4.4.2 Digital signal processing and control
All the digital signal processing and control was performed on the onboard FPGA device. The FGPA device handles I/O communication with multiple devices on the board. A schematic diagram, describing the internal design structure of the FPGA is given in Figure 4.7.
Figure 4.7: Internal structure of the FPGA digital design
47
48
The CPU interface is responsible for communications with the onboard microcontroller, which in turn handles communications with a PC. The microcontroller handles the CPU interface in a similar manner to an external memory. This allows the user to change internal FGPA register values online. The imager interface is in charge of generating control signals to the CMOS imager and of receiving and synchronizing its output pixel data. The control signal generation is based on a fundamental line sequence that is repeated periodically. A line setup phase takes place before the readout of every line. The analog data of every pixel in the line is then sampled by the ADC one pixel at a time. A sample clock and a decoder that controls an analog multiplexer handle the sampling. The imager interface accounts for the sample clock, the decoder input and the latency of the ADC and generates synchronization signals including pixel clock, line enable and frame enable. The image data can be transmitted to a digital frame grabber without further processing. In this implementation the encoder/embedder receives as input the pixel data and synchronization signals generated by the imager interface. JPEG requires that the pixel data be input in 8x8 blocks. An input memory buffer is used to reorder the pixels. Its size is 256x16 words. It has a capacity of sixteen lines allowing alternating read/write operation where eight lines are being written to the memory while the other eight lines are being processed. According to the imager interface operation the data is written one line at a time with a pause between every two lines. The timing and length of the pause are determined according to the synchronization signals. 48
49
While the general architecture of the encoder is meant to function as a pipeline, it is obvious that due to the imager mode of operation, it is impossible to achieve a completely continuous operation. Instead, the data from the input memory buffer is read eight lines at a time, followed by a pause to wait for the next eight lines to be completely written into the input memory buffer. To comply with the pipelined nature of the encoder, input enable has been added to all the registers in the design such that it is possible to freeze the module state at any point without loss of information. Finally, after the watermark has been embedded and the watermarked DCT data has been inverse transformed back into spatial pixel data, an output memory buffer is used to reverse the 8x8 blocks order back to standard rolling shutter. Every element in the encoder has a data output enable signal to facilitate synchronization between different elements in the design. When valid watermarked data is output from the inverse DCT module, a data output enable for the entire encoder is turned on. The data is then written to the output memory buffer that operates in a similar manner to that of the input memory buffer except it is in the reverse direction. Data is read out in chunks of eight lines at a time with suitable synchronization lines generated for the transmission of the watermarked data to the frame grabber. Table 4.3 summarizes the resource utilization in the final implementation of the design. The encoding module and its interface make the main demand for hardware resources. The inclusion of the watermark-embedding module including the RNG adds only 113 logics cells (LCs), which is less than 0.9% of the LCs used by the encoding module. The 49
50
watermark-embedding module also utilizes 744 memory bits to store the DCT values of one block, about 1.1% of the memory used in the overall encoding module.
Table 4.3: Resource utilization by modules in the overall design
Module Overall design Imager interface CPU interface Encoder + buffers WM embedder RNG
Logic Cells 13471 198 282 12573 113 30
Registers 8307 103 154 7905 78 28
Memory Bits 66280 0 0 66280 744 0
(a) output image w/ watermark
(b) output image w/o watermark
Figure 4.8: Sample output image from the physically implemented system 4.4.3 Output image capture
A National Instruments (NI) PCI-1428 digital frame grabber card is used to capture the image output from the evaluation board. NI LabVIEW software is used for the analysis and presentation of the captured data. An example image, taken by the implemented 50
51
system is given in Figure 4.8. The system offers the option to produce a watermarked image and a reference image that only goes through DCT and IDCT for the purpose of comparison.
51
52
CHAPTER 5: Conclusion 5.1 Thesis summary
This thesis has been conducted as part of an ongoing hardware watermarking I2I project. At the starting point of the thesis work, a prototype has already been designed. This prototype implemented an elementary watermarking algorithm in the spatial domain [3]. However, in order to provide a more commercially appealing implementation it has been determined that a more sophisticated design must be realized. An extensive literature survey has been conducted to explore existing watermarking methods and applications. It was found that while much work has been done in the field of watermarking in software, watermarking in hardware was still an emerging field of research. Watermark embedding in hardware introduces an opportunity to enhance the security of a real-time imaging system but is also a design challenge. The watermark algorithm must incorporate security features but have low complexity. The objective was to come up with a concept that would allow the addition of a secure watermark in hardware without resulting in performance degradation and/or increasing costs significantly. Watermarking in the DCT domain was identified as having the potential to accommodate these requirements. In most existing watermarking algorithms in the DCT domain, the DCT transform and quantization are the most hardware intensive elements. Therefore, the addition of a watermark embedding module based on a low complexity algorithm would require very little overhead.
52
53
A novel watermarking algorithm for the DCT domain was designed, tested and implemented. As expected, the implementation of the algorithm resulted in a hardware increase of a mere 1%. As DCT compression is widely used in most common imaging systems, it is possible to apply the proposed design with little to no additional cost. As illustrated in the proof of concept physical implementation, the design is expected to fit either discrete systems with separate chips for the imager and digital processing or systems integrating the complete design in an ASIC. The accomplishments achieved during the course of this thesis have contributed to the publication of four papers with the most recent results summarized in a fifth paper awaiting review. The findings of the extensive literature survey were presented in a paper that appeared in the 2008 IJ ITK [53]. The novel watermarking algorithm was presented in the IEEE ICECS 2008 conference along with our proposed RNG design technique [54], [55]. Hardware synthesis and experimental results were described in a paper on the application of the proposed system to publicly spread surveillance cameras. The paper was submitted to the IEEE Transactions on Information Forensics and Security.
5.2 Issues that still need attention and future work
To allow a more in depth view of the proposed system it is important to achieve a more reliable implementation of the DCT and IDCT hardware modules. Currently, the implemented modules have only limited accuracy. The result of this limited accuracy is a significant distortion of the image due only to the DCT and IDCT modules operation. An effort has been made to achieve better performance, however to this point with no 53
54
satisfactory results. Because hardware implementation of the compression modules has only been approached as a subsidiary assignment (these are readily available for purchase and hold no novelty) in this thesis, it was decided to make use of the imperfect modules in order to get basic initial results. Presentable results were obtained by applying input images with reduced quality. Hence it is still possible to demonstrate the functionality of the physically implemented system. After the DCT and IDCT modules will have been improved, further testing can be performed. In particular it is recommended that the detection performance be evaluated under real hardware conditions. Further analysis should consider the effects of data transmission over the communication channel, wired or wireless, as well as examine multiple quantization tables and levels. As a semi-fragile watermark technique that addresses mainly tamper detection, robustness against potential attacks is mainly based on mathematical analysis of the nature of the algorithm. While this analysis is a significant indication of the algorithms robustness, it may be useful to custom design a test bench with known attacks on semifragile watermarking for further validation.
5.3 Possible future directions for development
One of the important features of the proposed technology is its compatibility to a wide range of implementation platform. Many real time applications use digital signal processing (DSP) dedicated processors for the implementation of digital processing algorithms. These processors are microcontrollers with powerful arithmetic logic units 54
55
(ALU) specific for speeding signal processing related computations. It is expected that the proposed watermarking system be accommodated in a DSP based platform while allowing it to maintain performance. Open source modules are available for both still and video DCT based compression standards, making the main challenge be the integration of the algorithm on the DSP platform along with the other existing components of the compression module. The proposed algorithm has been designed to be compatible with DCT based compression standards. At the prototype stage work was concentrated on still imaging compression. However, it has also been determined that the algorithm is compatible with DCT based video compression standards. Future work will include the implementation of an imaging system employing DCT based video compression, e.g. MJPEG, MPEG-x or h.26x. A video standard based system will also provide a chance to introduce temporal security features in the watermark generation unit.
55
56
References
[1] [2] [3] [4] [5]
[6]
[7] [8] [9] [10]
[11]
[12]
[13] [14]
[15] [16] [17] [18] [19] [20] [21] [22] [23] [24]
V. M. Potdar, S. Han, E. Chang, "A survey of digital image watermarking techniques", 3rd IEEE International Conference on Industrial Informatics (INDIN '05), Aug. 2005, pp. 709- 716 Yadid-Pecht and R. Etienne-Cummings, " CMOS imagers: from phototransduction to image processing", Kluwer Academic Publishers G. R. Nelson, G. A. Jullien and O. Yadid-Pecht, "CMOS image sensor with watermarking capabilities", in Proc. IEEE Int. Symp. on Circuits and Systems (ISCAS '05), vol. 5, Kobe, Japan, May 2005, pp. 5326-5329 ISO/IEC JTC1/SC2/WG10 Digital Compression and Coding of Continuous-Tone Still Images Draft, International Standard 10918-1, Jan. 10, 1992. F. Arnault, T. Berger and A. Necer, "A new class of stream ciphers combining LFSR and FCSR architectures," In Advances in Cryptology - INDOCRYPT 2002, number 2551 in Lecture Notes in Computer Science, pp 22-33, Springer-Verlag, 2002. D. Gollmann, "Pseudo-random properties of cascade connections of clock-controlled shift registers," in Advances in Cryptology: Proceedings of Eurocrypt 84, Lecture Notes in Computer Science, T. Beth, N. Cot, and I. Ingemarsson, Eds. Berlin: Springer-Verlag, vol. 209, pp. 93-98, 1985. F. Arnault and T. P. Berger, "Design and properties of a new pseudorandom generator based on a filtered FCSR automaton," IEEE Trans. Computers, vol. 54, no.11, pp. 1374-1383, Nov. 2005. S. P. Mohanty, "Digital Watermarking: A Tutorial Review", URL: http://www.csee.usf.edu/~smohanty/research/Reports/WMSurvey1999Mohanty.pdf F. Mintzer, G. Braudaway, and M. Yeung, Effective and ineffective digital watermarks, in Proc. IEEE Int. Conf. Image Process., vol. 3, 1997, pp. 912. C. T. Li, D. C. Lou and T. H. Chen, Image authenticity and integrity verification via content-based watermarks and a public key cryptosystem. Proc. IEEE Int. Conf. on Image Processing, Vancouver, Canada Sep.2000.vol. III, pp. 694-697. S. P. Mohanty, et al., "A DCT Domain Visible Watermarking Technique for Images", Proc. of the IEEE International Conference on Multimedia and Expo, July 30- August 2, 2000, Hilton New York & Towers, New York City, NY, USA. M. J. Tsai and H. Y. Wavelet Transform Based Digital Watermarking for Image Authentication, IEEE Proceedings of the Fourth Annual ACIS International Conference on Computer and Information Science,0-76952296-3/05,2005. P. Meerwald S. Pereira, Attacks applications and evaluation of known watermarking algorithms with Checkmark. SPIE Electron. Imaging. v4675 F. A. P. Petitcolas, R. J. Anderson, and M. G. Kuhn, Attacks on copyright marking systems, Information Hiding: 2nd Int. Workshop, D.Aucsmith, Ed. , ser. Lecture Notes in Computer Science Berlin, Germany: SpringerVerlag, vol. 1525, pp. 218-238, 1998 K. Tanaka, Y. Nakamura and K. Matsui, "Embedding Secret Information into a Dithered Multi-level Image" IEEE Military Communications Conference 1990 pp. 0216-0220 F. Petitcolas, R. J. Anderson and M. G. Kuhn, "Information Hiding - A Survey" Proc. IEEE 87(7) Jul. 1999 pp. 1062-1078 R. G. van Schyndel, A. Z. Tirkel and C. F. Osborne, "A digital watermark," in Proc. IEEE Int. Conf. Image Processing, Austin, TX, 1994, vol. 2, pp. 8690. R. B. Wolfgang and E. J. Delp, "A watermark for digital images," in Proc. IEEE Int. Conf. Images Processing, Lausanne, Switzerland, 1996, pp. 219222,. C. T. Li, "Digital fragile watermarking scheme for authentication of JPEG images," in IEE Proc. Vis. Image Signal Processing, 2004, vol. 151, no. 6, pp. 460-466. M. Holliman and N. Memon, Counterfeiting attacks on oblivious block-wise independent invisible watermarking schemes, in IEEE Trans. Image Process., vol. 9, no. 3, pp. 432411, Nov. 2000. M. Barni, F. Bartolini and A. Piva, "Improved Wavelet-Based Watermarking Through Pixel-Wise Masking," IEEE Trans. Image Proc., vol. 10, no. 5, pp. 783-791, May, 2001. P. Meerwald and S. Pereira, "Attacks, applications and evaluation of known watermarking algorithms with checkmark," in Proc. of SPIE, Electronic Imaging, Security and Watermarking of Multimedia Contents IV, 2002. Z. Wang, A. C. Bovik and L. Lu, Why is image quality assessment so difficult?, in Proc. IEEE Int. Conf. Acoustics, Speech and Sig. Proc., 2002, vol. 4, pp. 3313-3316. T. H. Tsai, C. Y. Lu, "Watermark embedding and extracting method and embedding hardware structure used in image compression system", US Patent 6 993 151, 2006.
56
57
[25] F. A. P. Petitcolas, R. J. Anderson and M. G. Kuhn, Attacks on copyright marking systems, in Information
[26]
[27] [28] [29] [30] [31] [32] [33] [34] [35] [36]
[37] [38] [39]
[40] [41] [42] [43] [44] [45] [46] [47]
[48] [49] [50]
Hiding: 2nd Int. Workshop, D.Aucsmith, Ed. , ser. Lecture Notes in Computer Science, Berlin, Germany: Springer-Verlag, 1998, vol. 1525, pp. 218-238. S. P. Mohanty, N. Ranganathan and R. K. Namballa, "VLSI implementation of invisible digital watermarking algorithms towards the developement of a secure JPEG encoder," in Proc. IEEE Workshop Signal Processing Systems, 2003, pp. 183-188. M. Kutter and F. A. P. Petitcolas, "A fair benchmark for image watermarking systems," in Proc. 11th Int. Symp. Electron. Imaging, San Jose, CA: IS&T and SPIE, 1999, vol. 3657. P.S.L.M. Barreto, H.Y. Kim and V. Rijmen, "Toward secure publickey blockwise fragile authentication watermarking," in IEE Proc. Vision Image Signal Process., 2002, vol. 148, no. 2, pp. 57-62. P.W. Wong and N. Memon, "Secret and public key authentication watermarking schemes that resist vector quantization attack," in Proc. SPIE Int. Soc. Opt. Eng., 2000, vol. 3971, pp. 417-427. J. Fridrich, M. Goljan and N. Memon, "Further attack on Yeung-Mintzer watermarking scheme," in Proc. SPIE Int. Soc. Opt. Eng., 2000, vol. 3971, pp. 428-437. G. Doerr, "Security Issue and Collusion Attacks in Video Watermarking," Ph.D. dissertation, Universite de Nice Sophia-Antipolis, France, 2005. E. Kougianos, S. P. Mohanty, R. N. Mahapatra, Hardware assisted watermarking for multimedia, Elsevier IJ Comp. and Elect. Eng., vol. 35, no. 2, pp. 339-358, 2009. A. Garimella, M. V. V. Satyanarayana, P. S. Murugesh, U. C. Niranjan, ASIC for Digital Color Image Watermarking, in Proc. 11th IEEE Dig. Sig. Proc. Workshop, 2004, pp. 292295. S. P. Mohanty, E. Kougianos, N. Ranganathan, VLSI architecture and chip for combined invisible robust and fragile watermarking, IET Comp. & Dig. Techniques (CDT) 1 (5) (2007) 600611. T. H. Tsai, C. Y. Lu, A Systems Level Design for Embedded Watermark Technique using DSC Systems, in Proc. IEEE Int. Workshop on Intelligent Sig. Proc. and Comm. Sys. 2001. S. P. Mohanty, N. Ranganathan, K. Balakrishnan, A Dual Voltage-Frequency VLSI Chip for Image Watermarking in DCT Domain, IEEE Transactions on Circuits and Systems II (TCAS-II), 53 (5) (2006) 394 398. S. F. Hsiao, Y. C. Tai, K. H. Chang, VLSI Design of an Efcient Embedded Zerotree Wavelet Coder with Function of Digital Watermarking, IEEE Trans. Consumer Electron. 46 (3) (2000) 628636. Y. H. Seo, D. W. Kim, Real-Time Blind Watermarking Algorithm and its Hardware Implementation for Motion JPEG2000 Image Codec, in Proc. 1st Workshop on Embedded Sys. for Real-Time Multimedia, 2003, pp. 8893. L. D. Strycker, P. Termont, J. Vandewege, J. Haitsma, A. Kalker, M. Maes, G. Depovere, "An Implementation of a Real-time Digital Watermarking Process for Broadcast Monitoring on a Trimedia VLIW Processor," in Proc. IEE Int. Conf. Image Proc. and Its Applications (Vol. 2), 1999, pp. 775779. M. Maes, T. Kalker, J. P. M. G. Linnartz, J. Talstra, G. F. G. Depovere, J. Haitsma, Digital Watamarking for DVD Video Copyright Protection, IEEE Sig. Proc. Mag. 17 (5) (2000) 4757. T. H. Tsai, C. Y. Wu, "An Implementation of Congurable Digital Watermarking Systems in MPEG Video Encoder," in Proc. IEEE Int. Conf. Consumer Electron., 2003, pp. 216217. A. Brunton, J. Zhao, "Real-time Video Watermarking on Programmable Graphics Hardware," in Proc. Canadian Conf. on Elec. and Comp. Eng., 2005, pp. 13121315. N. J. Mathai, A. Sheikholeslami, D. Kundur, "VLSI Implementation of a Real-Time Video Watermark Embedder and Detector," in Proc. IEEE Int. Symp. Circ. and Sys. (Vol. 2), 2003, pp. 772775. S. Vural, H. Tomii, H. Yamauchi, "Video Watermarking For Digital Cinema Contents," in Proc. 13th European Sig. Proc. Conf., 2005, pp. 303304. G. Petitjean, J. L. Dugelay, S. Gabriele, C. Rey, J. Nicolai, "Towards Real-time Video Watermarking for SystemsOn-Chip," in Proc. IEEE Int. Conf. Multimedia and Expo (Vol. 1), 2002, pp. 597600. G. Hawkes. (2008, Apr.). Video Compression Using DCT. Xilinx, Inc. [online]. Available: http://www.xilinx.com/support/documentation/topicaudiovideoimageprocess_compression.htm S. Anand, G. V. Ramanan, Periodicity, complementarity and complexity of 2-adic FCSR combiner generators In Proceedings of the ACM Symposium on Information, Computer and Communications Security, ASIACCS 06, Taipei, Taiwan, 2006 Handbook of Image and Video Processing By Alan Conrad Bovik Published by Academic Press, 2005 B. Schneier, "Applied Cryptography. 2nd Ed," New York: Wiley, 1996. X. Li, Y. Shoshan, A. Fish, G. A. Jullien and O. Yadid-Pecht, A Simplified Approach for Designing Secure Random Number Generators in HW, in proc. IEEE Int. Conf. on Electronics, Circuits and Systems, Malta, Sept 2008.
57
58
[51] L. V. Agostini, I. S. Silva and S. Bampi (2006, February). Multiplierless and fully pipelined JPEG compression
[52] [53] [54] [55]
soft IP targeting FPGAs. Microprocessors and Microsystems [online]. 31(8). pp. 487-497. Available: http://dx.doi.org/10.1016/j.micpro.2006.02.002 S. Hamami, L. Fleshel and O. Yadid-Pecht, CMOS Image Sensor Employing 3.3.V 12 bit 6.3 MS/s Pipelines ADC, in Sensors and Actuators, A: Physical, vol. 134, no. 1, pp. 119-125, 2007. Y. Shoshan, A. Fish, X. Li, G. A. Jullien, and O. Yadid-Pecht, "VLSI Watermark Implemetations and Applications," in IJ Information and Knowledge Technologies, Vol.2, 2008. Y. Shoshan, A. Fish, G. A. Jullien and O. Yadid-Pecht, Hardware Implementation of a DCT Watermark for CMOS Image Sensors, in Proc. IEEE Int. Conf. Elec., Circ. and Syst., Malta, 2008, pp. 368-371. X. Li, Y. Shoshan, A. Fish, G. A. Jullien and O. Yadid-Pecht, A simplified approach for designing secure Random Number Generators in HW, in Proc. IEEE Int. Conf. Elec., Circ and Syst., Malta, 2008, pp. 372-375.
58
59
APPENDIX A: A.1. Simulation Testbench and peripherals A.1.1. Simulation Envelope

NEW_HW_Sim.ma
MATLAB CODE
% This code simulates the proposed watermarking system nc = 2; %number of coefficients marked I = double(imread('testimage.bmp')); imshow(uint8(I)); q = 4; % quantization factor % Step 2 load WM_data; %load saved wm data sequence [J H] = embed_hardware(I,A,q,nc); I2 = dct2im(J,q); figure('Name',['WM Image'],'NumberTitle','off') %watermarked image imshow(uint8(I2)); J2 = im2dct(I,q); I21 = dct2im(J,q); figure('Name',['Compressed Image'],'NumberTitle','off') %original after % decompression imshow(uint8(I21)); %% simulate a cover-up attack I3 = copyblock(I2,[9 26],[7 26],[2 4]); I3 = copyblock(I3,[27 16],[27 4],[6 3]); I3 = copyblock(I3,[27 13],[27 7],[6 3]); I3 = copyblock(I3,[27 18],[27 10],[6 2]); I3 = copyblock(I3,[27 13],[27 12],[6 1]); I3 = copyblock(I3,[27 13],[27 1],[6 3]); figure('Name',['Tampered Image'],'NumberTitle','off') imshow(uint8(I3)); [C LOC] = detect_hardware(I3,A,q,H,nc); LOC = reshape(LOC,32,32); se = strel('diamond',1); % % LOC = imclose(LOC,se); %optional closing stage to enhance detection figure('Name',['Detection Zones'],'NumberTitle','off') imshow(LOC);
59
60
PSNR(I/255,double(uint8(I2))/255); CompRatio = mean(sum(J ~= 0))/64 end copyblock.m % this function simulates the cover up attack function Iout = copyblock(Iin,Bs,Bt,Bsize) % calculate coordinates mhs = (Bs(1)-1)*8 + 1; mls = (Bs(1)+Bsize(1)-1)*8; nls = (Bs(2)-1)*8 + 1; nrs = (Bs(2)+Bsize(2)-1)*8; mht = (Bt(1)-1)*8 + 1; mlt = (Bt(1)+Bsize(1)-1)*8; nlt = (Bt(2)-1)*8 + 1; nrt = (Bt(2)+Bsize(2)-1)*8; Iout = Iin; Iout(mht:mlt,nlt:nrt) = Iin(mhs:mls,nls:nrs); PSNR.m function PSNR(A,B) % % % % % % if A == B error('Images are identical: PSNR has infinite value') end max2_A = max(max(A)); PSNR = 20 * log10(b/rms) where b is the largest possible value of the signal (typically 255 or 1), and rms is the root mean square difference between two images. The PSNR is given in decibel units (dB), which measure the ratio of the peak signal and the difference between two images.
60
61
max2_B = max(max(B)); min2_A = min(min(A)); min2_B = min(min(B)); if max2_A > 1 | max2_B > 1 | min2_A < 0 | min2_B < 0 error('input matrices must have values in the interval [0,1]') end err = A - B; decibels = 20*log10(1/(sqrt(mean(mean(err.^2))))); disp(sprintf('PSNR = +%5.2f dB',decibels))
A.2. Compression/Decompression
im2dct.m function y=im2dct(x ,quality) % im2dct receives an image x and a quality factor quality as inputs % the output is the 8x8 DCT transformed image % the coefficients are reorganized in the zigzag order and quantized % with a pre-set quantization table error(nargchk(1,2,nargin)); if nargin<2; quality=1; end % Compute DCTs of 8x8 blocks and quantize the coefficients.
% quantization table Q=[1 1 1 1 1 1 2 1 1 1 1 1 2 3 1 1 1 1 3 3 4 1 1 1 1 4 3 4 1 1 2 2 4 4 5 2 3 3 4 6 5 1 3 3 3 4 6 1 1 3 3 3 3 1 1 1
61
62
1] * quality;
order = [1 9 2 3 10 17 25 18 11 4 5 12 19 26 33 41 34 27 20 13 6 7 14 21 28 35 42 49 57 50 43 36 29 22 15 8 16 23 30 37 44 51 58 59 52 45 38 31 24 32 39 46 53 60 61 54 47 40 48 55 62 63 56 64]; fun = @dct2; J = blkproc(x,[8 8],fun); %perform 8x8 block dct J = blkproc(J, [8 8],'round(x./P1)',Q); %quantize the result %reorder the coefficients with the zigzag pattern J = im2col(J, [8 8], 'distinct'); %organize 8x8 blocks in 1x64 columns J = J(order, : ); y = J; dct2im.m function y=dct2im(J ,quality) % dct2im receives the transformed and quantized image J and % performs inverse quantization and transform to output the % reconstructed image error(nargchk(1,2,nargin)); if nargin<2; quality=1; end % quantization table Q=[1 1 1 1 1 1 1 1 1 1 1 2 2 3 5 5 1 1 1 1 3 3 4 5 1 1 1 1 4 3 4 5 1 1 2 2 4 4 5 1 2 3 3 4 6 5 1 1 3 3 3 4 6 1 1 1 3 3 3 3 1 1 1 1] * quality; %reorder column elements in zigzag format
62
63
inv_order = [1 3 4 10 11 21 22 36 2 5 9 12 20 23 35
37 6 8 13 19 24 34
38 49 7 14 18 25 33 39 48 50 15 17 26 32 40 47 51 58 16 27 31 41 46 52 57 59 28 30 42 45 53 56 60 63 29 43 44 54 55 61 62 64];
J = J(inv_order, : ); % zig-zag format
% reorder column elements back from
J = col2im(J,[8 8],[256 256],'distinct'); % arrange 64 % columns into 8x8 blocks J = blkproc(J, [8 8],'x.*P1',Q); %de-quantize the % coefficients fun = @idct2; I = blkproc(J,[8 8],fun); %inverse transform y = I;
A.3. Algorithm Implementation A.3.1. Embedding

embed_hardware.m function [y H] = embed_hardware(I,A,q,nc) J = im2dct(I,q); %dct, quantization, zigzag NumOfC = nc; fun = @findh; Neg = J < 0; % perform logical and between adjacent blocks Jfun = J; Jfun(:,2:end) = and(J(:,1:end-1),J(:,2:end)); Jfun(:,1) = 1; % find the location of two higest indexes of non-zero cells H = blkproc(Jfun,[64 1],fun); H = H - 1; S = zeros(NumOfC,60/NumOfC); M = zeros(NumOfC,60/NumOfC);
63
64
P = zeros(1,60); Sm = 0:NumOfC:60-NumOfC; %embedding sequence for m = 2:32 b = m; for n = 2:32 b = m + (n - 1)*32; P(1:end) = 0; P(1:H(NumOfC,b)) = Jfun(1:H(NumOfC,b),b).*A(1:H(NumOfC,b),b); for s = 1:NumOfC S(s,:) = P(Sm+s); if CheckLSB(J(H(s,b) + 1,b)) ~= mod(sum(S(s,:)),2) J(H(s,b) + 1,b) = toggle2(J(H(s,b) + 1,b)); end % LSB check end % s loop end end y = J; CheckLSB.m function y=CheckLSB(x) C = bitget(uint16(abs(x)),1:16); y = C(1); toggle2.m function j = toggle2(x) if mod(x,2) x = x - 1; else x = x + 1; end if x == 0 x = 2;
64
65
end j = x; findh.m % this function is responsible for finding the two highest indexes of % non-zero cells in the block function h = findh(J) n = 2; h = 0:n-1; h = n - h'; count = 1; for i = 1:64 if J(65-i) h(count) = 65-i; count = count + 1; if count == (n + 1) return end end end
A.3.2. Detection
Detect_hardware.m function [C LOC] = detect_hardware(I,A,q,H,nc) J = im2dct(I,q); % set the number of marked coefficients NumOfC = nc; fun = @findh; Jfun = J; Jfun(:,2:end) = and(J(:,1:end-1),J(:,2:end)); Jfun(:,1) = 1; H = blkproc(Jfun,[64 1],fun); H = H - 1;
65
66
HD = zeros(size(H)); S = zeros(NumOfC,60/NumOfC); M = zeros(NumOfC,60/NumOfC); P = zeros(1,60); Sm = 0:NumOfC:60-NumOfC; for m = 2:32 b = m; for n = 2:32 b = m + (n - 1)*32; P(1:end) = 0; P(1:H(NumOfC,b)) = Jfun(1:H(NumOfC,b),b).*A(1:H(NumOfC,b),b); for s = 1:NumOfC S(s,:) = P(Sm+s); HD(s,b) = CheckLSB(J(H(s,b)+1,b)) ~= mod(sum(S(s,:)),2); end % s loop end end C = 1 - sum(sum(HD))/4096; LOC = sum(HD)>0;
66
67
APPENDIX B: VERILOG CODE B.1. Top Level and Peripheral Modules

little_top.v //--------------------------------------------------------------------// Description : This is the top level module for the integrated system //--------------------------------------------------------------------`timescale 1ns / 100ps module littel (CLK, RST, LED, COL, ROW, CPU_RD, CPU_WR, CPU_ALE, CPU_AD, CPU_A, TARGET_OUT, TARGET_CLK, FADC_CLK, FADC_A, FADC_ADV, FADC_AOE, FADC_AOV, FADC_BDV, FADC_BOE, FADC_BOV, SADC_CLK, SADC_AOTR, SADC_APD, SADC_ARD, SADC_ARDY, SADC_BOTR, SADC_BPD, SADC_BRD, SADC_BRDY, SADC_DOUT, SDAC_RST, SRAM_A, SRAM_CS, SRAM_D, SRAM_RD, SRAM_WR, SRAM_ZZ, GEN1, GEN2, LVDS_PD, PI_EN, PI_IN, PI_CLK, CntExt_, SHS, SHR, Rset, RowDecColDec, PixelRst, RRst, Cset, ExtBias, ExtCtr, aps_ext_in, ExtCtrPipe, D, fg_pen, fg_len, fg_fen ,fg_dout);
// globals input CLK, RST; output LED; // debug output [7:0] COL, ROW; // cpu interface input CPU_RD; input CPU_WR; input CPU_ALE; inout [7:0] CPU_AD; input [7:0] CPU_A; // target interfce input [11:0] TARGET_OUT; output TARGET_CLK; output [15:0] aps_ext_in; output SHS;
67
68
output RRst; output SHR; output RowDecColDec; output Rset; output PixelRst; output Cset; output CntExt_; output ExtBias; output ExtCtr; output ExtCtrPipe; output [38:23] D; // fast AD convertor output FADC_CLK; input [12:1] FADC_A; input FADC_ADV, FADC_AOV; output FADC_AOE; input FADC_BDV,FADC_BOV; output FADC_BOE; // slow AD convertor output SADC_CLK; output SADC_AOTR, SADC_APD, SADC_ARD, SADC_ARDY; output SADC_BOTR, SADC_BPD, SADC_BRD, SADC_BRDY; output [17:0] SADC_DOUT; output SDAC_RST; // ZBT sram intrerface output [19:0] SRAM_A; output [1:0] SRAM_CS; inout [17:0] SRAM_D; output SRAM_RD, SRAM_WR, SRAM_ZZ; // waveform generator output [15:0] GEN1, GEN2; // frame grabber (LVDS camera link interface) output LVDS_PD;
68
69
output fg_pen; output fg_len; output fg_fen; output [11:0] fg_dout; // PI interface output PI_EN; input [14:0] PI_IN; input PI_CLK;
//top level buses wire cpu_zbt_rd_n; wire cpu_zbt_we_n; wire zbt_dout_v_n; wire clk_pulse; wire [20:0] cpu_add; wire [17:0] cpu_din; wire [1:0] select_zbt_user; wire [17:0] zbt_dout; wire [15:0] CPU_GEN1; wire [7:0] aps_time_ref; wire [7:0] aps_controls; wire pic_module_active; wire APS_TARGET_CLK; wire [7:0] aps_clk_counter; wire [11:0] user_data_fg; wire aps_pen; wire aps_len; wire aps_fen; wire [11:0] aps_dout; wire [7:0] aps_fra_dat; wire [7:0] wm_fra_out; /////////////////////////////////////begin////////////////////////////// //////// led led_inst
69
70
( .clk(CLK), .rst(RST), .led_out(LED) ); zbt_mux U2 ( .CLK(CLK), .cpu_add(cpu_add), .cpu_din(cpu_din), .cpu_rd_n(cpu_zbt_rd_n), .cpu_we_n(cpu_zbt_we_n), .rst(RST), .select_zbt_user(select_zbt_user), .sram1_ce1_n(SRAM_CS[0]), .sram2_ce1_n(SRAM_CS[1]), .sram_add(SRAM_A), .sram_data(SRAM_D), .sram_oe_n(SRAM_RD), .sram_wr_n(SRAM_WR), .sram_zz(SRAM_ZZ), .zbt_dout(zbt_dout), .zbt_dout_v_n(zbt_dout_v_n) );
cpu_int cpu_int ( .CLK(CLK), .aps_time_ref(aps_time_ref), .aps_controls(aps_controls), .pic_module_active(pic_module_active), .gen1value(CPU_GEN1), .gen2value(), .ale_n(CPU_ALE), .cpu_add(cpu_add),
70
71
.cpu_din(cpu_din), .cpu_zbt_rd_n(cpu_zbt_rd_n), .cpu_rd_n(CPU_RD), .cpu_zbt_we_n(cpu_zbt_we_n), .cpu_wr_n(CPU_WR), .port0(CPU_AD), .port2(CPU_A), .rst(RST), .select_zbt_user(select_zbt_user), .zbt_dout(zbt_dout), .zbt_dout_v_n(zbt_dout_v_n), .user_data_fg(user_data_fg) );
aps_int aps_int ( .rst(RST), .CLK(CLK), .zbt_dout(zbt_dout), .target_out(TARGET_OUT), .zbt_dout_v_n(zbt_dout_v_n), .pic_module_active(pic_module_active), .ExtBias(ExtBias), .ExtCtr(ExtCtr), .aps_ext_in(aps_ext_in), .aps_controls(aps_controls), .CntExt_(CntExt_), .PixelRst(PixelRst), .RRst(RRst), .Cset(Cset), .SHS(SHS), .SHR(SHR), .RowDecColDec(RowDecColDec), .Rset(Rset), .ExtCtrPipe(ExtCtrPipe), .fg_pen(aps_pen), .fg_len(aps_len),
71
72
.fg_fen(aps_fen), .fg_dout(aps_dout), .time_ref(aps_time_ref), .target_clk(APS_TARGET_CLK), .aps_clk_pulse(aps_clk_pulse), .aps_clk_counter(aps_clk_counter), .adc_dv(FADC_ADV), .user_data_fg(user_data_fg) ); wm_int wm_int ( .clk(CLK), .rst(!RST), .aps_clk(APS_TARGET_CLK), .aps_clk_pulse(aps_clk_pulse), .aps_clk_counter(aps_clk_counter), .aps_pen(aps_pen), .aps_len(aps_len), .aps_fen(aps_fen), .aps_dout(aps_dout), .fg_pen(fg_pen), .fg_len(fg_len), .fg_fen(fg_fen), .fg_dout(fg_dout) ); assign FADC_AOE = 1'b0; assign TARGET_CLK = aps_controls[7] ? aps_controls[2] : APS_TARGET_CLK; assign LVDS_PD = 1'b1; assign GEN1 = 16'h7FFF; assign GEN2 = 16'h7FFF; endmodule
72
73
cpu_int.v // Title : cpu_int // Description : This is the cpu interface for the TestBoard Altera // chip. It is responsible for communications with an external PC // It contains registers which can be written by the onboard CPU `timescale 1ns / 100ps module cpu_int ( zbt_dout_v_n ,zbt_dout ,user_data_fg ,select_zbt_user ,port0 ,port2 ,CLK ,cpu_wr_n ,ale_n ,cpu_din ,cpu_zbt_we_n ,cpu_rd_n ,cpu_add ,rst ,cpu_zbt_rd_n ,aps_time_ref ,pic_module_active ,aps_controls ,gen1value ,gen2value); `include "chip_def.v" input rst ; input CLK ; // cpu bus input cpu_rd_n ; input cpu_wr_n ; input ale_n ; inout [7:0] port0 ; input [7:0] port2 ; // zbt interface output [1:0] select_zbt_user ; input zbt_dout_v_n ; output [20:0] cpu_add ; input [17:0] zbt_dout ; output [17:0] cpu_din ; output cpu_zbt_we_n ; output cpu_zbt_rd_n ; // gen interface output [15:0] gen1value, gen2value;
73
74
//aps_int output pic_module_active; output [7:0] aps_controls; output [7:0] aps_time_ref; output [11:0] user_data_fg; // regs reg [15:0] ADD_l; reg [4:0] ADD; always @(negedge ale_n) ADD_l = {port2,port0}; // simple async //latch for address always @(posedge CLK) ADD = ADD_l[4:0]; reg [7:0] data_in_s, data_out; reg cpu_we_d, cpu_we_dd, cpu_we_ddd; reg [3:0] Cmd; reg [31:0] Add, Dat; reg [15:0] Gen1_reg, Gen2_reg; reg [7:0] Aps_time_ref, Aps_controls; reg [2:0] Mux_control; reg cpu_zbt_we_n, cpu_zbt_rd_n ; reg inst_go; reg [11:0] user_data_fg; wire altera_busy; ///////////////////////////////////////////////////////////////// // Chip registers ///////////////////////////////////////////////////////////////// always @(posedge CLK or negedge rst) begin if (~rst) begin data_in_s <= 0; data_out <= 0; cpu_we_d <= 1; cpu_we_dd <= 1; // sample latch
74
75
cpu_we_ddd <= 1; Cmd <= 0; Add <= 0; Dat <= 0; Mux_control <= 0; inst_go <= 0; Aps_time_ref <= 0; Aps_controls <= 0; Gen1_reg <= 0; Gen2_reg <= 0; user_data_fg <=0; end else begin // defaults inst_go <= 0; // sampled signals data_in_s <= port0; cpu_we_d <= cpu_wr_n; cpu_we_dd <= cpu_we_d; cpu_we_ddd <= cpu_we_dd;
if (~zbt_dout_v_n) begin Dat[17:0] <= zbt_dout; Dat[31:18] <= 0; end // write if ((cpu_we_ddd == 1) && (cpu_we_dd == 0)) // transition to low case (ADD) CMD_REG_ADD : begin Cmd <= data_in_s[3:0]; inst_go <= 1; end
75
76
ADD3_REG_ADD : Add[31:24] ADD2_REG_ADD : Add[23:16] ADD1_REG_ADD : Add[15:8] ADD0_REG_ADD : Add[7:0] DAT3_REG_ADD : Dat[31:24] DAT2_REG_ADD : Dat[23:16] DAT1_REG_ADD : Dat[15:8] DAT0_REG_ADD : Dat[7:0] MUXC_REG_ADD : Mux_control APS_RATE_ADD : Aps_time_ref APS_CONTROLS_ADD : Aps_controls endcase
<= data_in_s; <= data_in_s; <= data_in_s; <= data_in_s; <= data_in_s; <= data_in_s; <= data_in_s; <= data_in_s; <= data_in_s[2:0]; <= data_in_s; <= data_in_s;
// read mux // note : ADD is a synchronous signal sampling an async latch, but by // the time the cpu will get to read data_out it sould be stable case (ADD) TEST_REG_ADD : data_out <= 8'h5a; // RO
CMD_REG_ADD : data_out <= {4'b0,Cmd}; ADD3_REG_ADD : data_out <=Add[31:24]; ADD2_REG_ADD : data_out <=Add[23:16]; ADD1_REG_ADD : data_out <=Add[15:8]; ADD0_REG_ADD : data_out <=Add[7:0]; DAT3_REG_ADD : data_out <=Dat[31:24]; DAT2_REG_ADD : data_out <=Dat[23:16]; DAT1_REG_ADD : data_out <=Dat[15:8]; DAT0_REG_ADD : data_out <=Dat[7:0]; MUXC_REG_ADD : data_out <={5'h0,Mux_control}; APS_RATE_ADD : data_out <=Aps_time_ref; APS_CONTROLS_ADD : data_out <=Aps_controls; ALTERA_BUSY_ADD : data_out <={7'h0,altera_busy}; default endcase end end : data_out <= 0;
76
77
// CPU output tbuf. // NOTE : TBUF is open in every read cycle. assign port0 = (~cpu_rd_n) ? data_out : 8'hZZ; // connect reg outputs assign select_zbt_user = Mux_control[1:0]; assign aps_time_ref = Aps_time_ref; assign aps_controls = Aps_controls; //////////////////////////////////////////////////////////////// // Command State Machine ///////////////////////////////////////////////////////////////// reg inst_running_n; reg [1:0] stage; reg pic_module_active; always @(posedge CLK or negedge rst) begin if (~rst) begin inst_running_n <= 1; stage <= 0; cpu_zbt_we_n <= 1; cpu_zbt_rd_n <= 1; pic_module_active <= 0; end else begin // signals default values cpu_zbt_we_n <= 1; cpu_zbt_rd_n <= 1; if (inst_go) inst_running_n <= 0; // if active if (~inst_running_n) case (Cmd[3:0])
77
78
CMD_SRAM_READ_REG : // ZBT read case (stage) 0: begin cpu_zbt_rd_n <= 0; stage <= 1; end 1 : begin if (zbt_dout_v_n==0) begin inst_running_n <= 1; stage <= 0; end end endcase CMD_SRAM_WRITE_REG : // ZBT write begin cpu_zbt_we_n <= 0; inst_running_n <= 1; end
CMD_PIC_GO : begin pic_module_active <= 1; inst_running_n <= 1; end CMD_PIC_STOP : begin pic_module_active <= 0; inst_running_n <= 1; end default : // return to inactive state
inst_running_n <= 1;
78
79
endcase end end // ZBT assign cpu_add = Add[20:0]; assign cpu_din = Dat[17:0]; assign altera_busy = ~inst_running_n; endmodule chip_def.v // altera def file // reg address parameter TEST_REG_ADD parameter CMD_REG_ADD parameter MUXC_REG_ADD parameter ADD3_REG_ADD parameter ADD2_REG_ADD parameter ADD1_REG_ADD parameter ADD0_REG_ADD parameter DAT3_REG_ADD parameter DAT2_REG_ADD parameter DAT1_REG_ADD parameter DAT0_REG_ADD parameter APS_RATE_ADD = = = = = = = = = = = = 5'h0 ; 5'h1 ; 5'h2 ; 5'h3 ; 5'h4 ; 5'h5 ; 5'h6 ; 5'h7 ; 5'h8 ; 5'h9 ; 5'ha ; 5'h14; 5'h15; 5'h16;
parameter APS_CONTROLS_ADD = parameter ALTERA_BUSY_ADD = // command coding parameter CMD_SRAM_READ_REG parameter CMD_SRAM_WRITE_REG parameter CMD_PIC_GO parameter CMD_PIC_STOP
= 4'h8 ; = 4'h9 ; = 4'hb ; = 4'hc ;
79
80
// SRAM SOURCE MUX states parameter SRAM_MUX_CPU = 2'b00 ;
B.2. CMOS Imager Control Logic and Interface

aps_int.v // Description: This module functions as a top level for the imager // control logic `timescale 1ns / 100ps module aps_int (CLK, rst, pic_module_active, aps_ext_in, CntExt_, SHS, SHR, Rset, RowDecColDec, fg_pen, fg_len, fg_fen, fg_dout, zbt_dout, target_out, user_data_fg, zbt_dout_v_n, time_ref ,target_clk ,aps_clk_pulse ,aps_clk_counter ,ExtCtrPipe ,PixelRst ,RRst ,Cset ,ExtBias ,ExtCtr ,aps_controls, adc_dv); //globlas input CLK; input rst; // cpu_int interface input [7:0] aps_controls; input [7:0] time_ref; input pic_module_active; //zbt_mux interface input [17:0] zbt_dout; input zbt_dout_v_n; //aps interface input [11:0] target_out; output [15:0] aps_ext_in; output ExtCtrPipe; output target_clk; output aps_clk_pulse; output [7:0] aps_clk_counter; output SHS; output RRst;
80
81
output SHR; output RowDecColDec; output Rset; output PixelRst; output Cset; output CntExt_; output ExtBias; output ExtCtr; //fadc int input adc_dv; // frame grabber output fg_pen; output fg_len; output fg_fen; output [11:0] fg_dout; wire [15:0] cdsIn_level; wire aps_clk_pulse; wire APS_RowDecColDec; wire APS_RRst; wire APS_SHS; wire APS_ExtCtrPipe; wire APS_ExtCtr_out; wire APS_ExtCtr_in; wire target_clk; wire [7:0] aps_clk_counter; wire [11:0] fg_dout_board; clk_gen U1 ( .CLK(CLK), .rst(rst), .time_ref(time_ref), .target_clk(target_clk), .clk_pulse(aps_clk_pulse), .clk_counter(aps_clk_counter)
81
82
); pic_extract U4 ( .clk(CLK), .rst(rst), .module_active(pic_module_active), .aps_ext_in(aps_ext_in), .ext_in_en(CntExt_), .ext_ctr_en(APS_ExtCtr_in), .aps_ext_ctr(APS_ExtCtr_out), .aps_shs(APS_SHS), .aps_rst(PixelRst), .aps_shr(SHR), .aps_clk_pulse(aps_clk_pulse), .aps_rowdeccoldec(APS_RowDecColDec), .aps_rst_cnt(APS_RRst), .aps_clk(target_clk), .fg_pen(fg_pen), .fg_len(fg_len), .fg_fen(fg_fen), .fg_dout(fg_dout), .aps_out(target_out), .aps_clk_counter(aps_clk_counter) ); assign CntExt_ assign Cset assign ExtBias assign Rset assign ExtCtrPipe assign RowDecColDec assign SHS assign RRst assign ExtCtr = ~aps_controls[0]; = aps_controls[1]; = aps_controls[3]; = aps_controls[5]; = aps_controls[6] ? 1'b1 : APS_ExtCtrPipe; = ExtCtr ? 1'b1 : APS_RowDecColDec; = APS_SHS; = APS_RRst; = APS_ExtCtr_out;
assign APS_ExtCtr_in = aps_controls[4];
assign APS_ExtCtrPipe = 0; endmodule
82
83
pic_extract.v // this is the core for control signal generation for the CMOS image // sensor. The second structure handles reception of pixel data from // the imager and generation of synchronization signals to an external // frame grabber `timescale 1 ns / 100 ps module pic_extract (clk, rst, module_active, aps_ext_in, ext_in_en, ext_ctr_en, aps_shs, aps_rst, aps_shr, aps_clk_pulse, aps_rowdeccoldec, aps_rst_cnt, aps_ext_ctr, aps_clk, aps_clk_counter, fg_pen, fg_len, fg_fen, fg_dout, aps_out); /* input input clk; rst; Input/Output Declerations // main clock // main reset */
// controls input module_active; input aps_clk_pulse; input aps_clk;
// running while '1' // sync to APS clk
input [7:0] aps_clk_counter; // Output lines to the sensor output aps_shs; output aps_rst; output aps_shr; output aps_rowdeccoldec; output [15:0] aps_ext_in; input ext_in_en; output aps_rst_cnt; input [11:0] aps_out; output aps_ext_ctr; input ext_ctr_en; // shs signal to the sensor (Active High) // reset signal to the sensor (Active High) // shr signal to the sensor (Active High) // row - col enable // couter output // enables ext in output // APS Counter Reset (Active High) // enables cdsin control
// frame grabber output fg_pen; // pixelclk to output lvds
83
84
output fg_len; output fg_fen; output [11:0] fg_dout;
// lineclk to output lvds // frameclk to output lvds // fg data lines
//*************************** Wires and registers ******************// // output regs reg aps_shs, aps_shr, aps_rst, aps_rowdeccoldec, aps_rst_cnt; reg aps_ext_ctr; reg fg_len, fg_fen; wire fg_pen; reg [11:0] fg_dout; // intrenal regs reg [3:0] control_state; reg [9:0] delay_cnt; reg [1:0] wait_cnt; reg [7:0] col_cnt; reg [7:0] row_cnt; reg [3:0] ext_ctr_cnt; reg first_run, aquire; reg fg_len_int, fg_fen_int; reg [11:0] din_d; reg aps_clk_d; // internal wires wire aps_clk; //*************************** constants def ****************// parameter STATE_IDLE parameter STATE_IDLE_TO_SHS parameter STATE_SHS_HIGH parameter STATE_SHS_TO_RST parameter STATE_RST_HIGH parameter STATE_RST_TO_SHR parameter STATE_SHR_HIGH parameter STATE_SHR_TO_READ parameter STATE_READ parameter STATE_WAIT = 4'h0; = 4'h1; = 4'h2; = 4'h3; = 4'h4; = 4'h5; = 4'h6; = 4'h7; = 4'h8; = 4'h9;
84
85
// Note : 0 is one clk delay clk is 40MHz //(that is 25ns for etch clock cycle) // (add 1 ) parameter IDLE_TO_SHS_TIME parameter SHS_HIGH_TIME parameter SHS_TO_RST_TIME parameter RST_HIGH_TIME parameter RST_TO_SHR_TIME parameter SHR_HIGH_TIME parameter SHR_TO_READ_TIME parameter PIPE_DELAY reg [PIPE_DELAY:0] fg_fen_d; reg [PIPE_DELAY:0] fg_len_d; = 7; //200ns = 39; //1us = 7; //200ns = 19; //0.5us = 7; //200ns = 39; //1us = 7; //200ns =7;
always @(posedge clk or negedge rst) begin if (~rst) begin delay_cnt <= 0; aps_shs aps_shr aps_rst <= 1'b0; <= 1'b0; <= 1'b0;
// main_clock
aps_rst_cnt <= 0; fg_len_int <= 0; fg_fen_int <= 0; control_state <= STATE_IDLE; col_cnt <= 0; row_cnt <= 0; aps_rowdeccoldec <= 0; aps_rst_cnt <= 0; first_run <= 1; ext_ctr_cnt <= 0; aps_ext_ctr <= 0; wait_cnt <= 0; end else begin // rising_edge(clk)
85
86
case (control_state) STATE_IDLE : begin // reset aps_shs aps_shr aps_rst <= 1'b0; <= 1'b0; <= 1'b0;
aps_rst_cnt <= 0; aps_rowdeccoldec <= 0; col_cnt <= 0; // changed to provide samller ROI row_cnt <= 0; // changed to provide samller ROI delay_cnt <= 0; first_run <= 1; fg_len_int fg_fen_int wait_cnt <= 0; if (module_active) control_state <= STATE_IDLE_TO_SHS; end STATE_IDLE_TO_SHS : begin aps_rowdeccoldec <= 1; if (delay_cnt == IDLE_TO_SHS_TIME) begin control_state <= STATE_SHS_HIGH; delay_cnt <= 0; end else begin delay_cnt <= delay_cnt + 1'b1; end end STATE_SHS_HIGH : begin aps_shs <= 1'b1; if (first_run) aps_rst_cnt <= 1; if (delay_cnt == SHS_HIGH_TIME) begin control_state <= STATE_SHS_TO_RST; <= 0; <= 0;
86
87
delay_cnt <= 0; end else begin delay_cnt <= delay_cnt + 1'b1; end end STATE_SHS_TO_RST : begin aps_shs <= 1'b0; aps_rst_cnt <= 0; if (delay_cnt == SHS_TO_RST_TIME) begin control_state <= STATE_RST_HIGH; delay_cnt <= 0; end else begin delay_cnt <= delay_cnt + 1'b1; end end
STATE_RST_HIGH : begin aps_rst <= 1'b1; if (delay_cnt == RST_HIGH_TIME) begin control_state <= STATE_RST_TO_SHR; delay_cnt <= 0; end else begin delay_cnt <= delay_cnt + 1'b1; end end STATE_RST_TO_SHR : begin aps_rst <= 1'b0; fg_fen_int <= 1; if (delay_cnt == RST_TO_SHR_TIME) begin control_state <= STATE_SHR_HIGH; delay_cnt <= 0; end else begin delay_cnt <= delay_cnt + 1'b1; end end
87
88
STATE_SHR_HIGH : begin aps_shr <= 1'b1; if (delay_cnt == SHR_HIGH_TIME) begin control_state <= STATE_SHR_TO_READ; delay_cnt <= 0; end else begin delay_cnt <= delay_cnt + 1'b1; end end STATE_SHR_TO_READ : begin aps_shr <= 1'b0; if (delay_cnt == SHR_TO_READ_TIME) begin control_state <= STATE_READ; aps_rowdeccoldec <= 0; fg_len_int <= 1; delay_cnt <= 4; end else begin delay_cnt <= delay_cnt + 1'b1; end end STATE_READ : begin delay_cnt <= delay_cnt + 1'b1; if (delay_cnt == 3) col_cnt <= col_cnt + 1'b1; if (aps_clk_pulse) begin // clk en delay_cnt <= 0; ext_ctr_cnt <= ext_ctr_cnt + 1'b1; if (col_cnt == 255) begin col_cnt <= col_cnt + 1'b1; if (row_cnt == 255) begin fg_fen_int <= 1'b0; row_cnt <= 0; control_state <= STATE_WAIT;
88
89
end else begin fg_fen_int <= fg_fen_int; row_cnt <= row_cnt + 1'b1; control_state <= STATE_IDLE_TO_SHS; end fg_len_int first_run <= 0; end end if (ext_ctr_cnt > 0) ext_ctr_cnt <= ext_ctr_cnt + 1'b1; if (ext_ctr_cnt == 2) aps_ext_ctr <= 1'b1; if (ext_ctr_cnt == 8) begin aps_ext_ctr <= 1'b0; ext_ctr_cnt <= 4'b0; end if (~ext_ctr_en) aps_ext_ctr <= 1'b0; end STATE_WAIT : begin if (&wait_cnt) begin fg_fen_int <= 0; control_state <= STATE_IDLE_TO_SHS; end wait_cnt <= wait_cnt + 1'b1; end default : control_state <= STATE_IDLE; endcase // case // stop override <= 0;
89
90
if (~module_active) control_state <= STATE_IDLE; end // Else end // Always //Output delay and sync lines to the FG always @(posedge clk or negedge rst) begin if (~rst) begin fg_len_d <= 0; fg_fen_d <= 0; fg_len <= 0; fg_fen <= 0; din_d <= 0; fg_dout <= 0; end else if ((aps_clk_counter == 8'h9) & ~aps_clk) begin din_d <= aps_out; fg_dout <= din_d; fg_fen_d <= {fg_fen_d[PIPE_DELAY - 1 : 0], fg_fen_int}; fg_len_d <= {fg_len_d[PIPE_DELAY - 1 : 0], fg_len_int}; fg_fen fg_len end else begin din_d <= din_d; fg_dout <= fg_dout; fg_fen_d <= fg_fen_d; fg_len_d <= fg_len_d; fg_fen fg_len end end //note that fg_pen has an offset of ~2ns from the aps_clk <= fg_fen; <= fg_len; <= fg_fen_d[PIPE_DELAY]; <= fg_len_d[PIPE_DELAY]; // main_clock
90
91
assign fg_pen = fg_len & aps_clk_pulse; assign aps_ext_in = ext_in_en ? 16'h0 : {row_cnt, col_cnt}; endmodule clk_gen.v //generates target clock `timescale 1ps / 1ps module clk_gen (CLK ,rst ,time_ref ,target_clk ,clk_pulse, clk_counter); //globals input CLK; input rst; //cpu bus input [7:0] time_ref; //outputs output target_clk; output clk_pulse; output [7:0] clk_counter; reg target_clk; reg clk_pulse; reg [7:0] clk_counter; always @(posedge CLK or negedge rst) begin if (~rst) begin target_clk <= 0; clk_pulse <= 0; clk_counter <= 0; end else
91
92
begin //defaults clk_counter <= clk_counter + 1'b1; clk_pulse <= 0; if (clk_counter == time_ref) begin if (~target_clk) clk_pulse <= 1'b1; target_clk <= ~target_clk; clk_counter <= 0; end else target_clk <= target_clk; end end endmodule
B.3. JPEG Encoding and Watermark Embedding

`resetall `timescale 1ns / 100ps // this module is responsible for interfacing the encoder\embedder and // the imager controller // there are two pixel data buffer to allow reordering from row scan to // 8x8 blocks and vice versa. The module includes state machines that // synchronize the imager data output with the encoder\embedder modules module wm_int (clk, rst, aps_clk, aps_clk_pulse, aps_clk_counter, aps_pen, aps_len, aps_fen, aps_dout, fg_pen, fg_len, fg_fen, fg_dout, wm_ena); input clk; input rst; input aps_clk; input aps_clk_pulse; input [7:0] aps_clk_counter; input aps_pen;
92
93
input aps_len; input aps_fen; input [11:0] aps_dout; input wm_ena; output fg_pen; output fg_len; output fg_fen; output [11:0] fg_dout; //regs and wires reg [11:0] in_pic_add_wr_cnt, in_pic_add_rd_cnt; wire [7:0] in_pic_dat_wr, in_pic_dat_rd; wire [11:0] in_pic_add_wr, in_pic_add_rd; wire in_pic_wren = aps_len; //state parameters parameter IDLE = 2'b00; parameter READ = 2'b01; //write enable sync state machine reg [1:0] state, next_state; always @(posedge clk or posedge rst) if (rst) state <= IDLE; else if ((aps_clk_counter == 8'h9) & ~aps_clk) state <= next_state; else state <= state; //next state logic always @(state or in_pic_add_wr_cnt or in_pic_add_rd_cnt) begin case (state) IDLE :
93
94
if (&in_pic_add_wr_cnt[10:0]) next_state = READ; else next_state = IDLE; READ : if (&in_pic_add_rd_cnt[10:0]) next_state = IDLE; else next_state = READ; default: next_state = IDLE; endcase end wire in_pic_rden; //output logic assign in_pic_rden = (state == READ);
reg in_pic_rden_d; always @ (posedge clk or posedge rst) if (rst) in_pic_rden_d <= 0; else in_pic_rden_d <= in_pic_rden;
always @ (posedge clk or posedge rst) if (rst) in_pic_add_wr_cnt <= 12'h0; else if ((aps_clk_counter == 8'h9) & ~aps_clk) begin if (aps_len)
94
95
in_pic_add_wr_cnt <= in_pic_add_wr_cnt + 1; else in_pic_add_wr_cnt <= in_pic_add_wr_cnt; end assign in_pic_add_wr = in_pic_add_wr_cnt;
always @ (posedge clk or posedge rst) if (rst) in_pic_add_rd_cnt <= 12'h0; else if ((aps_clk_counter == 8'h9) & ~aps_clk) begin if (in_pic_rden) in_pic_add_rd_cnt <= in_pic_add_rd_cnt + 1; else if (&in_pic_add_rd_cnt[10:0]) in_pic_add_rd_cnt <= in_pic_add_rd_cnt + 1; else in_pic_add_rd_cnt <= in_pic_add_rd_cnt; end //address manipulation to account for the 8x8 block readout order assign in_pic_add_rd = {in_pic_add_rd_cnt[11],in_pic_add_rd_cnt[5:3],in_pic_add_rd_cnt[10:6],in _pic_add_rd_cnt[2:0]}; assign in_pic_dat_wr = aps_dout[3] ? aps_dout[11:4] + 1 : aps_dout[11:4];
//state machine to control output memory rd/wr reg [1:0] out_state, out_next_state; always @(posedge clk or posedge rst) if (rst) out_state <= IDLE; else if ((aps_clk_counter == 8'h9) & ~aps_clk) out_state <= out_next_state; else out_state <= out_state;
95
96
reg [11:0] out_pic_add_wr_cnt; reg [15:0] out_pic_add_rd_cnt; //next state logic always @(out_state or out_pic_add_wr_cnt or out_pic_add_rd_cnt) begin case (out_state) IDLE : if (&out_pic_add_wr_cnt[10:0]) out_next_state = READ; else out_next_state = IDLE; READ : if (&out_pic_add_rd_cnt[10:0]) out_next_state = IDLE; else out_next_state = READ; default: out_next_state = IDLE; endcase end wire out_pic_rden; //output logic assign out_pic_rden = (out_state == READ); wire [7:0] out_pic_dat_wr; wire [11:0] out_pic_add_wr, out_pic_add_rd; wire out_pic_wren; always @ (posedge clk or posedge rst) if (rst) out_pic_add_wr_cnt <= 12'h0; else if ((aps_clk_counter == 8'h9) & ~aps_clk)
96
97
begin if (out_pic_wren) out_pic_add_wr_cnt <= out_pic_add_wr_cnt + 1; else out_pic_add_wr_cnt <= out_pic_add_wr_cnt; end assign out_pic_add_wr = {out_pic_add_wr_cnt[11],out_pic_add_wr_cnt[5:3],out_pic_add_wr_cnt[10:6] ,out_pic_add_wr_cnt[2:0]}; always @ (posedge clk or posedge rst) if (rst) out_pic_add_rd_cnt <= 16'h0; else if ((aps_clk_counter == 8'h8) & ~aps_clk) begin if (out_pic_rden) out_pic_add_rd_cnt <= out_pic_add_rd_cnt + 1; else if (&out_pic_add_rd_cnt[10:0]) out_pic_add_rd_cnt <= out_pic_add_rd_cnt + 1; else out_pic_add_rd_cnt <= out_pic_add_rd_cnt; end assign out_pic_add_rd = out_pic_add_rd_cnt[11:0]; wire fg_pen; wire [11:0] fg_dout; reg fg_len, fg_fen; always @ (posedge clk or posedge rst) begin if (rst) begin fg_len <= 1'b0; fg_fen <= 1'b0; end else
97
98
begin if ((&out_pic_add_rd_cnt[7:0]) & aps_clk) fg_len <= 1'b0; else if (&out_pic_add_wr_cnt[10:0] & (aps_clk_counter == 8'h8) & (!aps_clk)) fg_len <= 1'b1; else if (out_pic_rden) fg_len <= 1'b1; else fg_len <= fg_len; if (&out_pic_add_rd_cnt & aps_clk) fg_fen <= 1'b0; else if (&out_pic_add_wr_cnt[10:0] & (aps_clk_counter == 8'h8)) fg_fen <= 1'b1; else if (out_pic_rden) fg_fen <= 1'b1; else fg_fen <= fg_fen; end end wire [7:0] out_pic_dat_rd; assign fg_pen = aps_clk_pulse & fg_len; assign fg_dout = {4'h0,out_pic_dat_rd}; wire wm_ena; pixel_mem_buffer in_mem_buffer ( .address_a(in_pic_add_wr), .address_b(in_pic_add_rd), .clock(clk), .data_a(in_pic_dat_wr), .data_b(),
98
99
.wren_a(in_pic_wren), .wren_b(1'b0), .q_a(), .q_b(in_pic_dat_rd)); pixel_mem_buffer out_mem_buffer ( .address_a(out_pic_add_wr), .address_b(out_pic_add_rd), .clock(clk), .data_a(out_pic_dat_wr), .data_b(), .wren_a(out_pic_wren), .wren_b(1'b0), .q_a(), .q_b(out_pic_dat_rd)); encoder encoder_embedder_decoder ( .clk(aps_clk), .ena(in_pic_rden_d), .rst(rst), .din(in_pic_dat_rd), .dout(out_pic_dat_wr), .douten(out_pic_wren), .wm_ena(wm_ena) );
endmodule
B.3.1. Watermark Embedding

WM_plus_RNG_top.v // This module is the top level that connects the watermark embedding // module with the watermark generator (RNG) module `resetall `timescale 1ns / 100ps
99
100
module top (clk, rst, ena, serial_data_out, WM_out, douten); parameter COEFF_SIZE = 12; parameter N = 22; parameter MKEY_VAL = 3; parameter CKEY_VAL = 4; input clk; input rst; input ena; input [COEFF_SIZE -1:0] serial_data_out; //unmarked DCT data from Zigzag buffer output [COEFF_SIZE-1:0] WM_out; //watermarked DCT data output douten; wire WM_data_in; wire s; reg shift; wire douten; reg douten_reg; reg [5:0] cntr64; reg ddata_valid; WM_top #(COEFF_SIZE) WM_embedder ( .clk(clk), .rst(rst), .shift(shift), .serial_data_out(serial_data_out), .WM_data_in(WM_data_in), .WM_out(WM_out) ); ffcsr22 #(.N(N), .MKEY_VAL(MKEY_VAL), .CKEY_VAL(CKEY_VAL)) WM_RNG (
100
101
.clk(clk), .rst(rst), .shift(shift), .s(s) ); //control and sync signals always @(posedge clk or posedge rst) begin if (rst) cntr64 <= #1 0; else if (shift) begin if (cntr64 < 6'h3f) cntr64 <= #1 cntr64 + 1'b1; else cntr64 <= #1 cntr64; end else cntr64 <= #1 cntr64; end always @ (posedge clk or posedge rst) if (rst) shift <= 1'b0; else shift <= ena; assign WM_data_in = s; assign douten = douten_reg & shift; always @ (posedge clk or posedge rst) if (rst) douten_reg <= 0; else douten_reg <= (&cntr64);
101
102
endmodule WM_top.v `resetall `timescale 1ns/10ps // this module is the top level for the watermark embedder and connects // the DCT data buffer and the watermarking logic module WM_top (clk, rst, shift, serial_data_out, WM_data_in, WM_out); parameter COEFF_SIZE = 12; input clk; input rst; input shift; input WM_data_in; input [COEFF_SIZE -1:0] serial_data_out; output [COEFF_SIZE-1:0] WM_out; wire [COEFF_SIZE-1:0] WM_out; wire [COEFF_SIZE-1:0] d_out, B_P_out; wire [COEFF_SIZE-1:0] d_in; wire shift; wire pointer_full; wire p_i; // Modules instantiation // DCT data buffer ram_sr B_P_reg ( .shiftin(d_in), .clock(clk), .clken(shift), .shiftout(d_out), .taps() ); // watermarking logic
102
103
WM_point_logic #(COEFF_SIZE) WM_point_logic1 ( .clk(clk), .rst(rst), .B_P_out(B_P_out), .p_i(p_i), .shift(shift), .pointer_full(pointer_full), .WM_out(WM_out) ); //assignments // d_in is the i-th coefficient from the current block out of the DCT // block // B_P_out is the i-th coefficient stored in B_P_reg from the previous // block assign d_in = serial_data_out; assign B_P_out = d_out; assign p_i = (|(d_in))&(|(B_P_out))&(pointer_full ? WM_data_in : 1'b1); endmodule WM_point_logic_ver2.v /* This module does two simultaneous assignments 1. Identifying the first (representing highest spatial frequency) N cells (after anding 2 neighbors) that are non-zero. The results are stored in the next_pointer register. When all N cells of next_pointer are full, the module starts to calculate the value to be embedded in the corresponding coefficients. The computation simply stores the XOR between the current value in next_p_reg[n] and the output of p_i. Finally, when all coefficients from DCT block are rcvd, the values from the 'next' registers are copied to the current registers, and the 'next' regs are reset 2. The 'current' regs are used to embed the WM in the selected coeff's. The pointer reg points at the indexes where the embedding takes place, and the p_reg reg holds the value to embed. That way, for each block there are two phases:
103
104
Each clock cycle, one coefficient is output from the DCT block and stored in the reg_stack, while the coef of the same index from the previous block is being output from the stack. During that stage, both coeff's are used to calculate the value of p_i and the coefficient of the previous block is embedded with the WM and sent out as secured image data */ `resetall `timescale 1ns/10ps module WM_point_logic( clk, rst, B_P_out, p_i, shift, pointer_full, WM_out ); parameter COEFF_SIZE = 12; // Internal Declarations input input input input input output [COEFF_SIZE-1:0] B_P_out; clk; rst; p_i; shift; pointer_full; WM_out;
output [COEFF_SIZE- 1 :0] wire [COEFF_SIZE-1:0] B_P_out; wire p_i; wire [COEFF_SIZE-1:0] WM_out; wire [5:0] inc; reg pointer_cnt;
104
105
// pointer reg declarations and assignments parameter N = 2; integer j; //mux for shift enabled SR reg [5:0] pointer embed reg [5:0] next_pointer [N-1:0]; //N is the number of coefficients to [N-1:0];
reg [N-1:0] p_reg, next_p_reg; // this register stores the LSB values // to embed wire pointer_full = !(&next_pointer[0]); wire shift_condition = (p_i && !pointer_full && shift); reg p_rst; //This shift register has async rst and syncronous p_rst //It shifts only when the conditions for shift are met: //The register is still not full, the new input is non-zero and a data //valid signal is on (shift)
always @(posedge clk or posedge rst) begin : synchronous_sr if (rst) for (j = 0;j < N;j = j + 1) next_pointer[j] <= #1 6'h3f; else if (p_rst) for (j = 0;j < N;j = j + 1) next_pointer[j] <= #1 6'h3f; else if (shift_condition) begin for (j = 0;j < N-1;j = j + 1) next_pointer[j] <= #1 next_pointer[j+1]; next_pointer[N-1] <= #1 inc; end else for (j = 0;j < N;j = j + 1) next_pointer[j] <= #1 next_pointer[j]; end
105
106
always @ (posedge clk or posedge rst) begin if (rst) begin pointer_cnt <= #1 1'b0; p_reg <= #1 0; next_p_reg <= #1 0; p_rst <= #1 0; end else if (shift) begin if (pointer[pointer_cnt] == inc) pointer_cnt <= #1 1'b1; else pointer_cnt <= #1 pointer_cnt; if (&inc) begin p_reg <= #1 next_p_reg; next_p_reg <= #1 0; for (j = 0;j < N;j = j + 1) pointer[j] <= #1 next_pointer[j]; p_rst <= #1 1; pointer_cnt <= #1 1'b0; end else begin p_reg <= #1 p_reg; p_rst <= #1 0; end if (pointer_full) begin case (inc[0]) 1'b0 : next_p_reg[0] <= #1 next_p_reg[0]^p_i; 1'b1 :
106
107
next_p_reg[1] <= #1 next_p_reg[1]^p_i; // // // // endcase end end //shift operations else p_rst <= #1 0; end //sync always // two oprions to embed teh WM in the LSB - just set\rst; make the // numbers parity be equal to the WM bit // wire parity; // assign parity = ^(B_P_out); // assign WM_out = (pointer[pointer_cnt] ~= inc) ? B_P_out : (parity == p_reg[pointer_cnt]) ? B_P_out : {B_P_out[COEFF_SIZE-1:1],~B_P_out[0]}; 2'b10 : next_p_reg[2] <= next_p_reg[2]^p_i; 2'b11 : next_p_reg[3] <= next_p_reg[3]^p_i;
assign WM_out = (pointer[pointer_cnt] == inc) ? {B_P_out[COEFF_SIZE1:1],p_reg[pointer_cnt]} : B_P_out; incr pointer_gen ( .clk(clk), .rst(rst), .en_cnt(shift), .inc_out(inc) ); endmodule module incr (clk, rst, en_cnt, inc_out); input clk;
107
108
input rst; input en_cnt; output [5:0] inc_out; reg [5:0] inc_out; always @ (posedge clk or posedge rst) begin if (rst) inc_out <= 6'h00; else if (en_cnt) inc_out <= inc_out + 1'b1; else inc_out <= inc_out; end endmodule ram_sr.v // megafunction wizard: %Shift register (RAM-based)% // GENERATION: STANDARD // VERSION: WM1.0 // MODULE: altshift_taps // ============================================================ // File Name: ram_sr.v // Megafunction Name(s): // altshift_taps // ============================================================ // ************************************************************ // THIS IS A WIZARD-GENERATED FILE. DO NOT EDIT THIS FILE! // // 5.0 Build 148 04/26/2005 SJ Full Version // ************************************************************
108
109
//Copyright (C) 1991-2005 Altera Corporation //Your use of Altera Corporation's design tools, logic functions //and other software and tools, and its AMPP partner logic //functions, and any output files any of the foregoing //(including device programming or simulation files), and any //associated documentation or information are expressly subject //to the terms and conditions of the Altera Program License //Subscription Agreement, Altera MegaCore Function License //Agreement, or other applicable license agreement, including, //without limitation, that your use is for the sole purpose of //programming logic devices manufactured by Altera and sold by //Altera or its authorized distributors. Please refer to the //applicable agreement for further details.
// synopsys translate_off `timescale 1 ns / 10 ps // synopsys translate_on module ram_sr ( shiftin, clock, clken, shiftout, taps); input [11:0] input input output output shiftin;
clock; clken; [11:0] [11:0] shiftout; taps;
wire [11:0] sub_wire0; wire [11:0] sub_wire1; wire [11:0] taps = sub_wire0[11:0]; wire [11:0] shiftout = sub_wire1[11:0]; altshift_taps altshift_taps_component ( .clken (clken),
109
110
.clock (clock), .shiftin (shiftin), .taps (sub_wire0), .shiftout (sub_wire1)); defparam altshift_taps_component.width = 12, altshift_taps_component.number_of_taps = 1, altshift_taps_component.tap_distance = 64, altshift_taps_component.lpm_type = "altshift_taps";
endmodule ffcsr22.v // this module implements a 22 bits ffcsr RNG `resetall `timescale 1ns/10ps module ffcsr22 parameter parameter parameter (clk, rst, shift, s);
N = 22; q= 4194793; d=22'b1000000000000011110101;
parameter MKEY_VAL = 3; parameter CKEY_VAL = 4; input clk;
input rst; input shift; output reg wire reg wire wire wire s; [N-1:0] mstate;
[N-1:0] mstate_N; [5:0] cstate; [5:0] s; [N-1:0] mkey = MKEY_VAL; cstate_N;
110
111
wire
[5:0]
ckey = CKEY_VAL;
// Define the FCSR and Filter function assign mstate_N[0]=mstate[1]^d[0]&cstate[0]^d[0]&mstate[0]; assign mstate_N[1]=mstate[2]; assign mstate_N[2]=mstate[3]^d[2]&cstate[1]^d[2]&mstate[0]; assign mstate_N[3]=mstate[4]; assign mstate_N[4]=mstate[5]^d[4]&cstate[2]^d[4]&mstate[0]; assign mstate_N[5]=mstate[6]^d[5]&cstate[3]^d[5]&mstate[0]; assign mstate_N[6]=mstate[7]^d[6]&cstate[4]^d[6]&mstate[0]; assign mstate_N[7]=mstate[8]^d[7]&cstate[5]^d[7]&mstate[0]; assign mstate_N[8]=mstate[9]; assign mstate_N[9]=mstate[10]; assign mstate_N[10]=mstate[11]; assign mstate_N[11]=mstate[12]; assign mstate_N[12]=mstate[13]; assign mstate_N[13]=mstate[14]; assign mstate_N[14]=mstate[15]; assign mstate_N[15]=mstate[16]; assign mstate_N[16]=mstate[17]; assign mstate_N[17]=mstate[18]; assign mstate_N[18]=mstate[19]; assign mstate_N[19]=mstate[20]; assign mstate_N[20]=mstate[21]; assign mstate_N[21]=mstate[0]; assign cstate_N[0]=mstate[1]&cstate[0]^cstate[0]&mstate[0]^mstate[1]&mstate[0]; assign cstate_N[1]=mstate[3]&cstate[1]^cstate[1]&mstate[0]^mstate[3]&mstate[0]; assign cstate_N[2]=mstate[5]&cstate[2]^cstate[2]&mstate[0]^mstate[5]&mstate[0]; assign cstate_N[3]=mstate[6]&cstate[3]^cstate[3]&mstate[0]^mstate[6]&mstate[0]; assign cstate_N[4]=mstate[7]&cstate[4]^cstate[4]&mstate[0]^mstate[7]&mstate[0]; assign cstate_N[5]=mstate[8]&cstate[5]^cstate[5]&mstate[0]^mstate[8]&mstate[0];
111
112
// Caculate the output sequence always @(posedge clk or posedge rst) begin if(rst) begin mstate<= #1 mkey; cstate<= #1 ckey; end else if (shift) begin mstate<= #1 mstate_N; cstate<= #1 cstate_N; end else begin mstate<= #1 mstate; cstate<= #1 cstate; end end assign s=(mstate[0]^mstate[2])^(mstate[4]^mstate[5])^(mstate[6]^mstate[7]^mstat e[21]); //the paratheses will hopefully minimize delay endmodule
B.3.2. DCT IDCT Modules
The DCT modules were borrowed from [ref] where the source code is also available.
B.3.3. Zigzag Modules
Zigzag.v ///////////////////////////////////////////////////////////////////// //// //// //// //// Zig-Zag Unit Performs zigzag-ing, as used by many DCT based encoders //// ////
112
113
//// //// //// //// //// Author: Richard Herveille richard@asics.ws www.asics.ws
//// //// //// //// ////
///////////////////////////////////////////////////////////////////// //// //// //// Copyright (C) 2002 Richard Herveille //// //// richard@asics.ws //// //// //// ////
//// This source file may be used and distributed without
//// restriction provided that this copyright statement is not //// //// removed from the file and that any derivative work contains //// //// the original copyright notice and the associated disclaimer.//// //// //// /////////////////////////////////////////////////////////////////////
`timescale 1ns/10ps module zigzag( clk, rst, ena, dct_2d, dout, douten ); parameter do_width = 12; // // inputs & outputs // input clk; input rst; input ena; // clk enable // system clock
113
114
input [do_width-1:0] dct_2d; output [do_width-1:0] dout; output // // variables // wire block_rdy; reg ld_zigzag; reg douten_reg; wire douten; reg [do_width-1:0] sresult_in [63:0]; // store results for zig// zagging reg [do_width-1:0] sresult_out[63:0]; reg [5:0] sample_cnt; // // module body // always @ (posedge clk or posedge rst) if (rst) sample_cnt <= 6'h0; else if (ena) sample_cnt <= sample_cnt + 1'b1; else sample_cnt <= sample_cnt; assign block_rdy = &sample_cnt; douten; // data-out enable
always @ (posedge clk) ld_zigzag <= block_rdy; always @ (posedge clk or posedge rst) if (rst) douten_reg <= 1'b0;
114
115
else if (block_rdy) douten_reg <= 1'b1; else douten_reg <= douten_reg; assign douten = douten_reg & ena;
// // Generate zig-zag structure // // This implicates that the quantization step be performed after // the zig-zagging. // // 0: 1: 2: 3: 4: 5: 6: 7: // 0: 63 62 58 57 49 48 36 35 // 1: 61 59 56 50 47 37 34 21 // 2: 60 55 51 46 38 33 22 20 // 3: 54 52 45 39 32 23 19 10 // 4: 53 44 40 31 24 18 11 09 // 5: 43 41 30 25 17 12 08 03 // 6: 42 29 26 16 13 07 04 02 // 7: 28 27 15 14 06 05 01 00 // // zig-zag the DCT results integer n; always @(posedge clk) if (ena) begin for (n=1; n<=63; n=n+1) // sresult_in[0] gets the new input begin sresult_in[n] <= #1 sresult_in[n -1]; sresult_in[0] <= #1 dct_2d; end if(ld_zigzag) begin // reload results-register file 0: 1: 2: 3: 4: 5: 6: 7: 3f 3e 3a 39 31 30 24 23 3d 3b 38 32 2f 25 22 15 3c 37 33 2e 26 21 16 14 36 34 2d 27 20 17 13 0a 35 2c 28 1f 18 12 0b 09 2b 29 1e 19 11 0c 08 03 2a 1d 1a 10 0d 07 04 02 1c 1b 0f 0e 06 05 01 00
115
116
sresult_out[00] <= #1 sresult_in[00]; sresult_out[01] <= #1 sresult_in[08]; sresult_out[02] <= #1 sresult_in[01]; sresult_out[03] <= #1 sresult_in[02]; sresult_out[04] <= #1 sresult_in[09]; sresult_out[05] <= #1 sresult_in[16]; sresult_out[06] <= #1 sresult_in[24]; sresult_out[07] <= #1 sresult_in[17]; sresult_out[08] <= #1 sresult_in[10]; sresult_out[09] <= #1 sresult_in[03]; sresult_out[10] <= #1 sresult_in[04]; sresult_out[11] <= #1 sresult_in[11]; sresult_out[12] <= #1 sresult_in[18]; sresult_out[13] <= #1 sresult_in[25]; sresult_out[14] <= #1 sresult_in[32]; sresult_out[15] <= #1 sresult_in[40]; sresult_out[16] <= #1 sresult_in[33]; sresult_out[17] <= #1 sresult_in[26]; sresult_out[18] <= #1 sresult_in[19]; sresult_out[19] <= #1 sresult_in[12]; sresult_out[20] <= #1 sresult_in[05]; sresult_out[21] <= #1 sresult_in[06]; sresult_out[22] <= #1 sresult_in[13]; sresult_out[23] <= #1 sresult_in[20]; sresult_out[24] <= #1 sresult_in[27]; sresult_out[25] <= #1 sresult_in[34]; sresult_out[26] <= #1 sresult_in[41]; sresult_out[27] <= #1 sresult_in[48]; sresult_out[28] <= #1 sresult_in[56]; sresult_out[29] <= #1 sresult_in[49]; sresult_out[30] <= #1 sresult_in[42]; sresult_out[31] <= #1 sresult_in[35]; sresult_out[32] <= #1 sresult_in[28]; sresult_out[33] <= #1 sresult_in[21]; sresult_out[34] <= #1 sresult_in[14]; sresult_out[35] <= #1 sresult_in[07]; sresult_out[36] <= #1 sresult_in[15]; sresult_out[37] <= #1 sresult_in[22];
116
117
sresult_out[38] <= #1 sresult_in[29]; sresult_out[39] <= #1 sresult_in[36]; sresult_out[40] <= #1 sresult_in[43]; sresult_out[41] <= #1 sresult_in[50]; sresult_out[42] <= #1 sresult_in[57]; sresult_out[43] <= #1 sresult_in[58]; sresult_out[44] <= #1 sresult_in[51]; sresult_out[45] <= #1 sresult_in[44]; sresult_out[46] <= #1 sresult_in[37]; sresult_out[47] <= #1 sresult_in[30]; sresult_out[48] <= #1 sresult_in[23]; sresult_out[49] <= #1 sresult_in[31]; sresult_out[50] <= #1 sresult_in[38]; sresult_out[51] <= #1 sresult_in[45]; sresult_out[52] <= #1 sresult_in[52]; sresult_out[53] <= #1 sresult_in[59]; sresult_out[54] <= #1 sresult_in[60]; sresult_out[55] <= #1 sresult_in[53]; sresult_out[56] <= #1 sresult_in[46]; sresult_out[57] <= #1 sresult_in[39]; sresult_out[58] <= #1 sresult_in[47]; sresult_out[59] <= #1 sresult_in[54]; sresult_out[60] <= #1 sresult_in[61]; sresult_out[61] <= #1 sresult_in[62]; sresult_out[62] <= #1 sresult_in[55]; sresult_out[63] <= #1 sresult_in[63]; end else begin for (n=0; n<63; n=n+1) // do not change sresult[0] sresult_out[n] <= #1 sresult_out[n +1]; end end
assign dout = sresult_out[0]; endmodule
117
118
reverse_zigzag.v `timescale 1ns/10ps module reverse_zigzag( clk, rst, ena, din, dct_2d, douten ); parameter do_width = 12; input clk; input rst; input ena; input [do_width-1:0] din; output // system clock
// clk ena
output [do_width-1:0] dct_2d; douten; // data-out enable
reg ld_zigzag; reg [do_width-1:0] sresult_in [63:0]; // store results for zig// zagging reg [do_width-1:0] sresult_out[63:0]; reg [5:0] sample_cnt; reg douten_reg; wire douten; always @ (posedge clk or posedge rst) if (rst) sample_cnt <= 6'h0; else if (ena) sample_cnt <= sample_cnt + 1'b1; else sample_cnt <= sample_cnt;
118
119
always @ (posedge clk) ld_zigzag <= &sample_cnt; always @ (posedge clk or posedge rst) if (rst) douten_reg <= 1'b0; else if (ld_zigzag) douten_reg <= 1'b1; else douten_reg <= douten_reg; assign douten = douten_reg & ena; // // Generate zig-zag structure // // // 0: 1: 2: 3: 4: 5: 6: 7: // 0: 63 62 58 57 49 48 36 35 // 1: 61 59 56 50 47 37 34 21 // 2: 60 55 51 46 38 33 22 20 // 3: 54 52 45 39 32 23 19 10 // 4: 53 44 40 31 24 18 11 09 // 5: 43 41 30 25 17 12 08 03 // 6: 42 29 26 16 13 07 04 02 // 7: 28 27 15 14 06 05 01 00 // // zig-zag the DCT results integer n; always @(posedge clk) if (ena) begin for (n=1; n<=63; n=n+1) // sresult_in[0] gets the new input begin sresult_in[n] <= #1 sresult_in[n -1]; sresult_in[0] <= #1 din; end 0: 1: 2: 3: 4: 5: 6: 7: 3f 3e 3a 39 31 30 24 23 3d 3b 38 32 2f 25 22 15 3c 37 33 2e 26 21 16 14 36 34 2d 27 20 17 13 0a 35 2c 28 1f 18 12 0b 09 2b 29 1e 19 11 0c 08 03 2a 1d 1a 10 0d 07 04 02 1c 1b 0f 0e 06 05 01 00
119
120
if(ld_zigzag)
// reload results-register file
begin sresult_out[00] <= #1 sresult_in[00]; sresult_out[08] <= #1 sresult_in[01]; sresult_out[01] <= #1 sresult_in[02]; sresult_out[02] <= #1 sresult_in[03]; sresult_out[09] <= #1 sresult_in[04]; sresult_out[16] <= #1 sresult_in[05]; sresult_out[24] <= #1 sresult_in[06]; sresult_out[17] <= #1 sresult_in[07]; sresult_out[10] <= #1 sresult_in[08]; sresult_out[03] <= #1 sresult_in[09]; sresult_out[04] <= #1 sresult_in[10]; sresult_out[11] <= #1 sresult_in[11]; sresult_out[18] <= #1 sresult_in[12]; sresult_out[25] <= #1 sresult_in[13]; sresult_out[32] <= #1 sresult_in[14]; sresult_out[40] <= #1 sresult_in[15]; sresult_out[33] <= #1 sresult_in[16]; sresult_out[26] <= #1 sresult_in[17]; sresult_out[19] <= #1 sresult_in[18]; sresult_out[12] <= #1 sresult_in[19]; sresult_out[05] <= #1 sresult_in[20]; sresult_out[06] <= #1 sresult_in[21]; sresult_out[13] <= #1 sresult_in[22]; sresult_out[20] <= #1 sresult_in[23]; sresult_out[27] <= #1 sresult_in[24]; sresult_out[34] <= #1 sresult_in[25]; sresult_out[41] <= #1 sresult_in[26]; sresult_out[48] <= #1 sresult_in[27]; sresult_out[56] <= #1 sresult_in[28]; sresult_out[49] <= #1 sresult_in[29]; sresult_out[42] <= #1 sresult_in[30]; sresult_out[35] <= #1 sresult_in[31]; sresult_out[28] <= #1 sresult_in[32]; sresult_out[21] <= #1 sresult_in[33]; sresult_out[14] <= #1 sresult_in[34];
120
121
sresult_out[07] <= #1 sresult_in[35]; sresult_out[15] <= #1 sresult_in[36]; sresult_out[22] <= #1 sresult_in[37]; sresult_out[29] <= #1 sresult_in[38]; sresult_out[36] <= #1 sresult_in[39]; sresult_out[43] <= #1 sresult_in[40]; sresult_out[50] <= #1 sresult_in[41]; sresult_out[57] <= #1 sresult_in[42]; sresult_out[58] <= #1 sresult_in[43]; sresult_out[51] <= #1 sresult_in[44]; sresult_out[44] <= #1 sresult_in[45]; sresult_out[37] <= #1 sresult_in[46]; sresult_out[30] <= #1 sresult_in[47]; sresult_out[23] <= #1 sresult_in[48]; sresult_out[31] <= #1 sresult_in[49]; sresult_out[38] <= #1 sresult_in[50]; sresult_out[45] <= #1 sresult_in[51]; sresult_out[52] <= #1 sresult_in[52]; sresult_out[59] <= #1 sresult_in[53]; sresult_out[60] <= #1 sresult_in[54]; sresult_out[53] <= #1 sresult_in[55]; sresult_out[46] <= #1 sresult_in[56]; sresult_out[39] <= #1 sresult_in[57]; sresult_out[47] <= #1 sresult_in[58]; sresult_out[54] <= #1 sresult_in[59]; sresult_out[61] <= #1 sresult_in[60]; sresult_out[62] <= #1 sresult_in[61]; sresult_out[55] <= #1 sresult_in[62]; sresult_out[63] <= #1 sresult_in[63]; end else begin for (n=0; n<63; n=n+1) // do not change sresult[63] sresult_out[n] <= #1 sresult_out[n +1]; end end assign dct_2d = sresult_out[00]; endmodule
121

Out 3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Out 3

Uploaded by

Copyright:

Available Formats

UNIVERSITY OF CALGARY

An Imaging System With Watermarking And Compression Capabilities

DEPARTMENT OF ELECTRICAL AND COPMUTER ENGINEERING CALGARY, ALBERTA SEPTEMBER 2009

Yonatan Shoshan 2009

List of Symbols, Abbreviations and Nomenclature

Figure 3.9: A generalized Fibonacci implementation of an LFSR circuit

consider the result of 1+ 2 i = 0 . Another intuitive association is to digital arithmetic

Figure 3.10: Galois implemented FCSR

Figure 3.11: A Gollman cascade RNG

Figure 4.1: Algorithm Matlab simulation results

4.2 Algorithm performance evaluation

Nonzero Cells [%] 43.8 43.8 28.0 28.0 21.0 21.0

PSNR [dB] 45.85 45.59 41.64 41.24 39.22 38.77

Detection Ratio [%] 82.6 94.7 90.7 90.7 76.0 89.3

4.3 Hardware design and verification

Figure 4.3: Test setup schematic 4.3.1 Hardware Experimental Results

Figure 4.4: Hardware watermarked image

Table 4.2: FPGA Synthesis Results

Figure 4.5: A general implementation of an imaging system

Figure 4.7: Internal structure of the FPGA digital design

Logic Cells 13471 198 282 12573 113 30

Registers 8307 103 154 7905 78 28

Memory Bits 66280 0 0 66280 744 0

(a) output image w/ watermark

(b) output image w/o watermark

CHAPTER 5: Conclusion 5.1 Thesis summary

[7] [8] [9] [10]

[37] [38] [39]

[40] [41] [42] [43] [44] [45] [46] [47]

[48] [49] [50]

[52] [53] [54] [55]

APPENDIX A: A.1. Simulation Testbench and peripherals A.1.1. Simulation Envelope

% quantization table Q=[1 1 1 1 1 1 2 1 1 1 1 1 2 3 1 1 1 1 3 3 4 1 1 1 1 4 3 4 1 1 2 2 4 4 5 2 3 3 4 6 5 1 3 3 3 4 6 1 1 3 3 3 3 1 1 1

J = J(inv_order, : ); % zig-zag format

% reorder column elements back from

A.3. Algorithm Implementation A.3.1. Embedding

APPENDIX B: VERILOG CODE B.1. Top Level and Peripheral Modules

= 4'h8 ; = 4'h9 ; = 4'hb ; = 4'hc ;

// SRAM SOURCE MUX states parameter SRAM_MUX_CPU = 2'b00 ;

B.2. CMOS Imager Control Logic and Interface

assign APS_ExtCtr_in = aps_controls[4];

assign APS_ExtCtrPipe = 0; endmodule

// controls input module_active; input aps_clk_pulse; input aps_clk;

// running while '1' // sync to APS clk

// frame grabber output fg_pen; // pixelclk to output lvds

output fg_len; output fg_fen; output [11:0] fg_dout;

// lineclk to output lvds // frameclk to output lvds // fg data lines

B.3. JPEG Encoding and Watermark Embedding

B.3.1. Watermark Embedding

clock; clken; [11:0] [11:0] shiftout; taps;

N = 22; q= 4194793; d=22'b1000000000000011110101;

parameter MKEY_VAL = 3; parameter CKEY_VAL = 4; input clk;

[N-1:0] mstate_N; [5:0] cstate; [5:0] s; [N-1:0] mkey = MKEY_VAL; cstate_N;

B.3.2. DCT IDCT Modules

//// //// //// //// ////

//// This source file may be used and distributed without

assign dout = sresult_out[0]; endmodule

output [do_width-1:0] dct_2d; douten; // data-out enable

// reload results-register file

You might also like