You are on page 1of 69



1.1 Introduction:
A standard movie, which is also known as motion picture, can be defined as a sequence of several scenes. A scene is then defined as a sequence of several seconds of motion recorded without interruption. A scene usually has at least three seconds. A movie in the cinema is shown as a sequence of still pictures, at a rate of 24 frames per second. Similarly, a TV broadcast consists of a transmission of 30 frames per second (NTSC, and some flavors of PAL, such as PAL-M), 25 frames per second (PAL, SECAM) or anything from 5 to 30 frames per second for typical videos in the Internet. The name motion picture comes from the fact that a video, once encoded, is nothing but a sequence of still pictures that are shown at a reasonably high frequency. That gives the viewer the illusion that it is in fact a continuous animation. Each frame is shown for one small fraction of a second, more precisely 1/ k seconds, where k is the number of frames per second. Coming back to the definition of a scene, where the frames are captured without interruption, one can expect consecutive frames to be quite similar to one another, as very little time is allowed until the next frame is to be captured. With all this in mind we can finally conclude that each scene is composed of at least 3 k frames (since a scene is at least 3 seconds long). In the NTSC case, for example, that means that a movie is composed of a sequence of various segments (scenes) each of which has at least 90 frames similar to one another.[4] Before going further with details on motion estimation we need to describe briefly how a video sequence is organized. As mentioned earlier a video is composed of a number of pictures. Each picture is composed of a number of pixels or peals (picture elements). A video frame has its pixels grouped in 88 blocks. The blocks are then grouped in macro blocks (MB), which are composed of 4 luminance blocks each (plus equivalent chrominance blocks). Macro blocks are then organized in groups of blocks (GOBs) which are grouped in pictures (or in layers and then pictures). Pictures are further grouped in scenes, as described above, and we can consider scenes grouped as movies. Motion estimation is often performed in the macro block domain. For simplicity sake well refer to the macro blocks as


blocks, but we shall remember that most often the macro block domain is the one in use for motion estimation.[4] For motion estimation the idea is that one block b of a current frame C is sought for in a previous (or future) frame R. If a block of pixels which is similar enough to block b is found in R, then instead of transmitting the whole block just a motion vector is transmitted. Ideally, a given macro blocks would be sought for in the whole reference frame; however, due to the computational complexity of the motion estimation stage the search is usually limited to a pre-defined region around the macro blocks. Most often such region includes 15 or 7 pixels to all four directions in a given reference frame. The search region is often denoted by the interval [-p, p] meaning that it includes p pixels in all directions.[4] With the growing need of real-time video applications, video compression plays a vital role in achieving bandwidth efficiency for both transmission and storage and efficient motion estimation is a key factor for achieving enhanced compression ratio. However, motion estimation involves high computational complexity, causing bottleneck in the realtime applications. To meet real-time processing needs, several motion vector search strategies and hardware designs have been proposed. These primarily focus on reducing the number of Sum-of-Absolute-Difference (SAD) operations at the cost of controller complexity. One of the main design goals is to reduce the computational complexity and power consumptions, without sacrificing image quality. Some algorithms and architectures succeeded in reducing power consumption and satisfied the required performance. The simplest and most effective method of motion estimation is to exhaustively compare each NxN macro block of the current frame with all the candidate blocks in the search window defined with in the previous processed frame and findthe best matching position with the lowest distortion. This is called Full Search Block Matching algorithm (FSBM)algorithm. A full search block matching process with a search range p has a search window of size (2p+N) x (2p+N) pixels and a total of (2p+1)2 candidate blocks in the reference frame for each block of the current frame. The distortion values are computed for each of the candidate blocks and its minimum value is found from the set of (2p+1)2


candidate blocks. The distortion measure is Sum of Absolute Difference for its simplicity, in which the candidate block with minimum amount of distortion is considered as the best-match. To achieve a best trade-off between the computational complexity of FSBM and degraded PSNR of motion compensated frame using faster algorithms, recently some researchers have investigated reduction of computational complexities of FSBM [1]-[5]. All these algorithms are not optimal in the sense that instead of exhaustive search, only some fixed positions are searched, based on the predictions of motion. Any error in motion prediction may lead to wrong motion vectors, resulting in poor peak signal-to-noise ratio (PSNR) of the motion-compensated frame. The computationally intensive nature of FSBM and the demand of real-time processing render the VLSI implementation of FSBM is a necessity. Due to the repetitive nature of the algorithm and demand for high throughput, several VLSI architectures of FSBM were implemented [6]-[10], but most of them are based on dataflow management and not suitable for good throughput/area efficiency and low power VLSI implementations. The main draw back of these architectures is that considerable extra hardware is required estimating the motion vector. Luc De Vos&M.Schobinger [11] and others [12]-[13], suggested processor based architectures, offering high flexibility and programmability, but compared to ASICs [14]; it offers higher power consumption and lower throughput. In this paper we propose efficient and low power VLSI architecture, which has been developed to meet the speed requirement of the current video coding system.

This chapter discusses design for testability (DFT) techniques for testing modern digital circuits. These DFT techniques are required in order to improve the quality and reduce the test cost of the digital circuit, while at the same time simplifying the test, debug and diagnose tasks. The purpose of this chapter is to provide readers with the knowledge to judge whether a design is implemented in a test-friendly manner and to recommend changes in order to improve the testability of the design for achieving the above-mentioned goals. More specifically, this chapter will allow readers to be able to identify and fix scan design rule violations and understand the


basics for successfully converting a design into a scan design.In this chapter, we first cover the basic DFT concepts and methods for performing testability analysis. Next, following a brief yet comprehensive summary of ad hoc DFT techniques, scan design, the most widely used structured DFT methodology, is discussed, including popular scan cell designs, scan architectures, scan design rules, scan design flow, and special-purpose scan designs. Finally, advanced DFT techniques for use at the register-transfer level (RTL) are presented in order to further reduce DFT design iterations and test development time. During the early stages of integrated circuit (IC) production history, design and test were regarded as separate functions, performed by separate and unrelated groups of engineers. During these early years, a design engineers job was to implement the required functionality based on design specifications, without giving any thought 38 VLSI Test Principles and Architectures to how the manufactured device was to be tested. Once the functionality was implemented, the design information was transferred to test engineers. A test engineers job was to determine how to efficiently test each manufactured device within a reasonable amount of time, in order to screen out the parts that may contain manufacturing defects and ship all defect-free devices to customers. The final quality of the test was determined by keeping track of the number of defective parts shipped to the customers, based on customer returns. This product quality, measured in terms of defective parts per million (PPM) shipped, was a final test score for quantifying the effectiveness of the developed test. While this approach worked well for small-scale integrated circuits that mainly consisted of combinational logic or simple finite-state machines, it was unable to keep up with the circuit complexity as designs moved from small-scale integration (SSI) to very-large-scale integration (VLSI). A common approach to test these VLSI devices during the 1980s relied heavily on fault simulation to measure the fault coverage of the supplied functional patterns. Functional patterns were developed to navigate through the long sequential depths of a design, with the goal of exercising all internal states and detecting all possible manufacturing defects. A fault simulation or fault grading tool was used to quantify the effectiveness of the functional patterns. If the supplied functional patterns did not reach the target fault coverage goal, additional functional patterns


were further added. Unfortunately, this approach typically failed to improve the circuits fault coverage beyond 80%, and the quality of the shipped products suffered. Gradually, it became clear that designing devices without paying much attention to test resulted in increased test cost and decreased test quality. Some designs that were otherwise best in class with regard to functionality and performance failed commercially due to prohibitively high test cost or poor product quality. These problems have since led to the development and deployment of DFT engineering in the industry. The first challenge facing DFT engineers was to find simpler ways of exercising all internal states of a design and reaching the target fault coverage goal. Various testability measures and ad hoc testability enhancement methods were proposed and used in the 1970s and 1980s to serve this purpose. These methods were mainly used to aid in the circuits testability or to increase the circuits controllability and observability [McCluskey 1986] [Abramovici 1994]. While attempts to use these methods have substantially improved the testability of a design and eased sequential automatic test pattern generation (ATPG), their end results at reaching the target fault coverage goal were far from being satisfactory; it was still the fact that, even with these testability aids, deriving functional patterns by hand or generating test patterns for a sequential circuit is a much more difficult problem than generating test patterns for a combinational circuit [Fujiwara 1982] [Bushnell 2000] [Jha 2002]. For combinational circuits, many innovative ATPG algorithms have been developed for automatically generating test patterns within a reasonable amount of time. Automatically generating test patterns for sequential circuits met with limited success, due to the existence of numerous internal states that are difficult to set Design for Testability and check from external pins. Difficulties in controlling and observing the internal states of sequential circuits led to the adoption of structured DFT approaches in which direct external access is provided for storage elements. These reconfigured storage elements with direct external access are commonly referred to as scan cells. Once the capability of controlling and observing the internal states of a design is added, the problem of testing the sequential circuit is transformed into a problem of testing the combinational logic, for which many solutions already existed. Scan design is currently the most popular structured DFT approach. It is implemented by connecting selected storage elements of a design into multiple shift registers, called scan chains, to provide them with


external access. Scan design accomplishes this task by replacing all selected storage elements with scan cells, each having one additional scan input (SI) port and one shared/additional scan output (SO) port. By connecting the SO port of one scan cell to the SI port of the next scan cell, one or more scan chains are created. Since the 1970s, numerous scan cell designs and scan architectures have been proposed [Fujiwara 1985] [McCluskey 1986]. A design where all storage elements are selected for scan insertion is called a full-scan design. A design where almost all (e_g_, more than 98%) storage elements are selected is called an almost fullscan design. A design where some storage elements are selected and sequential ATPG is applied is called a partialscan design. A partial-scan design where storage elements are selected in such a way as to break all sequential feedback loops [Cheng 1990] and to which combinational ATPG can be applied is further classified as a pipelined, feed-forward, or balanced partial-scan design. As silicon prices have continued to drop since the mid-1990s with the advent of deep submicron technology, the dominant scan architecture has shifted from partial-scan design to full-scan design. In order for a scan design to achieve the desired PPM goal, specific circuit structures and design practices that can affect fault coverage must be identified and fixed. This requires compiling a set of scan design rules that must be adhered to. Hence, a new role of DFT engineer emerged, with responsibilities including identifying and fixing scan design rule violations in the design, inserting or synthesizing scan chains into the design, generating test patterns for the scan design, and, finally, converting the test patterns to test programs for test engineers to perform manufacturing testing on automatic test equipment (ATE). Since then, most of these DFT tasks have been automated. In addition to being the dominant DFT architecture used for detecting manufacturing defects, scan design has become the basis of more advanced DFT techniques, such as logic built-in self-test (BIST) [Nadeau-Dostie 2000] [Stroud 2002] and test compression. Furthermore, as designs continue to move towards the nanometer scale, scan design is being utilized as a design feature, with uses varying from debug, diagnosis, and failure analysis to special applications, such as reliability enhancement against soft errors


[Mitra 2005]. A few of these special-purpose scan designs are included in this chapter for completeness. Recently, design for testability has started to migrate from the gate level to the register-transfer level (RTL). The motivation for this migration is to allow additional DFT features, such as logic BIST and test compression, to be integrated VLSI Test Principles and Architectures at the RTL, thereby reducing test development time and creating reusable and testable RTL cores. This further allows the integrated DFT design to go through synthesisbased optimization to reduce performance and area overhead. A more serious issue is that good testability for sequential circuits is difficult to achieve. Because many internal states exist, setting a sequential circuit to a required internal state can require a very large number of input events. Furthermore, identifying the exact internal state of a sequential circuit from the primary outputs might require a very long checking experiment. Hence, a more structured approach for testing designs that contain a large amount of sequential logic is required as part of a methodical design for testability (DFT) approach [Williams 1983]. Initially, many ad hoc techniques were proposed for improving testability. These techniques relied on making local modifications to a circuit in a manner that was considered to result in testability improvement. While ad hoc DFT techniques do result in some tangible testability improvement, their effects are local and not systematic. Furthermore, these techniques are not methodical, in the sense that they have to be repeated differently on new designs, often with unpredictable results. Due to the ad hoc nature, it is also difficult to predict how long it would take to implement the required DFT features. The structured approach for testability improvement was introduced to allow DFT engineers to follow a methodical process for improving the testability of a design. A structured DFT technique can be easily incorporated and budgeted for as part of the design flow and can yield the desired results. Furthermore, structured DFT techniques are much easier to automate. To date, electronic design automation (EDA) vendors have been able to provide sophisticated DFT tools to simplify and speed up DFT tasks. Scan design, which is the main topic in this chapter, hasbeen found to be one of the most effective structured DFT methodology ies for testability improvement. Not only can scan design achieve the targeted fault coverage goal, but it also makes DFT implementation in scan design manageable. In the


following two subsections, we briefly introduce a few typical ad hoc DFT techniques, followed by a detailed description of the structured DFT approach, focusing specifically on scan design.


LITERATURE SURVEY: PE Array architecture PE is a module that calculates the absolute difference between the pixel of the reference block and the pixel of the current block. Fig. 6 shows the architecture of PE Array 4x4. To enable the reference data shifting to top, bottom, right or left in PE Array 4x4, each PE is connected to the PE of top, bottom, right or left one. This structure generates SAD 4x1 by accumulating the absolute difference of each PE. Furthermore, SAD 4x4 is generated by accumulating generated SAD 4x1. However, because long delay will be induced by the multiple arithmetic accumulating modules which generates SAD 4x1 and SAD 4x4, registers are inserted to improve maximum clock frequency.



The PE module is designed to be able to shift to top,bottom, right or left. The proposed PE and the traditional one are shown in Fig. 7 and Fig. 8, respectively.As shown in Fig.

Fig. 6. PE Array 4x4 architecture.

Fig. 7. Traditional PE architecture.



traditional PE can shift pixel data in one direction by connecting PEs of the same direction. However, the proposed PE can shift reference pixel data to top, bottom, right or left. The current pixel data is always shifting into the PE Array from top direction. Reference pixel data need input from four directions as well as output to four directions. Therefore, four input ports and four output ports are prepared in each PE. The connection of PEs is shown in Fig. 9(a). As shown in Fig. 9(a), each PE is connected with surrounding PEs: from top connects to to bottom, from bottom connects to to top, from left connects to to right, and from right connects to to left. An example when from top is selected by the multiplexer is shown in Fig. 9(b). The reference data from the output port to bottom of upper PEs is shifted to bottom PEs and the search position is shifted. In this way, each I/O port of PE enables the shift of the reference pixel data by selecting the input of reference pixel data using multiplexer.

Fig. 8. Proposed PE architecture.



Blocks 1. Processing Element 2. Coder 3. Selector 4. Detector 5. Syndrome Decoder 6. Corrector

1. Processing Element: Processing element calculates the sum of absolute differences between current pixels and reference pixels. The 16 pixels absolute difference is given to adder unit to perform SAD.

Adder Unit: vnxcvncxn

M,Dnbvk.GSJckgASK The absolute differences of 16 pixels each is given to the adder unit which performs the SAD calculation. The architecture of adder tree is shown below, which consists of 3-2 compressor. The calculated SAD is 12 bit.

3-2 Compressor:

3-2 Compressor consists of full adders.



As the pixel size is 8 bits hence the number of full adders required is 8.

2. Coder: The coder generates residue code,two coders are used to generate biresidue codes. Bi-two,residue-rem,the residue code is calculated using modulus operator.the modulus values are selected in such a way that it satisfies the following conditions i.e.

a=3,b=4 hence A=7 and B=15 respectively. This diagram is for single coder.



If data n1 andn2, obtained from Cur. pixel and Ref.pixel, are sent to the Mod1 andMod1 circuits at the first clock, then values |n1|1 and |n2|1 can be created at the second clock. When the third clock is triggered, the summation of modulus values, =|n1| 1+ |n2|1, is generated by the adder. Additionally, the following two data, n3 and n4, are simultaneously passed into the Mod 11 and Mod 12 circuits for modulo operations. The Mod 13 circuit executes the operation of ||1=||n1| 1+|n2| 1| 1, and then stores the computational results in the register at the fourth clock. Moreover, the summation of the modulus value = |n3| 1+ |n4| 1can be captured by an adder at this time. Notably, the two selected signals S1 and S2 in multiplexers M1 and M2 are set to 0 to sum the modulus values at clocks 14. At the fifth clock, the next two data, n5and n6, are transmitted to the coder 1 circuit, and signals S1 and S2 are changed from 0 to 1 to deliver the values || 1 and || 1 to the adder for addition. Fig. 4 clearly demonstrates that the regularity processes will continue after the fifth clock. Since a 4 _ 4 macroblock in a specific PEi of the MECA contains 16 pixels, the accumulated input data from coder 1 and coder 2 circuits in the TCG are exported to the DAC and SAC circuits for error detection and error correction, respectively, after 50 clocks.



3. Selector: It takes the output of the processing element as an input. Another input to the selector is the output of the detector. If the Detector block detects any error in the PE output then the Selector block will give the PE output to Syndrome Decoder to detect in which bit position there is an error and also to the Corrector block to correct the single bit error.



4. Detector:
This module will detect whether there is an error in output of PE i.e. SAD. Here the Error is calculated. The output of PE and theoretically calculated SAD will be subtracted which is given as

If SAD> SAD then e= + 5. Syndrome Decoder:

else e=

This module decodes the syndrome values which specify the error in the bit position position of SAD.

The table shown below specifies the depending upon the calculated syndrome values the single bit error bit position for ex: (s1, s2) = (1, 1) or (6, 14) then error is in zero position.



The output of the syndrome decoder is given as the input to the corrector module.

6. Corrector: Input to the corrector module is the output of the selector module which is SAD that needs to be corrected; the bit position which needs correction is specified by the syndrome decoder. The corrector architecture consists of LUT and 12 multiplexers.



In the figure above, the output of the syndrome decoder is going as the input to the lut.the output of the lut will act as the select line to the multiplexer. The selector output will act as the input to all the multiplexer which takes single bit as an input. For ex: if the input to the lut is (1,1) then the output is 000000000001 which specifies that 0th bit of the select out requires the the multiplxer is 2*1 to which 1 input is the inverted value of the other,if the select line is zero input is given as it is to the output else inverted input is given as the output. Hence the correction is done.




All Verilog programs are composed of modules. These modules may be instantiated inside of other modules to create a hierarchy that represents the structure of the hardware system. Modules contain three types of concurrent process statements: initial blocks, always blocks, and ontinuous assignments. Initial blocks are executed once at the beginning of the simulation, while always blocks are executed repeatedly. Initial and always blocks consist of statements that are executed sequentially, and each can wait on a signal to change value using wait or @ statements. A continuous assignment is an assignment to a wire whose left hand side continuously reflects the current state of the variables on the right hand side. A Verilog simulator simulates a model sequentially by removing events from an event queue, executing the events, and placing new events on the queue as they become activated. VeriSUIF supports a subset of Verilog. We do not support tasks, functions, fork/join, or the disable statement. However, we do not foresee any difficulty in extending our system to handle these features.
There are two basic types of digital design methodologies: a top-down design methodology and a bottom-up design methodology. In a top-down design methodology, we define the top-level block and identify the sub-blocks necessary to build the top-level block. We further subdivide the subblocks until we come to leaf cells, which are the cells that cannot further be divided. Figure 2-1 shows the top-down design process

In a bottom-up design methodology, we first identify the building blocks that are available to us. We build bigger cells, using these building blocks. These cells are then used for higher-level blocks until we build the top-level block in the design. Figure 2-2 shows the bottom-up design process. Typically, a combination of top-down and bottom-up flows is used. Design architects define the specifications of the top-level block. Logic designers decide how the design should be structured by breaking up the functionality into blocks and sub-blocks. At the same time, circuit designers are designing optimized circuits for leaf-level cells. They build higher-level cells by using these leaf cells. The flow meets at an intermediate point where the switch-level circuit designers have created a library of leaf cells by using switches, and the logic level designers have designed from top-down until all modules are defined in terms of leaf cells. Modules vnxcvncxn

M,Dnbvk.GSJckgASK We now relate these hierarchical modeling concepts to Verilog. Verilog provides the concept of a module. A module is the basic building block in Verilog. A module can be an element or a collection of lower-level design blocks. Typically, elements are grouped into modules to provide common functionality that is used at many places in the design. A module provides the necessary functionality to the higher-level block through its port interface (inputs and outputs), but hides the internal implementation. This allows the designer to modify module internals without affecting the rest of the design.

In Figure 2-5, ripple carry counter, T_FF, D_FF are examples of modules. In Verilog, a module is declared by the keyword module. A corresponding keyword endmodule must appear at the end of the module definition. Each module must have a module_name, which is the identifier for the module, and a module_terminal_list, which describes the input and output terminals of the module.

module <module_name> (<module_terminal_list>);

... <module internals> ... ... endmodule

Specifically, the T-flipflop could be defined as a module as follows:

module T_FF (q, clock, reset); <functionality of T-flipflop


Verilog is both a behavioral and a structural language. Internals of each module can be defined at four levels of abstraction, depending on the needs of the design. The module behaves identically with the external environment irrespective of the level of abstraction at which the module is described. The vnxcvncxn

M,Dnbvk.GSJckgASK internals of the module are hidden from the environment. Thus, the level of abstraction to describe a module can be changed without any change in the environment. These levels will be studied in detail in separate chapters later in the book. The levels are defined below.

Behavioral or algorithmic level This is the highest level of abstraction provided by Verilog HDL. A module can be implemented in terms of the desired design algorithm without concern for the hardware implementation details. Designing at this level is very similar to C programming.

Dataflow level At this level, the module is designed by specifying the data flow. The designer is aware of how data flows between hardware registers and how the data is processed in the design. Gate level The module is implemented in terms of logic gates and interconnections between these gates. Design at this level is similar to describing a design in terms of a gate-level logic diagram. Switch level This is the lowest level of abstraction provided by Verilog. A module can be implemented in terms of switches, storage nodes, and the interconnections between them. Design at this level requires knowledge of switch-level implementation details.

Verilog allows the designer to mix and match all four levels of abstractions in a design. In the digital design community, the term register transfer level (RTL) is frequently used for a Verilog description that uses a combination of behavioral and dataflow constructs and is acceptable to logic synthesis tools.

If a design contains four modules, Verilog allows each of the modules to be written at a different level of abstraction. As the design matures, most modules are replaced with gate-level implementations.

Normally, the higher the level of abstraction, the more flexible and technology-independent the design. As one goes lower toward switch-level design, the design becomes technology-dependent and inflexible. A small modification can cause a significant number of changes in the design. Consider the analogy with C programming and assembly language programming. It is easier to program in a vnxcvncxn

M,Dnbvk.GSJckgASK higher-level language such as C. The program can be easily ported to any machine. However, if you design at the assembly level, the program is specific for that machine and cannot be easily ported to another machine.

A module provides a template from which you can create actual objects. When a module is invoked, Verilog creates a unique object from the template. Each object has its own name, variables, parameters, and I/O interface. The process of creating objects from a module template is called instantiation, and the objects are called instances. In the top-level block creates four instances from the T-flipflop (T_FF) template. Each T_FF instantiates a D_FF and an inverter gate. Each instance must be given a unique name. Note that // is used to denote single-line comments.
Example 2-1 Module Instantiation // Define the top-level module called ripple carry // counter. It instantiates 4 T-flipflops. Interconnections are // shown in Section 2.2, 4-bit Ripple Carry Counter. module ripple_carry_counter(q, clk, reset); output [3:0] q; //I/O signals and vector declarations //will be explained later. input clk, reset; //I/O signals will be explained later. //Four instances of the module T_FF are created. Each has a unique //name.Each instance is passed a set of signals. Notice, that //each instance is a copy of the module T_FF. T_FF tff0(q[0],clk, reset); T_FF tff1(q[1],q[0], reset); T_FF tff2(q[2],q[1], reset); T_FF tff3(q[3],q[2], reset); endmodule // Define the module T_FF. It instantiates a D-flipflop. We assumed // that module D-flipflop is defined elsewhere in the design. Refer // to Figure 2-4 for interconnections. module T_FF(q, clk, reset); //Declarations to be explained later output q; input clk, reset; wire d; D_FF dff0(q, d, clk, reset); // Instantiate D_FF. Call it dff0. not n1(d, q); // not gate is a Verilog primitive. Explained later. endmodule

In Verilog, it is illegal to nest modules. One module definition cannot contain another module definition within the module and endmodule statements. Instead, a module definition can


incorporate copies of other modules by instantiating them. It is important not to confuse module definitions and instances of a module. Module definitions simply specify how the module will work, its internals, and its interface. Modules must be instantiated for use in the design. Lexical Conventions The basic lexical conventions used by Verilog HDL are similar to those in the C programming language. Verilog contains a stream of tokens. Tokens can be comments, delimiters, numbers, strings, identifiers, and keywords. Verilog HDL is a case-sensitive language. All keywords are in lowercase.

Whitespace Blank spaces (\b) , tabs (\t) and newlines (\n) comprise the whitespace. Whitespace is ignored by Verilog except when it separates tokens. Whitespace is not ignored in strings.

Comments Comments can be inserted in the code for readability and documentation. There are two ways to write comments. A one-line comment starts with "//". Verilog skips from that point to the end of line. A multiple-line comment starts with "/*" and ends with "*/". Multiple-line comments cannot be nested. However, one-line comments can be embedded in multiple-line comments.

a = b && c; // This is a one-line comment

/* This is a multiple line comment */

/* This is /* an illegal */ comment */

/* This is //a legal comment */



Operators Operators are of three types: unary, binary, and ternary. Unary operators precede the operand. Binary operators appear between two operands. Ternary operators have two separate operators that separate three operands.

a = ~ b; // ~ is a unary operator. b is the operand a = b && c; // && is a binary operator. b and c are operands a = b ? c : d; // ?: is a ternary operator. b, c and d are operands

Number Specification There are two types of number specification in Verilog: sized and unsized. Sized numbers Sized numbers are represented as <size> '<base format> <number>.

<size> is written only in decimal and specifies the number of bits in the number. Legal base formats are decimal ('d or 'D), hexadecimal ('h or 'H), binary ('b or 'B) and octal ('o or 'O). The number is specified as consecutive digits from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f. Only a subset of these digits is legal for a particular base. Uppercase letters are legal for number specification.

4'b1111 // This is a 4-bit binary number 12'habc // This is a 12-bit hexadecimal number 16'd255 // This is a 16-bit decimal number.



Unsized numbers Numbers that are specified without a <base format> specification are decimal numbers by default. Numbers that are written without a <size> specification have a default number of bits that is simulator- and machine-specific (must be at least 32).

23456 // This is a 32-bit decimal number by default 'hc3 // This is a 32-bit hexadecimal number 'o21 // This is a 32-bit octal number

X or Z values Verilog has two symbols for unknown and high impedance values. These values are very important for modeling real circuits. An unknown value is denoted by an x. A high impedance value is denoted by z.

12'h13x // This is a 12-bit hex number; 4 least significant bits unknown 6'hx // This is a 6-bit hex number 32'bz // This is a 32-bit high impedance number

An x or z sets four bits for a number in the hexadecimal base, three bits for a number in the octal base, and one bit for a number in the binary base. If the most significant bit of a number is 0, x, or z, the number is automatically extended to fill the most significant bits, respectively, with 0, x, or z. This makes it easy to assign x or z to whole vector. If the most significant digit is 1, then it is also zero extended.

Negative numbers Negative numbers can be specified by putting a minus sign before the size for a constant number. Size constants are always positive. It is illegal to have a minus sign between <base format> and <number>. An optional signed specifier can be added for signed arithmetic.



-6'd3 // 8-bit negative number stored as 2's complement of 3 -6'sd3 // Used for performing signed integer math 4'd-2 // Illegal specification

Underscore characters and question marks An underscore character "_" is allowed anywhere in a number except the first character. Underscore characters are allowed only to improve readability of numbers and are ignored by Verilog.

A question mark "?" is the Verilog HDL alternative for z in the context of numbers. The ? is used to enhance readability in the casex and casez statements discussed in Chapter 7, where the high impedance value is a don't care condition. (Note that ? has a different meaning in the context of user-defined primitives, which are discussed in Chapter 12, User-Defined Primitives.)

12'b1111_0000_1010 // Use of underline characters for readability 4'b10?? // Equivalent of a 4'b10zz

Strings A string is a sequence of characters that are enclosed by double quotes. The restriction on a string is that it must be contained on a single line, that is, without a carriage return. It cannot be on multiple lines. Strings are treated as a sequence of one-byte ASCII values.

"Hello Verilog World" // is a string "a / b" // is a string

3.1.6 Identifiers and Keywords



Keywords are special identifiers reserved to define the language constructs. Keywords are in lowercase. A list of all keywords in Verilog is contained in Appendix C, List of Keywords, System Tasks, and Compiler Directives.

Identifiers are names given to objects so that they can be referenced in the design. Identifiers are made up of alphanumeric characters, the underscore ( _ ), or the dollar sign ( $ ). Identifiers are case sensitive. Identifiers start with an alphabetic character or an underscore. They cannot start with a digit or a $ sign (The $ sign as the first character is reserved for system tasks, which are explained later in the book).

reg value; // reg is a keyword; value is an identifier input clk; // input is a keyword, clk is an identifier

Escaped Identifiers Escaped identifiers begin with the backslash ( \ ) character and end with whitespace (space, tab, or newline). All characters between backslash and whitespace are processed literally. Any printable ASCII character can be included in escaped identifiers. Neither the backslash nor the terminating whitespace is considered to be a part of the identifier.

\a+b-c \**my_name**



Data Types
This section discusses the data types used in Verilog.
Value Set

Verilog supports four values and eight strengths to model the functionality of real hardware. The four value levels are listed in Table 3-1.
Table 3-1. Value Levels

Value Level 0 1 x z

Condition in Hardware Circuits Logic zero, false condition Logic one, true condition Unknown logic value High impedance, floating state

In addition to logic values, strength levels are often used to resolve conflicts between drivers of different strengths in digital circuits. Value levels 0 and 1 can have the strength levels listed in Table 3-2.
Table 3-2. Strength Levels

Strength Level supply strong pull large weak medium small highz Driving Driving Driving Storage Driving Storage Storage


Degree strongest

High Impedance


If two signals of unequal strengths are driven on a wire, the stronger signal prevails. For example, if two signals of strength strong1 and weak0 contend, the result is resolved as a strong1. If two signals of equal strengths are driven on a wire, the result is unknown. If two signals of strength strong1 and strong0 conflict, the result is an x. Strength levels are particularly useful for accurate modeling of signal contention, MOS devices, dynamic MOS, and other low-level devices. Only trireg nets can have storage strengths large, medium,


and small. Detailed information about strength modeling is provided in Appendix A, Strength Modeling and Advanced Net Definitions.

Nets represent connections between hardware elements. Just as in real circuits, nets have values continuously driven on them by the outputs of devices that they are connected to. In Figure 3-1 net a is connected to the output of and gate g1. Net a will continuously assume the value computed at the output of gate g1, which is b & c.

Nets are declared primarily with the keyword wire. Nets are one-bit values by default unless they are declared explicitly as vectors. The terms wire and net are often used interchangeably. The default value of a net is z (except the trireg net, which defaults to x ). Nets get the output value of their drivers. If a net has no driver, it gets the value z.
wire a; // Declare net a for the above circuit wire b,c; // Declare two wires b,c for the above circuit wire d = 1'b0; // Net d is fixed to logic value 0 at declaration.

Note that net is not a keyword but represents a class of data types such as wire, wand, wor, tri, triand, trior, trireg, etc. The wire declaration is used most frequently. Other net declarations are discussed in Appendix A, Strength Modeling and Advanced Net Definitions.

Registers represent data storage elements. Registers retain value until another value is placed onto them. Do not confuse the term registers in Verilog with hardware registers built from edge-triggered flipflops in real circuits. In Verilog, the term register merely means a variable that can hold a value. Unlike a net, a register does not need a driver. Verilog registers do not need a clock as hardware registers do. Values of registers can be changed anytime in a simulation by assigning a new value to the register. Register data types are commonly declared by the keyword reg. The default value for a reg data type is x. An example of how registers are used is shown Example 3-1.
Example 3-1 Example of Register reg reset; // declare a variable reset that can hold its value initial // this construct will be discussed later begin reset = 1'b1; //initialize reset to 1 to reset the digital circuit. #100 reset = 1'b0; // after 100 time units reset is deasserted. end



Registers can also be declared as signed variables. Such registers can be used for signed arithmetic. Example 3-2 shows the declaration of a signed register.
Example 3-2 Signed Register Declaration reg signed [63:0] m; // 64 bit signed value integer i; // 32 bit signed value


Nets or reg data types can be declared as vectors (multiple bit widths). If bit width is not specified, the default is scalar (1-bit).
wire a; // scalar net variable, default wire [7:0] bus; // 8-bit bus wire [31:0] busA,busB,busC; // 3 buses of 32-bit width. reg clock; // scalar register, default reg [0:40] virtual_addr; // Vector register, virtual address 41 bits wide

Vectors can be declared at [high# : low#] or [low# : high#], but the left number in the squared brackets is always the most significant bit of the vector. In the example shown above, bit 0 is the most significant bit of vector virtual_addr.
Vector Part Select

For the vector declarations shown above, it is possible to address bits or parts of vectors.
busA[7] // bit # 7 of vector busA bus[2:0] // Three least significant bits of vector bus, // using bus[0:2] is illegal because the significant bit should // always be on the left of a range specification virtual_addr[0:1] // Two most significant bits of vector virtual_addr Variable Vector Part Select

Another ability provided in Verilog HDl is to have variable part selects of a vector. This allows part selects to be put in for loops to select various parts of the vector. There are two special part-select operators: [<starting_bit>+:width] - part-select increments from starting bit [<starting_bit>-:width] - part-select decrements from starting bit The starting bit of the part select can be varied, but the width has to be constant. The following example shows the use of variable vector part select:
reg [255:0] data1; //Little endian notation reg [0:255] data2; //Big endian notation reg [7:0] byte;



//Using a variable part select, one can choose parts byte = data1[31-:8]; //starting bit = 31, width =8 => data[31:24] byte = data1[24+:8]; //starting bit = 24, width =8 => data[31:24] byte = data2[31-:8]; //starting bit = 31, width =8 => data[24:31] byte = data2[24+:8]; //starting bit = 24, width =8 => data[24:31] //The starting bit can also be a variable. The width has //to be constant. Therefore, one can use the variable part select //in a loop to select all bytes of the vector. for (j=0; j<=31; j=j+1) byte = data1[(j*8)+:8]; //Sequence is [7:0], [15:8]... [255:248] //Can initialize a part of the vector data1[(byteNum*8)+:8] = 8'b0; //If byteNum = 1, clear 8 bits [15:8]

Integer , Real, and Time Register Data Types

Integer, real, Integer

and time register data types are supported in Verilog.

An integer is a general purpose register data type used for manipulating quantities. Integers are declared by the keyword integer. Although it is possible to use reg as a general-purpose variable, it is more convenient to declare an integer variable for purposes such as counting. The default width for an integer is the host-machine word size, which is implementationspecific but is at least 32 bits. Registers declared as data type reg store values as unsigned quantities, whereas integers store values as signed quantities.
integer counter; // general purpose variable used as a counter. initial counter = -1; // A negative one is stored in the counter Real

Real number constants and real register data types are declared with the keyword real. They can be specified in decimal notation (e.g., 3.14) or in scientific notation (e.g., 3e6, which is 3 x 106 ). Real numbers cannot have a range declaration, and their default value is 0. When a real value is assigned to an integer, the real number is rounded off to the nearest integer.
real delta; // Define a real variable called delta initial begin delta = 4e10; // delta is assigned in scientific notation delta = 2.13; // delta is assigned a value 2.13 end integer i; // Define an integer i initial i = delta; // i gets the value 2 (rounded value of 2.13) Time



Verilog simulation is done with respect to simulation time. A special time register data type is used in Verilog to store simulation time. A time variable is declared with the keyword time. The width for time register data types is implementation-specific but is at least 64 bits.The system function $time is invoked to get the current simulation time.
time save_sim_time; // Define a time variable save_sim_time initial save_sim_time = $time; // Save the current simulation time

Simulation time is measured in terms of simulation seconds. The unit is denoted by s, the same as real time. However, the relationship between real time in the digital circuit and simulation time is left to the user. This is discussed in detail in Section 9.4, Time Scales.

Arrays are allowed in Verilog for reg, integer, time, real, realtime and vector register data types. Multi-dimensional arrays can also be declared with any number of dimensions. Arrays of nets can also be used to connect ports of generated instances. Each element of the array can be used in the same fashion as a scalar or vector net. Arrays are accessed by <array_name>[<subscript>]. For multi-dimensional arrays, indexes need to be provided for each dimension.
integer count[0:7]; // An array of 8 count variables reg bool[31:0]; // Array of 32 one-bit boolean register variables time chk_point[1:100]; // Array of 100 time checkpoint variables reg [4:0] port_id[0:7]; // Array of 8 port_ids; each port_id is 5 bits wide integer matrix[4:0][0:255]; // Two dimensional array of integers reg [63:0] array_4d [15:0][7:0][7:0][255:0]; //Four dimensional array wire [7:0] w_array2 [5:0]; // Declare an array of 8 bit vector wire wire w_array1[7:0][5:0]; // Declare an array of single bit wires

It is important not to confuse arrays with net or register vectors. A vector is a single element that is n-bits wide. On the other hand, arrays are multiple elements that are 1-bit or n-bits wide. Examples of assignments to elements of arrays discussed above are shown below:
count[5] = 0; // Reset 5th element of array of count variables chk_point[100] = 0; // Reset 100th time check point value port_id[3] = 0; // Reset 3rd element (a 5-bit value) of port_id array. matrix[1][0] = 33559; // Set value of element indexed by [1][0] to 33559 array_4d[0][0][0][0][15:0] = 0; //Clear bits 15:0 of the register


//accessed by indices [0][0][0][0] port_id = 0; // Illegal syntax - Attempt to write the entire array matrix [1] = 0; // Illegal syntax - Attempt to write [1][0]..[1][255]


In digital simulation, one often needs to model register files, RAMs, and ROMs. Memories are modeled in Verilog simply as a one-dimensional array of registers. Each element of the array is known as an element or word and is addressed by a single array index. Each word can be one or more bits. It is important to differentiate between n 1-bit registers and one n-bit register. A particular word in memory is obtained by using the address as a memory array subscript.
reg mem1bit[0:1023]; // Memory mem1bit with 1K 1-bit words reg [7:0] membyte[0:1023]; // Memory membyte with 1K 8-bit words(bytes) membyte[511] // Fetches 1 byte word whose address is 511.


Verilog allows constants to be defined in a module by the keyword parameter. Parameters cannot be used as variables. Parameter values for each module instance can be overridden individually at compile time. This allows the module instances to be customized. This aspect is discussed later. Parameter types and sizes can also be defined.
parameter port_id = 5; // Defines a constant port_id parameter cache_line_width = 256; // Constant defines width of cache line parameter signed [15:0] WIDTH; // Fixed sign and range for parameter // WIDTH

Module definitions may be written in terms of parameters. Hardcoded numbers should be avoided. Parameters values can be changed at module instantiation or by using the defparam statement, which is discussed in detail in Useful Modeling Techniques. Thus, the use of parameters makes the module definition flexible. Module behavior can be altered simply by changing the value of a parameter. Verilog HDL local parameters (defined using keyword localparam -) are identical to parameters except that they cannot be directly modified with the defparam statement or by the ordered or named parameter value assignment. The localparam keyword is used to define parameters when their values should not be changed. For example, the state encoding for a state machine can be defined using localparam. The state encoding cannot be changed. This provides protection against inadvertent parameter redefinition.
localparam state1 = 4'b0001, state2 = 4'b0010, state3 = 4'b0100, state4 = 4'b1000;


M,Dnbvk.GSJckgASK Strings

Strings can be stored in reg. The width of the register variables must be large enough to hold the string. Each character in the string takes up 8 bits (1 byte). If the width of the register is greater than the size of the string, Verilog fills bits to the left of the string with zeros. If the register width is smaller than the string width, Verilog truncates the leftmost bits of the string. It is always safe to declare a string that is slightly wider than necessary.
reg [8*18:1] string_value; // Declare a variable that is 18 bytes wide initial string_value = "Hello Verilog World"; // String can be stored // in variable

Special characters serve a special purpose in displaying strings, such as newline, tabs, and displaying argument values. Special characters can be displayed in strings only when they are preceded by escape characters, as shown in
Table 3-3. Special Characters

Escaped Characters \n \t %% \\ \" \ooo newline tab % \ "

Character Displayed

Character written in 13 octal digits

Modules We discussed how a module is a basic building block, Hierarchical Modeling Concepts. We ignored the internals of modules and concentrated on how modules are defined and instantiated. In this section, we analyze the internals of the module in greater detail. A module definition always begins with the keyword module. The module name, port list, port declarations, and optional parameters must come first in a module definition. Port list and port declarations are present only if the module has any ports to interact with the external environment.The five components within a module are: variable declarations, dataflow statements, instantiation of lower modules, behavioral blocks, and tasks or functions. These components can be in any order and at any place in the module definition. The endmodule


statement must always come last in a module definition. All components except module, module name, and endmodule are optional and can be mixed and matched as per design needs. Verilog allows multiple modules to be defined in a single file. The modules can be defined in any order in the file. Ports Ports provide the interface by which a module can communicate with its environment. For example, the input/output pins of an IC chip are its ports. The environment can interact with the module only through its ports. The internals of the module are not visible to the environment. This provides a very powerful flexibility to the designer. The internals of the module can be changed without affecting the environment as long as the interface is not modified. Ports are also referred to as terminals.
Port Declaration

All ports in the list of ports must be declared in the module. Ports can be declared as follows: Verilog Keyword
input output inout

Type of Port Input port Output port Bidirectional port

Inputs Internally, input ports must always be of the type net. Externally, the inputs can be connected to a variable which is a reg or a net.

Outputs Internally, outputs ports can be of the type reg or net. Externally, outputs must always be connected to a net. They cannot be connected to a reg.

Inouts Internally, inout ports must always be of the type net. Externally, inout ports must always be connected to a net.



Width matching It is legal to connect internal and external items of different sizes when making inter-module port connections. However, a warning is typically issued that the widths do not match.

Unconnected ports Verilog allows ports to remain unconnected. For example, certain output ports might be simply for debugging, and you might not be interested in connecting them to the external signals. You can let a port remain unconnected by instantiating a module as shown below.

fulladd4 fa0(SUM, , A, B, C_IN); // Output port c_out is unconnected

Expressions, Operators, and Operands

Dataflow modeling describes the design in terms of expressions instead of primitive gates. Expressions, operators, and operands form the basis of dataflow modeling.

Expressions are constructs that combine operators and operands to produce a result.
// Examples of expressions. Combines operands and operators a^b addr1[20:17] + addr2[20:17] in1 | in2


Operands can be any one of the data types defined, Data Types. Some constructs will take only certain types of operands. Operands can be constants, integers, real numbers, nets, registers, times, bit-select (one bit of vector net or a vector register), part-select (selected bits of the vector net or register vector), and memories or function calls (functions are discussed later).
integer count, final_count; final_count = count + 1;//count is an integer operand real a, b, c; c = a - b; //a and b are real operands reg [15:0] reg1, reg2;


reg [3:0] reg_out; reg_out = reg1[3:0] ^ reg2[3:0];//reg1[3:0] and reg2[3:0] are //part-select register operands reg ret_value; ret_value = calculate_parity(A, B);//calculate_parity is a //function type operand


Operators act on the operands to produce desired results. Verilog provides various types of operators. Operator Types.
d1 && d2 // && is an operator on operands d1 and d2 !a[0] // ! is an operator on operand a[0] B >> 1 // >> is an operator on operands B and 1

Structured Procedures
There are two structured procedure statements in Verilog: always and initial. These statements are the two most basic statements in behavioral modeling. All other behavioral statements can appear only inside these structured procedure statements. Verilog is a concurrent programming language unlike the C programming language, which is sequential in nature. Activity flows in Verilog run in parallel rather than in sequence. Each always and initial statement represents a separate activity flow in Verilog. Each activity flow starts at simulation time 0. The statements always and initial cannot be nested. The fundamental difference between the two statements is explained in the following sections.
initial Statement

All statements inside an initial statement constitute an initial block. An initial block starts at time 0, executes exactly once during a simulation, and then does not execute again. If there are multiple initial blocks, each block starts to execute concurrently at time 0. Each block finishes execution independently of other blocks. Multiple behavioral statements must be grouped, typically using the keywords begin and end. If there is only one behavioral statement, grouping is not necessary. This is similar to the begin-end blocks in Pascal programming language or the { } grouping in the C programming language. Example 7-1 illustrates the use of the initial statement.
Example 7-1 initial Statement module stimulus; reg x,y, a,b, m; initial m = 1'b0; //single statement; does not need to be grouped


initial begin #5 a = 1'b1; //multiple statements; need to be grouped #25 b = 1'b0; end initial begin #10 x = 1'b0; #25 y = 1'b1; end initial #50 $finish; endmodule

In the above example, the three initial statements start to execute in parallel at time 0. If a delay #<delay> is seen before a statement, the statement is executed <delay> time units after the current simulation time. Thus, the execution sequence of the statements inside the initial blocks will be as follows.
time 0 5 10 30 35 50 statement executed m = 1'b0; a = 1'b1; x = 1'b0; b = 1'b0; y = 1'b1; $finish;

The initial blocks are typically used for initialization, monitoring, waveforms and other processes that must be executed only once during the entire simulation run. The following subsections discussion how to initialize values using alternate shorthand syntax. The use of such shorthand syntax has the same effect as an initial block combined with a variable declaration.
Combined Variable Declaration and Initialization

Variables can be initialized when they are declared. Example 7-2 shows such a declaration.
Example 7-2 Initial Value Assignment //The clock variable is defined first reg clock; //The value of clock is set to 0 initial clock = 0; //Instead of the above method, clock variable //can be initialized at the time of declaration //This is allowed only for variables declared //at module level. reg clock = 0;


Combined Port/Data Declaration and Initialization

The combined port/data declaration can also be combined with an initialization shows such a declaration.
Example 7-3 Combined Port/Data Declaration and Variable Initialization module adder (sum, co, a, b, ci); output reg [7:0] sum = 0; //Initialize 8 bit output sum output reg co = 0; //Initialize 1 bit output co input [7:0] a, b; input ci; --endmodule Combined ANSI C Style Port Declaration and Initialization

ANSI C style port declaration can also be combined with an initialization.

Example 7-4 Combined ANSI C Port Declaration and Variable Initialization module adder (output reg [7:0] sum = 0, //Initialize 8 bit output output reg co = 0, //Initialize 1 bit output co input [7:0] a, b, input ci ); --endmodule

always Statement

All behavioral statements inside an always statement constitute an always block. The always statement starts at time 0 and executes the statements in the always block continuously in a looping fashion. This statement is used to model a block of activity that is repeated continuously in a digital circuit. An example is a clock generator module that toggles the clock signal every half cycle. In real circuits, the clock generator is active from time 0 to as long as the circuit is powered on. illustrates one method to model a clock generator in Verilog.
Example 7-5 always Statement module clock_gen (output reg clock); //Initialize clock at time zero initial clock = 1'b0; //Toggle clock every half-cycle (time period = 20)


always #10 clock = ~clock; initial #1000 $finish; endmodule

In, the always statement starts at time 0 and executes the statement clock = ~clock every 10 time units. Notice that the initialization of clock has to be done inside a separate initial statement. If we put the initialization of clock inside the always block, clock will be initialized every time the always is entered. Also, the simulation must be halted inside an initial statement. If there is no $stop or $finish statement to halt the simulation, the clock generator will run forever.

Timing and Delays

Functional verification of hardware is used to verify functionality of the designed circuit. However, blocks in real hardware have delays associated with the logic elements and paths in them. Therefore, we must also check whether the circuit meets the timing requirements, given the delay specifications for the blocks. Checking timing requirements has become increasingly important as circuits have become smaller and faster. One of the ways to check timing is to do a timing simulation that accounts for the delays associated with the block during the simulation. Techniques other than timing simulation to verify timing have also emerged in design automation industry. The most popular technique is static timing verification. Designers first do a pure functional verification and then verify timing separately with a static timing verification tool. The main advantage of static verification is that it can verify timing in orders of magnitude more quickly than timing simulation.

Delay Back-Annotation
is an important and vast topic in timing simulation. An entire book could be devoted to that subject. However, in this section, we introduce the designer to the concept of back-annotation of delays in a simulation. Detailed coverage of this topic is outside the scope of this book. For details, refer to the IEEE Standard Verilog Hardware Description Language document.
Delay back-annotation

The various steps in the flow that use delay back-annotation are as follows: 1. The designer writes the RTL description and then performs functional simulation. 2. The RTL description is converted to a gate-level netlist by a logic synthesis tool. 3. The designer obtains pre-layout estimates of delays in the chip by using a delay calculator and information about the IC fabrication process. Then, the designer does timing simulation or static timing verification of the gate-level netlist, using these preliminary values to check that the gate-level netlist meets timing constraints.


4. The gate-level netlist is then converted to layout by a place and route tool. The postlayout delay values are computed from the resistance (R) and capacitance (C) information in the layout. The R and C information is extracted from factors such as geometry and IC fabrication process. 5. The post-layout delay values are back-annotated to modify the delay estimates for the gate-level netlist. Timing simulation or static timing verification is run again on the gate-level netlist to check if timing constraints are still satisfied. 6. If design changes are required to meet the timing constraints, the designer has to go back to the RTL level, optimize the design for timing, and then repeat Step 2 through Step 5.

User-Defined Primitives
Verilog provides a standard set of primitives, such as and, nand, or, nor, and not, as a part of the language. These are also commonly known as built-in primitives. However, designers occasionally like to use their own custom-built primitives when developing a design. Verilog provides the ability to define User-Defined Primitives (UDP). These primitives are selfcontained and do not instantiate other modules or primitives. UDPs are instantiated exactly like gate-level primitives. There are two types of UDPs: combinational and sequential.

Combinational UDPs are defined where the output is solely determined by a logical combination of the inputs. A good example is a 4-to-1 multiplexer. Sequential UDPs take the value of the current inputs and the current output to determine the value of the next output. The value of the output is also the internal state of the UDP. Good examples of sequential UDPs are latches and flipflops.

Learning Objectives

Sequential UDPs
Sequential UDPs differ from combinational UDPs in their definition and behavior. Sequential UDPs have the following differences:

The output of a sequential UDP is always declared as a reg. An initial statement can be used to initialize output of sequential UDPs. The format of a state table entry is slightly different.
<input1> <input2> ..... <inputN> : <current_state> : <next_state>;

There are three sections in a state table entry: inputs, current state, and next state. The three sections are separated by a colon (:) symbol. The input specification of state table entries can be in terms of input levels or edge transitions. The current state is the current value of the output register.



The next state is computed based on inputs and the current state. The next state becomes the new value of the output register. All possible combinations of inputs must be specified to avoid unknown output values.

If a sequential UDP is sensitive to input levels, it is called a level-sensitive sequential UDP. If a sequential UDP is sensitive to edge transitions on inputs, it is called an edge-sensitive sequential UDP.

Logic Synthesis with Verilog HDL

Advances in logic synthesis have pushed HDLs into the forefront of digital design technology. Logic synthesis tools have cut design cycle times significantly. Designers can design at a high level of abstraction and thus reduce design time. In this chapter, we discuss logic synthesis with Verilog HDL. Synopsys synthesis products were used for the examples in this chapter, and results for individual examples may vary with synthesis tools. However, the concepts discussed in this chapter are general enough to be applied to any logic synthesis tool.[1] This chapter is intended to give the reader a basic understanding of the mechanics and issues involved in logic synthesis. It is not intended to be comprehensive material on logic synthesis. Detailed knowledge of logic synthesis can be obtained from reference manuals, logic synthesis books, and by attending training classes.

What Is Logic Synthesis?

Simply speaking, logic synthesis is the process of converting a high-level description of the design into an optimized gate-level representation, given a standard cell library and certain design constraints. A standard cell library can have simple cells, such as basic logic gates like and, or, and nor, or macro cells, such as adders, muxes, and special flip-flops. A standard cell library is also known as the technology library. It is discussed in detail later in this chapter. Logic synthesis always existed even in the days of schematic gate-level design, but it was always done inside the designer's mind. The designer would first understand the architectural description. Then he would consider design constraints such as timing, area, testability, and power. The designer would partition the design into high-level blocks, draw them on a piece of paper or a computer terminal, and describe the functionality of the circuit. This was the high-level description. Finally, each block would be implemented on a hand-drawn schematic, using the cells available in the standard cell library. The last step was the most complex process in the design flow and required several time-consuming design iterations before an optimized gate-level representation that met all design constraints was obtained.

Impact of Logic Synthesis

Logic synthesis has revolutionized the digital design industry by significantly improving productivity and by reducing design cycle time. Before the days of automated logic synthesis, when designs were converted to gates manually, the design process had the following limitations:


For large designs, manual conversion was prone to human error. A small gate missed somewhere could mean redesign of entire blocks. The designer could never be sure that the design constraints were going to be met until the gate-level implementation was completed and tested. A significant portion of the design cycle was dominated by the time taken to convert a high-level design into gates. If the gate-level design did not meet requirements, the turnaround time for redesign of blocks was very high. What-if scenarios were hard to verify. For example, the designer designed a block in gates that could run at a cycle time of 20 ns. If the designer wanted to find out whether the circuit could be optimized to run faster at 15 ns, the entire block had to be redesigned. Thus, redesign was needed to verify what-if scenarios. Each designer would implement design blocks differently. There was little consistency in design styles. For large designs, this could mean that smaller blocks were optimized, but the overall design was not optimal. If a bug was found in the final, gate-level design, this would sometimes require redesign of thousands of gates. Timing, area, and power dissipation in library cells are fabrication-technology specific. Thus if the company changed the IC fabrication vendor after the gate-level design was complete, this would mean redesign of the entire circuit and a possible change in design methodology. Design reuse was not possible. Designs were technology-specific, hard to port, and very difficult to reuse.

Automated logic synthesis tools addressed these problems as follows:

High-level design is less prone to human error because designs are described at a higher level of abstraction. High-level design is done without significant concern about design constraints. Logic synthesis will convert a high-level design to a gate-level netlist and ensure that all constraints have been met. If not, the designer goes back, modifies the high-level design and repeats the process until a gate-level netlist that satisfies timing, area, and power constraints is obtained. Conversion from high-level design to gates is fast. With this improvement, design cycle times are shortened considerably. What took months before can now be done in hours or days. Turnaround time for redesign of blocks is shorter because changes are required only at the register-transfer level; then, the design is simply resynthesized to obtain the gate-level netlist. What-if scenarios are easy to verify. The high-level description does not change. The designer has merely to change the timing constraint from 20 ns to 15 ns and resynthesize the design to get the new gate-level netlist that is optimized to achieve a cycle time of 15 ns. Logic synthesis tools optimize the design as a whole. This removes the problem with varied designer styles for the different blocks in the design and suboptimal designs.



If a bug is found in the gate-level design, the designer goes back and changes the high-level description to eliminate the bug. Then, the high-level description is again read into the logic synthesis tool to automatically generate a new gate-level description. Logic synthesis tools allow technology-independent design. A high-level description may be written without the IC fabrication technology in mind. Logic synthesis tools convert the design to gates, using cells in the standard cell library provided by an IC fabrication vendor. If the technology changes or the IC fabrication vendor changes, designers simply use logic synthesis to retarget the design to gates, using the standard cell library for the new technology. Design reuse is possible for technology-independent descriptions. For example, if the functionality of the I/O block in a microprocessor does not change, the RTL description of the I/O block can be reused in the design of derivative microprocessors. If the technology changes, the synthesis tool simply maps to the desired technology.


This tutorial shows how to create, implement, simulate, and synthesize VHDL designs for implementation in FPGA chips using Xilinx ISE 9.2i and ModelSim: Xilinx Edition III v6.2g.


Launch Xilinx ISE from either the shortcut on your desktop or from your start menu under Programs Xilinx ISE 9.2i Project Navigator. Start a new project by clicking File New Project.



M,Dnbvk.GSJckgASK 3. In the resulting window, verify the Top-Level Source Type is VHDL. Change the Project Location to a suitable directory and give it what ever name you choose, e.g. lab3.


The next window shows the details of the project and the target chip. We will be synthesizing designs into real chips so it important to match the target chip with the particular board/chip you will be using. Beginning labs will be done in a Spartan 2E XC2S200E chip that comes in a PQ208 package with a speed grade of 6 as shown below.


M,Dnbvk.GSJckgASK 5. 6. Since we are starting a new design the next couple of pop-up windows arent relevant, just click Next and Next and Finish . You should now be in the main Project Navigator window. Select Project New Source from the menu.


In the resulting pop-up window specify a VHDL Module source and give the file a name. I tend to just use the same name as the project itself, e.g. Lab 3. Click Next.

The next pop-up window allows you to specify your inputs and outputs through the Wizard if you so desire. In this tutorial we will build a 2 x 1 multiplexer so we can specify the inputs and outputs as shown below. Here, the default entity and architecture names have also been changed. Once all inputs and outputs are entered click Next and click Finish. vnxcvncxn




The project will usually open with the design summary tab active in the right hand side of your window. We want to go to the VHDL code so you need to click the *.vhd tab for your design.


You can see that the Wizard has used STD_LOGIC as the default type for your signals and also filled in the basic entity and architecture details for you.




Now you can fill in the rest of your code for your design. In this case, we can do the multiplexer as shown below. Make sure to frequently save your code.


Once the code is entered we can proceed with a simulation of the design or we can synthesize the code for implementation and download onto an FPGA. Let us proceed with the simulation first. In the upper left-hand side of the ISE environment there is a Sources subwindow which has a drop down box as shown below. Note that the drop down box currently shows Synthesis/Implementation. Change this to Behavioral Simulation.



13. Highlight your *.vhd file in the Sources subwindow and then expand the ModelSim Simulator selection in the Processes subwindow as shown below. Click on Simulate Behavioral Model to launch the ModelSim simulator.


ModelSim should successfully launch and will open several subwindows by default. For now we just need the Wave and Transcript subwindows, so close the other subwindows and you should see the following:




To conduct the simulation you basically only need to know two commands, force and run. Force is used to set the value of any input variable. Then Run the simulation for a specific amount of time. To use a Force command, in the transcript window simply type Force, space, logic variable you wish to set, space, the value you wish to assign (0 or 1). For example, force b 0. Then type run 1000 to run the simulation for 1000 picoseconds. Alternatively, you could type run 1 ns. Change the various inputs one or more at a time and do additional run steps.


If you would like to clean things up or begin the simulation over, type restart at the transcript prompt and then click Restart in the resulting pop-up window. You can print the waveforms (Timing Diagram) to any printer or you can do a Shift + Print Screen to copy the waves to the Windows Clipboard so that you can paste and edit the image in a


M,Dnbvk.GSJckgASK graphics program if you desire. 17. It is convenient to automatically toggle your inputs high and low when testing your circuits. Use can use the force command to do this as well, for example force a 0 1 ns, 1 2 ns -r 2 ns says to assign a value of 0 to input a from the current moment to 1 ns from now (force a 0 1 ns). Then from 1 ns to 2 ns from the current moment input a will have a value of 1 (, 1 2 ns). Finally, this pattern will repeat itself every 2 ns (-r 2 ns). For this tutorial circuit, for instance, then you could establish repeating waveforms of different periods that will automatically cycle through every combination of input values. One possibility would be to type force i0 0 1 ns, 1 2 ns -r 2 ns force i1 0 2 ns, 1 4 ns -r 4 ns force a 0 4 ns, 1 8 ns -r 8 ns run 48 ns which would result in the timing diagram shown below.









SYNTHESIS REPORT: Release 9.2i - xst J.36 Copyright (c) 1995-2007 Xilinx, Inc. All rights reserved. --> Parameter TMPDIR set to ./xst/projnav.tmp CPU : 0.00 / 0.45 s | Elapsed : 0.00 / 1.00 s

--> Parameter xsthdpdir set to ./xst CPU : 0.00 / 0.45 s | Elapsed : 0.00 / 1.00 s

--> Reading design: ultimate_meca.prj

TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) Design Hierarchy Analysis 4) HDL Analysis 5) HDL Synthesis 5.1) HDL Synthesis Report 6) Advanced HDL Synthesis vnxcvncxn

M,Dnbvk.GSJckgASK 6.1) Advanced HDL Synthesis Report 7) Low Level Synthesis 8) Partition Report 9) Final Report 9.1) Device utilization summary 9.2) Partition Resource Summary 9.3) TIMING REPORT

======================================================================== = * Synthesis Options Summary *

======================================================================== = ---- Source Parameters Input File Name Input Format : "ultimate_meca.prj" : mixed

Ignore Synthesis Constraint File : NO

---- Target Parameters Output File Name Output Format Target Device : "ultimate_meca" : NGC : xc3s500e-4-fg320

---- Source Options Top Module Name Automatic FSM Extraction vnxcvncxn : ultimate_meca : YES

M,Dnbvk.GSJckgASK FSM Encoding Algorithm Safe Implementation FSM Style RAM Extraction RAM Style ROM Extraction Mux Style Decoder Extraction Priority Encoder Extraction Shift Register Extraction Logical Shifter Extraction XOR Collapsing ROM Style Mux Extraction Resource Sharing : Auto : No : lut : Yes : Auto : Yes : Auto : YES : YES : YES : YES : YES : Auto : YES : YES : NO

Asynchronous To Synchronous Multiplier Style : auto

Automatic Register Balancing

: No

---- Target Options Add IO Buffers Global Maximum Fanout : YES : 500 : 24

Add Generic Clock Buffer(BUFG) Register Duplication Slice Packing : YES : YES

Optimize Instantiated Primitives : NO vnxcvncxn

M,Dnbvk.GSJckgASK Use Clock Enable Use Synchronous Set Use Synchronous Reset Pack IO Registers into IOBs Equivalent register Removal : Yes : Yes : Yes : auto : YES

---- General Options Optimization Goal Optimization Effort Library Search Order Keep Hierarchy RTL Output Global Optimization Read Cores Write Timing Constraints Cross Clock Analysis Hierarchy Separator Bus Delimiter Case Specifier Slice Utilization Ratio BRAM Utilization Ratio Verilog 2001 Auto BRAM Packing Slice Utilization Ratio Delta : Speed :1 : ultimate_meca.lso : NO : Yes : AllClockNets : YES : NO : NO :/ : <> : maintain : 100 : 100 : YES : NO :5

======================================================================== = vnxcvncxn


======================================================================== = * HDL Compilation *

======================================================================== = Compiling verilog file "../../../My_MECA/My_MECA/full_adder.v" in library work Compiling verilog file "../../../My_MECA/My_MECA/modulo.v" in library work Module <full_adder> compiled Compiling verilog file "../../../My_MECA/My_MECA/compressor3_2.v" in library work Module <modulo> compiled Compiling verilog file "../../../My_MECA/My_MECA/abs_diff.v" in library work Module <compressor3_2> compiled Compiling verilog file "../../../My_MECA/My_MECA/selector.v" in library work Module <abs_diff> compiled Compiling verilog file "../../../My_MECA/My_MECA/processing_element.v" in library work Module <selector> compiled Compiling verilog file "../../../My_MECA/My_MECA/corrector.v" in library work Module <processing_element> compiled Compiling verilog file "../../../My_MECA/My_MECA/coder_A_n_B.v" in library work Module <corrector> compiled Compiling verilog file "ultimate_meca.v" in library work Module <coder_A_n_B> compiled Module <ultimate_meca> compiled No errors in compilation Analysis of file <"ultimate_meca.prj"> succeeded. vnxcvncxn


======================================================================== = * Design Hierarchy Analysis *

======================================================================== = Analyzing hierarchy for module <ultimate_meca> in library <work> with parameters. s0 = "00000" s1 = "00001" s10 = "01010" s11 = "01011" s12 = "01100" s13 = "01101" s14 = "01110" s15 = "01111" s16 = "10000" s2 = "00010" s3 = "00011" s4 = "00100" s5 = "00101" s6 = "00110" s7 = "00111" s8 = "01000" s9 = "01001"

Analyzing hierarchy for module <processing_element> in library <work>. vnxcvncxn


Analyzing hierarchy for module <selector> in library <work>.

Analyzing hierarchy for module <corrector> in library <work>.

Analyzing hierarchy for module <coder_A_n_B> in library <work> with parameters.

Analyzing hierarchy for module <abs_diff> in library <work>.

Analyzing hierarchy for module <modulo> in library <work>.

Analyzing hierarchy for module <compressor3_2> in library <work>.

Analyzing hierarchy for module <full_adder> in library <work>. ======================================================================== = * HDL Synthesis *

======================================================================== =

Performing bidirectional port resolution...

Synthesizing Unit <selector>. Related source file is "../../../My_MECA/My_MECA/selector.v". Found 12-bit register for signal <select_out>. Found 12-bit register for signal <error_free>. Summary: vnxcvncxn

M,Dnbvk.GSJckgASK inferred 24 D-type flip-flop(s). Unit <selector> synthesized. ======================================================================== HDL Synthesis Report

Macro Statistics # Adders/Subtractors 10-bit adder 10-bit adder carry out 11-bit adder 11-bit adder carry out 12-bit adder 24-bit subtractor 32-bit adder 4-bit adder carry out 8-bit adder 8-bit adder carry out 8-bit subtractor 9-bit adder carry out # Registers 1-bit register 12-bit register 3-bit register 32-bit register 8-bit register # Latches vnxcvncxn :6 :2 : 50 : 120 : 18 :4 : 18 :2 : 17 :2 : 68 : 18 : 13 :1 :1 : 35 : 836 :2 :2 : 243

M,Dnbvk.GSJckgASK 12-bit latch 32-bit latch 8-bit latch # Comparators 12-bit comparator greatequal 33-bit comparator lessequal 8-bit comparator greatequal 8-bit comparator less # Tristates 1-bit tristate buffer 12-bit tristate buffer # Xors 1-bit xor3 :3 :2 :1 : 1792 : 1792 :1 :3 : 17 : 816 : 47 : 10 : 18 : 18

======================================================================== = Advanced HDL Synthesis Report

Macro Statistics # FSMs # Adders/Subtractors 10-bit adder 10-bit adder carry out 11-bit adder 11-bit adder carry out 12-bit adder 13-bit subtractor vnxcvncxn :6 :2 : 50 : 10 :2 :2 :1 : 243

M,Dnbvk.GSJckgASK 14-bit subtractor 15-bit subtractor 24-bit subtractor 32-bit adder 4-bit adder carry out 8-bit adder 8-bit adder carry out 8-bit subtractor 9-bit adder carry out # Registers Flip-Flops # Latches 12-bit latch 32-bit latch 8-bit latch # Comparators 12-bit comparator greatequal 33-bit comparator lessequal 8-bit comparator greatequal 8-bit comparator less # Xors 1-bit xor3 :1 : 1792 : 1792 : 10 : 10 : 90 : 18 :4 : 18 :2 : 17 :2 : 456 : 456 : 836 :3 : 17 : 816 : 47 : 10 : 18 : 18

======================================================================== Final Register Report

Macro Statistics vnxcvncxn

M,Dnbvk.GSJckgASK # Registers Flip-Flops : 308 : 308

======================================================================== =

======================================================================== = * Partition Report *

======================================================================== =

Partition Implementation Status -------------------------------

No Partitions were found in this design.


======================================================================== = * Final Report *

======================================================================== = Final Results RTL Top Level Output File Name Top Level Output File Name Output Format vnxcvncxn : ultimate_meca.ngr : ultimate_meca


M,Dnbvk.GSJckgASK Optimization Goal Keep Hierarchy : Speed : NO

Design Statistics # IOs : 19

Cell Usage : # BELS # # # # # # # # # # # # # # # GND INV LUT1 LUT2 LUT2_L LUT3 LUT3_D LUT3_L LUT4 LUT4_D LUT4_L MUXCY MUXF5 VCC XORCY : 7206 :1 : 48 : 546 : 360 :6 : 2719 :7 : 14 : 1240 :9 : 42 : 1292 : 39 :1 : 882 : 3040 : 299 :1

# FlipFlops/Latches # # FDE FDR


M,Dnbvk.GSJckgASK # # # # # FDRE LD LDC LDE LDE_1 :8 : 12 : 544 : 2040 : 136 : 18 : 17 :1 : 18 : 17 :1

# Clock Buffers # # BUFG BUFGP

# IO Buffers # # IBUF OBUF

======================================================================== =

Device utilization summary: ---------------------------

Selected Device : 3s500efg320-4

Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of IOs: Number of bonded IOBs: Number of GCLKs:

2754 out of 4656

59% 32% 53%

3040 out of 9312 4991 out of 9312 19 19 out of 18 out of 232 24

8% 75%


M,Dnbvk.GSJckgASK Partition Resource Summary: ---------------------------

No Partitions were found in this design.


======================================================================== = TIMING REPORT


Minimum period: 12.129ns (Maximum Frequency: 82.447MHz) Minimum input arrival time before clock: 8.893ns Maximum output required time after clock: 6.881ns Maximum combinational path delay: No path found ======================================================================== =

Device utilization summary: vnxcvncxn

M,Dnbvk.GSJckgASK ---------------------------

Selected Device : 3s500efg320-4

Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of IOs: Number of bonded IOBs: Number of GCLKs:

2754 out of 4656

59% 32% 53%

3040 out of 9312 4991 out of 9312 19 19 out of 18 out of 232 24

8% 75%

--------------------------Partition Resource Summary: ---------------------------

No Partitions were found in this design. CPU : 63.53 / 64.06 s | Elapsed : 63.00 / 64.00 s


Total memory usage is 253500 kilobytes

Number of errors : 0 ( 0 filtered) Number of warnings : 882 ( 0 filtered) Number of infos : 15 ( 0 filtered)


M,Dnbvk.GSJckgASK ---------------------------


This project proposes BISDC architecture for self-detection and self-correction of errors of PEs in an MECA. Based on the error detection correction concepts of bi residue codes, this paper presents the corresponding definitions used in designing the BISD and BISC circuits to achieve self-detection and self-correction operations. Performance evaluation reveals that the proposed BISDC architecture effectively achieves self-detection and self-correction capabilities with minimal area overhead and a small timing penalty.


Z. L. He, C. Y. Tsui, K. K. Chan, and M. L. Liou, Low-power VLSI design for motion estimation using adaptive pixel truncation, IEEE Trans. Circuits Syst. Video Technol., vol. 10, no. 5, pp. 669678, Aug. 2000. [2] C. G. Peng, D. S. Yu, X. X. Cao, and S. M. Sheng, Efficient VLSI design and implementation of integer motion estimation for H.264 SDTV encoder, in Proc. IEEE Int. Conf. Solid-State Integr. Circuits, 2006, pp. 20192021. [3] P. Gallagher, V. Chickermane, S. Gregor, and T. S. Pierre, A building vnxcvncxn

M,Dnbvk.GSJckgASK block BIST methodology for SOC designs: A case study, in Proc. Int. Test Conf., Oct. 2001, pp. 111120. [4] T. H. Wu, Y. L. Tsai, and S. J. Chang, An efficient design-for-testability scheme for motion estimation in H.264/AVC, in Proc. Int. Symp. VLSI Des. Autom. Test, Apr. 2007, pp. 2527. [5] J. C. Yeh, K. L. Cheng, Y. F. Chou, and C. W. Wu, Flash memory testing and built-in self-diagnosis with march-like test algorithms, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 26, no. 6, pp. 11011113, Jun. 2007. [6] X. Xiong, Y. L. Wu, and W. B. Jone, Reliability analysis of self-repairable MEMS accelerometer, in Proc. IEEE Int. Symp., Defect Fault Tolerance VLSI Syst., Oct. 2006, pp. 236244. [7] R. J. Higgs and J. F. Humphreys, Two-error-location for quadratic residue codes, Proc. Inst. Electr. Eng. Commun., vol. 149, no. 3, pp. 129131, Jun. 2002. [8] J. C. Tuan, T. S. Chang, and C. W. Jen, On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture, IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 1, pp. 6172, Jan. 2002.