This action might not be possible to undo. Are you sure you want to continue?
In the recent years, with the using of electronic mail, electronic funds transfer, and I-phone etc., the security of tr ansmitting and storing information is more and more important. Although software encryption is becoming more prevalent today, specialized hardware such as an encryption ch ip is still the embodiment of choice for many applications because of the factor s include higher speed, physical security, and lower power consumption . 1.2 Literature Survey The most widely accepted private-key block cipher, the Data Encryption Standard (DES) , introduced in 1977, is vulnerable to many attacks, like brut e force attack, differential cryptanalysis, and linear cryptanalysis, and runs inefficie ntly on general purpose processors. So the cryptography community needs to provide th e world with a new encryption standard. B. Schneier presented a new variable-lengt h key, 64-bit block cipher (Blowfish) in 1993 . It is faster and safer than D ES. D. Honig introduced the hardware implementation of Blowfish in 1997 . But the last two steps of XOR, designed for against differential cryptanalysis [ 3] , were not included in his design. 1.3 Aim of the Thesis In this thesis, we present designs of Blowfish encryption (BE),which are microp rocessor peripheral devices designed to encrypt and decrypt 64-bit blocks of dat a based on the Blowfish algorithm. Algorithm operate on 64-bit plaintext using a 32-bit user-specified key to produce 64-bit cipher text. We use VHDL to describ e the chips, Xilinx for synthesis, and ModelSim XE-III 6.3g for timing simulat ion. Then we implement the chips with FPGAs (Spartan 3S500E). 1.4 Statement of Problem Our design of Blowfish Encryption have the higher speed than the similar implem entation of DES  and the lower area consumption than the implementation of B lowfish designed by D. Honig . The Blowfish Encryption might be useful for I A and are well suited for using as a coprocessor with 32-bit microprocessor. 1.5 Organization The thesis is organized as follows. Chapter 2 describes the algorithm of the Blowfish. Chapter 3 introduces the analysis of main components of the Blowfish encryption . ////Chapter 4 introduces the designs of the round circuits, interfa ce, and control logics of the chips. Chapter 5 presents the timing simulations of Blowfi sh Encryption Algorithm and a discussion is given. Finally, a general conclusion and future scope of the thesis is presented in Chapter 6.
2.1 VLSI DESIGN FLOW 2.1.1 INTRODUCTION The word digital has made a dramatic impact on our society. More significant is a continuous trend towards digital solutions in all areas – from electronic instru mentation, control, data manipulation, signals processing, telecommunications et c., to consumer electronics. Development of such solutions has been possible due to good digital system design and modeling techniques. 2.2 CONVENTIONAL APPROACH TO DIGITAL DESIGN Digital ICs of SSI and MSI types have become universally standardized and have b een accepted for use. Whenever a designer has to realize a digital function, he uses a standard set of ICs along with a minimal set of additional discrete circu itry. Consider a simple example of realizing a function as Q n+1 = Q n + (A B) Here Qn, A, and B are Boolean variables, with Q n being the value of Q at the nt h time step. Here A B signifies the logical AND of A and B; the ‘+’ symbol signifies the logical OR of the logic variables on either side. A circuit to realize the function is shown in Figure 1.1. The circuit can be realized in terms of two ICs – an A-O-I gate and a flip-flop. It can be directly wired up, tested, and used.
With comparatively larger circuits, the task mostly reduces to one of identifyin g the set of ICs necessary for the job and interconnecting; rarely does one have to resort to a micro level design. The accepted approach to digital design here is a mix of the top-down and bottom-up approaches as follows • Decide the requirements at the system level and translate them to circuit requir ements. • Identify the major functional blocks required like timer, DMA unit, register fil e etc., say as in the design of a processor. • Whenever a function can be realized using a standard IC, use the same –for example programmable counter, mux, demux, etc. • Whenever the above is not possible, form the circuit to carry out the block func tions using standard SSI – for example gates, flip-flops, etc. • Use additional components like transistor, diode, resistor, capacitor, etc., whe rever essential. Once the above steps are gone through, a paper design is ready. Starting with th e paper design, one has to do a circuit layout. The physical location of all the components is tentatively decided; they are interconnected and the ‘circuit-on pa per’ is made ready. Once a paper design is done, a layout is carried out and a net -list prepared. Based on this, the PCB is fabricated and populated and all the p opulated cards tested and debugged. The procedure is shown as a process flowchar t in Figure 1.2. At the debugging stage one may encounter three types of problems: • Functional mismatch: The realized and expected functions are different. One may have to go through the relevant functional block carefully and locate any error logically. Finally the necessary correction has to be carried out in hardware. • Timing mismatch: The problem can manifest in different forms. One possibility is due to the signal going through different propagation delays in two paths and a rriving at a point with a timing mismatch. This can cause faulty operation. Anot
In turn. As an example.her possibility is a race condition in a circuit involving asynchronous feedback .4 and shown in detail in Figure 1. development of an ASIC starts with an idea and takes tangible shape through the stages of development as shown in Figure 1 . With the rap id technological developments in the last two decades. • Overload: Some signals may be overloaded to such an extent that the signal trans ition may be unduly delayed or even suppressed. The above developments have resulted in a proliferation of approaches to VLSI de sign. In fact. overload on a signal can lead to timing mismatches. let us consider design at the gate le vel. the same is fully developed into a design description – in terms of well defi ned standard constructs and conventions.4 ASIC DESIGN FLOW As with any other technical activity. • A steady reduction in feature size and hence increase in the speed of operation as well as gate or transistor density. he has to express them as Boolean logic equations and realize them in terms of gates and flip-flops. etc. The design methods at different levels use the respective aids such as Boolean e quations.1 Abstraction Model The model divides the whole design cycle into various domains with such an abstr action through a division process the design is carried out in different layers. We briefly describe the procedure of automated design flow the aim is more to bring out the role of a Hardware Description Language (HDL) in the design pr ocess. This kind of problem may call for elaborate debugging. The circuit to be designed would be described in terms of truth tables and state tables. and also a redesigning process to develop a bug fr ee design. Compartmentalization of the ap proach to design in the manner described here is the essence of abstraction. The first step in the process is to expand the idea in terms of behavior of the target circuit. The designer at one layer can function without bothering about the layers above or below. one may have to switch from one t ool to another.3 VLSI DESIGN The complexity of VLSI is being designed and used today makes the manual approac h to design impractical. With these as available inputs. But the aids play only a sm all role in the process. To complete a design. 2. 2. 2.). time. An abstraction based model is the basis of the automated design. it involves cost. state transition table. it is the basis for development and use of CAD tools in VLSI design at various leve ls. The problem manifests as reflect ions and erratic behavior in some cases (The signal has to be suitably buffered here.5. Through stages of programm ing. • A steady improvement in the predictability of circuit behavior. • A steady increase in the variety and size of software tools for VLSI design. the status of VLSI techno logy is characterized by the following • A steady increase in the size and hence the functionality of the ICs. . thes e form the inputs to the layer immediately below. truth tables. raising the issues of tool compatibility and learning new Enviro nments.3. The above have to be carried out after completion of the prototype PCB manufactu ring. The thick horizontal lines separating the layers in the figure signif y the compartmentalization. The preferred practice is to do debugging at smaller module stages and ensuring that feedback through l arger loops is avoided: It becomes essential to check for the existence of long asynchronous loops. Design automation is the order of the day.
4. The same can be minimized with the help of minimization tools.3 Simulation The design descriptions are tested for their functionality at every level – behavi oral. One has to check here whether all the functions are c arried out as expected and rectify them. repeated until an error-free design is ev olved. The description looks like a program in a high level language like C. The first step in evolving the design description is to describe the circuit in terms of its be havior. To translate it into a tangible circuit. for circuit synthesis. it is tested extensively wi th the help of a simulation tool. and specifications. Design description is an activity independent of the target technology or manufa cturer. On ce the behavioral level design description is ready. I t leads to a detailed design description in terms of logic gates and transistor switches. Th e step is not shown separately in the figure. The minimized logical design is co nverted to a circuit in terms of the switch level cells from standard libraries provided by the foundries.2 Optimization The circuit at the gate level – in terms of the gates and flip-flops – can be redund ant in nature. The same cons titutes a set of activities closely linked to the manufacturer and the target Technology 2. this behavioral level rou tine is edited. The tool also has an editor to carry out any corrections to the source code. If necessary. it is arrived at in a step-by-step manner.1 Design Description The design is carried out in stages. modified. and gate. and rerun – all done manually. The cell based design generated by the tool is the la st step in the logical design process. functional sequences. Simulation involves testing the design for all its functions. Normally testing and simulati on at all the levels – behavioral to switch level – are carried out by a single tool . The behavioral constructs not supported by the synthesis tools are replaced by data flow and ga te level constructs. The behavioral desig n forms the input to the synthesis tools. Simulation and changes to design description tog ether form a cyclic iterative process. verify. one goes through the physical design process. the simulation results are studied to identify errors in the design description. it forms the input to the first level of physical design. The elaboration can be continued one or two steps further. the designer has to develop synthesizable codes for his design. 2. It results in a description of the digital circuit. and e nsure that what is wanted is what is described. The final circuit of such an IC can have up to a bil lion such components. it checks and confirms that all the expected f unctions are carried out satisfactorily. Once again the design is to be tested through simulation and iteratively co rrected for errors. 2. Simulation is carried out throug h dedicated tools. data flow.4. It forms the next detailed level of design descript ion. With every simulation run. The process of transforming the idea into a detailed circuit description in terms of the elementary circuit components cons titutes design description. it is to check. Finally. To surmise. All such activities are carried out by the simulation tool.4. The design at the behavioral level is to be elaborated in terms of known and ack nowledged functional blocks. timing constraints. one has a design for the expected system – described at the behavioral level. The errors are corrected and anot her simulation run carried out.The design is tested through a simulation process.
path delays.. Their performance may be crucial to the overall performance. a physical design may call for an intermediate func tional verification through the FPGA route. Two common approaches are as follows: • The circuit is realized through an FPGA. The FPGA vendors provide an interface t o the synthesis tool. This constitutes the physical design.6 Post Layout Simulation Once the placement and routing are completed. etc. The circuit realized through the FPG A is tested as a prototype. 2. The gate level design description is th e starting point for the synthesis here. Interconnection of the bloc ks is part of the partition process. It provides another opportunity for testing the desi gn closer to the final circuit. Equivalent c ircuit can be extracted at the component level and performance analysis carried out. 2. The ste p-by-step activities in the process are described briefly as follows: • System partitioning: The design is partitioned into convenient compartments or f unctional blocks. the performance specifications lik e silicon area.5 Physical Design A fully tested and error-free design at the switch level can be the starting poi nt for a physical design [Baker & Boyce. The procedure is analogous to the planning and arra ngement of domestic furniture in a residence. Eventually the cir cuit is to be realized by selecting such components and interconnecting them con forming to the required design. Wolf]. the same is identified as “scope of simulation tool” in Figure 1. The final mask for the design can be made at this stage and the ASIC manufactured in the foundry. With many synthesis tools. A typical ASIC vendor will have his own libr ary of basic components like elementary gates and flip-flops. Once the routing is complete.4.7 Critical Subsystems The design may have critical subsystems.4. Through the interface the gate level design is realized as a final circuit. • Floor planning: The positions of the partitioned blocks are planned and the bloc ks are arranged accordingly. in other words. Partitioning and floor pla nning may have to be carried out and refined iteratively to yield best results.” It is done with each of the blocks above.4 Synthesis With the availability of design at the gate (switch) level. to improve the system performance substant .. 2. and so on. • Routing: The components placed as described above are to be interconnected to th e rest of the block: It is done with each of the blocks by suitably routing the interconnects. • The circuit is realized as an ASIC. It is to be realized as the fina l circuit using (typically) a million components in the foundry’s library.5. Being an e laborate and costly process. can be computed. Often it would have been done at an earlier stage itself and t he software design prepared in terms of such blocks. power consumed. The corresponding circuit hardware realization is carried out by a s ynthesis tool.4. • Placement: The selected components from the ASIC library are placed in position on the “Silicon floor. those which interact frequently or through a large number of interconnections are kept close together. 24.” One may have to go thro ugh the placement and routing activity once again to improve performance. The FPGA route is attractive for limited volume production or a fast develo pment cycle. the physical design cam is taken as complete. one can directly use the design des cription at the data flow level itself to realize the final circuit through an F PGA. the logical design i s complete. Blocks with I/O pins are kept clos e to the periphery. This constitutes the final stage called “verification.
Ross Freeman and Bernard Vonderschmitt.6 FIELD PROGRAMABLE GATE ARRAYS (FPGA) A Field-Programmable Gate Array (FPGA) is a semiconductor device that can be con figured by the customer or designer after manufacturing—hence the name "field-prog rammable". Casselman was successful and the system was awarded a patent i n 1992. FPGAs contain programmable logic components called "logic blocks". gates. the logic blocks also include memory elements. spelling out test vectors for them and “observing” the outputs from t he designed unit. invented the first c ommercially viable field programmable gate array in 1985 – the XC2064 the XC2064 h ad programmable gates and programmable interconnects between gates. which ma y be simple flip-flops or more complete blocks of memory. VHDL used by a substantial number of the VLSI designers today is the used in thi s project for modeling the design. or merely simple logic gates like AND and XOR. IEEE has brought out Standards for the HDLs.ially. However. with two 3-input lookup tables (LUTs). Xilinx Co-Founders. There are additional constructs available to facilitate setting up of t he test bench. however program mable logic was hard-wired between logic gates. Verilog as an HDL was introduced by Cadence Design Systems. Both have constructs with which the design can be fully described at all the levels. or routing done separately and specifically for the subsyste m. The XC2064 boasted a mere 64 configurable logic blocks (CLBs). In most FPGAs. and the software tools conform to t hem. and a hierarc hy of reconfigurable interconnects that allow the blocks to be "wired together"—so mewhat like a one-chip programmable breadboard. component design. The revised version has been brought out in 2001. FPGAs are programmed using a logic circuit diagram or a source code i n a hardware description language (HDL) to specify how the chip will work. Peterson in 1985. All t he activities coming under the purview of an HDL are shown enclosed in bold dott ed lines in Figure 1. the beginnin gs of a new technology and market. PROMs and PLDs both had the option of being programm ed in batches in a factory or in the field (field programmable). They can be used to implement any logical function that an application-specific integ rated circuit (ASIC) could perform. Ross was entered into the National Inventor s Hall of Fame for his inventi on. More than 20 years l ater.6. Verilog and VHDL are the two most commonly used HDLs tod ay. Pa ge and LuVerne R.1 History The FPGA industry sprouted from programmable read only memory (PROM) and program mable logic devices (PLDs). 2. A set of masks used in the foundry may have to be done afresh for the purpose.5 ROLE OF HDL An HDL provides the framework for the complete logical design of the ASIC. and logic blocks are founded in patents awarded to David W. one may have to design such subsystems afresh. 2. The design here may imply redefinition of the basic feature size of the component. Some of the industry’s foundational concepts and technologies for programmable log ic arrays. place ment of components. Xilinx continued unchallenged and quickly growing from 1985 to the mid-1990s. most of the sim ulation tools available today conform only to the 1995 version of the standard. they placed it into the public domain in 1990. In the late 1980s the Naval Surface Warfare Department funded an experiment pro posed by Steve Casselman to develop a computer that would implement 600.000 repr ogrammable gates.4. 2. but the ability to update the functionality after shipping offers advantages for many applications. Logic blocks can be configured t o perform complex combinational functions. wh . It was established as a formal IEEE Standard in 1995.
and the I/O capabilities of new supercomputers have largely closed the performance gap between ASICs and FPGAs. and industrial applications. both in sophistication and the volume of production.000 gates. This work mirrors the architecture by Ron Perlof and Han a Potash of Burroughs Advanced Systems Group which combined a reconfigurable CPU architecture on a single chip called the SB24. non-FPGA architectures are beginning to emerge.3 1987: 1992: Early Gates 9.6. FPGAs have been slower. Additionally. many modern FPGAs have the ability to be reprogrammed at "run time. Naval Surface Warfare Department 2000s: Millions 2. Thomson’s algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip to decide the configurat ion needed to accomplish a sound recognition task. ability to re-program in the field to fix bugs. eroding significant market-share.6. As previously mentione d. FPGAs were primarily used in tele communications and networking." and this is leading to the idea of reconfigurable computing or reconfigurable systems — CP Us that reconfigure themselves to suit the task at hand.6. The Mitrion Virtual Pro cessor from Mitrionics is an example of a reconfigurable soft processor. but instead adapts itself to a specific program. The Atmel FPSLIC is another such device.000. it does not support dynamic reconfiguration at runtime. research and development. automotive. The 1990s were an explosive period of time for FPGAs. Exam ples of such hybrid technologies can be found in the Xilinx Virtex-II PRO and Vi rtex-4 devices. less energy efficient and generally achiev ed less functionality than their fixed ASIC counterparts. 2. By 1993. and lower non-recurring engineering costs. when Adrian Thompson merged genetic algorit hm technology and FPGAs to create a sound recognition device. In the early 1990s. which uses an AVR processor in combination with Atmel s programmable logic architecture. By the end of the decade.en competitors sprouted up. FPGAs found their way i nto consumer. which include one or more PowerPC processors embedded within the FPGA s logic fabric. FPGAs got a glimpse of fame in 1997. Advantages include a shorter time to market. Software-conf igurable microprocessors such as the Stretch S5000 adopt a hybrid approach by pr oviding an array of processor cores and FPGA-like programmable cores on the same chip. 2. new. Actel was serving about 18 percent of the market. Vendors can also take a middle road by developing their hardware on ordinary FPGAs. but manufacture thei r final version so it can no longer be modified after the design has been commit ted. Xilinx 600. fabrication improvements. A combination of volum e. An alternate approach to using hard-macro processors is to make use of "soft" pr ocessor cores that are implemented within the FPGA logic. That work was done in 1982. However.2 Modern developments A recent trend has been to take the coarse-grained architectural approach a step further by combining the logic blocks and interconnects of traditional FPGAs wi th embedded microprocessors and related peripherals to form a complete "system o n a programmable chip".4 FPGA Comparisons Historically. impleme nted on FPGAs. Xilinx claims that several market and technology dynamics are changing the ASIC/ FPGA paradigm: .
on the other hand. The inherent parallelism of the logic resources on an FPGA allows for considerab le compute throughput even at a low MHz clock rates. in particular brute-force attack. Another notable difference between CPLDs and FPGAs is the presence in most FPGAs of higher-level embedded functions (such as adders and multipliers) and embedde d memories. computer visi on. A CPLD has a somewhat restrictive structure consisting of one or more programmable sum-of-pro ducts logic arrays feeding a relatively small number of clocked registers. medical imaging. As their size. as well as to have logic blocks implement decoders or mathematical f unctions. FPGAs originally began as competitors to CPLDs and competed in a similar space. cryptography. FPGAs are increasingly used in conventional high performance computing applicati ons where computational kernels such as FFT or Convolution are performed on the FPGA instead of a microprocessor. This makes them far more flexible (i n terms of the range of designs that are practical for implementation within the m) but also far more complex to design for. of cryptographic algorithms. Traditionally. where 4-8 hours wait is necessary after even minor changes to the source code. where time intensive tasks are offloaded from software to FPGAs. capabilities. are dominated by interconnect. Particularly with the introduction o f dedicated multipliers into FPGA architectures in the late 1990s. t hey began to take over larger and larger functions to the state where some are n ow marketed as full systems on chips (SoC). applications which had traditionally been the sole reserve of DSPs. This has drive n a new type of processing called reconfigurable computing. that of glue logic for PCBs. The adoption of FPGAs in high performance computing is currently limited by the complexity of FPGA design compared to conventional software and the extremely lo ng turn-around times of current design tools. The primary differences between CPLDs and FPGAs are architectural. with the advantage of more predictable timing delays and a higher logic-to-interconnect ratio. software-defined radio. speech recognition. The r esult of this is less flexibility. which the company blames for the growing number of FPGA design starts. computer hardware emulatio n and a growing range of other areas. and speed increased. The FPGA architectures. ASIC prototyping. aerospace and defense systems.IC costs are rising aggressively ASIC complexity has bolstered development time and costs R&D resources and headcount is decreasing Revenue losses for slow time-to-market are increasing Financial constraints in a poor economy are driving low-cost technologies These trends make FPGAs a better alternative than ASICs for a growing number of higher-volume applications than they have been historically used for. bioinformatics. 27 Applications of FPGA Applications of FPGAs include digital signal processing. Some FPGAs have the capability of partial re-configuration that lets one portion of the device be re-programmed while other portions continue running. One such area is code br eaking. The flexibility of the FPGA allows for even higher performance by trading off precision and range in the nu mber format for an increased number of parallel arithmetic units. began to incorporate FPGA s instead FPGAs especially find applications in any area or algorithm that can make use of the massive parallelism offered by their architecture. FPGAs have been reserved for specific vertical applications where .
Each logic block output pin can connect to any of the wiring segments in the cha nnels adjacent to it.8. Whil e the number of CLBs and I/Os required are easily determined from the design. longer paths can be constructed. 2.) Since unused routing track s increase the cost (and decrease the performance) of the part without providing any benefit. I/O pads. the locations of the FPGA logic block pins are sh own below. Similarly. the FPGA routing is unsegmented. all the routing channels have the same width (number of wires). For this example architecture. 2. That is. FPGA manufacturers try to provide just enough tracks so that most designs that will fit in terms of LUTs and IOs can be routed. an I/O pad can connect to any one of the wiring segments in the chann el adjacent to it. For example. Generally. By turning on some of the programmable switches within a switch box.8 Architecture of FPGA The most common FPGA architecture consists of an array of configurable logic blo cks (CLBs). Multiple I/O pads may fit into the height of one row or the width of one column in the array. Generally. some FPGA architectures use longer routing lines . claiming increased performance. and routing channels.8. This is determined by estimates such as those derived from Rent s rule or by experiments with exis ting designs. th e amount of routing tracks needed may vary considerably even among designs with the same amount of logic. For higher speed interconnect. and a flipflop. The logic block has four inputs for the LUT and a clock input. An application circuit must be mapped into an FPGA with adequate resources. an I/O pad at the top of the chip can connect to any of the W wires (where W is the channel width) in the horizontal channel imm ediately below it.1 Typical logic block There is only one output. (For example. Sinc e clock signals (and often other high-fan out signals) are normally routed via s pecial-purpose dedicated routing networks in commercial FPGAs. For these low-volume applications. In recent years. manufacturers have started moving to 6-in put LUTs in their high performance parts. as shown below. a crossbar switch requires much more rou ting than a systolic array with the same gate count. which can be either the registered or the unregistered LUT output.the volume of production is small.2 Logic Block Pin Locations Each input is accessible from one side of the logic block. they and other si gnals are separately managed. new cost and performance dynamics have broadened the r ange of viable applications. while the output pin can connect to routing wires in both the channel to the right and the channel be low the logic block. each wiring segment spans o nly one logic block before it terminates in a switch box. A classic FPGA logic block consists of a 4-input lookup table (LUT). Today. 2. the premi um that companies pay in hardware costs per unit for a programmable chip is more affordable than the development resources spent on creating an ASIC for a low-v olume application.
which have been compared to the equivalent of a ssembly languages. In this switch box topology. 2. although in an attempt to reduce the complexity of designing in HDLs. the user provides a hardware description lan guage (HDL) or a schematic design. wires in track number 2 connect only to other wires in track number 2 and so on. of switches used in this architecture is the p lanar or domain-based switch box topology. there are three programmab le switches that allow it to connect to three other wires in adjacent channel se gments. The pattern.that span multiple logic blocks. high speed IO logic and embedded me mories. schematic entry can allow for easier visualization of a design. there are moves to raise the abstraction level through the in troduction of alternative languages. Having these common functions embedded int o the silicon reduces the area required and gives those functions increased spee d compared to building them from primitives. or topology. place and route r esults via timing analysis. embedded processors. This file is then transferred to the FPGA/CPLD via a ser ial interface (JTAG) or to an external memory device like an EEPROM. The most common HDLs are VHDL and Verilog. using an electronic design automation tool. The HDL form might be easier to work with whe n handling large structures because it s possible to just specify them numerical ly rather than having to draw every piece by hand. and other verification methodologies. On ce the design and validation process is complete. The user will validate the map. post-silicon validation.9 FPGA Design and Programming To define the behavior of the FPGA. 2. and are available from FPGA vendors and third-party IP suppliers (rarely free and typically released under proprietary licenses). The figure below illustrates the connections in a switch box. simulation. Going from schematic/HDL source files to actual configuration: The source files are fed to a software suite from the FPGA/CPLD vendor that through different ste ps will produce a file. This allows chip companie s to validate their design before the chip is produced in the factory. generic DSP blocks. I n this architecture. On the other hand.3 Switch box topology Modern FPGA families expand upon the above capabilities to include higher level functionality fixed into the silicon. To simplify the design of complex systems in FPGAs. Examples of these include multiplie rs. and firmware development. Whenever a vertical and a horizontal channel intersect. Other predefined circuits a re available from developer communities such as OpenCores . The netlist can then be fitted to the actual FPGA architecture using a process called place-and-route. Then. FPGAs are also widely used for systems validation including pre-silicon validati on. usually performed by the FPGA Company’s proprie tary place-and-route software.8. the binary file generated (als o using the FPGA company s proprietary software) is used to (re)configure the FP GA. there exist libraries of pre defined complex functions and circuits that have been tested and optimized to sp eed up the design process. there is a switch box. a wire i n track number one connects only to wires in track number one in adjacent channe l segments. reducing the time to market. These predefined circuits are commonly called IP core s. when a wire enters a switch box. a technology-mapped netlist is generated.
Can b e erased. mixed-signal). EPROM .9. the Solaris and Linux tools are only available via a rental scheme. Xilinx also provides free Windows and Linux design software . even in plastic packages.1 Spartan-3E FPGA Features and Embedded Processing Functions: The Spartan-3E Starter Kit board highlights the unique features of the Spartan-3 E FPGA . 2. CMOS. Achro nix (RAM based. CMOS.9. EEPROM .5GHz fabric speed FPGA). Together. Actel (antifuse. 1. 2. flash-based. Other competitors include Lattice Semiconductor (flash. Embedded development . after the synthesis engine has mapped the design to a netlist. MultiBoot FPGA configuration from Parallel NOR Flash PROM . 2. Finally the des ign is laid out in the FPGA at which point propagation delays can be added and t he simulation run again with these values back-annotated onto the netlist. Some. Spartan-3E specific features: .based on static memory technology. but not all. Can be erased. Usually one-time pr ogrammable in production because of plastic packaging.Electrically Erasable Programmable Read-Only Memory technology. Some.Erasable Programmable Read-Only Memory technology. they control over 80 percent of the market. the netlist is translated to a gate level description where simulatio n is repeated to confirm the synthesis proceeded without errors. CMOS. no general purpose FPGA s!). In-system programmable and re-programm able. an FPGA application developer will simulate the design at multiple stages throughout the design process. CMOS. flash devices can be in-system programmed. Requires external boot devices.1 Basic Process Technology Types SRAM . Flash . Family and provides a convenient development board for embedded process ing applications. Antifuse One-time programmable. but not all. a flash c ell is smaller than an equivalent EEPROM cell and is therefore less expensive to manufacture.2 Major Manufacturers Xilinx and Altera are the current FPGA market leaders and long-time industry riv als. EEPROM devices can be insystem programmed.Flash-erase EPROM technology. with Xilinx alone representing over 50 percent. and Quick Logic (handheld focused CSS P. Windowed devices can be e rased with ultraviolet (UV) light. even in plastic packages. SPI serial Flash configuration . Parallel NOR Flash configuration .One-time programmable. Usually.10 SPARTRAN FPGA 210. while Altera p rovides free Windows tools.In a typical design flow. SRAM). Initially the RTL description in VHDL or Verilog is simulated by creating test benches to simulate the system and observe results. . Fuse . CMOS. SiliconBlue Technologies ultra low power FPGA. Then.
restrict their applications on areas such as real time requi rement. But.000 logic cells • Xilinx 4 Mbit Platform Flash configuration PROM • Xilinx 64-macrocell XC2C64A CoolRunner CPLD • 64 MByte (512 Mbit) of DDR SDRAM. SPI-based Analog-to-Digital Converter (ADC) with programmable-gain pre-amplifier • ChipScope SoftTouch debugging port • Rotary-encoder with push-button shaft • Eight discrete LEDs • Four slide switches 3. 100+ MHz • 16 MByte (128 Mbit) of parallel NOR Flash (Intel StrataFlash) . In a secret-key system. FPGA configuration storage . their low speed and high cost. in a public-key system. MicroBlaze 32-bit embedded RISC processor .and DCE-style) • On-board USB-based FPGA/CPLD download/debug interface • 50 MHz clock oscillator • SHA-1 1-wire serial EEPROM for bitstream copy protection • Hirose FX2 expansion connector • Three Digilent 6-pin expansion connectors • Four-output. x16 data interface.1 General Description We can generally divide cryptosystem into secret-key systems and public-key systems. the legitimate users must use the same secret k ey that is unknown to the unauthorized persons. 320-pin FBGA package . MicroBlaze code shadowing • 2-line. . Up to 232 user-I/O pins .1 Key Components and Features: The key features of the Spartan-3E Starter Kit board are: • Xilinx XC3S500E Spartan-3E FPGA . MicroBlaze code storage/shadowing • 16 Mbits of SPI serial Flash (STMicro) . the sender and receiver use a different key separately.. PicoBlaze 8-bit embedded controller . Although public-key systems seem to be ideal for many cryptographic applications. Secret-key systems can be further divided into block ciphers and stream ciphers. DDR memory interfaces 2. 16-character LCD screen • PS/2 mouse or keyboard port • VGA display port • 10/100 Ethernet PHY (requires Ethernet MAC in FPGA) • Two 9-pin RS-232 ports (DTE. SPI-based Digital-to-Analog Converter (DAC) • Two-input.10. in contrast to secret-key systems. FPGA configuration storage . Over 10.
If the block is n bits long. then Li and Ri each are n/2 bits long. using the same key. The basic concept of block ciphering with partitioning and iteration is shown in Fig. 3. iterating a simple encryption function 16 times. and the key can be any length up to 448 bits. With a stream cipher. Block ciphers involve encrypting and decrypting messages in blocks of information bits . the encryp tion yields L i+1 = Ri Ri+1 = Li f (Ri. A block of message to be transformed repeatedly i = 1. Ki+1) represents modulo-2 addition. Encryption and decryption are carried out by means of the set of iteration-dependent keys Ki+1 and a transfor mation function f. For decryption the order of Ki+1 is reversed. (2) Owing to each of synchronization. Stream ciphers operate on streams of plaintext and ciphertex t one bit or byte ( sometimes even one 32-bit word ) at a time. For example. or (3) using a combination of (1) and (2). block ciphers have the following advantages : (1) Block ciphers can be easily standardized because information is usually processed and transmitted in the form of blocks. M is a plaintext message. and then encrypt ing and decrypting them respectively. With a block cipher. a block cipher brea ks M into successive blocks M1. 3. As shown in Fig. that is Ek( M )= Ek( M 1) Ek( M 2) … . the same plaintext bit or byte will encrypt to a diff erent bit or byte every time it is encrypted. (2) repeating or iterating the block encryption a number of times. losing one of ciphertext blocks has no influence on the correctness of the decryption of the following blocks. Ki+1) where that is Li = Ri+1 Ri = L i+1 Fig. 2… r times is di vided equally into left and right halves denoted by Li and Ri.1. the .Block ciphers operate on blocks of plaintext and ciphertext ( usually of 64 bits but sometimes longer ). The block size is 64 bits. which depends on Ri and Ki+1 for encryption and on L i+1 and Ki+1 for decryption.2 The Blowfish Block Cipher   Blowfish is a block cipher. 3. Comparing with stream ciphers. f (L i+1.1. … and encrypts each Mi with the same key K. 2. Although th ere is a complex initialization phase required before any encryption can take place. M2. for the (i+1)th iteration.1 Basic concept of block encrypting with partitioning and i teration. t he same plaintext block will always encrypt to the same ciphertext block. The level of secrecy in message transmission with block ciphering can be increased either by (1) partitioning the plaintext into blocks.
The only additional operations are four indexed array data lookups per round.2. … . S3. Each round consists of a key-dependent permutation.1. with the output of the continuously-changing Blowfish algorith m. For example: P1 = 0x243f6a88 P2 = 0x85a308d3 P3 = 0x13198a2e P4 = 0x03707344 (2) XOR P1 with the first 32-bits of the key. (2) There are four 32-bit S-boxes with 256 entries each: S1. The algorit hm consists of two parts: a key-expansion part and a data-encryption part. S4. then AA. Subkeys: (1) The P-array consists of 18 32-bit subkeys: P1. Generating the subkeys: The subkeys are calculated using the Blowfish algorithm. XOR P2 with the second 32-bits of the key. and then all fou r S-boxes in order. are equivalent keys. 3.1. with a fix ed string.255. These keys must be precomputed before any data encryption or decryption. In total.2. S4. for example.actual encryption of data is very efficient on large microprocessors. Repeatedly cycle through the key bits until the entire P-array has been XORed with key bits.1 Key-expansion Process Blowfish uses a large number of subkeys.0. … . All operations are XORs and addition on 32-bit words. Blowfish has 16 r ounds. (4) Replace P1 and P2 with the output of step (3). … .255.and data-dependen t substitution. using the subkeys described in steps (1) and (2). (For every short key. there is at least one equivalent longer key. P18. P2. (7) Continue the process. S3. This string consists of the hexadecimal digits of pi (less the initial 3 ).1. and so on for all bits of the key (possibly up to P14). replacing all entries of the P-array. etc.255. S4. Key expa nsion converts a variable-length key of at most 448 bits into several subkey arrays to taling 4168 bytes. S2.255.0. in order.0. Applications can store the subkeys rather than execute this derivation process m ultiple times. … .0..2 Data-encryption Process Encryption: . S3. S1. 3. 521 iterations are required to generate all required subkeys.) (3) Encrypt the all-zero string with the Blowfish algorithm. The exact method is as follows: (1) Initialize first the P-array and then the four S-boxes. S1. (6) Replace P3 and P4 with the output of step (5). (5) Encrypt the output of step (3) using the Blowfish algorithm with the modifie d subkeys.1. S2. if A is a 64-bit key. … . AAA. and a key. S2.
The number of iterations ecurity may be dependent on the length of the key.Blowfish is a Feistel network consisting of 16 rounds (see Fig.4): Divide XL into four eight-bit quarters: a. except that P1.3 Blowfish decryption algorithm Fig.2 Blowfish encryption algorithm Fig. XR For i =18 downto 3: XL = XL XOR Pi XR = F (XL) XOR XR Swap XL and XR Swap XL and XR (Undo the last swap) XR = XR XOR P2 XL = XL XOR P1 Recombine XL and XR Fig. an 8-iteration algorithm cannot accept iterations fro required for s the current su a key longer t . the latter would reduce the requirements for a single S-box from 1024 bytes to 256 bytes. 3. Additionally. and d F (XL) = ((S1. These are outlined below: (1) Fewer and smaller S-boxes. 3. aimed at decreasing memory requirements and execution time. 2.c) + S4.b mod 232 )XOR S3. The steps of decryption are as follows: Divide X into two 32-bit halves: XL. The former simplification would reduce the memory requirements for the four S-boxes from 4096 bytes to 1024 bytes.3 Possible Simplifications  Blowfish have several possible simplifications. 3. … .a + S2. 3. The input is a 64-bit data element.4 Function F 2. X. b. Note that with bkey generation procedure.3). It is probably safe to reduce the number of m 16 to 8 without compromising security.2). It may be possible to reduce the number of Sboxes from four to one. etc. The steps of encryption are as follows: Divide X into two 32-bit halves: XL. 2. entry 1 would consist of bytes1 through 4. (2) Fewer iterations. it may be possible to overlap entries in a single S-box: entry 0 would consist of bytes 0 through 3. P2.2.d mod 232 Decryption: Decryption is exactly the same as encryption. P18 are use in the reverse order (see Fig. c. XR For i = 1 to 16: XL = XL XOR Pi XR = F (XL) XOR XR Swap XL and XR Swap XL and XR (Undo the last swap) XR = XR XOR P17 XL = XL XOR P18 Recombine XL and XR Function F (see Fig.
Ci = Ei (Pi C i-1) Pi = C i-1 Dk (Ci ) Fig. Electronic Codebook Mode: Electronic codebook (ECB) mode is the most obvious way to use a block cipher: A block of plaintext encrypts into a block of ciphert ext. High-end implementations could still precompute the subkeys for increased spe ed. this is called cipher-feedback (CFB) mode.5 ECB mode 2.5 shows the ECB mode. Four modes are defined as follows: 1. input is processed by j bits at a time. Since the same block of plaintext always encr ypts to the same block of ciphertext . Fig.han 192 bits.6 CBC mode 3.2. This mode produces a ciphertext dependent on the previous plaintext blocks. An alternate method of subkey calculation would b e preferable: one where every subkey can be calculated independently of any othe r. The current method of subkey calculation requ ires all subkeys to be calculated advance of any data encryption. 2.7 CFB mode .6a shows the CBC encryption mode. Fig. This disadvantage can be overcom e by introducing a small amount of memory in the encryption process. 2. In fact. Fig. This output is fed back an d added modulo 2 to the next plaintext block forming the new Blowfish input bloc k. Fig.4 Operation Modes of Blowfish  Blowfish is a symmetric block cipher that can be used as a drop-in replacement f or DES or IDEA so that it can be used in four standard operation modes as DES an d IDEA . which is XORed with plaintext to produce the next unit of ciphertext. it is impossible to calculate the last subkey of the last S-box without calculating e very subkey that comes before. The three modes below can counter such kind attack called block relay. The potentially serious problem with this mode is that an adversary could modify encrypted message wit hout knowing the key as to cheat the receiver. it is theoretically possible to create a code book of plaintexts and corresponding ciphertexts. 2. Preceding ciphertext is used as input to the enc ryption algorithm to produce pseudorandom output. an initial v value is added modulo 2 (XORed) to the first plaintext block to form the Blowfis h input block.7 shows the CFB mode. 3. In this mode. 3. 2. Cipher-feedback Mode: Block ciphers can also be implemented as a selfsynchron izing stream cipher.6b shows the CBC decryption mod e. Fig. (3) On-the-fly subkey calculation. Fig. 3. this is useful for encoding long blocks of input. The Blowfish output is the ciphertext. but low-end applications could only compute the required subkeys when needed . Again. Cipher Block Chaining Mode: In cipher block chaining (CBC) mode. 3.
The timing and FPGA resource reports for the synthesized 32-bit XOR reveal that the maximum delay for data to be available at the output of the 32-bit XOR is equal to 8. Decryption is the reverse of this process.additions modulo 232on 32-bit long words .3 Comparisons of Block Ciphers According to Table 3. and the method o f implementing them in FPGAs will be discussed.key dependent P-array and S-boxes .2. except th at j bits of the previous output block are moved into the right-most positions o f the quence (see Fig. the same attack requir e only 24r+1 chosen plaintexts to recover the P-array. Blowfis h is much safer than DES and IDEA at most 2384 times. This attack only works against reduc ed-round variants.8). For certain weak keys that generate bad S-boxes (the odds of getting them randomly are 1 in 214). Table Table 3. it is completely ineffective against 16-round Blowfish. In the following section.additions modulo 2 (XOR) . all basic functions used in Blowfish. 2. Output-feedback Mode: The output-feedback (OFB) mode is a method of running a block cipher as a synchronous stream cipher. These basic functions include: . IDEA and other block ciphers that has the same block length.8 OFB mode 2.---------- . It’s similar to CFB mode.1 Additions Modulo 2 (XOR) This is a very inexpensive and fast operation.4. With unknown S-boxes this attack can detect whether a weak key is being used.the F function 4. 3. but cannot determine what i t is (neither the S-boxes nor the P-array). Each output bit depends on value of only two input bits. Serge Vaudenay examined Bl owfish with known S-boxes and r rounds.5 Security of Blowfish    Because the variable key length of Blowfish is from 32 bits to 448 bits. a differential attack can recover the Parray with 28r+1 chosen plaintexts . 3.8 ns and total number of logic cells used is eq ual to 32.1: Speed comparisons of block ciphers on a pentium (Bruce Schneier 1996) The analysis presented in this chapter concerns the ability of implementing Blow fish using FPGA devices. we know that Blowfish is much faster than DES.1. Fig.
The 10th bit is the most significant one.1 . 4. 5. 5. In the implementation. So all subkeys have to be computed off-chip and then written to the appropriate on-chip static RAMs and registers. rst – High state resets the chip. 4. clock – Clocking signal for the entire circuit.1. High state chooses encryption. For the round circuits. the longer it takes to compute the result. Large blocks of RAM can be created by combining multiple EABs. It accomplishes the bytewise substitution using key-dependant substitution boxes. encr_decr – With this signal the host system can choose which operation to perform: encryption or decryption. the circuit describes exactly the Blowfish cipher.4 The F function 5.1. 3.1 Additions modulo 232 using carry chain 4. we do not complete the key scheduling process yet. a ll subkeys have to be computed off-chip and then written to the appropriate on-chip memories . and use carry chain to propagate intermediate results from the least s ignificant to the most significant position. There are two kinds of designs: one is for one-round implementation .2 Data and Key Flow The following section describes the architecture of our Design.1 Blowfish Implementation  Our project has one important part: Blowfish round circuit (Feistel network).2 Additions Modulo 232 on 32-bit Long Words The most simple way to perform this operation is to compute the sum position by position.4. 3. This design would require one 32 x 32 and four 256 x 32 memories. So we can realize P-array and S-boxes using ten EABs. Fig. We can combine addition modulo 2 and addition modulo 232 designed above to implement the F function as Fig. F( ) = ( ( ( S0 + S1 ) modulo 232) XOR S2 ) + S3 modulo 232 Fig. The one-round c an be run repeatedly 16 times to perform entire encryption or decryption process . The longer words are being added. In the present implementation. Signals meaning: data_bibus[31 …0] – 32-bit wide bi-directional data bus. Such structure is shown in Fig. address [10 …0] – 11-bit wide address bus allows accessing all registers defined within the circuit. Two 256 x 16 RAM blocks can b e combined to form a 256 x 32 RAM block. decryption and key scheduling processes and the other is also for one-round A LU but does not include key scheduling process. In present implementation.1 Signals Used All signals available are shown in Fig. including encryption and decryption processes. It is obvious that the time of signal propagation through the entire carry cha in limits the speed of addition. 4.3 Key Dependent P-array and S-boxes We can implement P-array and S-boxes as fast static RAMs. key_data – Choose data or key to be written to registers.4. For examples. 5. This implementation will use about (33 x 2 ) + 32 = 98 LEs. High state chooses data. The 31st bit is the most significant one.4 The F Function The F function is the most important component of the Blowfish cipher. including encryptio n.
2. Instead of write. depending on what is going to be written to the register: data or key. This falling edge indicates latching data in the selected input register.1 Operation sequence performed by our cipher after start To use the cipher efficiently is to understand what operations are performed .and registers because we assumed that the speed of key scheduling has a secondar y influence on results of comparison of different designs.2. a read signal is used to indicate reading cycle. depending on intended encryption or decryption process. The encr_decr signal should be set in advance to a proper state. Data are driven on the data_bibus as long as read signal stays high.5. writing to the register under address 1H is a signal for the cipher that new data block is already ready for encryption 5. The key_data signal should be set to the appropriate state.3 The Transmission Protocol 5.2.4 Operation Sequence 5. as shown in Fig.4. 5. they cannot be read. we also simplify the four S-boxes to one. and key scheduling pro cesses. the one-round implementation runs repeatedly 16 times to accomplish encryption.2.2. this time the cipher drives data on t he data_bibus. decryption.3 Writing cycle Address bus should be first set to one of the addresses within the input register address space. In case of data. 5.4 Reading cycle Reading data is very similar to writing.3. 5. Of course. Fig. and stay stable at least unt il the falling edge of the write signal.2 One-round implementation 5. but can always be overwritten. data bus should be set to the appropriate data or key value. After the address is stable.2 Reading data from the circuit Fig.2 shows the interface architecture used by one-round implementation. incl uding key scheduling process. 4.2. In this design.1 Writing keys and data to the circuit Fig. Once the keys and data are written to the circuit.3. 5. therefore a host system should release the bus. 5. For reducing the area.1 Design: One-round Implementation Fig. 5.
B. 5. m3 and m4 and goto next state S2. The content of location will be 32-bit number and ea ch named as m1. Fig. 2. Perform encryption or decryption.5 Implementation We designed two finite state machines (FSMs) . These tow FSMs are as shown in fig5.6 and fig 5.5 Blowfish one-round implementation 5.1 Example for Feistel Network: F_in = 57 82 (a) S1 7EFB2A98 (m1) 4B 75 (b) S2 BDCF3F2E (m2) 18 24 (c) S3 96EB27BB (m3) E1 225 (d) S4 62A80F00 (m4) .b. 3. 4.9 FSM of Feistel Network Where S0: In this sate 32-bit Feistel input is divided into four 8-bit numbers and Converts these four 8-bit numbers into integers and assigned as a. m2. There are four 32-bit registers.c and d will acts as addresses and S-box produces the content of the memory location.3 The One-round ALU Circuit Almost all elements of the Blowfish one-round circuit are described directly in the chapter 2. A.b. set busy = 1. then goto sate S1.7.after start signal activation. The 64-bit wide register stores intermediate results of encryption and decryptio n. Therefore. 5. we focus only on parts that are specific f or hardware implementation. Write encrypted or decrypted data to the output registers and set busy = 0.One is for Feistel network which has five states and second FSM is for complete encryption and decryption process . Fig. Read data from the input registers and store them inside the round circuit. 5. C and D. which are used to store the v alues transmitted from the single S-box in turn. The implementation of the Blowfish one-round ALU is shown in Fig.c and d are given to the four S-boxes and these a.7. And the last two XORs with P[ i ] are performed after loop 16.c and d. Which is the final Feistel network output. Additionally most of the Blowfish building blocks are already described in previous chapter. S1: Now 8-bit numbers a. S2: In this state m1 and m2 are added and assigned to y1 . S3: Here y1 and m3 will be XORed (two 32-bit numbers) and result will be assigned to y2. The entire circuit has a feedba ck that permits data to circulate by repeating a single round computation 16 tim es.5.b. 5. This sequence is easy: 1. S4: State S4 will add y2 with fourth S-box output m4 and produces F_ou t.
even if a modified ASIC can be developed.9 FSM of Blowfish chip Where S0 : In this state initialize Blowfish out put data to zero (output is cl eared). A new development in integrated circuits offers a hardware implementation choice that is much more flexible than Application Specific Integrated Circuit ( ASIC): large. In contrast. VHDL was chosen as a language used to describe Blowfish implementation. This device consist of arrays of configurable logic blocks that implement logical functions of gates and are easily reconfigurable. It is a standard language for hardware description and is supported by CAD software . S2 : Assign transmitter output data (encrypted data) to receiver input as rx_in. the configuration of an FPGA can be easily reprogramme d to accommodate a design modification. Therefore.F_out = 2FC95D75. In contrast. 4. S1 : Here tx_in is assigned with the input data. fast. S3 : Assign decrypted output to the data_out which is the output of the chip and is same as data_in. ASIC provides only the functionality needed for a specific task. the original hardware is too highly customized to be reused in succes sive generations. reconfigurable gate arrays. we have chosen FPGA as the target technology for implementing the Blowfish cryptographic algorithm. Furthermore. Fig. but not a slightly modified version of the same application introduced after the ASIC design is completed. This will encrypt the input data According to Blowfish encryption algorithm and assign to tx_out. This will be decrypted according to Blowfish algorithm and p roduces rx_out. popularly. A well-designed ASIC chip will support a particular application for which it is designed. known as Field Programmable Gate Array (FPGA).
we use RTL-level VHDL coding style to describe the entire design.2 i was used for timing simulation. this way of coding would make our design specific for a particular device family.1 One-round ALU The one-round ALU is a core part of the entire one-round implementation circuit. and it determines the encryption speed. Therefore.3g verify the correct operation of the cryptographic algorithm. 5. Any other way we have tried of implementing these memories would dramatically increase the amount of required area. It is very easy to write the VHDL code because VHDL allows describing circuit function without the need to specify the circuit structure. Gate-lev el synthesis and logic optimization of the design utilizing Synplicity Xilinx9. However.2 Implementation of Blowfish Components This section focuses on implementation of the F function.of all major FPGA device vendors. However.2 shows the RTL view of one-round ALU after being compiled by Xilinx 9.2. we have describe almost the entire system c ircuit in pure VHDL’87 language. Fig. the best way to write the most efficient code is to use vendor supported libraries. only the good coding style and an in-depth knowledge of how VHDL is interpreted by a synthesizer can produce the optimal circuit. Probably. The functional VHDL simulation of the design is carried out using the ModelSim X E III 6.3 ns. Therefore. except using library parts from the Altera LPM library t o describe static RAMs used to store subkeys.2i. The synthesizer realized it usi ng 447 logic cells and the minimum clock period is 21. The main purpose to implement them separately was to find ou t how much area they require and how fast they are. one-round ALU and two-round ALU only. As target device we have chosen a family of SP ARTAN 3S500E 5. . 5.
for blf_in_data . m3.(encrypted output) Fig 5. m2. Fig 5.Fig 5.3 Shows Decrypted wave forms with rx_in= F4C3D3956569D2E7 and key =11223344. produces tx_out = F4C3D3956569D2E7.1 Shows the Feistel network with F_in= 57 4B 18 E1 and F_out= 2F C9 5D 75 and some intermediate stages(m1.4 Shows both transmitter and receiver input and outputs . and m4) Fig 5. produces rx_out = 0123456789ABCDEF(same as input data).2 Shows Encrypted wave forms with input data tx_in=0123456789ABCDEF and ke y=11223344.
blf_tx_data ( tx_out= F4C3D3956569D2E7 ). tx_in_data= 0123456789ABCDEF. and rx_out_data= 0123456789ABCDEF. blf_out_data(rx_out=0123456789ABCDEF) Fig 5.( key=11223344.6 Shows Blowfish implemented on FPGA routing diagram . tx_out_data= B70AFA36ECCAA5F5. Fig 5.5 Shows chip scope results with key_gen=11223344.tx_in=0123456789ABCDEF).
we can integrate BECs with IA or mobiles and apply other priv atekey block ciphers. Subkeys are generated off-chip. RC6 or CAST-256 etc. we have designed Blowfish encryption chip (BEC).Fig 5. Special assumptions regarding our implementations are: The key size is limited to 32 bits. and loaded to the static RAMs inside the FPGA before the encryption or decryption starts. However these implementations ne ed more complicated hardware and more details of those algorithms. The design of BEC in structure required 753 4 . Blowfish is a symmetric-key block cipher with a 64-bit input/output block.. like Twofish.8 Shows Synthesis report for Blowfish encryption algorithm implemented on FPGA diagram In this thesis.7 Shows Blowfish implemented on FPGA placing diagram Fig 5. we can use a high end FPGA device or full custom design method and pipeline structu re.input LUTs and 954 slice reg isters used for the storage of internal subkeys. The BECs can be used in real time applications and variety of Electronic Funds Transfer applications as well as other electronic banking and data handling applications. the leading candidates to the new Advanced Encryption standard (AES). If higher speed is needed. and key length is up to 448 bits. . In the future.
Schneier. pp. September 1995. 1994. Which will reduce one addition operation pe r one round. Kingston.  “National Bureau of Standards – DES Modes of Operation. Kaps and C. “Differential Cryptanalysis of Blowfish.” New York. “The Blowfish Encryption Algorithm – One Year Later.” Fast Software Encryption.” Dr. Vaudenay. Ont.” presented at Workshop in Selected Areas of Cryptography (SAC’98). 1995. 191-204.  B. Aug. “Applied Cryptography. A change in one bit of plain text or one bit of the key should produce a change in many bits of the cipher text whenever there is a change in one bit of the plain text or one bit of key is called Avalanche effect.  J. Tavares.  B.-P. Cambridge Security Workshop Proceedings (De cember 1993). Heys and S. 64-Bit Block Cipher (Blowfish). 1977. John Wiley & Sons.7. E.” FIPS Publication 81.  S. Paar. 1980.. “Description of a New Variable-Length Key. Springer-Verlag. BIBLIOGRAPHY  “National Bureau of Standards – Data Encryption Standard. 1998. By taking the advantage of Avalanche effect we can modify the Feistel ne twork structure as by introducing the second XOR operation in the first stage an d adding output of two XOR outputs. Dobb’s Journal. That is for complete algorithm we can reduce the 16 addition operat ions with which we can reduce the hardware complexity for most secure data. it was decided to modify function F and determine whether the modified function F s aves time and this can be analyzed with the help of Avalanche effect. 1996. “Fast DES Implementation for FPGAs and its Application to a Universal Key-search Machine. Schneier.  H.” FIPS Publication 46. “Substitution – Permutation Networks Resistant to . Schneier.” unpublished manuscript.1 Future Socpe Since function F (Feistel) plays an important role in the algorithm.  B.
” unpublished manuscript. “On the Design of Secure Block Ciphers. “Data Book. E. May 1994.  M. Maste r Thesis. 9.” Technical Report.” 1999. Honig. Oct. pp. 1997. Kingston. Riaz and H. Edmonton.  Liang-Yu Chang.  D. n. 1996. 1.” Journal of Cryptology. Heys and S. Tavares. Gaj.  Altera. 1-1 9.” in Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering. Ontario. “Blowfish Chip Design.” Proceedings of Queen’s 17th Biennial Symposium on Communications. . v. May 1999.  P. Tatung University. 1997. “The FPGA Implementation of the RC6 and CAST-256 Encryption Algorithms. July 1999. Chodowiec and K. “Design and Implementation of Data Encryption Processor”. Heys.  H.Differential and Linear Cryptanalysis. “Implementation of the Twofish Cipher Using FPGA Devices. Alberta.