ECE545 Lecture2 Project 6

ECE 545 Digital System Design with VHDL
Course web page:

ECE web page Courses Course web pages ECE 545
http://ece.gmu.edu/coursewebpages/ECE/ECE545/F10/
Kris Gaj Research and teaching interests:

reconfigurable computing computer arithmetic cryptography network security
Contact:
The Engineering Building, room 3225 kgaj@gmu.edu Office hours: Monday, 7:30-8:30 PM, Wednesday, 6:00-7:00 PM, and by appointment
ECE 545 Part of: MS in Computer Engineering

One of five core courses (must be passed with B or better) Strongly suggested for two concentration areas:
Design level
Digital System Computer Design with VHDL Arithmetic algorithmic register-transfer gate transistor layout devices ECE 545 ECE 645
Courses
VLSI Design VLSI Test for ASICs Concepts
Digital Systems Design Microprocessor and Embedded Systems

Elective course in the remaining concentration areas
ECE 586 ECE 680
ECE 681 Digital Integrated Circuits
ECE 682
MS in Electrical Engineering
Elective
Physical VLSI Design Semiconductor ECE 584 ECE684 Device Fundamentals
MOS Device Electronics
DIGITAL SYSTEMS DESIGN

Concentration advisors: Kris Gaj, Jens-Peter Kaps, Ken Hintz 1. ECE 545 Digital System Design with VHDL K. Gaj, project, FPGA design with VHDL, Aldec/Mentor Graphics, Xilinx/Altera 2. ECE 645 Computer Arithmetic K. Gaj, project, FPGA design with VHDL or Verilog, Aldec/Mentor Graphics, Xilinx/Altera 3. ECE 681 VLSI Design for ASICs N. Klimavicz, project/lab, back-end ASIC design with Synopsys tools 4. ECE 586 Digital Integrated Circuits D. Ioannou, R. Mulpuri 5. ECE 682 VLSI Test Concepts T. Storey
Grading Scheme
Homework
10% 40% 20% 30%
Project Midterm Exam Final Exam
Midterm exam 1
2 hours 30 minutes in class design-oriented open-books, open-notes practice exams will be available on the web Tentative date: Monday, November 1st
Final exam
2 hours 45 minutes in class design-oriented open-books, open-notes practice exams will be available on the web Date: Monday, December 20, 7:30-10:15pm
Project
individual semester-long
Project
related to the research project conducted by Cryptographic Engineering Research Group (CERG) at GMU supporting NIST (National Institute of Standards and Technology) in the evaluation of candidates for a new cryptographic standard
9
Hash Function
arbitrary length
m message
Background
It is computationally infeasible to find such m and m that h(m)=h(m)
11
hash function hash value
h(m)
fixed length
Main Application: Digital Signature

Signature
HANDWRITTEN DIGITAL A6E3891F2939E38C745B 25289896CA345BEF5349 245CBA653448E349EA47
Typical Digital Signature Scheme

Alice
Message Signature Message Signature
Bob
Hash function
Hash function
Hash value 1 Hash value

yes no
Main Goals:
unique identification proof of agreement to the contents of the document
Public key cipher
Hash value 2
Public key cipher
Alices private key
Alices public key
Handwritten and Digital Signatures

Common Features
Handwritten signature 1. Unique 2. Impossible to be forged 3. Impossible to be denied by the author 4. Easy to verify by an independent judge 5. Easy to generate Digital signature
Handwritten and Digital Signatures

Differences
Handwritten signature Digital signature 6. Associated physically 6. Can be stored and with the document transmitted independently of the document 7. Almost identical 7. Function of the for all documents document 8. Usually at the last 8. Covers the entire page document
Hash function algorithms

Customized (dedicated)
MD2 MD4
Rivest 1988 Rivest 1990
Based on block ciphers

MDC-2 MDC-4
Based on modular arithmetic

MASH-1
1988-1996
Attacks against dedicated hash functions known by 2004

MD2 MD4
partially broken broken, H. Dobbertin, 1995 (one hour on PC, 20 free bytes at the start of the message)
IBM, Brachtl, Meyer, Schilling, 1988
MD5 MD5
Rivest 1990
SHA-0 discovered, RIPEMD

1995 NSA, 1998 France
weakness
SHA-0 SHA-1
NSA, 1992
RIPEMD
European RACE Integrity Primitives Evaluation Project, 1992
NSA, 1995
RIPEMD-160
NSA, 2000
partially broken, collisions for the compression function, Dobbertin, 1996 (10 hours on PC)
reduced round version broken, Dobbertin 1995
SHA-1
RIPEMD-160
SHA-256, SHA-384, SHA-512
SHA-256, SHA-384, SHA-512
What was discovered in 2004-2005?

MD4
broken; Wang, Feng, Lai, Yu, Crypto 2004 (manually, without using a computer)
263 operations
Schneier, 2005 In hardware: Machine similar to the one used to break DES:
MD5
broken; Wang, Feng, Lai, Yu Crypto 2004 (1 hr on a PC)
SHA-0 SHA-1
attack with 240 operations Crypto 2004 RIPEMD attack with 263 operations Wang, Yin, Yu, Aug 2005
broken; Wang, Feng, Lai, Yu, Crypto 2004 (manully, without using a computer)
Cost = $50,000-$70,000 or Cost = $0.9-$1.26M In software:
Time: 18 days Time: 24 hours
RIPEMD-160
Computer network similar to distributed.net used to break DES (~331,252 computers) : Cost = ~ $0 Time: 7 months
SHA-256, SHA-384, SHA-512
Cryptographic Standards
National Security Agency (also known as No Such Agency or Never Say Anything)
Created in 1952 by president Truman Goals: designing strong ciphers (to protect U.S. communications) breaking ciphers (to listen to non-U.S. communications) Budget and number of employees kept secret Largest employer of mathematicians in the world Larger purchaser of computer hardware
So how the cryptographic standards have been created so far?
NSA-developed Cryptographic Standards

Block Ciphers 1977
DES Data Encryption Standard Triple DES
Cryptographic Standard Contests

IX.1997 X.2000 15 block ciphers 1 winner
1999
2005
AES
I.2000
NESSIE
CRYPTREC
XII.2002 XI.2004 V.2008 X.2007 XII.2012
Hash Functions
1993 1995
SHA-0
2003
SHA-2
SHA-1Secure Hash Algorithm
34 stream ciphers 4 SW+4 HW winners
eSTREAM SHA-3
51 hash functions 1 winner
1970
1980
1990
2000
2010 time
96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12
time
SHA-3 Contest - NIST Evaluation Criteria
Software or hardware?
SOFTWARE
security of data during transmission
HARDWARE
speed random key generation
Security
So*ware Eciency Flexibility
Hardware Eciency
ASICs FPGAs
low cost flexibility (new cryptoalgorithms, protection against new attacks)
access control to keys resistance to side-channel attacks tamper resistance
Simplicity
Licensing
25
Primary efficiency indicators

Latency Software Hardware
Mi
Efficiency parameters
Throughput = Speed
Mi+2 Mi+1 Mi
Speed
Memory
Speed
Area
Encryption/ decryption
Time to encrypt/decrypt Encryption/ a single block decryption of data

Ci+2 Ci+1 Ci
Power consumption
Ci
Number of bits encrypted/decrypted in a unit of time
Throughput =
Block_size Number_of_blocks_processed_simultaneously Latency
Advanced Encryption Standard (AES) Contest 1997-2001

June 1998 15 Candidates
from USA, Canada, Belgium, France, Germany, Norway, UK, Israel, Korea, Japan, Australia, Costa Rica
Speed of the final AES candidates in Xilinx FPGAs Speed [Mbit/s]

500 450 400 350 300 250 200 150 100 50 0 K.Gaj, P. Chodowiec, AES3, April, 2000
Round 1
Security Software efficiency Flexibility
August 1999 5 final candidates

Mars, RC6, Rijndael, Serpent, Twofish
Round 2
Security Hardware efficiency
October 2000 1 winner: Rijndael

Belgium
Serpent Rijndael Twofish RC6
Mars
# votes
100 90 80 70 60 50 40 30 20 10 0
Survey filled by 167 participants of the Third AES Conference, April 2000
Results of the NSA group ASICs Speed [Mbit/s] AES3, April, 2000
700 600 500 400 300 200 100 0
105 202 177 103 57 414 431 606 NSA ASIC GMU FPGA
143 61
Rijndael Serpent Twofish
RC6
Mars
Rijndael Serpent Twofish RC6
Mars
Efficiency in software: NIST-specified platform Speed [Mbits/s] 30 25 20 15 10 5 0 Rijndael RC6 Twofish Mars Serpent
Adequate
200 MHz Pentium Pro, Borland C++ 128-bit key 192-bit key 256-bit key
NIST Report: Security

Security
AES Final Report, October 2000
High
Serpent
MARS Twofish
Rijndael RC6 Simple Complex Complexity
NIST SHA-3 Contest - Timeline
GMU Team Goals

Fair and comprehensive methodology for evaluation of hardware performance in FPGAs
51 candidates
Round 1 14 Oct. 2008 July 2009
Round 2
5-6
Round 3 1-2 Mid 2012
End of 2010
High-speed fully autonomous implementations of all 14 SHA-3 candidates & SHA-2 256-bit & 512-bit variants optimized for the maximum throughput to area ratio Open-source benchmarking tool supporting optimization of tool options and efficient generation of results for multiple FPGA families
35
36
Primary Designers of GMU Codes

Ekawat Homsirikamol a.k.a Ice Marcin Rogawski
Methodology
Developed optimized VHDL implementations of 14 Round 2 SHA-3 candidates + SHA-2 in two variants each (256 & 512-bit output), for some functions using several alternative architectures
38
Comprehensive Evaluation
two major vendors: Altera and Xilinx (~90% of the market) multiple high-performance and low-cost families
Altera
Technology Low-cost Highperformance Low-cost
Uniform Evaluation
Language: VHDL Tools: Interface Performance Metrics Design Methodology Benchmarking
39 40
FPGA vendor tools
Xilinx
Highperformance
90 nm 65 nm
Cyclone II Cyclone III
Stratix II Stratix III
Spartan 3
Virtex 4 Virtex 5
Why Interface Matters?

Pin limit
Interface: Two possible solutions

msg_bitlen message zero_word
end_of_msg
SHA core
Total number of i/o ports Total number of an FPGA i/o pins
Support for the maximum throughput

Time to load the next message block Time to process previous block
Length of the message communicated at the beginning + easy to implement passive source circuit area overhead for the counter of message bits
41
Dedicated end of message port more intelligent source circuit required + no need for internal message bit counter
42
SHA Core: Interface & Typical Configuration

clk clk rst rst clk clk rst rst clk rst rst
SHA Core: Interface & Typical Configuration

io_clk rst clk rst io_clk clk io_clk clk rst rst io_clk rst clk rst
clk
ext_idata w foin_full foin_write
Input FIFO
din full write dout empty read idata w foin_empty foin_read
SHA core
din src_ready src_read dout dst_ready dst_write odata w foout_full foout_write
Output FIFO
din dout
ext_odata w foout_empty foout_read
ext_idata w foin_full foin_write
Input FIFO
din full write dout
SHA core
dout dst_ready dst_write odata w foout_full foout_write
Output FIFO
din dout ext_odata w foout_empty foout_read
full empty write read
idata din w foin_empty empty src_ready read foin_read src_read
full empty write read
SHA core is an active component; surrounding FIFOs are passive and widely available Input interface is separate from an output interface Processing a current block, reading the next block, and storing a result for the previous message can be all done in parallel
Some functions may require a faster input/output clock in order to load input data at a faster rate
43
44
Performance Metrics Primary

1. Throughput (single long message) 2. Area 3. Throughput / Area 3. Hash Time for Short Messages (up to 1000 bits)
45
Performance Metrics - Area Secondary

We force these vectors to look as follows through the synthesis and implementation options:
0 0
0 0
Areaa
46
Choice of Optimization Target

Primary Optimization Target: Throughput to Area Ratio Features: practical: good balance between speed and cost very reliable guide through the entire design process, facilitating the choice of
high-level architecture implementation of basic components choice of tool options
Our Design Flow

Specification Interface Controller Template Library of Basic Components
Datapath Block diagram
Controller ASM Chart
VHDL Code Formulas for Throughput & Hash time
Max. Clock Freq. Resource Utilization Throughput, Area, Throughput/Area, Hash Time for Short Messages
leads to high-speed, close-to-maximum-throughput designs

47
48
Basic Operations of 14 SHA-3 Candidates
ATHENa Automated Tool for Hardware Evalua?oN

http://cryptography.gmu.edu/athena
Benchmarking open-source tool, wriGen in Perl, aimed at an AUTOMATED genera?on of OPTIMIZED results for MULTIPLE FPGA plaSorms Under development at George Mason University.
NTT Number Theoretic Transform, GF MUL Galois Field multiplication, 49 MUL integer multiplication, mADDn multioperand addition with n operands
49
50
Basic Dataflow of ATHENa

User
6 5
FPGA Synthesis and Implementation

3
conguraKon les
constraint les testbench
Database query
Ranking of designs
HDL + scripts + configuration files

1
Result Summary + Database Entries HDL + FPGA Tools
synthesizable source les
ATHENa Server
Download scripts and configuration files8

4
Database Entries
0
Designer Interfaces + Testbenches

51
result summary (user-friendly)
database entries (machine- friendly)

52
ATHENa Major Features (1)

synthesis, implementa?on, and ?ming analysis in batch mode support for devices and tools of mulKple FPGA vendors:
ATHENa Major Features (2)

automated vericaKon of designs through simula?on in batch mode OR support for mulK-core processing automated extracKon and tabulaKon of results several opKmizaKon strategies aimed at nding

53
genera?on of results for mulKple families of FPGAs of a given vendor
automated choice of a best-matching device within a given family
op?mum op?ons of tools best target clock frequency best star?ng point of placement
54
Generation of Results Facilitated by ATHENa

batch mode of FPGA tools
vs.
Relative Improvement of Results from Using ATHENa Virtex 5, 256-bit Variants of Hash Functions
2.5 2 1.5 1 Area Thr Thr/Area
ease of extraction and tabulation of results

Excel, CSV (available), LaTeX (coming soon)
0.5 0
optimized choice of tool options

55
Gr oe Sh stl av ite -3 Lu ffa Ke cc ak Ha ms i Ec ho Sk ein Fu gu e Sh a2 B Cu MW be Ha sh Bl ak e Sh ab al SI MD
Ratios of results obtained using ATHENa suggested options vs. default options of FPGA tools
JH
56
Results
58
Throughput [Mbit/s] Virtex 5, 256-bit variants of algorithms

16000 14000 12000 10000 8000 6000 4000 2000 0
Throughput [Mbit/s] Virtex 5, 512-bit variants of algorithms

14000.0 12000.0 10000.0 8000.0 6000.0 4000.0 2000.0 0.0
ffa
as h Fu gu SH e Av ite -3 BL AK E
al
si
JH
ak
es
tl
ffa
O SI M D
al
G ro e
EC
Av i
BL
Sh
SH
eH
ub
Ke
SH
59
ub
Fu gu e
60
G ro
BM
EC
SH
eH
Lu
cc
Ke
Sh
Sk
am si
st l
am
SI M
BM
ab
JH
A-
ei
ak
te -
as
Lu
cc
AK
Sk
ab
A-
ei
10
Normalization & Compression of Results

Absolute result
e.g., throughput in Mbits/s, area in CLB slices
Normalized Throughput & Overall Normalized Throughput
Normalized result
normalized _ result = result _ for _ SHA 3_ candidate result _ for _ SHA 2
Overall normalized result
Geometric mean of normalized results for all inves?gated FPGA families

61 62
Overall Normalized Throughput: 256-bit variants of algorithms

Normalized to SHA-256, Averaged over 7 FPGA families
8 7 6 5 4 3 2 1 0
Overall Normalized Throughput: 512-bit variants of algorithms

4 3.5 3 2.5 2 1.5 1 0.5 0
ffa
al Sh ab
G ro
EC
eH
Av i
BL
Ke
ub
ffa
ub e h Has
JH
al
si
Fu gu e
ak
es
te -
AK
am
ei
EC
G ro
Av i
Sk
Ke
BL
SH
Sh
SI M
BM
cc
Lu
ab
63
tl
SH
Area [CLB slices] Virtex 5, 256-bit variants of algorithms

10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0
Area [CLB slices] Virtex 5, 512-bit variants of algorithms

18000 16000 14000 12000 10000 8000 6000 4000 2000 0
ffa
h Fu gu e
JH
st l
W BM
al
si
H am si Fu gu e
SI M D
G ro e
AK
BM
Sk
SH
eH
Sh
Av i
Sk
SH
G ro
EC
eH
Av i
Sh
Ke
BL
ub
Ke
ub
SH
65
SH
BL
EC
Lu
cc
SI M D
66
ffa
JH
al
as
A-
ak
tl
te -
ak
ei
am
AK
as
te -
es
A-
ab
ei
ab
Lu
cc
si Fu gu e
64
JH
es
ak
as
te -
tl
AK
ei
SI M
BM
Lu
cc
Sk
am
11
Overall Normalized Area: 256-bit variants of algorithms

30 25 20 15 10 5 0 30 25 20 15 10 5 0
Overall Normalized Area: 512-bit variants of algorithms

ffa
ffa
st l BM W
as h Fu gu e Ke cc ak Sh ab al
ei n Fu gu e
st l BM W
al
am si
am si
te -3
te -3
JH
JH
O EC H O EC H
as
ak
AK
ei
SI M
AK
G ro e
BL
Ke
BL
ub
SH
ub
67
SH
G ro e
Sk
eH
EC
Av i
eH
Sk
Sh
Av i
Overall Normalized Throughput/Area: 256-bit variants

2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 1.4 1.2 1 0.8 0.6 0.4 0.2
Overall Normalized Throughput/Area: 512-bit variants

ffa
al
JH
si
tl
Fu gu e
ak
as
es
AK
te -
ei
am
E Fu gu e
G ro
es
eH
as
te -
Sk
am
SI M
Sh
BM
BL
Av i
AK
ab
Sk
G ro
eH
Av i
EC
Ke
Lu
cc
BL
Sh
ub
ub
SH
69
SH
Ke
Throughput vs. Area Normalized to Results for SHA-256 and Averaged over 7 FPGA Families 256-bit variants
best
Throughput vs. Area Normalized to Results for SHA-512 and Averaged over 7 FPGA Families 512-bit variants
best
worst
worst
71
SI M
70 72
ab
BM
ffa
al
JH
si
tl
ak
ei
Lu
cc
SI M
68
Lu
ab
Lu
cc
12
Execution Time for Short Messages up to 1000 bits Virtex 5, 256-bit variants of algorithms
Execution Time for Short Messages up to 1000 bits Virtex 5, 512-bit variants of algorithms
73
74
256-bit variants
Thr/Area Thr Area Short msg.
512-bit variants
Thr/Area Thr Area Short msg.
Summary of Results
Throughput/Area & Throughput most crucial for high-speed implementations Area cannot be easily traded for Throughput Best performers so far 1-2. Keccak & Luffa 3. Groestl Worst performers so far: 14. SIMD 13. ECHO 12. BMW
BLAKE BMW CubeHash ECHO Fugue Groestl Hamsi JH Keccak Luffa Shabal SHAvite-3 SIMD Skein
75
76
More About our Designs & Tools

Cryptology e-Print Archive - 2010/445 (100+ pages) Detailed hierarchical block diagrams Corresponding formulas for execution time and throughput
FPL 2010 paper ATHENa features Case studies
Comparison with Other Groups
ATHENa web site Most recent results Comparisons with results from other groups Optimum options of tools
77 78
13
Comparison with Best Results Reported by Other Groups Virtex 5, 256-bit variants of algorithms
OTHER GROUPS
Area
BLAKE CubeHash ECHO Groestl Hamsi Keccak Luffa Shabal Skein (estimated)
Best Overall Reported Results as of Aug. 6, 2010 Virtex 5, 256-bit variants of algorithms
BEST REPORTED RESULTS
GMU
Source
Kobayashi et al. Kobayashi et al. Lu et al. Gauvaram et al. Kobayashi et al.
Thr 2676 2960 14860 10276 1680 6900 6343 2051 3535
Thr/Area 1.61 5.02 1.59 5.97 2.34
Area 1871 707 5445 1884 946 1229 1154 1266 1463
Thr 2854 3445 13875 8677 2646 10807 8008 2624 2812
Thr/Area 1.53 4.87 2.55 4.61 2.80 8.79 6.94 2.07 1.92
79
Area BLAKE BMW CubeHash ECHO Fugue Groestl Hamsi JH Keccak Luffa Shabal SHAvite-3 SIMD Skein 1660 4400 590 5445 956 1722 946 1108 1229 1154 153 1130 9288 1632
Thr 2676 5577 2960 13875 3151 10276 2646 3955 10807 8008 2051 2887 2326 3535
Thr/Area 1.61 1.27 5.02 2.55 3.30 5.97 2.80 3.57 8.79 6.94 13.41 2.55 0.25 2.17
Source Kobayashi et al. GMU Kobayashi et al. GMU GMU Gauvaram et al. GMU GMU GMU GMU Detrey et al. GMU GMU Tillich et al.
1660 590 9333 1722 718 1412 1048 153 1632
4.89 Bertoni et al. 6.05

Kobayashi et al.
13.41 Detrey et al. 2.17

Tillich
80
Throughput vs. Area: Best reported results Virtex 5, 256-bit variants of algorithms
best
Your Project
worst 82
81
Analysis of Alternative Architectures - Unrolled
Analysis of Alternative Architectures - Folded

Folded Vertically-2x (fv2) Folded Horizontally-2x (fh2)
Basic
r times
r/2 times
r times
83
2r times
2r times
84
14
Preliminary results for CubeHash, Groestl, Keccak & Luffa in Virtex 5

8
Your Project
14 SHA-3 candidates left in the contest
x1
x1
Keccak
x2
Given:
specification of the function reference implementation in C interface testbench and test vectors GMU implementation of the basic version including
block diagrams ASM charts short description formulas for execution time & throughput source codes results for Xilinx and Altera FPGAs
Normalized Throughput
x2 x4 fv2
x1
5 4 3 2
Luffa
x1 ^2 fv3 x2
Groestl
fv4
CubeHash Groestl
CubeHash
2 3
Luffa Keccak
1 0 0 1 4 5
Normalized Area
85
Your Project
Develop:
Block diagram ASM chart Formulas for execution time & throughput Synthesizable code in VHDL Results for multiple families of FPGAs from Xilinx and Altera for at least one architecture from each of the following three classes of architectures: Unrolled architecture Folded architecture Architecture based on the use of embedded FPGA resources (BRAMs, multipliers, DSP units, etc.) [256 bit only, 512-bit only, or both]
What is an FPGA?
Configurable Logic Blocks I/O Blocks Block RAMs & Embedded Multipliers
Block RAMs and MULs
Block RAMs and MULs
88
RAM Blocks and Multipliers in Xilinx FPGAs
Using Embedded FPGA Resources
Basic design Your design
( 1536, ( 768,
0, 2,
0) 4)
Basic design
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
( 3010, ( 1505,
0, 32 kbit,
0) 4)
90
Your design
89
15
Block RAM
Spartan-3 Dual-Port Block RAM
Block RAM can have various configurations (port aspect ratios)

1 0 2 0 0 4
Port B
Most efficient memory implementation

Dedicated blocks of memory
Ideal for most memory requirements

4 to 104 memory blocks
18 kbits = 18,432 bits per block (16 k without parity bits)
Use multiple blocks for larger memories
Builds both single and true dual-port RAMs Synchronous write and read (different from distributed RAM)
91
Dual-Port Bus Flexibility

WEA
Port A
ENA RSTA CLKA ADDRA[9:0] DIA[17:0] WEB ENB
Block RAM
8k x 2
4,095
4k x 4
16k x 1
8,191 0
8+1
2k x (8+1)
2047 16+2 0 1023 16,383
92
1024 x (16+2)
Embedded Multipliers in Spartan 3
Port A In 1K-Bit Depth
DOA[17:0]
Port A Out 18-Bit Width
Port B In 1k-Bit Depth
RSTB CLKB ADDRB[9:0] DIB[17:0]
DOB[17:0]
Port B Out 18-Bit Width
18x18 bit signed multipliers with optional input/output registers

93 94
Multiplier-Accumulator - MAC
Xilinx XtremeDSP
Starting with Virtex 4 family, Xilinx introduced DSP48 block for high-speed DSP on FPGAs Essentially a multiply-accumulate core with many other features Now also Spartan-3A and Virtex 5 have DSP blocks
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com) 95
96
16
DSP48 Slice: Virtex 4
Simplified Form of DSP48
97
98
Xilinx FPGA Devices

Technology 120/150 nm 90 nm 65 nm 45 nm 40 nm Spartan 6 Virtex 6 Spartan 3 Low-cost High- performance Virtex 2, 2 Pro Virtex 4 Virtex 5 Technology 130 nm 90 nm 65 nm 40 nm
Altera FPGA Devices

Low-cost Cyclone Cyclone II Cyclone III Cyclone IV Arria I Arria II Mid-range High- performance Stra?x StraKx II StraKx III StraKx IV
All Projects - Organization

Projects divided into phases Deliverables for each phase submitted through Blackboard at selected checkpoints and evaluated by the instructor and/or TA Feedback provided to students on a best effort basis Final report and codes submitted using Blackboard at the end of the semester
Honor Code Rules

All students are expected to write and debug their codes individually Students are encouraged to help and support each other in all problems related to the - operation of the CAD tools, - basic understanding of the problem.
17
Course Objectives
At the end of this course you should be able to:
Code in VHDL for synthesis Decompose a digital system into a controller (FSM) and datapath, and code accordingly Write VHDL testbenches Synthesize and implement digital systems on FPGAs Effectively code digital systems for cryptography, signal processing, and microprocessor applications
Additional Skills Learned in the Project

Reading & understanding specification of a complex algorithm Design of new hardware architectures based on existing architectures (datapath & controller) Reading, understanding, and modifying existing VHDL code Using embedded resources of modern FPGAs Characterizing performance of your codes for multiple FPGA families
This knowledge will come about through homework, exams, and an extensive project
The project in particular will help you know VHDL and the FPGA design flow from beginning to end
103
104
Project Task 1
Read the following chapters from the GMU technical report published at http://eprint.iacr.org/2010/445
Chapter 1 Introduction & Motivation Chapter 2 Methodology Chapter 3 Comprehensive Designs of SHA-3 Candidates 3.1, 3.2 + subsection concerning your algorithm Chapter 4 Design Summary and Results
Project Task 1 cont.

In one week: Meeting with the instructor devoted to fully understanding the GMU report, specification, block diagrams, interface, and timing formulas. In two weeks: Draft block diagrams of the - selected unrolled architecture - selected folded architecture. Corresponding timing formulas for execution time & throughput.
105 106
Download and get familiar with the package of a hash function assigned to you
http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/submissions_rnd2.html
Read carefully the specification of your algorithm
18

ECE545 Lecture2 Project 6

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ECE545 Lecture2 Project 6

Uploaded by

Copyright:

Available Formats

ECE 545 Digital System Design with VHDL

Course web page:

Kris Gaj Research and teaching interests:

ECE 545 Part of: MS in Computer Engineering

Digital Systems Design Microprocessor and Embedded Systems

ECE 586 ECE 680

ECE 681 Digital Integrated Circuits

Physical VLSI Design Semiconductor ECE 584 ECE684 Device Fundamentals

MOS Device Electronics

DIGITAL SYSTEMS DESIGN

10% 40% 20% 30%

Project Midterm Exam Final Exam

hash function hash value

Main Application: Digital Signature

Typical Digital Signature Scheme

Hash value 1 Hash value

unique identification proof of agreement to the contents of the document

Public key cipher

Public key cipher

Alices private key

Alices public key

Handwritten and Digital Signatures

Handwritten and Digital Signatures

Hash function algorithms

Based on block ciphers

Based on modular arithmetic

Attacks against dedicated hash functions known by 2004

IBM, Brachtl, Meyer, Schilling, 1988

SHA-0 discovered, RIPEMD

reduced round version broken, Dobbertin 1995

SHA-256, SHA-384, SHA-512

SHA-256, SHA-384, SHA-512

What was discovered in 2004-2005?

Cost = $50,000-$70,000 or Cost = $0.9-$1.26M In software:

Time: 18 days Time: 24 hours

SHA-256, SHA-384, SHA-512

So how the cryptographic standards have been created so far?

NSA-developed Cryptographic Standards

Cryptographic Standard Contests

XII.2002 XI.2004 V.2008 X.2007 XII.2012

SHA-1Secure Hash Algorithm

34 stream ciphers 4 SW+4 HW winners

51 hash functions 1 winner

SHA-3 Contest - NIST Evaluation Criteria

So*ware Eciency Flexibility

low cost flexibility (new cryptoalgorithms, protection against new attacks)

access control to keys resistance to side-channel attacks tamper resistance

Primary efficiency indicators

Time to encrypt/decrypt Encryption/ a single block decryption of data

Number of bits encrypted/decrypted in a unit of time

Block_size Number_of_blocks_processed_simultaneously Latency

Advanced Encryption Standard (AES) Contest 1997-2001

Speed of the final AES candidates in Xilinx FPGAs Speed [Mbit/s]

August 1999 5 final candidates

October 2000 1 winner: Rijndael

Serpent Rijndael Twofish RC6

Rijndael Serpent Twofish

Rijndael Serpent Twofish RC6

NIST Report: Security

Rijndael RC6 Simple Complex Complexity

NIST SHA-3 Contest - Timeline

GMU Team Goals

Round 1 14 Oct. 2008 July 2009