Professional Documents
Culture Documents
Contact:
The Engineering Building, room 3225 kgaj@gmu.edu Office hours: Monday, 7:30-8:30 PM, Wednesday, 6:00-7:00 PM, and by appointment
Design level
Digital System Computer Design with VHDL Arithmetic algorithmic register-transfer gate transistor layout devices ECE 545 ECE 645
Courses
VLSI Design VLSI Test for ASICs Concepts
ECE 682
MS in Electrical Engineering
Elective
Grading Scheme
Homework
Midterm exam 1
2 hours 30 minutes in class design-oriented open-books, open-notes practice exams will be available on the web Tentative date: Monday, November 1st
Final exam
2 hours 45 minutes in class design-oriented open-books, open-notes practice exams will be available on the web Date: Monday, December 20, 7:30-10:15pm
Project
individual semester-long
Project
related to the research project conducted by Cryptographic Engineering Research Group (CERG) at GMU supporting NIST (National Institute of Standards and Technology) in the evaluation of candidates for a new cryptographic standard
9
Hash Function
arbitrary length
m message
Background
It is computationally infeasible to find such m and m that h(m)=h(m)
11
h(m)
fixed length
Bob
Hash function
Hash function
Main Goals:
Hash value 2
MD5 MD5
Rivest 1990
weakness
SHA-0 SHA-1
NSA, 1992
RIPEMD
European RACE Integrity Primitives Evaluation Project, 1992
NSA, 1995
RIPEMD-160
NSA, 2000
partially broken, collisions for the compression function, Dobbertin, 1996 (10 hours on PC)
SHA-1
RIPEMD-160
263 operations
Schneier, 2005 In hardware: Machine similar to the one used to break DES:
MD5
broken; Wang, Feng, Lai, Yu Crypto 2004 (1 hr on a PC)
SHA-0 SHA-1
attack with 240 operations Crypto 2004 RIPEMD attack with 263 operations Wang, Yin, Yu, Aug 2005
broken; Wang, Feng, Lai, Yu, Crypto 2004 (manully, without using a computer)
RIPEMD-160
Computer network similar to distributed.net used to break DES (~331,252 computers) : Cost = ~ $0 Time: 7 months
Cryptographic Standards
National Security Agency (also known as No Such Agency or Never Say Anything)
Created in 1952 by president Truman Goals: designing strong ciphers (to protect U.S. communications) breaking ciphers (to listen to non-U.S. communications) Budget and number of employees kept secret Largest employer of mathematicians in the world Larger purchaser of computer hardware
1999
2005
AES
I.2000
NESSIE
CRYPTREC
Hash Functions
1993 1995
SHA-0
2003
SHA-2
eSTREAM SHA-3
1970
1980
1990
2000
2010 time
96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12
time
Software or hardware?
SOFTWARE
security of data during transmission
HARDWARE
speed random key generation
Security
Hardware
Eciency
ASICs
FPGAs
Simplicity
Licensing
25
Efficiency parameters
Throughput = Speed
Mi+2 Mi+1 Mi
Speed
Memory
Speed
Area
Encryption/ decryption
Power consumption
Ci
Throughput =
Round 1
Security Software efficiency Flexibility
Round 2
Security Hardware efficiency
Mars
# votes
100 90 80 70 60 50 40 30 20 10 0
Survey filled by 167 participants of the Third AES Conference, April 2000
Results of the NSA group ASICs Speed [Mbit/s] AES3, April, 2000
700 600 500 400 300 200 100 0
105 202 177 103 57 414 431 606 NSA ASIC GMU FPGA
143 61
RC6
Mars
Mars
Efficiency in software: NIST-specified platform Speed [Mbits/s] 30 25 20 15 10 5 0 Rijndael RC6 Twofish Mars Serpent
Adequate
200 MHz Pentium Pro, Borland C++ 128-bit key 192-bit key 256-bit key
High
Serpent
MARS Twofish
51 candidates
Round 2
5-6
End of 2010
High-speed fully autonomous implementations of all 14 SHA-3 candidates & SHA-2 256-bit & 512-bit variants optimized for the maximum throughput to area ratio Open-source benchmarking tool supporting optimization of tool options and efficient generation of results for multiple FPGA families
35
36
Methodology
Developed optimized VHDL implementations of 14 Round 2 SHA-3 candidates + SHA-2 in two variants each (256 & 512-bit output), for some functions using several alternative architectures
38
Comprehensive Evaluation
two major vendors: Altera and Xilinx (~90% of the market) multiple high-performance and low-cost families
Altera
Technology Low-cost Highperformance Low-cost
Uniform Evaluation
Language: VHDL Tools: Interface Performance Metrics Design Methodology Benchmarking
39 40
Xilinx
Highperformance
90 nm 65 nm
Spartan 3
Virtex 4 Virtex 5
end_of_msg
SHA core
Length of the message communicated at the beginning + easy to implement passive source circuit area overhead for the counter of message bits
41
Dedicated end of message port more intelligent source circuit required + no need for internal message bit counter
42
clk
Input
FIFO
din
full
write
dout
empty
read
idata
w
foin_empty
foin_read
SHA
core
din
src_ready
src_read
dout
dst_ready
dst_write
odata
w
foout_full
foout_write
Output
FIFO
din
dout
Input
FIFO
din
full
write
dout
SHA
core
dout
dst_ready
dst_write
odata
w
foout_full
foout_write
Output
FIFO
din
dout
ext_odata
w
foout_empty
foout_read
SHA core is an active component; surrounding FIFOs are passive and widely available Input interface is separate from an output interface Processing a current block, reading the next block, and storing a result for the previous message can be all done in parallel
Some functions may require a faster input/output clock in order to load input data at a faster rate
43
44
0 0
0 0
Areaa
46
Max. Clock Freq. Resource Utilization Throughput, Area, Throughput/Area, Hash Time for Short Messages
48
Benchmarking open-source tool, wriGen in Perl, aimed at an AUTOMATED genera?on of OPTIMIZED results for MULTIPLE FPGA plaSorms Under development at George Mason University.
NTT Number Theoretic Transform, GF MUL Galois Field multiplication, 49 MUL integer multiplication, mADDn multioperand addition with n operands
49
50
conguraKon les
Database query
Ranking of designs
ATHENa Server
Database Entries
0
op?mum
op?ons
of
tools
best
target
clock
frequency
best
star?ng
point
of
placement
54
Relative Improvement of Results from Using ATHENa Virtex 5, 256-bit Variants of Hash Functions
2.5 2 1.5 1 Area Thr Thr/Area
0.5 0
Ratios of results obtained using ATHENa suggested options vs. default options of FPGA tools
JH
56
Results
58
ffa
as h Fu gu SH e Av ite -3 BL AK E
al
si
JH
ak
es
tl
ffa
O SI M D
al
G ro e
EC
Av i
BL
Sh
SH
eH
ub
Ke
SH
59
ub
Fu gu e
60
G ro
BM
EC
SH
eH
Lu
cc
Ke
Sh
Sk
am si
st l
am
SI M
BM
ab
JH
A-
ei
ak
te -
as
Lu
cc
AK
Sk
ab
A-
ei
10
Normalized
result
normalized _ result = result _ for _ SHA 3_ candidate result _ for _ SHA 2
ffa
al Sh ab
G ro
EC
eH
Av i
BL
Ke
ub
ffa
ub e h Has
JH
al
si
Fu gu e
ak
es
te -
AK
am
ei
EC
G ro
Av i
Sk
Ke
BL
SH
Sh
SI M
BM
cc
Lu
ab
63
tl
SH
ffa
h Fu gu e
JH
st l
W BM
al
si
H am si Fu gu e
SI M D
G ro e
AK
BM
Sk
SH
eH
Sh
Av i
Sk
SH
G ro
EC
eH
Av i
Sh
Ke
BL
ub
Ke
ub
SH
65
SH
BL
EC
Lu
cc
SI M D
66
ffa
JH
al
as
A-
ak
tl
te -
ak
ei
am
AK
as
te -
es
A-
ab
ei
ab
Lu
cc
si Fu gu e
64
JH
es
ak
as
te -
tl
AK
ei
SI M
BM
Lu
cc
Sk
am
11
ffa
ffa
st l BM W
as h Fu gu e Ke cc ak Sh ab al
ei n Fu gu e
st l BM W
al
am si
am si
te -3
te -3
JH
JH
O EC H O EC H
as
ak
AK
ei
SI M
AK
G ro e
BL
Ke
BL
ub
SH
ub
67
SH
G ro e
Sk
eH
EC
Av i
eH
Sk
Sh
Av i
ffa
al
JH
si
tl
Fu gu e
ak
as
es
AK
te -
ei
am
E Fu gu e
G ro
es
eH
as
te -
Sk
am
SI M
Sh
BM
BL
Av i
AK
ab
Sk
G ro
eH
Av i
EC
Ke
Lu
cc
BL
Sh
ub
ub
SH
69
SH
Ke
Throughput vs. Area Normalized to Results for SHA-256 and Averaged over 7 FPGA Families 256-bit variants
best
Throughput vs. Area Normalized to Results for SHA-512 and Averaged over 7 FPGA Families 512-bit variants
best
worst
worst
71
SI M
70 72
ab
BM
ffa
al
JH
si
tl
ak
ei
Lu
cc
SI M
68
Lu
ab
Lu
cc
12
Execution Time for Short Messages up to 1000 bits Virtex 5, 256-bit variants of algorithms
Execution Time for Short Messages up to 1000 bits Virtex 5, 512-bit variants of algorithms
73
74
256-bit variants
Thr/Area Thr Area Short msg.
512-bit variants
Thr/Area Thr Area Short msg.
Summary of Results
Throughput/Area & Throughput most crucial for high-speed implementations Area cannot be easily traded for Throughput Best performers so far 1-2. Keccak & Luffa 3. Groestl Worst performers so far: 14. SIMD 13. ECHO 12. BMW
BLAKE BMW CubeHash ECHO Fugue Groestl Hamsi JH Keccak Luffa Shabal SHAvite-3 SIMD Skein
75
76
ATHENa web site Most recent results Comparisons with results from other groups Optimum options of tools
77 78
13
Comparison with Best Results Reported by Other Groups Virtex 5, 256-bit variants of algorithms
OTHER
GROUPS
Area
BLAKE CubeHash ECHO Groestl Hamsi Keccak Luffa Shabal Skein (estimated)
Best Overall Reported Results as of Aug. 6, 2010 Virtex 5, 256-bit variants of algorithms
BEST REPORTED RESULTS
GMU
Source
Kobayashi et al. Kobayashi et al. Lu et al. Gauvaram et al. Kobayashi et al.
Thr 2676 2960 14860 10276 1680 6900 6343 2051 3535
Area 1871 707 5445 1884 946 1229 1154 1266 1463
Thr 2854 3445 13875 8677 2646 10807 8008 2624 2812
Thr/Area 1.53 4.87 2.55 4.61 2.80 8.79 6.94 2.07 1.92
79
Area BLAKE BMW CubeHash ECHO Fugue Groestl Hamsi JH Keccak Luffa Shabal SHAvite-3 SIMD Skein 1660 4400 590 5445 956 1722 946 1108 1229 1154 153 1130 9288 1632
Thr 2676 5577 2960 13875 3151 10276 2646 3955 10807 8008 2051 2887 2326 3535
Thr/Area 1.61 1.27 5.02 2.55 3.30 5.97 2.80 3.57 8.79 6.94 13.41 2.55 0.25 2.17
Source Kobayashi et al. GMU Kobayashi et al. GMU GMU Gauvaram et al. GMU GMU GMU GMU Detrey et al. GMU GMU Tillich et al.
80
Throughput vs. Area: Best reported results Virtex 5, 256-bit variants of algorithms
best
Your Project
worst 82
81
Basic
r times
r/2 times
r times
83
2r times
2r times
84
14
Your Project
14 SHA-3 candidates left in the contest
x1
x1
Keccak
x2
Given:
specification of the function reference implementation in C interface testbench and test vectors GMU implementation of the basic version including
block diagrams ASM charts short description formulas for execution time & throughput source codes results for Xilinx and Altera FPGAs
Normalized Throughput
x2 x4 fv2
x1
5 4 3 2
Luffa
x1
^2
fv3
x2
Groestl
fv4
CubeHash Groestl
CubeHash
2 3
Luffa Keccak
1 0 0 1 4 5
Normalized Area
85
Your Project
Develop:
Block diagram ASM chart Formulas for execution time & throughput Synthesizable code in VHDL Results for multiple families of FPGAs from Xilinx and Altera for at least one architecture from each of the following three classes of architectures: Unrolled architecture Folded architecture Architecture based on the use of embedded FPGA resources (BRAMs, multipliers, DSP units, etc.) [256 bit only, 512-bit only, or both]
What is an FPGA?
Configurable Logic Blocks I/O Blocks Block RAMs & Embedded Multipliers
88
( 1536, ( 768,
0, 2,
0) 4)
Basic design
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
( 3010, ( 1505,
0, 32 kbit,
0) 4)
90
Your design
89
15
Block RAM
Spartan-3 Dual-Port Block RAM
Port B
Builds both single and true dual-port RAMs Synchronous write and read (different from distributed RAM)
91
Port A
ENA RSTA CLKA ADDRA[9:0] DIA[17:0] WEB ENB
Block RAM
8k x 2
4,095
4k x 4
16k x 1
8,191 0
8+1
2k x (8+1)
2047 16+2 0 1023 16,383
92
1024 x (16+2)
DOA[17:0]
DOB[17:0]
Multiplier-Accumulator - MAC
Xilinx XtremeDSP
Starting with Virtex 4 family, Xilinx introduced DSP48 block for high-speed DSP on FPGAs Essentially a multiply-accumulate core with many other features Now also Spartan-3A and Virtex 5 have DSP blocks
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com) 95
96
16
97
98
17
Course Objectives
At the end of this course you should be able to:
Code in VHDL for synthesis Decompose a digital system into a controller (FSM) and datapath, and code accordingly Write VHDL testbenches Synthesize and implement digital systems on FPGAs Effectively code digital systems for cryptography, signal processing, and microprocessor applications
This knowledge will come about through homework, exams, and an extensive project
The project in particular will help you know VHDL and the FPGA design flow from beginning to end
103
104
Project Task 1
Read the following chapters from the GMU technical report published at http://eprint.iacr.org/2010/445
Chapter 1 Introduction & Motivation Chapter 2 Methodology Chapter 3 Comprehensive Designs of SHA-3 Candidates 3.1, 3.2 + subsection concerning your algorithm Chapter 4 Design Summary and Results
Download and get familiar with the package of a hash function assigned to you
http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/submissions_rnd2.html
18