You are on page 1of 6

IEEE - 45670

Modified MD5 algorithm for low end IoT Edge


devices
Viraj Khatri Ms. Vanita Agarwal
Electronics and Telecommunication Engineering Electronics and Telecommunication Engineering
College of Engineering College of Engineering
Shivajinagar, Pune 411005 Shivajinagar, Pune 411005
Email: khatrivt15.extc@coep.ac.in Email: vsa.extc@coep.ac.in

Abstract—Advent of the Information Age and Internet of of both, only cryptographic hash functions are used to ensure
Things has lead to an unprecedented increase in flow of data. This integrity of data. For this two algorithms are prevalent in
increase in data-flow has been matched with increase in computa- usage SHA and MD5 [1]. Several generations of the Secure
tional power. However, low-end devices, usually implemented as
a cost-saving alternative, have not been able to keep up with the Hashing Algorithms have been developed and are widely
increasing security and error detection & correction demands used. Internally SHA-3 is quite different from SHA-1 and
of MD5 or SHA1/2/3 algorithms. We present the performance SHA-2, which makes it more suited to be implemented on
characteristics of modified versions of MD5 message-digest algo- hardware, although it takes more computation time in software
rithm which have been altered using two different approaches, [2]. However due to the recent publication of SHA-3 (2015),
reducing lateral register widths or reducing computation cycles.
The metrics for comparison are collisions, synthesis reports and there hasn’t been wide-spread development in this algorithm.
time required for message-digest calculation. Thus, SHA-2 has been widely implemented as a software
Index Terms—Internet of Things, IoT Edge Devices Security, data verification tool, due to it’s software based improved
Low End IoT Edge Devices, Modified MD5 Algorithm, Modified performance, which leaves IoT hardware applications to be
Message-Digest. handled solely by MD5 algorithm.
I. I NTRODUCTION To further justify the choice of MD5 algorithm, we take
into account the low memory and power IoT edge devices, our
For wireless communication between devices, cryptographic use case. This leads to cryptographic hash algorithms being
algorithms are widely used to monitor or assist communica- disregarded due to the target implementation of data integrity
tion. This is done by the algorithms playing a major role in verification. Furthermore, for the constant bit-rate data streams
verification of integrity of data, which can get corrupted on any associated with the IoT edge device, a fixed length output hash
of the many stages of transmissions, owing to internal fault or algorithm is appropriate. The MD5 algorithm satisfies both of
external interference. For Internet of Things (IoT) applications, these conditions, as it has been cryptographically broken [3]
where data is streamed continuously from various sources, the and should not be considered a cryptographic hash algorithm.
data is monitored real-time. This is accomplished by utilizing
Application Specific Integrated Circuits (ASIC) or Graphics The MD5 message-digest algorithm was primarily made
Processing Units (GPU). All these hardware options are used to differentiate between two messages using their respec-
to generate a hash for a given message signal, while the hash tive message-digest. This was achieved by making the same
itself assists with data analysis and decision making during message-digest output improbable to be produced by two
erroneous data transmission and reception. slightly differing messages [4]. This property has been used
There are several algorithms that have been made for this as a error-detecting and security mechanism in various forms
specific purpose, like Secure Hashing Algorithm (SHA) and of communications. It has 4 stages with 64 iterations total
Mesage-Digest 4 (MD4). These were designed with 32-bit at it’s core [4], and then further trivial functions to achieve
processors in focus, as they require 32-bit variables and the properties the message-digest possesses. By modifying the
32-bit logical operations. However, as greater progress was MD5 architecture, we aim to make it more feasible to be
made in computing these algorithms increased in complexity, implemented as hardware in low power and memory devices,
like SHA-256 and MD5, which ensure better integrity in like IoT edge devices, to add the functionality of data integrity
transmission. verification.
Depending on the particular IoT application, the tolerance of We will first discuss the Message-Digest cryptographic
erroneous data changes, resulting in requirement of verification algorithms, Their variations and working, followed by how this
of data integrity dependent on application. When a non- algorithm is implemented in verilog, subsequently showing
tolerant IoT application has various sources of data, sources how this architecture can be scaled down. Ultimately, perfor-
who don’t have enough resources to commit to Encryption mance comparison will be done on the metrics of collisions,
along with integrity verification, a trade-off is made. Instead synthesis reports and time required.

10th ICCCNT 2019


July 6-8, 2019, IIT - Kanpur
Kanpur,
Authorized licensed use limited to: University of Exeter. Downloaded India
on May 07,2020 at 07:51:50 UTC from IEEE Xplore. Restrictions apply.
IEEE - 45670

a multiple of 448. After it is padded with another 64-bit value


representing the length of the original message in, modulo 264 .
The main operation of MD5 happens on a 128-bit block,
whose values are affected by the 512-bit message (448-bit
message+64-bit representation). The blocks in shown in Fig.1,
namely A, B, C & D, are each 32-bit in size, with predeter-
mined pseudo-random starting values, while the functions F,
G, H & I are given in equations Eq.1, Eq.2, Eq.3 & Eq.4
respectively.
F (B, C, D) = (B ∧ C) ∨ (¬B ∧ D) 1 ≤ i ≤ 16 (1)
Fig. 1. Data Flow Graph of one of the 64 iterations of the MD5 algorithm.

II. R ELATED W ORKS G(B, C, D) = (B ∧ D) ∨ (C ∧ ¬D) 17 ≤ i ≤ 32 (2)


Overview and survey articles for hardware designs of se-
curity in IoT Edge devices have been published [5] which H(B, C, D) = B ⊕ C ⊕ D 33 ≤ i ≤ 48 (3)
outline the authentication and encryption using cryptography
in Ultra Low Power processors. They also discuss the possible
problems that might be encountered while designing such a I(B, C, D) = C ⊕ (B ∨ ¬D) 49 ≤ i ≤ 64 (4)
system, however, for our implementation of Data integrity After computing the above functions, F for first 16 itera-
verification, only the pertinent principles is taken, a lot of tions, followed by G, H, I respectively, the result is added
information is impertinent as it is specific to processors. to A. The functions M, k & S from Fig.1 are dependent on
There have been forays into measuring the performance and an iterating variable i, which is calculated uniquely for each
energy consumption of AES, and optimizing AES for IoT of the 16 iteration of the 4 functions. While original 512-bit
Edge devices, namely in Wireless Sensor Networks. One of message is broken in to 16 32-bit chunks, each stored in a
the approaches to optimize AES taken in this publication [6] M function, the function k is made of predetermined pseudo-
is in the same vein as the approach we take, that is Reducing random values of 32-bit size. After these M and k are added
data types size. This is however done in with the intention to A, it is shifted according to the value provided by S, which
of optimizing compiler processing by eliminating inefficient ranges from 5 to 23. After this the 4 blocks are rotated one and
storage due to fixed platform bit size. Other optimization, like i is incremented, and this process repeats 63 more times. After
Specialization of code, is not applicable to us as it chooses these 64 iterations, the values in blocks A, B, C & D together
one between two modes of AES operation (128 or 256-bit). represent the hash output. For k & S the iteration variable is
These publications have covered optimizing AES and simi- not changed, that is it still is given by i. That is, whatever the
lar encryption standards for low power processors, which has value of i it is same value is to be used as iteration variable
been well documented and pursued. However, cost concerns of for both kLUT & sLUT, while for M the iterating variable is
any deployment can remove the luxury of a processor which taken not as i directly, it is taken from iLUT which can be
leaves only the option of custom hardware based solutions that calculated using Eq.5.
are power and cost efficient to solve this vein of problems. The

core motivation behind this paper is to develop such a system ⎪
⎪ i 1 ≤ i ≤ 16
and comment on it’s reliability. ⎪
⎨(5 × i + 1) mod 16 17 ≤ i ≤ 32
iLU T = (5)
III. M ESSAGE -D IGEST A LGORITHMS ⎪(3 × i + 5) mod 16 33 ≤ i ≤ 48



A. MD4 (7 × i) mod 16 49 ≤ i ≤ 64

This algorithm was developed by Ronald Rivest in 1990, Thus, the Look-up Table for i has rearranged values of i,
although it was published in 1992 [9]. The main contribution which are fed to M, to choose a input message chunk, while
of the MD4 algorithm in this context is to be a significant normal iterating values of i are used in Look-up Tables for k &
influence on later designs like SHA-1 and MD5. By this virtue S to find out the predetermined pseudo-random additions and
the MD4 algorithm is not discussed further here. the shift values to be used in the four MD5 blocks, namely A,
B, C & D.
B. MD5
IV. V ERILOG I MPLEMENTATION
Designed as a replacement for MD4 algorithm by Ronald
Rivest [4], producing a 128-bit hash from a 512-bit message. A. Original
A longer message than 448-bits is broken into different pieces The unedited verilog model [10] consists of multiple tiers
of 448-bit length. If the message length is not a multiple of of blocks all of which contribute individually to the func-
448, then it is padded with 1 followed by zeros until it reaches tioning of the MD5 algorithm. These blocks are md5 ctl,

10th ICCCNT 2019


July 6-8, 2019, IIT - Kanpur
Kanpur,
Authorized licensed use limited to: University of Exeter. Downloaded India
on May 07,2020 at 07:51:50 UTC from IEEE Xplore. Restrictions apply.
IEEE - 45670

Fig. 2. Block Diagram for Verilog implementation of MD5.

md5 fsm, md5 core & sram, whose hierarchy is shown in


Fig.2, and each handles their respective tasks. The block
md5 core houses the 4 functions (F, G, H & I) and handles the
computation of the values in these blocks during a computation
iteration, though it cannot decide which to use on it’s own,
that signal needs to be fed from outside this block. The next
block, md5 fsm, is responsible for reordering A, B, C & D
as shown in Fig.1 for all 64 iterations, while also feeding
the correct values for these 4 blocks to the block md5 core.
The block sram’s sole purpose is to store the input message
in memory to be later given in required order to the block
md5 fsm. This order is decided by the block md5 ctl, as this
maintains a Look-Up Table (LUT) to find the value of iteration
variable i (iLUT). Along with iLUT, the block md5 ctl also is Fig. 3. All possible combinations of hardware reductions
responsible for LUT for finding the shift value i.e. S (sLUT),
and LUT for finding the random addition value i.e. k (kLUT).
Further, the block md5 ctl also feeds the message to the block computation iterations from 64 to 16. These values, 512-bits
sram to store and transfers output of the block sram to the to 128-bits & 64 iterations to 16 iterations, are determined
block md5 fsm. by a scaling of factor 4, which has been chosen as the test
In each iteration, the block md5 ctl feeds values for i, k, S scaling factor to try and provide the best trade-off between
& M from iLUT, kLUT, sLUT & sram respectively. The value Computational complexity & Integrity verification of data.
of i helps the block md5 fsm decide the ordering of A, B, C Fig.3 shows the 4 possible combinations that are imple-
& D, and thus the values of A, B, C, D, M, k & S are fed mented to be compared against each other on the metrics
to the block md5 core. Now, the block md5 core applies the collisions, synthesis reports and time required for hash calcula-
required function from the 4 to the fed inputs. These output tion. The values ×64 or ×16 present below the MD5 block in
values are put back into A, B, C & D registers, thus completing Fig.3 signifies the number of iterations for each of the different
one iteration. combinations of possible hardware reductions.
After 64 of such iterations, the output is available to us as In the original verilog implementation, all registers and
(A,B,C,D). In this implementation, the 512-bit message has to values (iLUT, kLUT & sLUT) are defined for a block size of
be held at input for 16 iterations, so that the sram can store 32-bit each (128-bit total) [10]. To scale down by a factor of 4,
all 16 chunks, and then the hash computation is performed. the blocks (A,B,C,D) will have a lateral width of 8-bits. Thus
However, in the first 16 iterations of i are in ascending order, the message must broken into 16 × 8-bit chunks, leading the
which means the block md5 core can take through inputs input message size to be changed to 128-bit. Simultaneously,
along with sram, thus instead of taking input in 16 iterations these 4 blocks together also form a output hash of 32-bits
and then another 64 iterations for calculating hash, the first (4 × 8-bit blocks). The values of iLUT are dependent on the
16 iterations are also utilized by giving a through input from number of iterations, while the values in kLUT & sLUT are
the message to the block md5 core and sram for the first 16 dependent on the block size. Thus while scaling down the
iterations, later taking message from sram only. Using this the architecture size by a factor of 4, only the latter two are
number of iterations can be reduced from 80 (16 catching + relevant, while scaling down the number of iterations makes
64 computation iterations) to 64, which will save shave some iLUT relevant.
time off the time required from hash calculation.
B. Edited V. S CALING DOWN
Two techniques can be used to reduce the hardware re-
quirements of MD5 algorithm when it is described as shown The two methodologies to scale down the original MD5
in Fig.2. These two techniques are, Reducing the message hash algorithms, namely by Architecture and Iterations, are
size from 512-bits to 128-bits or 64-bits, or Reducing the elaborated here.

10th ICCCNT 2019


July 6-8, 2019, IIT - Kanpur
Kanpur,
Authorized licensed use limited to: University of Exeter. Downloaded India
on May 07,2020 at 07:51:50 UTC from IEEE Xplore. Restrictions apply.
IEEE - 45670

A. Architecture ues to relationships and impact of original shift values (S32 [i]).
The relevant kLUT and sLUT are discussed changed ac- Thus, negative impact to the message-digest is expected here
cordingly. The values for kLUT are calculated by, (32-bit block to 4-bit block) more than 32-bit block to 8-bit
block.
K[i] = 232 × | sin(i + 1)| (6)
B. Iteration
In Eq.6, all i are normal iterating constants, taken as radians
for sin, thus giving predetermined pseudo-random values. The The relevant look-up table, iLUT, is discussed here, along
K[i] here is a 32-bit variable, changing which into 8-bit with feeding and computation characteristics, as they are also
during definitions transforms all random addition values to 8- affected by iteration scale down.
bit without problems. Further, for sLUT, the shift values are in In 64 iterations cycle (×64), the four functions , F, G, H
range 5 to 23, which can’t be mapped straight to the range of 1 & I, each get 16 iterations, into which 16 message chunks,
to 8 without compromising the extent of properties possessed are fed in accordance to order outlined in iLUT. In iLUT, the
by the hash algorithm as a whole. Initially following equation values are given according to Eq.5. For scaling these values
maps the 5 to 23 range values (for 32-bit blocks) to 0 to 8 down to 16 iterations (×16), we can not simply use I×16 [i] =
shift values. I×64 [i] mod 4 as the number of values present in iLUT×64 is
S8 [i] = S32 [i] mod 8 (7) 64, while for iLUT×16 is 16. Not only that, as both have to
chose from 16 message chunks, they have to have all values
In Eq.7, S8 [i] signifies all sLUT values for 8-bit blocks, ranging from 1 to 16. iLUT×16 must have all values from 1
while S32 [i] signify all shift values for 32-bit blocks. Due to to 16, only in a different order than ascending. This order can
non-ideal mapping in Eq.7, some shift values come out as 0 & be similar to the one described for iLUT×64 , where instead of
8 which both mean no shift at all, and some shift values come 4 groups 16 values each, we take 2 groups, one of 4 values,
out to be same for previous & next cycle, which disqualifies and the other consisting of 12 values. This can be achieved in
this method from being applied. To devise the shift values the following manner.
for sLUT the original shift values are studied, to find their 
relation to next shift values, and their effect on the blocks. i 1≤i≤4
iLU T×16 = (9)
After understanding the relationships between shift values and (5 × i) mod 12 + 5 5 ≤ i ≤ 16
their impact, values for S8 [i] emulating similar relationships
and having similarr impact can be determined. In Eq.9, we can use any one of the 3 remaining equations
The original sLUT has 4 values which repeat 4 times for one from Eq.5. Any of these will give satisfactory randomness
of 4 functions (F, G, H & I). Thus each of the 4 functions has to the values, while exhausting all the values ranging from
a set of 4 values that repeat 4 times to complete 16 iterations. 5 to 12. Although this ×16 equation will not give the same
These 4 values (for each of the 4 functions) are assigned from message-digest characteristics as ×64, it is used for lack of
one quarter of the range 1 to 32, except the last value, which other options.
belongs to the same range as the third shift value. The first Moreover, as stated earlier for ×64, the first 16 iterations
shift value belongs to 1 to 8, the second belongs to 9 to 16 are fed straight to the block md5 core while simultaneously
, and the third & fourth belong to 17 to 23. There is another being stored in sram. This cannot be done as for ×16 the first
limit on the fourth shift value being greater than the third shift 4 iterations only have ascending order message chunks due
value. Using similar relationships, the sLUT values for 8-bit to normal iLUT values, and the remaining are out of order,
blocks are taken from 1 or 2 for the first value, 3 or 4 for which means these chunks must be first stored in sram and
the second, and 5 to 7 for the third and fourth. For the third then fed to the block md5 core. This results in ×16 needing
and fourth shift values, for ensuring that fourth value can be 16 iterations for storing, and then another 16 iterations to
greater than the third, while maintaining some change between compute the message-digest, unlike ×64, which requires 64
the 4 groups according to functions, an extra value from the iterations for both storage of message chunks (in the first 16)
fourth quarter of 1 to 8 range is also included. and computing message-digest.
Similar techniques are used for scaling down the 8-bit
block architecture to 4-bit block architecture. At 4-bit block VI. P ERFORMANCE
architecture, the message size becomes 64-bit (16×4-bit) size, The measurements taken for measurement of performance
while the hash becomes 16-bit size (4 × 4-bit). The values for of these different combinations of architecture need to be
kLUT can be found similar to Eq.6, by taking the register standardized. This is achieved by taken random values as
lateral width as 4-bit. However, while scaling down sLUT to input to the message-digest algorithm, and then comparing
shift values in range 1 to 4, scaling down from 32 is not the collision rate of the message-digests that are outputted
feasible, thus shift values are found by scaling down from 8. by the algorithm with different architectures. At the same
time an output signal of the message must be available for
S4 [i] = S8 [i] mod 4 (8)
comparison and distinction of different message-digests. Here
In Eq.8, Due a very small range (1 to 4), redundancies in either a through output from sram can be established, however
shifting are unavoidable, even after conforming these shift val- this is complicated to implement, thus the input message must

10th ICCCNT 2019


July 6-8, 2019, IIT - Kanpur
Kanpur,
Authorized licensed use limited to: University of Exeter. Downloaded India
on May 07,2020 at 07:51:50 UTC from IEEE Xplore. Restrictions apply.
IEEE - 45670

Fig. 4. 16-bit Fibonacci LFSR

persist till the message-digest is found, that is, it must be


registered. Fig. 6. Collision Distribution for 16-bit Hash

A. Linear Feedback Shift Register


The random inputs are nearly impossible to simulate as for the data set. Fig.5 shows the collisions of all recorded
hardware without significantly impacting performance, which message-digests, out of 1,048,575 total data sets for each
leads to usage of pseudo-random pattern generator here. of the 4 implementations. As can be seen in the data, 32-
Pseudo-random pattern generator is implemented here as a bit message digests have minimal collisions compared to 16-
Linear Feedback Shift Register (LFSR), whose size is equal bit message digests. In 32-bit architecture, ×64 has 1,164
to input message size, that is, it has two sizes, 128-bit & 64-bit. collisions, that is 2,328 message-digests are same for different
Here Fibonacci LFSRs, shown in Fig.4[11], are used instead message inputs. This is not comparable to the theoretical
of Galois LFSRs due to their simplicity in implementation, limit of 1 collision for 232 unique messages. However, in the
and also because in Galois LFSR the output bit needs to be context of error checking in a IoT edge device bit-stream,
examined individually [12]. To create a 128-bit LFSR, the only 0.222% message-digests collide, making the collision rate
feedback taps are at the 128th, 126th, 101st & 99th bits, while acceptable. Similarly, in ×16, there are 1,187 collisions which
for a 64-bit LFSR, the feedback taps are at 64th, 63rd, 61st arise due to 2,374 unique message-digests, which account for
& 60th bits [13]. only 0.2264% of total message-digests recorded. For 16-bit
The output from LFSR is fed to the block md5 ctl as a message-digest, not a single message-digest was found that
16 message chunk parts. This is best achieved by using a was not colliding with the message-digest from a different
Multiplexer and counter. After the message is fed, the LFSR input message. Due to the randomly chosen input messages,
must not be clocked to increment until the output message- it can be assumed that there are no non-colliding message-
digest and the original message are both saved in the log file digests in 16-bit hash algorithm.
for simulation. The pseudo-random patterns are achieved by Due to all message-digests having a collision with atleast
starting the LFSR with pre-loaded value, and then pushing the one other message-digest for the message-digest size of 16-bit,
LFSR to it’s next state. Due to the nature of LFSR to shift the relation between number of message-digests and number
values, the pseudo-random patterns are available. of occurrences is shown in Fig.6. From this graph it can be
B. Measurements seen that not all message-digests have the same distribution
for collisions, it varies with the number of occurrences. In
Due to processing limitations, only 5 million message-
Fig.6, x-axis shows the number of collisions, that is 100
digests are simulated, which is a skewed data set. To ensure
input messages having the same message-digest gives 100
fair testing, 1 million random message-hash pairs are chosen
collisions, while y-axis denotes the number of message-digests
participating in the given number of collisions. As can be seen,
for ×64, more message-digests are present at lower number
of collisions, while for ×16, higher number of collisions
also house some message-digests. This means, in ×64 the
probability of having small collision is more than that in ×16,
and therefore, for such a high collision record, all options to
reduce collisions must be taken. Also, the highest number of
collision in ×64 is 233, while for ×16 the highest number of
collisions is 339.
Fig.7 shows the important parameters from the synthesis
report from all 4 possible combinations of the MD5 algorithm
avenues for scaling down. From the graph, it can be seen that
32-bit architecture utilizes more Registers, LUTs & Flipflops
Fig. 5. Message-Digest Collisions than the 16-bit architecture, while using lesser Multiplexers.

10th ICCCNT 2019


July 6-8, 2019, IIT - Kanpur
Kanpur,
Authorized licensed use limited to: University of Exeter. Downloaded India
on May 07,2020 at 07:51:50 UTC from IEEE Xplore. Restrictions apply.
IEEE - 45670

VII. C ONCLUSION
This paper presents the different possibilities of a modified
MD5 algorithm aiming to be implemented on Low-End IoT
edge devices. We have also shown the relative performance
characteristics in terms of message-digest collisions, synthesis
of components and time taken for hashing for the different
scale-down architectures. Reducing the message-digest size
has severe impact on integrity verification property of hashing,
and must be avoided, meanwhile reducing the number of iter-
ations has minimal effect, while highly reducing architecture
and time costs, and should be implemented.
However, one thing left to be noticed is that all con-
clusions made are from simulation alone, and not FPGA
Fig. 7. Synthesis Report
board implementation, which can disrupt simulations with an
unforseen variable. Also, fuzzy hashing can, more efficiently
and effectively, help in identifying files that contain a high
This can be attributed to 32-bit architecture needing more percentage of similarities, whose hardware implementation is
storage, while the 16-bit architecture requiring more signals the future work involved in this publication.
controlling actions, owing to lesser storage, thus requiring R EFERENCES
more Multiplexers than 32-bit architecture. Similarly, ×64
[1] Rogaway, Phillip, and Thomas Shrimpton. ”Cryptographic hash-function
iterations require more Registers, LUTs and Flipflops than basics: Definitions, implications, and separations for preimage resistance,
×16, while ×16 iterations require more Multiplexers than ×64 second-preimage resistance, and collision resistance.” International work-
iterations. This can be attributed to ×64 requiring architecture shop on fast software encryption. Springer, Berlin, Heidelberg, 2004.
[2] Bertoni, Guido, et al. ”The keccak sha-3 submission.” Submission to
to manage through inputs of message to sram, while ×16 NIST (Round 3) 6.7 (2011): 16.
requiring no such architecture. [3] Wang, Xiaoyun, and Hongbo Yu. ”How to break MD5 and other hash
functions.” Annual international conference on the theory and applications
Fig.8 shows the average time taken to calculate the message- of cryptographic techniques. Springer, Berlin, Heidelberg, 2005.
digest for one input message when the data-set of 1,048,575 [4] Rivest, Ronald. ”The MD5 message-digest algorithm.” No. RFC 1321.
is taken. There is no effect of architecture size on time 1992.
[5] Yang, Kaiyuan, David Blaauw, and Dennis Sylvester. ”Hardware designs
taken, only the number of iterations. ×64 iterations take for security in Ultra-Low-Power IoT systems: an overview and survey.”
680ns (6.8 × 10−7 seconds), while ×16 iterations take 380ns IEEE Micro 37.6 (2017): 72-89.
(3.8 × 10−7 seconds). For ×64, one iteration takes 10ns, [6] Didla, Shammi, Aaron Ault, and Saurabh Bagchi. ”Optimizing AES for
embedded devices and wireless sensor networks.” Proceedings of the
taking computation cycle to 640ns, with 40ns between two 4th International Conference on Testbeds and research infrastructures for
instances of computation to record, iterate LFSR, and reset the the development of networks & communities. ICST (Institute for Com-
architecture. Similarly, for ×16, computation cycle is 160ns puter Sciences, Social-Informatics and Telecommunications Engineering),
2008.
long, while the data capture is just as long (160ns), but to [7] Rivest, Ronald. ”The MD4 message-digest algorithm.” No. RFC 1320.
ensure flawless data capture another 20ns are given before 1992.
computation cycle starts. The reset and LFSR iterate cycles [8] Sedov, Stanislav (2011). ”MD5 core in verilog.” GitHub repository,
https://github.com/stass/md5 core
take the same time (40ns), thus bringing the total average cycle [9] ”Linear-feedback shift register.” Wikipedia contributors. Wikipedia, The
time to 380ns. Free Encyclopedia. Wikipedia, The Free Encyclopedia, 19 Feb. 2019.
[10] Goresky, Mark, and Andrew M. Klapper. ”Fibonacci and Galois repre-
sentations of feedback-with-carry shift registers.” IEEE Transactions on
Information Theory 48.11 (2002): 2826-2836
[11] George, Maria, and Peter Alfke. ”Linear feedback shift registers in virtex
devices.” Xilinx apprication note XAPP210 (2007).

Fig. 8. Time taken per Message-Digest

10th ICCCNT 2019


July 6-8, 2019, IIT - Kanpur
Kanpur,
Authorized licensed use limited to: University of Exeter. Downloaded India
on May 07,2020 at 07:51:50 UTC from IEEE Xplore. Restrictions apply.

You might also like