You are on page 1of 37

Cryptographic Hash

Functions
Security properties:
• Property 1: Deterministic
• No matter how many times you parse through a particular input through a hash function you will
always get the same result.
• Property 2: Quick Computation
• Hash function should be capable of returning the hash of an input quickly.
• Property 3: Pre-Image Resistance
• Given H(x), it is infeasible to determine x, where x is the input and H(x) is the output hash.
• Property 4: Small Changes In The Input Changes the Hash.
• Even if you make a small change in your input, the changes that will be reflected in the hash will be
huge.(Avalanche effect)
• Property 5: Collision Resistant
• It is infeasible to find x and y such that x != y and H(x)=H(y)

Property 6: Puzzle Friendly


For every output “Y”, if k is chosen from a distribution with high min-entropy , it is infeasible to find an
input x such that H(k|x) = Y.
Property1: The same input will always produce the same
hash (of same size 256 bits) and Slightly different inputs will
produce very different hashes (of same size 256 bits).
Property3:Pre-image resistance
• Means that it should be computationally hard to reverse a hash
function.
• if a hash function h produced a hash value z, then it should be a
difficult process to find any input value x that hashes to z.
• This property protects against an attacker who only has a hash value and
is trying to find the input.
Property 4:
Property 5: Collision-free
Nobody can find x and y such that
x != y and H(x)=H(y)

H(x) = H(y)

y
Collisions do exist ...

possible outputs
possible inputs

… but can anyone find them?


Collision-Resistance
• Means it should be hard to find two different inputs of any length that result in the
same hash. This property is also referred to as collision free hash function.
• for a hash function h, it is hard to find any two different inputs x and y such that h(x) = h(y).
• Since, hash function is compressing function with fixed hash length, it is impossible
for a hash function not to have collisions. This property of collision free only
confirms that these collisions should be hard to find.
• This property makes it very difficult for an attacker to find two input values with
the same hash.
• How to find a collision in a Secure Hash function with a 256 bit output?
• Strategy 1: Brute-force – Continue to randomly pick inputs and compute its Hash until you
find a collision.
• How long does this take?
• Worst-case - 2256 + 1 inputs
• On average – more than 50% chance of finding collision after 2128 inputs (Birthday paradox)
• More than 99.8% chance of collision after 2130 randomly chosen inputs
• Brute-force always works, no matter what H is, in finding collision. However it takes too long to
matter (2128 is a lot of tries!)
• Strategy 2: Find cryptographic or other weaknesses in hash functions
• Is the following function cryptographically secure H(x) = x mod 23 ?
• Most cryptographically secure has functions also have weaknesses. E.g., MD5 was considered to
be secure, until after many years of research collisions were found. SHA 256 (currently used secure
hash function) has no known attacks, but we don’t know it is secure!
No Hash function has proven to be collision-free or secure!
• How to find a collision?
• Usually, collision happens after sqrt(N), where N is total number of possible
ways
• For ex: For 256 bits output, N=2256
• try 2130 randomly chosen inputs
• 99.8% chance that two of them will collide
• This works no matter what H is … but it takes too long to matter
• How big is 2256?
• 2256 is about 1077
• 60 million hashes per second, and the expected number of tries needed to
find a solution is 2256. The result is 2256 / (60 × 106) s ≈ 1068 s ≈ 3 × 1061 years
• Even if we had 1 trillion computers and ran them concurrently, it would take
about 3 × 1049 years
What is the Birthday Paradox?
• Assuming all days of the year have the same likelihood of having a birthday, the
chances of another person sharing your birthday is 1/365 which is a 0.27%.
• So, if you gather up 20-30 people in one room, the odds of two people sharing
the exact same birthday rises up astronomically.
• In fact, there is a 50-50 chance for 2 people of sharing the same birthday in this
scenario!
• Simple rule in probability:
• Suppose you have N different possibilities of an event happening, then you need square
root of N random items for them to have a 50% chance of a collision.
• So applying this theory for birthdays, you have 365 different possibilities of birthdays, so
you just need Sqrt(365), which is ~23, randomly chosen people for 50% chance of two
people sharing birthdays.
Property 6: Puzzle-friendliness or Second
preimage resistant
• Puzzle-friendliness property: For every possible output
value y, if k is chosen from a distribution with high min-
entropy, then it is infeasible to find x such that H(k|x) = y.
Pictorial representations of properties of Hash
Function
Secure Hash Algorithm
Secure Hash Algorithm
• SHA originally designed by NIST in 1993
• Revised in 1995 as SHA-1
• versions of SHA
• SHA-224, SHA-256, SHA-384, SHA-512
SHA-256 hash function
Padding (10* | length)
512 bits
Message Message Message
(block 1) (block 2) (block n)

256 bits 256 bits

c c c
IV Hash
SHA-256 Operation
• Takes the message you're hashing, breaks it up into blocks that are
512 bits in size, pad the blocks if it is not a multiple of 512 (i.e. a 1
followed by a certain number of 0)
• start with the 256-bit value called the IV, specified in the standards
document and the first block. This 768-bits string goes through a
special function cc(compression function) that outputs a 256-bits
string
• Then the compression function (Merkle‐Damgard transform) is
applied to the concatenation of the first output and the second block
• the process is repeated until the end of the blocks, the hash is the
final 256-bits output
One Compression function in SHA-256
• One compression function in SHA-
256 comprises
• a 256-bit block cipher with 64
rounds,
 
Secure Hash Algorithm in Bitcoin
• SHA-256 is used in several different parts of the Bitcoin network:
• Mining uses SHA-256 as the proof-of-work algorithm.
• SHA-256 is used in the creation of bitcoin addresses to improve security and
privacy.
Pointers and Linked Lists
• Pointers
• Pointers are variables in programming which stores the address of another
variable.
• Linked Lists
• a sequence of blocks, each containing data which is linked to the next block via a
pointer variable which is pointing to address of the next node in it and hence the
connection is made
• The first block is called as “genesis block” 
Linked List
Hash Pointer
• hash pointer is:
• pointer to where some info as well as the (cryptographic) hash of the info are
stored.

• if we have a hash pointer, we can


• get the info back, and
• verify that it hasn’t changed
Hash Pointers

What can you do with a hash pointer?


• Retrieve or get back the info/data
• Verify that the info/data hasn’t changed
• What else?
Use hash pointers to build data
Blockchain
• Blockchain is linked list with hash pointers
• A series of blocks, each block has data as well as a hash pointer to the
previous block in the list
• Benefit: Value of the previous block and a digest of that value that allows
us to verify that the value hasn’t changed

• Achieves tamper-evident (immutable) property because of hash pointer


• The adversary changes the data of some block k . Since the data has been
changed, the hash in block k + 1, which is a hash of the entire block k , is
not going to match up due to collision-resistant property
linked list with hash pointers = “block chain”
Tamper-evident Log

 An attacker wants to tamper with one block of the chain, let’s say, block 1.
 The attacker changed the content of block 1, because of “collision free” property
of the hash function, he is not able to find another data which has the same hash
with the old one. So now the hash of this modified block is also changed.
 To avoid others noticing the inconsistency, he also needs to change the hash
pointer of that block in the next block, which is block 2.
 Now the content of block 2 is changed, so to make this story consistent, the hash
pointer in block3 must be changed.
 Finally, the attacker goes to the hash pointer to the last block of the blockchain,
Merkle Tree
• Binary tree with hash pointers = “Merkle tree”
• A Markle tree is a data structure used for efficiently verifying the integrity of
large sets of data.
• In a Merkle tree, data blocks are grouped in pairs and the hash of each of these
blocks is stored in a parent node.
• The parent nodes are in turn grouped in pairs and their hashes stored one level
up the tree.
• This continues all the way up the tree until we reach the root node.
• if an adversary tampers with some data block at the bottom of the tree, that
will cause the hash pointer that’s one level up to not match, and even if he
continues to tamper with this block, the change will eventually propagate to the
top of the tree where he won’t be able to tamper with the hash pointer that
we’ve stored.
binary tree with hash pointers = “Merkle tree”

H( ) H( )

H( ) H( ) H( ) H( )

H( ) H( ) H( ) H( ) H( ) H( ) H( ) H( )

(data) (data) (data) (data) (data) (data) (data) (data)


Features of Merkle Tree
• Tamper evident
Just like blockchain, we only need to remember the hash pointer in the
root (top-level node), then we can traverse down to any leaf data block
to check if a node is in the tree or has it been tampered with.
• Proof of membership
To verify a data block, we only need to traverse the path from the top
to the leaf where the data is. So the complexity is O(log n), which is
much more efficient compared with O(n) of a linked list blockchain.
• None-membership proof
If Merkel tree is sorted, we can prove a given data is not in the tree: if
the data before and after the given data are both in the tree and
they’re consecutive, so there’s no space between them, this proves
that the given data is not in three.

You might also like