You are on page 1of 39

Elements of Cryptography

Elements of Cryptography
• Hash function,
• Properties of a hash function,
• Puzzle friendly Hash,
• Collison resistant hash,
• digital signatures,
• public key crypto,
• verifiable random functions
Cryptography
• The goal of cryptography is to transform any data from its
original form, called the plaintext, into an obscure form
known as the ciphertext. This process is called encryption.
• The reverse process of recovering the plaintext from the
ciphertext is called decryption.
• It is essential to understand that the plaintext does not have
to be a textual message. It can be a computer file representing
binary or other types of data, an image, a database and so on
• The process of encrypting plaintext using a particular
algorithm depends on a code, commonly known as the secret
key. A secret key is nothing but a large number
• Use
• Message (Plain Text): This is Block chain class
• Key = 4
• A-> D, B-> E, C-> F
• ABCDEFGHIJKLMNOPQRSTUVWXYZ

• Cipher text : WKLV LV EORFN FKDLQ FODVV


Cryptography

Figure 1: A generic encryption and decryption


model
Services provided by cryptography
• Confidentiality is the assurance that information is only
available to authorized entities.
• Integrity is the assurance that information is modifiable only
by authorized entities.
• Authentication provides assurance about the identity of an
entity or the validity of a message.
• Non-repudiation: It is the assurance that an entity cannot
deny a previous commitment or action by providing
incontrovertible cryptographic evidence.
• Accountability is the assurance that actions affecting
security can be traced back to the responsible party.
Cryptography
• Cryptography is primarily used to provide a
confidentiality service.
• Securing a blockchain ecosystem requires
many different cryptographic primitives, such
as hash functions, symmetric key
cryptography, digital signatures, and public
key cryptography
Taxonomy of cryptographic primitives

Figure 2: Cryptographic primitives


Hash functions
• Hash functions are used to create fixed-length digests of
arbitrarily long input strings.
• Hash functions are keyless, and they provide a data
integrity service.
• They are usually built using iterated and dedicated hash
function construction technique.
• Various families of hash functions are available, such as
MD, SHA1, SHA-2, SHA-3, RIPEMD, and Whirlpool.
• Hash functions are commonly used for digital signatures
and message authentication codes (MACs), such as
HMACs.
Hash functions
• Hash functions are also typically used to
provide data integrity services.
• These can be used both as one-way functions
and to construct other cryptographic
primitives, such as MACs and digital
signatures.
• Some applications use hash functions as a
means of generating Pseudorandom number
generators (PRNGs)
Properties of Hash functions
• There are two practical properties of hash functions that
must be met depending on the level of integrity required:
• Compression of arbitrary messages into fixed-length
digests:
– This property relates to the fact that a hash function must be
able to take an input text of any length and output a fixed-
length compressed message.
– Hash functions produce a compressed output in various bit
sizes, usually between 128-bit and 512-bit.
• Easy to compute:
– Hash functions are efficient and fast one-way functions.
– It is required that hash functions be very quick to compute
regardless of the message size.
– The efficiency may decrease if the message is too big, but the
Hash Function: Three security properties
• Three security properties depending on the level of
integrity:
• Pre-image resistance: (Difficult to find message x)
• This property states that if given a value y, it is
computationally infeasible (almost impossible) to find a
value x such that h(x)= y.
• Here, h is the hash function, x is the input, and y is the hash.
• The first security property requires that y cannot be reverse
computed to x. x is considered a pre-image of y, hence the
name pre-image resistance. This is also called a one-way
property.
Hash Function: Three security properties

• Second pre-image resistance:


• The second pre-image resistance property states that
given x it is computationally infeasible to find another
value x’ such that x’≠ x and h(x’) = h(x).
• This property is also known as weak collision resistance.

• Difficult to produce another message x’ with the given


message (x). i.e. Difficult to find two different message
with s
• ame hash value.
Hash Function: Three security properties
Collision resistance:
• The collision resistance property states that it is
computationally infeasible to find two distinct values x’ and x
such that h(x’) = h(x).
• In other words, two different input messages should not hash
to the same output.
• This property is also known as strong collision resistance.

• Difficult to find any two messages that hash to the same value
i.e difficult to find x and x’ with same hash value.
• Weak and Strong Collisions are not same
• Weak Collision resistance is bound to particular resistance
Hash Function: Three security properties

Figure 4: Three security properties of hash functions


Message digest functions
• Message digest (MD) functions were prevalent
in the early 1990s.
• MD4 and MD5 fall into this category.
• Both MD functions were found to be insecure
due to message collisions found and are not
recommended for use anymore.
• MD5 is a 128-bit hash function that was
commonly used for file integrity checks
Secure Hash Algorithms (SHA)
• The most common secure hash algorithms (SHAs):

• SHA-0: This is a 160-bit function introduced by the U.S.


National Institute of Standards and Technology (NIST) in
1993.

• SHA-1: SHA-1 was introduced in 1995 by NIST as a


replacement for SHA-0.
• This is also a 160-bit hash function.
• SHA-1 is used commonly in SSL and TLS implementations.
• It should be noted that SHA-1 is now considered insecure,
and it is being deprecated by certificate authorities.
• Its usage is discouraged in any new implementations.
Secure Hash Algorithms (SHA)
• SHA-2:
– This category includes four functions defined by the
number of bits of the hash: SHA224, SHA-256, SHA-
384, and SHA-512

• SHA-3:
– This is the latest family of SHA functions. SHA3-224,
SHA3-256, SHA3-384, and SHA3- 512 are members
of this family.
– SHA-3 is a NIST-standardized version of Keccak.
Secure Hash Algorithms (SHA)
• RIPEMD:
– RIPEMD is the acronym for RACE Integrity Primitives Evaluation
Message Digest.
– It is based on the design ideas used to build MD4.
– There are multiple versions of RIPEMD, including 128-bit, 160-bit,
256-bit, and 320-bit.

• Whirlpool:
– This is based on a modified version of the Rijndael cipher known as
W.
– It uses the Miyaguchi-Preneel compression function, which is a type
of one-way function used for the compression of two fixed-length
inputs into a single fixed-length output.
– It is a single-block length compression function.
Applications of cryptographic hash
functions
• Cryptographic parameters are used in blockchains
to provide various protocol-specific service
• For example, hash functions are used to build
Merkle trees, which are used to efficiently and
securely verify large amounts of data in
distributed systems.
• Some other applications of hash functions in
blockchains are to provide several security
services
Applications of cryptographic hash
functions
1. Hash functions are used in cryptographic puzzles such
as the proof of work (PoW) mechanism in Bitcoin.
– Bitcoin’s PoW makes use of the SHA-256 cryptographic
hash function.

2. The generation of addresses in blockchains.


– For example, in Ethereum, blockchain accounts are
represented as addresses.
– These addresses are obtained by hashing the public key with
the Keccak-256 hash algorithm and then using the last 20
bytes of this hashed value
Applications of cryptographic hash
functions
3. Message digests in digital signatures.
4. The creation of Merkle trees to guarantee the
integrity of transaction structure in the
blockchain.
– Specifically, this structure is used to quickly verify
whether a transaction is included in a block or not.
– However, note that Merkle trees on their own are
not a new idea; it has just been made more popular
with the advent of blockchain technology
Hadoop Distributed File System (HDFS)
Hadoop Distributed File System (HDFS)

• The Apache™ Hadoop® project develops open-


source software for reliable, scalable, distributed
computing.
• The Apache Hadoop software library is a
framework that allows for the distributed
processing of large data sets across clusters of
computers using simple programming models.
• It is designed to scale up from single servers to
thousands of machines, each offering local
computation and storage.
Hadoop Distributed File System (HDFS)
• HDFS, a distributed file system for parallel big data
processing, uses the master-slave architecture to schedule
the tasks and manage the data transfer among different
computing nodes.
• In HDFS, the master node is called NameNode and the
slave node is called DataNode.
• Multiple NameNodes can be used to maintain high
availability using an active-passive relationship.
• NameNodes are responsible for performing block
operations, and they store metadata related to the file
system, such as where each chunk of a file is located.
Hadoop Distributed File System (HDFS)

• HDFS has been widely used by distributed data


processing frameworks such as Map Reduce to
perform efficient big data analytics jobs.
• A MapReduce job consists of Map tasks that perform
the same map operation on different data chunks in
parallel and re
• duce tasks that use shuffling and reducing functions
to generate desired outputs. Different MapReduce
jobs can share the same cluster of machines and are
coordinated by a job tracker
Components of HDFS
There are two types of machines in a HDFS cluster
• NameNode :– is the heart of an HDFS filesystem,
it maintains and manages the file system metadata.
– E.g; The NameNode knows which DataNode contains
which blocks and where the DataNodes reside within
the machine cluster.
– The NameNode also manages access to the files,
including reads, writes, creates, deletes and the data
block replication across the DataNodes.
• DataNode :- where HDFS stores the actual data,
there are usually quite a few of these.
HDFS Architecture

Introduction to HDFS 28
Unique features of HDFS
HDFS also has a bunch of unique features that make it ideal for distributed
systems:

• Failure tolerant - data is duplicated across multiple DataNodes to


protect against machine failures. The default is a replication factor of 3
(every block is stored on three machines).
• Scalability - data transfers happen directly with the DataNodes so your
read/write capacity scales fairly well with the number of DataNodes
• Space - need more disk space? Just add more DataNodes and re-
balance
• Industry standard - Other distributed applications are built on top of
HDFS (HBase, Map-Reduce)

HDFS is designed to process large data sets with write-once-read-many


semantics, it is not for low latency access
HDFS – Data Organization
• Each file written into HDFS is split into data
blocks
• Each block is stored on one or more nodes
• Each copy of the block is called replica
• Block placement policy
– First replica is placed on the local node
– Second replica is placed in a different rack
– Third replica is placed in the same rack as the second
replica
Read Operation in HDFS
Write Operation in HDFS
Hadoop Distributed File System (HDFS)

• Need for HDFS


• Need to process huge datasets on large clusters of
computers
• Very expensive to build reliability into each application
• Nodes fail every day
– Failure is expected, rather than exceptional
– The number of nodes in a cluster is not constant
• Need a common infrastructure
– Efficient, reliable, easy to use
– Open Source, Apache Licence
Blockchain with HDFS
• Blockchain with HDFS is created to make HDFS
more secure by logging the relevant metadata into
the blockchain.
• Accessing HDFS by API is a normal practice, but it
can be hacked in many ways, such as Reverse
Engineering, Man-in-the-middle attack, User
Spoofing, and Session Replays.
• One concrete attack is DemonBot which gives
access to Remote Code Execution (RCE) in HDFS.
Blockchain with HDFS
• Let us assume that a hacker has gained access to an
HDFS cluster and modified the files in it.
• In order to investigate what has happened to the files
in HDFS, currently, the administrator can only get
file metadata such as access time and modification
time in terms of the most recent activity or the last
change to the file, but not in terms of a trusted,
complete log
• This would not be enough or feasible to monitor in
case of rapid file changes in a span of time.
Blockchain with HDFS
• Furthermore, such metadata is also located in the HDFS,
which may have already been compromised, thus
making it completely unreliable.
• To overcome these problems, Viraaji Mothukuri et al.
provides solution to offer a transparent, reliable, and
low-cost approach to capturing the historical information
of the HDFS files and recording them in the blockchain.
• As the data on the blockchain is immutable, these
changes are recorded permanently in a non-repudiation
manner, providing an excellent chain of custody.
Distributed hash tables
• A hash table is a data structure that is used to map
keys to values.
• Internally, a hash function is used to calculate an
index into an array of buckets from which the required
value can be found.
• Buckets have records stored in them using a hash key
and are organized into a particular order.
• Distributed hash table (DHT) as a data structure where
data is spread across various nodes, and nodes are
equivalent to buckets in a P2P network.
Distributed hash tables

Figure 3: Distributed hash table


Distributed hash tables
• In Figure 3, data is passed through a hash function,
which then generates a compact key. This key is
then linked with the data (values) on the P2P
network.
• When users on the network request the data (via the
filename), the filename can be hashed again to
produce the same key, and any node on the network
can then be requested to find the corresponding data.
• A DHT provides decentralization, fault tolerance,
and scalability.

You might also like