You are on page 1of 4


Hadoop has been wildly successful
because it scales extraordinarily well, can be configured to
handle a wide variety of use cases, and is incredibly
inexpensive compared to older — mostly proprietary — data
warehouse alternatives. Which is all another way of saying
Hadoop is cheap, fast, and flexible. To see why and how it
scales, take a look at a Hadoop cluster architecture,
illustrated in the above diagram. This architecture promotes
scaling and performance. It supports parallel processing, with
additional nodes providing ‘horizontal’ scalability. This
architecture is also inherently multi-tenant, supporting
multiple client applications across one or more file groups.
There are several things to note here from a security
perspective. There are many moving parts — each node
communicates with its peers to ensure that data is properly
replicated, nodes are on-line and functional, storage is
optimized, and application requests are being processed.
Each line in the diagram is a trust relationship utilizing a
communication protocol. The Hadoop Framework To
appreciate Hadoop’s flexibility you need to understand that
clusters can be fully customized. For those of you new to
Hadoop it may help to think of the Hadoop framework as a
‘stack’, much like a LAMP stack, only on steroids. With a
jetpack and night-vision googles. The extent to which you can
mix and add components is incredible.

There are different types of security available in hadoop


• Authentication and authorization: Identity and
authentication are central to any security effort — without
them we cannot determine who should have access to data.
Fortunately the greatest gains in Hadoop security have been
in identity and access management..

•Administrative data access: Most organizations have

platform (i.e.: OS) administrators and Hadoop administrators,
both with access to the cluster’s files.

• Audit and logging: If you suspect someone has breached

your cluster, can you detect it, or trace back to the root
cause? You need an activity record, usually from an event
log. A variety of add-on logging capabilities are available,
both open source and commercial. Scribe and LogStash are
open source tools which integrate into most big data
environments, as do a number of commercial products.
• API security: Big data cluster APIs need to be protected
from code and command injection, buffer overflow attacks,
and all the other usual web service attacks. This responsibility
typically lands on the applications using the cluster, but not


• Data access & ownership: Role-based access is central to

most RDBMS and data warehouse security schemes, and
Hadoop is no different. Relational and quasi-relational
platforms include roles, groups, schemas, label security, and
various other facilities for limiting user access to subsets of
available data.
Data at rest protection: The standard for protecting data at
rest is encryption, which protects against attempts to access
data outside established application interfaces.
• Multi-tenancy: Hadoop is commonly used to serve multiple
applications and 'tenants', each of which may be from
different groups with one firm, or altogether different

Cluster Security: Unlike relational databases which function

like black boxes, Hadoop exposes its innards to the network.
Inter-node communication, replication and other cluster
functions take place between many machines, using different
types of services. For the most effective protection, building
security into cluster operations is critical. This approach
leverages security tools built into – or third-party products
integrated into – the NoSQL cluster. This security is systemic
and built to be part of the base cluster architecture.
Data Centric Security Big data systems typically share data
from dozens of sources. Firms do not always know where
their data is going, or what security controls are in place once
it is stored, they have taken steps to protect their data
regardless of where it is used. This model is called data-
centric security because the controls are part of the data, or
in some cases of the query-processing layer.

Embedded Security :The number of security solutions

compatible with or even designed for Hadoop has been the
biggest change since 2012. All the major security pillars —
authentication, authorization, encryption, key management
and configuration management — are available and viable; in
many cases suitable open source options exist alongside with
the commercial ones.

These are the some hadoop security in the information