You are on page 1of 3

WHAT IS DATA INGESTION?

Data ingestion is the process of obtaining and importing data for immediate use
or storage in a database. To ingest something is to "take something in or absorb
something."

Data can be streamed in real time or ingested in batches. When data is ingested in
real time, each data item is imported as it is emitted by the source. When data is
ingested in batches, data items are imported in discrete chunks at periodic intervals
of time. An effective data ingestion process begins by prioritizing data sources,
validating individual files and routing data items to the correct destination.

Big data ingestion is about moving data - especially unstructured data - from where
it is originated, into a system where it can be stored and analyzed such as Hadoop.

Data ingestion may be continuous or asynchronous, real-time or batched or both


(lambda architecture) depending upon the characteristics of the source and the
destination. In many scenarios, the source and the destination may not have the
same data timing, format or protocol and will require some type of transformation
or conversion to be usable by the destination system.

As the number of IoT devices grows, both volume and variance of data sources are
expanding rapidly, sources which now need to be accommodated, and often in real
time.
TYPICAL PROBLEMS OF DATA INGESTION

Complex, Slow and Expensive

Purpose-built and over-engineered


engineered tools make big data ingestion complex, time
consuming, and expensive

Writing customized scripts, and combining multiple products together to acquire


and ingestion data associated with current big data ingest solutions takes too long
and prevents on-time
time decision making required of today’s business environment

Command line interfaces for existing streaming data processing tools create
dependencies on developers and fetters access to data and decis
decision
ion making.

Security and Trust of Data

The need to share discrete bits of data is incompatible with current transport layer
data security capabilities which limit access at the group or role level.
level
Adherence to compliance and data security regulation
regulationss is difficult, complex and
costly

Verification of data access and usage is difficult and time consuming and often
involves a manual process of piecing together different systems and reports to
verify where data is sourced from, how it is used, and who has used it and how
often

Problems of Data Ingestion for IoT

Difficult to balancing limited resources of power, computing and bandwidth with


the volume of data signals being generated from big data streaming sources

Unreliable connectivity disrupts communication outages and causes data loss

Lack of security on most of the world’s deployed sensors puts businesses and
safety at risk.

You might also like