Professional Documents
Culture Documents
( HGrid247 )
By Solechoel Arifin
Agenda
- The Importance of Data Engineering
- What is HGrid247 ?
- HGrid247 Features
- HGrid247 Implementation
The Importance of Data Engineering
1. Data engineers design and build pipelines that transform and transport data into a format wherein, by the time
it reaches the Data Scientists or other end users, it is in a highly usable state. These pipelines must take data
from many disparate sources and collect them into a single warehouse that represents the data uniformly as a
single source of truth. (Nathan Black - QuantHub )
2. Without data engineering, there would be no data as such, which would bring machine learning and AI to an
end, because these technologies use algorithms that are requiring a lot of data to build. (DataEngi)
3. Data Engineering is the Backbone of Data Science. Data engineers are on the front lines of data strategy.
They are the first people to tackle the influx of structured and unstructured data that enters a company’s
systems. They are the foundation of any data strategy. Without Lego blocks, after all, you can’t build a Lego
castle. (DataQuest)
4. In F1, the driver would be useless without a whole range of engineers and mechanics. If your business only
has BI (Business Intelligence) and MI (Management Information) analysts or Data Scientists, you are asking
the driver to win an F1 race with a Morris Minor – you need a Data Engineer. (Holly Rourke - LinkedIn)
5. Data Engineering Is Critical to Drive Data and Analytics Success. Organizations have heavily invested in hiring
data scientists and business analysts, but without data engineers they struggle to curate a data pipeline or
move data to production. Data engineers make the appropriate data accessible and available to the right users
at the right time. (Gartner report 10/2020)
What is HGrid247 ?
HGrid247 is Multi Platforms Drag and Drop Big Data Engineering
ETL Tool for batch and Stream Processing.
HGrid247 can help the user easily design data engineering pipeline
(workflow) using a drag-and-drop interface.
From a workflow, HGrid247 can generate code that runs on the
following 'Distributed Massive Data Processing Frame-Work' :
- Processing Components
- Data Sinks
- Other Features
HGrid247 Features
Data Sources
- File
- RDBMS
- Hive
- HBase
- Solr
- Kafka
HGrid247 Features
Data Source : File
- Inner Join
- Left Join
- Right Join
- Outer Join
HGrid247 Features
Processing Component : Reference Join
(join without shuffling)
- Inner Join
- Left Join
HGrid247 Features
Processing Component : Merging
- Telecommunication Industry
- Goverment
- Banking
- Education Institution
Thank You