Eckerson FlexibleDataPipelines 0920 Equalum r1

Flexible Data Pipelines
Balancing Standard and Custom Approaches
By Kevin Petrie and Wayne Eckerson

September 2020
Research Sponsored by
This publication may not be reproduced or distributed

without prior permission from Eckerson Group.
About the Authors

Kevin Petrie has spent 25 years deciphering what technology means
for practitioners as an industry analyst, writer, instructor, marketer,
and services leader. Kevin launched, built, and led a profitable data
services team for EMC Pivotal in the Americas and Europe. He also
ran field training for the data integration software provider Attunity
(now part of Qlik). Kevin is a frequent public speaker, the author of
two books on data streaming, and a data management instructor at
eLearningCurve.
Wayne W. Eckerson has been a thought leader in the data and

analytics field since the early 1990s. He is a sought-after consultant,
noted speaker, and expert educator who thinks critically,
writes clearly, and presents persuasively about complex topics.
Eckerson has conducted many groundbreaking research studies,
chaired numerous conferences, written two widely read books
on performance dashboards and analytics, and consulted on BI,
analytics, and data management topics for numerous organizations.
Eckerson is the founder and principal consultant of Eckerson Group.
About Eckerson Group
Eckerson Group provides research, consulting, and education services to help organizations
get more value from their data. Our experts each have 25 years of experience in the field,
specializing in business intelligence, data architecture, data governance, analytics, and data
management. We provide organizations with expert guidance during every step of their data
and analytics journey. Get more value from your data. Put an expert on your side. Learn what
Eckerson Group can do for you!
© Eckerson Group 2020 www.eckerson.com 2

Table of Contents
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Architectural Approaches to Data Pipelines. . . . . . . . . . . . . . . . . . . . . . . . . 5
Why Customize?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Why Standardize? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Yin and Yang. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
The Data Pipeline Lifecycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Challenges: Why Enterprises Lose Balance. . . . . . . . . . . . . . . . . . . . . . . . . . 8
Adoption Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Striking the Right Balance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
The Benefits of Balance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Your Pipeline to Success . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
About Eckerson Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
About Equalum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Executive Summary
Enterprises use both standard and custom data pipelines to process raw data into
analytics-ready data sets. They must apply the right mix, creating efficiency and scale via
standardization and automation where possible, but still accommodating customization in
order to innovate. But many get the balance wrong and standardize too slowly, limiting the
value they derive from analytics.
This report examines the impact of standardization and customization on data pipelines,
with a focus on design, building, testing, rollout, operations, and adaptation. It seeks to help
architects, data engineers, application developers, and data scientists strike the right balance
in their environments. Enterprises can start by standardizing overly customized data pipelines
and demonstrating clear ROI with bite-size projects. As they standardize more and regain
the right balance, they will drive data democratization, increased productivity, and reduced
risk. As they standardize, enterprises can also free up and redirect resources to custom work,
fostering innovation and increasing analytics value.
Introduction
To visualize the strain on modern data pipelines, stand in the middle of a seesaw and try
to balance it while friends pile on either side. You’ll need to make some fast and tricky
calculations to avoid hitting the dirt.
Enterprise data teams need to strike a similar balance with the pipelines that process data for
analytics. On one hand, they must standardize where possible, using automation to handle
high data volumes and varieties. On the other hand, they must use hand-coded programs and
scripts to customize and support the specialized demands of advanced analytics. Swing too
far one way, and you cannot scale. Swing the other way, and you cannot innovate.
This report seeks to help data analytics leaders and their teams strike the right balance,
standardizing where efficiency matters and customizing where new ideas matter. One of the
key lessons: just like balancing a seesaw, these opposing forces need each other.

Architectural Approaches to Data Pipelines

The terms data pipeline, standard, and custom have different meanings in different contexts.
Here we define them for the purposes of this report, along with the concept of a hybrid
approach.
Data pipeline. The data pipeline describes the tasks and processes, including ingestion,
transformation, and preparation, that process raw data into analytics-ready data sets.
Enterprises create and manage many pipelines, each comprising a distinct combination of
data sources, transformation tasks, and targets in order to support various analytics use cases
required by the business.
Custom approach. “Custom” tasks and processes apply to one or a few specialized pipelines.
Data engineers, data scientists, and application developers can spend significant time creating
and refining custom pipeline components—related to transformation in particular—to address
specialized requirements. They develop and test custom program coding, as well as scripts to
manage that program’s tasks. They insert those programs into production data pipelines, then
manage, monitor, and update the resulting custom data pipeline to fix errors, address new
requirements, and so on.
Standard approach. At the other end of the spectrum, “standard” tasks and processes
apply to many data pipelines with little or no adaptation. For example, many pipeline tools
provide an automated graphical interface for users to configure, execute, and monitor basic
extraction, transformation, and loading (ETL) tasks. They apply roughly the same sequence of
drag-and-drop steps to multiple pipelines without requiring the underlying software code to
be manually rewritten or changed.
Hybrid approach. Open source communities and commercial vendors alike now offer
extensive libraries of modular code, including include machine learning and other advanced
algorithms that enterprises can plug into their data pipelines. Depending on the amount
of adaptation and customization involved, these modular libraries can help data analytics
teams address specialized requirements with less work than would otherwise be necessary.
The amount of work required determines whether such hybrid approaches lean toward the
“standard” or “custom” end of the spectrum. Templates constitute another hybrid approach,
enabling data teams to reuse custom code and effectively standardize it across multiple
pipelines.
Why Customize?
Enterprises must frequently take a custom approach to data pipelines in order to support
innovative use cases, which have only multiplied in the wake of the COVID-19 pandemic. They
must answer creative questions, often on a real-time basis. For example:

• How do customers react on social media to a new product or promotional

campaign?
• What if we automatically incorporate social media sentiment scores into our

e-commerce pricing plans?
• How can we continually monitor the resulting revenue and use those findings to
re-calibrate prices?
Architects, data engineers, and application developers collaborate with data scientists to help
the business address such innovative questions with custom ingestion, transformation, and
analytics programs.
Why Standardize?
Enterprises standardize to improve the operational efficiency of data pipelines so they can
deliver timely and well-structured data to BI analysts, business managers, and data scientists.
Data teams identify modular tasks and processes that can be applied to many pipelines using
GUI tools and templates. Standard data delivery increases the productivity of IT and business
stakeholders throughout the enterprise.
Well-run enterprises standardize some existing tasks over time to manage rising data
supply and rising data demand at scale. Hybrid approaches help drive the evolution toward
standardization for many data pipelines. Data teams develop and deploy custom code, then
standardize the repeated, scalable, and tactical aspects later with GUI-based tools and
templates.
Figure 1 compares these approaches and illustrates the evolution of processes over time.
Figure 1. Data Pipeline Approaches
Yin and Yang

Each enterprise needs both standard and custom data pipelines. It’s a yin-yang relationship.
On the one hand, these approaches can conflict with one another. Standard, GUI-based
software often lacks the functionality necessary to support custom use cases. Hand-crafted
custom software introduces operational risks to otherwise-standard data pipelines. On the
other hand, these approaches can also reinforce each other. Standardizing and automating

repeated tasks frees up time to build custom solutions. Those custom solutions, meanwhile,
often result in pipeline tasks and processes that can be standardized over time to improve
efficiency and scale. Table 1 compares the goals, methods, and mutual benefits of these two
approaches to data pipelines.
Table 1. Standard and Custom Approaches to Data Pipelines
The Data Pipeline Lifecycle

Before exploring how enterprises can fix the imbalance of standard and custom approaches,
let’s consider the lifecycle of work that is involved, as well as the roles and relationships.
• Enterprise architects, data architects, and data engineers gather requirements

from BI analysts, data scientists, and business managers about the data that is
delivered to them.
• These architects, as their name implies, then architect data pipelines by creating
the guidelines to achieve quality and consistency in design and development.
• Architects and data engineers design and build pipelines, often with the help of
data scientists. They identify users, define use cases, and inventory data sources.
They apply design patterns such as ETL or streaming to manage the ingestion,
processing, and delivery of data to support those use cases. Application
developers might also help design and build pipelines that integrate with
operational applications.
• Data engineers operate and adapt pipelines. They maintain, monitor, and tune
various pipeline components, including sources, targets, and processors, as well
as their interconnections. In addition, they add and remove these components to

address changing business requirements, sometimes with the help of application

developers and data scientists.
• Analysts and data scientists consume the data emanating from the data
pipelines and provide feedback on data quality, service-level agreements, and
so on to assist architects and engineers with ongoing architecture, design, and
adaptations.
• Architects and data engineers also help data stewards and other governance
managers create policies and controls, which data engineers then implement.
Data stewards monitor data usage and enforce those policies to maintain data
quality.
Figure 2 illustrates the stages and contributions of various roles to this iterative and
collaborative process.
Figure 2. Data Pipeline Lifecycle
Enterprise data teams can most easily standardize the design/build and operate/adapt stages
of the lifecycle.
Challenges: Why Enterprises Lose Balance

Most enterprises standardize too little, too slowly, and get bogged down in inefficient custom
environments. They struggle with technical debt, primarily in the form of siloed data and
tools, as well as budget constraints and staff resistance to change.
Siloed data. Although this is hardly new or surprising, the problem of data silos persists in
enterprise environments thanks to years of accumulated one-off architectural decisions. Data
teams in IT departments or lines of business steadily adopt new data sources and analytics

targets to address custom use cases. This creates a sprawling patchwork of poorly integrated
data types, formats, and platforms that delay or prevent standardization efforts.
Enterprise data teams also accumulate multiple data pipeline and analytics tools, many
of which address only narrow parts of their environments. Technical limitations on tool
compatibility, and the preferences of organizational fiefdoms, limit the ability of enterprise
data teams to centralize tools. Standardization suffers.
Budget constraints. Despite the medium- to long-term efficiency benefits of standardization,

near-term budget pressures often prevent enterprise data teams from purchasing standard,
automated tools. Depending on the organization and industry, economic uncertainty can
exacerbate the problem and make executives even more reluctant to invest in change. As a
result, data teams become more entrenched with siloed tools and custom processes.
Resistance to change. Data teams and enterprises resist change for many familiar reasons.
Specialized data engineers and application developers might view their backlog of manual
programming and scripting requests, however tedious, as a form of job security. Stakeholders
of all types—data engineers, analysts, and data scientists in particular—do not have much
spare time to learn new processes or tools. The status quo remains a comfort zone.
Figure 3 summarizes the factors that favor customization over standardization.
Figure 3. Why Custom Data Pipelines Persist
Siloed data and tools, budget constraints for new tools, and staff resistance to change cause enterprise data
teams to favor existing custom tasks and processes over standardization.

Adoption Patterns
Enterprises pursue numerous strategic initiatives to democratize data usage and drive digital
transformation. These inter-related initiatives, including real-time streaming, cloud migration,
data modernization, self-service, and advanced analytics, drive demand for both standard
and custom data pipelines.
Real-time streaming. Enterprises replace legacy batch data ingestion and processing with
real-time streaming mechanisms such as changed data capture (CDC) and message streaming.
Real-time streaming enables new insights and improves efficiency by eliminating repeated
batch copies of unchanged data.
Cloud migration. Data teams complement and replace on-premises platforms such as
databases, data warehouses, and data lakes with infrastructure-as-a-service (IaaS) offerings
from cloud service providers. IaaS platforms create financial and operational flexibility.
Data modernization. As enterprises outgrow the inflexible architectures and proprietary

formats of legacy data warehouses, they migrate old workloads and spin up new workloads
on open, economical, and elastic IaaS data warehouses. They convert legacy data warehouses
to cloud-based alternatives, and complement relational data with NoSQL databases to
accommodate numerous data sources and data types.
Self-service. A growing population of data-savvy business managers generate reports,

run queries, and even run advanced analytics with limited IT support. This trend increases
the need for data pipelines that handle frequently changing users, data sets, and data
requirements.
Advanced analytics. Businesses seek to compete effectively by pursuing analytics initiatives

that rely on artificial intelligence (AI) and machine learning (ML), as well as new data sources
such as internet of things (IoT) sensors. AI/ML–based applications drive business value through
prediction, recommendation engines, and automation. Such workloads typically require high
volumes of well-processed data.
Striking the Right Balance

The question of standard versus custom approaches applies most directly to the design/build
and operate/adapt stages described earlier. Drilling down, we see these stages break into five
steps in the data pipeline lifecycle. Generally, data engineers lead these activities from start to
finish, although they share the responsibility for the “design” and “build” steps with architects.
In addition, data scientists advise and contribute to projects that involve advanced analytics

use cases such as machine learning. Application developers also help integrate operationally
focused data pipelines with business applications where appropriate.
Let’s examine how custom, standard, and hybrid approaches shape each of these steps,
as well as the resulting impact on operating costs in the form of time, effort, and software
licensing.
1. Design. Once architects and data engineers identify target users and use cases,
they define the product—i.e., the output of their pipelines—in terms of data
types, formats, latency, and so on. Then they inventory the necessary data
inputs spanning both internal and external sources. Outputs and inputs, in
turn, drive the design of the “middle”—including data acquisition, ingestion,
transformation, and preparation—which becomes the beating heart of the
data pipeline. Architects and data engineers must select their design pattern
carefully—batch ETL, streaming, virtualization, and so on—to ensure they deliver
correctly structured data in a timely fashion. They must also carefully design
the transformation tasks required to improve, enrich, and format data. Many
enterprises adopt data streaming in order to improve efficiency and enable real-
time analytics.
Data teams should seek to standardize where possible during the design phase.
Do they really need to hand-craft data ingestion and transformation with custom
programs and scripts? In many cases, the answer is no. Commercial, automated
pipeline tools address the most common data types, sources, targets, and
transformation tasks, creating efficiency benefits that justify their licensing costs
over time. Data teams should seek budget approval and start evaluating pipeline
tools. Key evaluation criteria include ease of use, breadth of source and target
support, and support for real-time data streaming.
You can’t standardize everything, of course. Unusual data sources might require
custom integration, for example, and advanced algorithms such as machine
learning often require specialized transformation processes. Data engineers
should scope the custom programs and scripts they will need to develop, and
find “hybrid” tools such as templates and code libraries that can help them along
the way.
2. Build. Next, data engineers tackle the hard work of building data pipelines. These
pipelines must connect to source databases, IoT sensors, social media feeds,
and so on, then ingest, transform, and deliver the data on a periodic, scheduled,
or real-time basis to an analytics target such as a data warehouse, data lake,
or NoSQL platform. They might need to incorporate often-changing source
schemas into their data targets, gather metadata, and monitor data lineage.
They might also need to schedule jobs, execute workloads, coordinate task
interdependencies, and monitor execution status on a real-time basis.

Data engineers often struggle to custom-develop all these pipeline functions,

which leads to missed deadlines and buggy code. Where possible, they should
standardize pipeline processes with automated, GUI-based tools that configure
tasks through a drag-and-drop interface. Standard tools require less time, less
effort, and fewer skills than custom programs and scripts. Alternatively, hybrid
tools such as code libraries and templates can reduce labor to some degree by
accelerating the development of components and encouraging their reuse.
When custom code is required, data engineers, data scientists, and application
developers need to ensure their custom components integrate with the rest
of the pipeline. For example, they might use open APIs that maintain future
integration options.
3. Test. The efficiency differences between standard and custom pipeline

approaches start to widen when it comes to testing. As recommended in the
DataOps Cookbook by the team at Data Kitchen, data teams should test both data
and logic throughout the data pipeline, including ingestion, transformation, and
delivery. Basic tests such as row counts and data value ranges often flag errors in
either data or logic. Data teams should automate these tests where possible and
run them regularly after rollout to identify issues that need remediation.
By definition, custom-developed code needs more robust testing and is more

likely to contain errors. Standard GUI-driven tools eliminate or minimize the
risk of fat-finger errors, which enables data teams to accelerate testing and
remediation. The hybrid tools of templates and code libraries exist in a middle
ground—they generate fewer errors than pure custom code, but still require
diligent testing.
4. Roll out. With upfront testing complete, data engineers, data scientists, and
application developers can put their data pipelines into production. They
package, stage, and release custom code, all while keeping a close eye on their
full environments to spot unexpected cascading changes or surprises. They
should employ continuous delivery best practices, meaning that code should
be “releasable” throughout its development in order to remove risk from the
rollout. They also might employ DevOps best practices to streamline rollouts with
improved communication and collaboration between parties.
Standard pipelines that use automated, GUI-driven tools accelerate rollouts and
reduce risk by reducing configuration and execution errors. Hybrid tools such as
templates and code libraries similarly reduce errors and therefore help streamline
rollouts, although to a lesser degree.
5. Operate and adapt. Like living organisms, data pipelines need ongoing support
in the form of continuous monitoring and enhancement. Cross-functional teams,

including data engineers, data scientists, application developers, and systems

administrators should employ DataOps best practices that borrow from agile
methodology to help continuously implement high-quality software. They should
maintain multiple versions of custom code with rollback capabilities. They
should “branch” code away from the production pipeline, adapt that code, then
merge it back into the pipeline. Data engineers, data scientists, and application
developers should also collaborate on change management, using schedule and
cost criteria to filter and select suggested changes to pipelines.
Once again, automated pipeline tools greatly improve speed and flexibility.
Data teams can change many data sources or targets, or reconfigure basic
transformation tasks, with a few clicks. Given the interdependencies involved
in modern environments, they should carefully model, predict, and assess
the cascading impacts of such changes on standard and especially custom
components.
Figure 4 compares the cumulative operating costs of standard, custom, and hybrid
approaches to data pipelines.
Figure 4. Standard, Custom, and Hybrid Data Pipeline Approaches
Although standard data pipeline approaches carry higher design and building costs due to software licensing,
their efficiency benefits drive costs lower than custom and hybrid approaches during subsequent stages in the
data pipeline lifecycle.

The Benefits of Balance

Finding the right pendulum position between standard and custom approaches yields
business benefits that reinforce one another.
Data democratization. Standardized, automated tools for accessing and analyzing data
enable more business-oriented managers and analysts to make data-driven decisions. These
standard tools, meanwhile, free up data engineers to build custom data pipelines and support
more requests for advanced analytics.
Productivity. Standard and hybrid data pipeline approaches improve output per unit of input
on several dimensions. Data engineers build and operate more data pipelines in less time.
Developers integrate pipelines with business applications more easily. Business managers and
analysts answer their questions faster.
Reduced risk. Balanced data pipeline approaches also reduce operational risk. Business
managers make better decisions because they have more data at their disposal. Data teams
commit to analytics projects with new confidence they will meet deadlines, budgets, and
SLAs.
Innovation. Standardization frees up data engineers, data scientists, and application

developers to customize where they should. They dedicate more time and budget to probing
strategic business opportunities and developing innovative projects to address them.
Higher analytics value. Business managers and leaders gain a higher return on investment
in their data assets. They create more and better insights, faster, improving both the top and
bottom line.
Figure 5 illustrates these benefits and their relationships.

Figure 5. The Benefits of Balance
Effectively balancing standard, custom, and hybrid approaches to data pipelines yields higher analytics value,
innovation, data democratization, productivity, and lower risk.
Your Pipeline to Success

The COVID-19 shock has forced enterprises of all types to digitize, reexamine opportunities,
and rethink business models. As a result, demand for both custom and standard analytics
data pipelines has never been greater. Take these actions to create your pipeline to success.
• Standardize—carefully. Ask your data team, particularly your data engineers,

to break down how they spend their time. Have them estimate how much of
their programming and scripting work is repetitive and might be automated.
Identify change agents within your team, and engage them in a creative “what-if”
discussion about what an automated pipeline tool would do for their workloads.
Seek budget and start to evaluate tools based on opportunities to eliminate
manual, error-prone scripting and coding.
• Focus on ROI. Start by standardizing pipelines that offer low risk and high return
in a short time frame, even if they are smaller in scale. Your initiative will need
to demonstrate a clear ROI out of the gate in order to gain approval for larger
projects. You can achieve this by selecting a cost-effective, highly automated,

easily learned tool that can replace one or more existing tools and their
associated maintenance fees.
• Redirect data engineering staff to custom projects. Once you start

standardizing, re-deploy your change agents and most highly skilled data
engineers (often the same people) to tackle custom work. Team them up with
data scientists and business leaders to identify opportunities for innovation.
Have business leaders define the right new questions to answer, and have data
scientists and data engineers define the “data product” that those questions
require. Then work backward from the product to identify the necessary input
data. Build the necessary pipelines—standardized wherever possible—to connect
the two.
• Modernize and consolidate. Standardizing your data pipelines helps you

modernize your environment by migrating from legacy systems like mainframe to
agile new cloud-based platforms. Evangelize these benefits and align with data
modernization projects to drive organizational adoption. Standard pipelines will
also help you consolidate data onto fewer platforms by streamlining your data
transfer process. This reduces administrative overhead, eliminates data silos, and
creates new analytics value.
• Embrace an open ecosystem. Expect analytics use cases to continue to

multiply, introducing the need for new data sources, platforms, transformation
algorithms, and other architectural components—even while you consolidate
to eliminate legacy components. Prepare for this future evolution by investing
in open APIs, data formats, and pipeline tools that provide the highest levels of
interoperability.
And finally … plan for growth! Expect demand for your team’s services to continue to rise,
driving the need for continued rebalancing as we enter the post-COVID-19 world.

About Eckerson Group

Wayne Eckerson, a globally known author, speaker, and advisor, formed
Eckerson Group to provide data-driven leaders like yourself a cocoon of
support during every step of your journey toward data and analytics
excellence.
Today, Eckerson Group has three main divisions:
• Eckerson Research publishes insights so you and your team can stay abreast
of the latest tools, techniques, and technologies in the field.
• Eckerson Consulting provides strategy, design, and implementation assistance

to meet your organization’s current and future needs.
• Eckerson Education keeps your data analytics team current on the latest
developments in the field through three- and six-hour workshops and public
seminars.
Unlike other firms, Eckerson Group focuses solely on data analytics. Our veteran
practitioners each have more than 25 years of experience in the field. They specialize in
every facet of data analytics—from data architecture and data governance to business
intelligence and artificial intelligence. Their primary mission is to share their hard-won
lessons with you.
Our clients say we are hard-working, insightful, and humble. We take the compliment! It all
stems from our love of data and desire to serve—we see ourselves as a bunch of continuous
learners, interpreting the world of data for you and others.
Accelerate your data journey. Put an expert on your side.

Learn what Eckerson Group can do for you!

About Equalum
Develop and operationalize your batch and streaming pipelines with infinite
scalability and speed with Equalum
Traditional Change Data Capture and ETL processes and tools cannot adequately perform
under the pressure of modern data volumes and velocities. The strain on legacy data
systems leads to data latency, broken pipelines, and stale data used for business analytics
and daily operations. Equalum built the industry’s most scalable and comprehensive data
ingestion platform, combining streaming Change Data Capture with modern data
transformation capabilities. While real-time ingestion and integration are a core strength of
the platform, Equalum also supports high scale batch processing.
Equalum supports both structured and semi-structured data formats, and can run on-
premises, in public clouds or in hybrid environments. Equalum’s library of optimized and
developed CDC connectors is one of the largest in the world, and more are developed and
rolled out on a continuous basis, largely based on customer demand. Equalum’s multi-
modal approach to data ingestion can power a multitude of use cases including CDC Data
Replication, CDC ETL ingestion, batch ingestion and more.
Equalum also leverages open source data frameworks by orchestrating Apache Spark, Kafka
and others under the hood. The platform’s easy to use, drag and drop UI eliminates IT
productivity bottlenecks with rapid deployment and simple data pipeline setup. The
platform’s comprehensive data monitoring eliminates the need for endless DIY patch fixes to
broken pipelines and challenging open source frameworks management, empowering the
user with immediate system diagnostics, solution options and visibility into data integrity.
Headquartered in Silicon Valley and Tel Aviv, Equalum is proud to work with some of the
world’s top industrial, financial, and media enterprises globally.
Learn more about Equalum at equalum.io.

Follow Equalum on LinkedIn, Twitter, Facebook, and YouTube.

Eckerson FlexibleDataPipelines 0920 Equalum r1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Eckerson FlexibleDataPipelines 0920 Equalum r1

Uploaded by

Copyright:

Available Formats

Flexible Data Pipelines

Balancing Standard and Custom Approaches

By Kevin Petrie and Wayne Eckerson

This publication may not be reproduced or distributed

About the Authors

Wayne W. Eckerson has been a thought leader in the data and

About Eckerson Group

© Eckerson Group 2020 www.eckerson.com 2

Architectural Approaches to Data Pipelines. . . . . . . . . . . . . . . . . . . . . . . . . 5

Yin and Yang. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

The Data Pipeline Lifecycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Challenges: Why Enterprises Lose Balance. . . . . . . . . . . . . . . . . . . . . . . . . . 8

Striking the Right Balance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

The Benefits of Balance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Your Pipeline to Success . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

About Eckerson Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

© Eckerson Group 2020 www.eckerson.com 3

© Eckerson Group 2020 www.eckerson.com 4

Architectural Approaches to Data Pipelines

© Eckerson Group 2020 www.eckerson.com 5

• How do customers react on social media to a new product or promotional

• What if we automatically incorporate social media sentiment scores into our

Figure 1. Data Pipeline Approaches

Yin and Yang

© Eckerson Group 2020 www.eckerson.com 6

Table 1. Standard and Custom Approaches to Data Pipelines

The Data Pipeline Lifecycle

• Enterprise architects, data architects, and data engineers gather requirements

© Eckerson Group 2020 www.eckerson.com 7

address changing business requirements, sometimes with the help of application

Figure 2. Data Pipeline Lifecycle

Challenges: Why Enterprises Lose Balance

© Eckerson Group 2020 www.eckerson.com 8

Budget constraints. Despite the medium- to long-term efficiency benefits of standardization,

Figure 3 summarizes the factors that favor customization over standardization.

Figure 3. Why Custom Data Pipelines Persist

© Eckerson Group 2020 www.eckerson.com 9

Data modernization. As enterprises outgrow the inflexible architectures and proprietary

Self-service. A growing population of data-savvy business managers generate reports,

Advanced analytics. Businesses seek to compete effectively by pursuing analytics initiatives

Striking the Right Balance

© Eckerson Group 2020 www.eckerson.com 10

© Eckerson Group 2020 www.eckerson.com 11

Data engineers often struggle to custom-develop all these pipeline functions,

3. Test. The efficiency differences between standard and custom pipeline

By definition, custom-developed code needs more robust testing and is more

© Eckerson Group 2020 www.eckerson.com 12

including data engineers, data scientists, application developers, and systems

Figure 4. Standard, Custom, and Hybrid Data Pipeline Approaches

© Eckerson Group 2020 www.eckerson.com 13

The Benefits of Balance

Innovation. Standardization frees up data engineers, data scientists, and application

Figure 5 illustrates these benefits and their relationships.

© Eckerson Group 2020 www.eckerson.com 14

Figure 5. The Benefits of Balance

Your Pipeline to Success

• Standardize—carefully. Ask your data team, particularly your data engineers,

© Eckerson Group 2020 www.eckerson.com 15

• Redirect data engineering staff to custom projects. Once you start

• Modernize and consolidate. Standardizing your data pipelines helps you

• Embrace an open ecosystem. Expect analytics use cases to continue to

© Eckerson Group 2020 www.eckerson.com 16

About Eckerson Group