Professional Documents
Culture Documents
Guide to Modern
Analytics
A step-by-step process to faster insights,
democratizing data and data-driven innovation
Table of Contents
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Step 2: Grow your data team and establish data governance standards . . . . . . . . . 17
We live in a world saturated with data. The three Vs of big data – variety, volume and velocity
– continue to grow as business operations increasingly take place in and depend on the cloud.
In a highly competitive, globalized and interconnected world, it is increasingly essential for
organizations to be able to turn data into opportunities.
Broadly speaking, organizations can use insights from data to improve the following:
1. Customer experiences
2. Internal processes and operations
3. Products and features
A modern data operation can make valuable insights available to everyone in your organization.
A data-literate leadership means data-driven improvements become important keystones of
your strategy and set the high-level direction for your organization. For individual contributors,
broad-based data literacy means an opportunity to personally make impactful decisions.
Data lakes are a more general, usually cheaper data storage solution that stores all data as
files. As a result, data lakes can accommodate both unstructured and structured data but run a
danger of becoming cluttered, poorly documented and “murky,” i.e. data swamps.
For the purposes of this guide, we assume that data warehouses are the destination. Very
complex use cases, such as those involving machine learning applications that are trained using
documents and media files, may combine the functionality of data lakes and data warehouses.
Without the help of modern off-the-shelf solutions, the conventional approach to data
operations is engineering-heavy and involves roles with highly specialized competencies that are
difficult to fill. Off-the-shelf tools allow you to circumvent the cost and complication of building
a data integration platform from scratch.
“Working with terabytes of data, making sense of it and putting in the correct
context is one of the most important things we can do. This allows employees
from across the organization to create valuable insights into how we work,
engage with partners and deliver services across our customer base. Every
decision we make is driven by data.”
– Lucianne Millan, Senior Manager of Data Engineering, WeWork
Data governance is a wide-reaching term that includes ensuring data quality and integrity,
providing a taxonomy to define key business metrics and standardizing metrics and practices
across different business units. It also concerns properly assigning permissions to ensure
consistent and controlled access to data, including for the purposes of regulatory compliance.
An essential part of data governance is to take inventory of all data assets. The best time to do
so is when you’re moving to a new data stack for the first time, as you should be reviewing all
Treat reports, dashboards and visualizations as discrete products in their own right. Analytics
projects benefit from product thinking, especially in the guise of agile methodology. A keen
understanding of users and quick project cycles to produce minimum viable products (MVP) will
enable your analytics team to produce results quickly, pivot as needed and reduce the risk of
wasted work. It is usually a good idea to begin with a relatively low effort, high impact use case.
Sales, marketing and product analytics are obvious candidates.
Starter KPIs
The following are important early metrics:
1. Interpret visualizations, dashboards and reports in order to make decisions. Senior (especially
C-suite) leadership and junior contributors alike should be able to read and intelligently act
on what analysts create.
2. Construct new data models from existing raw data. This is the bread and butter of analysts
as well as data literate individual contributors.
3. Explore new sources; construct predictive models. This is the purview of analysts and
data scientists.
Training will be essential to evangelizing the importance and benefits of data literacy as well as
creating a common language for learning.
As your organization continues to operationalize its data, you must ensure that as many data
operations are automated as possible. Make sure the technologies you use to support your data
operations can support automated reporting, quality control of data and the ability to push data
back into operational systems.
Business applications of machine learning include revenue and profit projections, predictive
modeling to describe likely outcomes and tradeoffs of major decisions, recommendation systems
for customers and every manner of business process automation. The most sophisticated
applications of machine learning are products in their own right, such as self-driving cars.
Organizations aren’t born with mature data operations and a data-driven mandate from
leadership. So how do you know if you are ready to make a change and what do you do?
Manual reporting becomes a time- and labor-intensive recipe for your analysts to start dropping
the ball in countless ways:
You will need to consider a more sustainable, longer-term strategy to avoid unnecessary work.
The Guardian has a huge number of different products with different data
assets, even in different clouds, all with different formats, different standards.
We realized that we really needed to build a holistic view of our consumer base.
– Jonathan Rankin, Senior Product Manager, The Guardian
The downside is that, by including both extraction and transformation in the data pipeline, ETL
is fundamentally brittle. Any change to either upstream data sources or downstream analytics
models requires a rebuild of the entire data pipeline. It is a complicated, engineering-intensive
process with lengthy project turnaround.
By contrast, the extract-load-transform (ELT) approach to data integration moves all data from
source to destination first, allowing transformations to be performed within a data warehouse
environment. This means extraction and loading simply replicates all available data to the
destination, without consideration for downstream analytics models.
Reporting is underutilized
Your analysts may produce plenty of data assets such as dashboards and reports, but people in
your organization might not use them regularly. This can mean that people who are responsible
for knowing important metrics might struggle to answer relatively simple questions about
trends and summary findings, especially at more granular levels.
This is usually the result of a lack of awareness, distrust in the data’s provenance, unclear
ownership or a general lack of data literacy.
Data utilization
The people in your organization should be using the aforementioned resources to inform and
justify decisions. This habit should be commonly practiced by leaders and individual contributors
alike. A good metric is the percentage of your organization that consults your BI tool at least
weekly. Aim for adoption rates over 50 percent.
We will now cover, in detail, the steps required to ascend the hierarchy of data needs
outlined earlier.
As a general rule, you are looking for tools that are easy to use and have a lot of out-of-the-
box functionality. The tools you choose should almost obviate the need for your organization to
perform complicated data engineering work, at least at first. It should complement other tools
in an ecosystem, as well.
Recall that a modern data stack includes the following tools and technology:
• Data pipeline
• Destination
• Transformation tool
• Business intelligence platform
Select these tools and technologies carefully. You want them all to be cloud-based and
compatible with each other. The following are some more specific selection criteria for each
type of tool.
Modern tools tend to offer free trials. You should be able to rapidly assemble a proof of concept,
create an MVP and demonstrate its efficacy for stakeholders.
Data analysts are the bread and butter of any data team. Their main responsibilities are to
produce visualizations, dashboards and reports using business intelligence platforms. They are
usually well-versed in SQL and may also know a scripting language like Python, R or Java. It’s
important to have intellectual and experiential diversity among analysts, especially people with
valuable domain knowledge in certain business functions, industries and other concerns.
With the help of a modern data stack and automation, your data team should be able to get by
with only analysts for quite some time.
Without the aid of a modern data stack, it is necessary to maintain a large team of data
engineers to build your own bespoke data stack. This can be extremely costly and time-intensive.
In addition to the main analytics roles, roles that emphasize coordination will become important
as your data operations mature. Data product managers perform the same role as other
product managers, but for data- and analytics-related assets. They are responsible for defining
what to build and guiding analysts through the creation of data assets. Last but certainly not
least, you will need to designate a data governor to own the task of data governance.
Organizational design
As your data needs grow, you will need organization and a clear division of labor. For
accountability purposes, data assets and workflows also need clear owners.
There are arguments in favor of both centralizing and decentralizing data teams, but a good
compromise is both, in the guise of a hub-and-spoke model.
To avert this problem, consider a cloud data catalog tool. It will help you take the
following actions:
• Document all models, tables and fields. This may be impractical if you have many data
sources; an alternative is to carefully build a dimensional schema, which is a simplified data
model that encompasses all major operations.
• Determine what metrics you need and where they come from.
• Make note of how frequently you need to refresh the data.
• Plan to address any data integrity issues.
• Identify the true data owners for the various models within the organization.
• Assign ownership and create incentives to keep the system healthy.
The best time to make this effort is as you start fully implementing your modern data stack, as
you will need to take inventory of all data assets anyway.
Get data governance under control early in order to build trust. Without a clear provenance for
every data model, it will be difficult for the end users of your data to make sense of how metrics
are determined and resolve conflicting narratives.
Identify
• Understanding users
• Gathering requirements
Design
• Defining scope
• Managing expectations
Develop
• Rapid prototyping
• Productionizing
Launch
• Marketing and rolling out the product
• Training users via office hours and internal communications
• Driving adoption, including through self-service whenever possible
Assess
• Evaluating against expectations and KPIs
You will need to build a roadmap regarding how these data assets will improve decision-making
at your organization. List the milestones and end goals, as well as the information and insights
necessary to achieve them. Your roadmap is something you can bring to your leadership with the
possibility of being incorporated into your organization’s strategy.
It can be useful to think of data literacy in terms of the Diffusion of Innovation Theory. Imagine
an S-shaped curve in which the leading 2.5 percent of people are innovators and trailblazers;
another 13.5 percent are early adopters; another 34 percent are the early majority. The key
challenge to getting started is to find an enthusiastic and, ideally, technically-savvy person in
a position of influence to evangelize to their team on your behalf. As teams become more data
literate and more capable, your efforts should eventually snowball and gain momentum of
their own.
For instance, you could have a member of your data team ask an applicant to interpret a chart
like the following:
There are two obvious takeaways from this chart. One is that tourist visits to Bali are strongly
seasonal, with yearly peaks in January; not surprising, given Bali’s proximity to the equator.
Another is that tourism has grown steadily over the course of a decade.
The other is for internal analytics. Programmatic control of your data pipeline can also be used
to establish reproducibility and trust in your data models through a system of automated alerts
to monitor the health of your data integration workflows.
Don’t prematurely pursue vanity projects with high costs and risks, especially if they involve
machine learning (and especially if it’s at the behest of a consultant!). Many organizations hire
data scientists well before they are ready to do any data science.
Don’t let data governance slide. Without clear ownership of data models and the utmost trust
in the provenance of data models, people in your organization will hesitate to use your data to
make decisions. It will also make compliance with laws, regulations and ethical standards
more difficult.
Don’t make extra work for yourself. The whole point is to do what you used to do but better. This
shouldn’t be difficult if you choose the right tools — the “right tools” being the operative phrase.
Choose tools that are noted for labor-saving features.
Don’t emphasize negative, dystopian narratives about using data. Yes, data can be used for
purposes that are manipulative and frankly creepy. That is far from the whole story; insights can
also genuinely improve people's lives.
Finally, don’t delay getting started. The initial outlay of resources required to modernize your
analytics is very modest. You can, in principle, do it with a single analyst and several free trials of
technology. Start now!
Fivetran can help you with the first step of your analytics modernization journey.