You are on page 1of 20

Automate organization, provide consistent

definitions and enable self-service


management of data across today’s modern
enterprise with the use of a data catalog.
Introduction Data catalog use cases
– Enhance use of data
The importance of a – Improve regulatory
modern data catalog compliance
– Automate data governance
for DataOps
Setting up a business – Support a governed
taxonomy data lake
– Enable AI governance

Get started with IBM Watson


Knowledge Catalog

2
Businesses need to maximize the value of their As a solution, many organizations have begun
data to drive monetization and increase what to implement DataOps (data operations)
McKinsey & Company refers to as “the insights practices to deliver continuous enterprise data
to value chain.”1 In many cases this includes the that is high-quality and trustworthy. DataOps
leveraging of artificial intelligence (AI) that can orchestrates people, process, and technology
fuel predictive insights and proactive outcomes. to solve the challenges associated with
However, growing volume of data spread across inefficiencies in accessing, preparing, and
multiple deployments as well as internal integrating data. This enables collaboration
obstacles of traditional manual processes and across an organization to drive agility, speed
data stewardship roles remains a challenge. and new initiatives at scale.
Leaders are discovering their current data
processes don’t efficiently scale to tackle today’s At the heart of an effective DataOps practice is a
needs, nor ones they will face in the future, and data catalog, a metadata management tool
yet the importance of being able to find a solution designed to help organizations find and manage
is absolutely imperative. large amounts of data. It puts trusted data in
the hands of a business by automating the
Gartner estimates that by 2021, AI augmentation– organization of a common and known business
a human-centered partnership model between vocabulary, self-service management of data and
people and AI technologies working together– on-boarding of data content. This ebook focuses
will create a business value of $2.9 trillion and on the importance of a modern data catalog
6.2 billion hours of heightened worker productivity and the benefits a business can reap from
worldwide.2 its use when it’s implemented correctly. From
supporting multicloud adoption and integration,
to accelerating an organization’s journey to AI,
the data catalog is at the foundation.

Discover DataOps with an interactive guide

3
Introduction

Gartner estimates that by 2021, AI


augmentation–a human-centered
partnership model between people
and AI technologies working
together–will create a business value
of $2.9 trillion and 6.2 billion hours
of heightened worker productivity
worldwide.

4
Gartner originally defined a data catalog as a A modern data catalog allows data analysts to
tool that “creates and maintains an inventory find all the data available in each database or
of data assets through the discovery, description application maintained by their organization.
and organization of distributed data sets.” As This can include both relational data and
the quantity of data available to organizations unstructured data which can be found in word
has grown exponentially over the last several documents or spreadsheets, whereas analytic
years, data catalogs have grown in importance assets will include Jupyter Notebooks, trained
and their definition and scope have grown as models and dashboards. Because data catalogs
well. Delivering business-ready data to feed make data sources more discoverable and
analytics and AI projects begins with a data manageable, they help organizations make more
catalog that can automate organization, provide informed decisions about how to use their data.
consistent definitions and enable self-service How to access the data, the data format, the
management of enterprise data. classification of the asset, the asset lineage
and the list of collaborators that have access to
certain kinds of data is the kind of information
that should be embedded inside data assets.

5
The importance of a modern data catalog

Benefits of a data catalog:

Index and enrich assets Control access to data policies


When looking for a data catalog, it is essential After adding data assets to a catalog, they can Policies should apply to all catalogs within an
for the catalog to have a metadata repository be profiled to add generated metadata about enterprise and the corresponding policy tools
that acts as an index for data and other assets, the data assets’ contents, and in addition, you should only be available to users who have
making it easier to understand what kind of can enrich assets by having catalog collaborators special permissions within the catalog.
data and analytic assets are in your catalog. add ratings and reviews. Catalog collaborators
can also create tags that describe different assets Policy tools should allow you to:
Here’s how a data catalog can smoothly while making sure data classes accurately define
ensure the addition of assets: the type of data stored within the assets—all – Create business terms that describe
while having set business terms that help your data to use in policies
– Leaves your data where it is. Whether it describe data in a standard way for your – Write policies to deny access and
is in the cloud or on premises, just add enterprise. protect sensitive data assets
the connection information into your – Write policies to mask data values in
data catalog to access it. columns that contain sensitive data
– Automatically discovers and adds all – Monitor trends in policy enforcement
tables from a connection to a relational over time
data source as assets in the catalog.
– Uploads files to the dedicated encrypted
cloud object storage bucket that’s
associated with the catalog.
– Includes an object storage instance
to store assets that are copied into
the catalog.

6
The importance of a modern data catalog

Benefit from data discovery Expedite data preparation Collaborate across governed assets
Data catalogs must have a record of collaborators In order to help transform large amounts of A catalog helps alleviate manual processes
who need access to certain assets and raw data into consumable, quality information and dependencies with advanced discovery
corresponding information in data sets from that’s ready for analysis, a data catalog should capabilities typically driven by machine learning
across an entire organization, without needing have self-service preparation features to support and semantic context. This makes it easier to
separate credentials for every source. This any data preparation solution your company find relevant assets quickly and at scale.
creates a single platform where any member in already has in place.
an enterprise can locate their data. To ensure Ways in which a catalog enables data
security, the data catalog assigns the correct Make sure the following features are included discovery include:
roles to its users based on their needs and will in your catalog to make it easy to explore, prepare
place the necessary restrictions on what the and deliver data that can be trusted and used – Search keywords and filters based on
user can and can’t do inside the catalog. across your business. subject tags and other asset properties
– Preview capabilities to ensure that you
Types of collaborators and their functions: – Powerful operations that clean, are selecting the correct data asset
organize, fix and validate your data – Reviews about assets created by
– Authors: Subject-matter experts who will – Scripting support for the efficient collaborators within the catalog to help
pull and draft the appropriate information and flexible manipulation of data identify the best assets to pull from
into the catalog – Scheduling and monitoring of data – Asset recommendations that are
– Approvers: Once authors have completed preparation flows automatically compiled based on your
their draft, approvers can review, comment, – Profiles for validating your data usage history, similar assets and other
approve, or deny the delivered information – Visualizations for gaining insight factors
– Publishers: Authorized to publish the into your data
approved information and make the new – Policies that mask data are enforced Dive deeper into the benefits of cataloging
business terms and data assets available – Support for unstructured data
to anyone with access to the business
glossary

7
The importance of a modern data catalog

What it looks like when your business has a What it looks like when your business doesn’t
data catalog and it is implemented correctly: have a data catalog or is implemented incorrectly:

– Decrease time to results with more – Risk wasting time searching


time to analyze data and put it to use and tagging your data
– Capture contextual asset knowledge – Lose crucial knowledge when you
and improve data’s utility locate data but can’t find colleagues
– Track data lineage and improve who understand the data
trust in your data quality – Lose knowledge of who has access
– Market information assets to data
for broader consumption – Failure to meet compliance
– Assist with data governance and governance requirements
and compliance

8
Understanding the benefits of a modern data
catalog is just beginning. It’s equally important
to understand how to start integrating it into your
business to realize value faster. When the goal
of your organization is to increase efficiency and
collaboration across stakeholders, the first place
to focus your improvements on should be the
company’s taxonomy. This will become the
foundation for content categorization, data
relationships, and provide a guideline that
improves that speed at which data can be
found, accessed or reused.

9
Setting up a business taxonomy

Best practices when


establishing a robust
business taxonomy include:

Step one: Focus on a single high-value Step two: Concentrate on the meaning Step three: Establish benefit Step four: Develop and commit
information area of business definitions and gain interest to milestones
As opposed to trying to organize all of your Use the language of your industry in the form Though adoption of a business taxonomy might The final step is to establish official milestones
assets at once, it is far more efficient to focus of logical or business intelligence models to not happen overnight, it is critical for your that your organization will commit to for
on a particular segment of the business that power existing terms and standards already organization to understand the advantage implementation of the business categories,
will drive the greatest impact. For instance, if set in place. Take time to understand how certain of having a single place where all information business terms, and correct assignment of
compliance and regulatory processes, such as concepts and definitions are currently being is stored. Within a specific sector of your user roles—and moreover the data catalog
for GDPR and CCPA, are high priority for your applied throughout your organization, then build business, champion the idea of selecting a process. Whether you have a mature DataOps
organization, begin with establishing terms your catalog specific to these key components, focused area to start integrating a data catalog culture in place or this is your first step, it is
and classifying assets related to personally data types and common uses of data. with an established business taxonomy, so important to remember that each organization
identifiable information. the organization’s data can be consolidated has unique needs where stakeholders in and
in one place. out of IT need to add value to drive success of
data projects.

10
Setting up a business taxonomy

Data
Identify focus areas Identify sponsors and key Identify data stewardship
governance
stakeholders team
officer

Configure the workflow Approve and publish


Data steward Define the workflow Assign roles (edit/review/
in IBM Watson® categories and terms in IBM
for taxonomy approve)
Knowledge Catalog Watson® Knowledge Catalog

Governance Gather potential terms to Identify the category Select the first set Agree to their definitions
council bootstrap new taxonomy hierarchy of terms to populate

Figure 1: Data citizens must work together to build business


taxonomy that benefits their organization as a whole.

11
An organization can leverage a data catalog to
accomplish the levels of success that enterprise
data leaders are experiencing today. From
ensuring that your enterprise can meet
compliance regulations, facilitating data lake
governance, or cutting down on the time
consuming labor that it takes to govern your
data, the following stories share the data
struggles five different companies were able
to overcome by implementing their own data
catalog.

12
Data catalog use cases

Enhance use of data


A data catalog offers a single place for data How Credito Valtellinese used cognitive
analysts to view and easily find all data assets analysis to find hidden opportunities
across different departments. This consolidated Seeking growth through customer-centric
view enables team members to share insights banking, Credito Valtellinese needed to
that can improve the business. For example, reposition itself. In order to do so, the
team members might discover cross-sell and organization launched a plan that was
up-sell opportunities that can generate new predicted to increase revenue per customer
revenue streams. by optimizing its cross-selling and up-selling
marketing campaigns. However, the bank
always encountered the same roadblock—
internal systems were not centered around
their client relationships, making it near
impossible to market to existing customers.

Credito Valtellinese had to create an analytical


foundation, inclusive of a data catalog, in order
to understand its customers’ behaviors and
needs on new level of depth and granularity,
and by adapting cascading styling sheets (CSS)
their organization was able to create just that.

Their comprehensive system and management


solution delivered precisely targeted promotions
to those which were most likely to convert,
therefore increasing outbound marketing
campaign conversion rates by 10%.

13
Data catalog use cases

Improve regulatory
compliance
Ungoverned sensitive data may lead to The IBM Global Chief Data Office helps The results of this effort were collected in a
regulatory penalties. For instance, if a business analyze and visualize business risks round central data privacy catalog as a key first step
does not rectify any of their violations against sensitive data in the journey to readiness, but it was still
the California Consumer Privacy Act, an attorney Due to GDPR readiness, companies in uncertain how to identify, evaluate and share
general could impose a civil penalty of anywhere possession of personal data from European the discovered information of data that needed
from $2,500 to $7,500 per violation,3 and when it Union data subjects are legally obligated to to be in compliance with the GDPR. As a result,
comes to the GPDR, financial penalties could go understand the types of data they store, where IBM used their own cataloging technologies and
as high as 20 million euros or 4% of worldwide the data lives and its associated levels of risk. created a central store for their privacy data. To
annual turnover.4 Therefore, as organizations face compliment the catalog, IBM Data Risk Manager
growing data privacy regulations, they must look For a company as large as IBM, which operates was also implemented to provide a data risk
more holistically at how they store and use data. in more than 170 countries, it can be a daunting control center for executives and their teams
task to refresh an organization’s privacy to easily view the updated information from
A data catalog can automate the classification practices and ensure that the GDPR guidelines the privacy catalog in a central dashboard and
and profiling of data assets and automatically are met—all while enhancing products and ensure that ongoing requirements to meet data
enforce data protection rules established to services that will ultimately benefit all of its privacy regulations are met.
anonymize and restrict access to sensitive clients. To undergo this task, the Global Chief
information. More importantly, if something goes Data Office (GCDO) created a global program, Learn More: Forrester names IBM a leader in
wrong, controls allow the organization to rapidly among numerous work streams, to address the Machine Learning Catalogs
respond to an issue, whether that means flagging GDPR requirement and more comprehensively
sensitive data, identifying and remediating issues, understand the type of personal data IBM
or collecting information in response to an audit. controls.

14
Data catalog use cases

“ The job involved examining more


than 6,500 application across the
company, about 3,400 of which are
critical from a GDPR perspective.”
Neera Mathur, GCDO Senior Technical Staff
Member in the Global Chief Data Office

15
Data catalog use cases

Automate data
governance for
DataOps
An integrated quality and governance platform How Integra LifeSciences adopted an Integra LifeSciences worked with IBM to
helps manage data and protects it from misuse. integrated approach to manage all implement IBM data cataloging technology
For effective governance, an enterprise data parts of their business that creates consistent definitions of its
catalog must be in place. You can’t effectively When implementing various new systems business data and helps them better
apply governance if you don’t have organized and processes into their organization, Integra understand what their data could do for them.
data with proper metadata tags and lineage. LifeSciences, a surgical and medical instrument
Data organization includes detailing each data manufacturing company, found that governance To learn more about what IBM Watson
object: documenting data properties, ownership, in their organization was not a simple feat. The Knowledge Catalog can do for your business
business context, origin, and structure; quantity of data they needed to keep track of take a guided tour to see how business users
evaluating data quality; and properly classifying was quickly multiplying, and they were losing can quickly discover, curate, categorize and
data so it can automatically be used to define track of where the data was located and how share data assets across a whole organization.
and refine an organization’s DataOps practice. they could effectively use it to benefit their
business. By turning to an integrated approach
that collected, defined and managed their data
all in one platform, Integra was able to cut 50%
of business systems, reduce their complex
management of systems and data, and cut
operational costs in order to maximize the
organization’s growth benefits.

16
Data catalog use cases

“Integra has substantially reduced


operational costs as a proportion
of revenue–and we predict the
solutions will unlock greater financial
benefits as we move towards our
USD 1 billion revenue goal.”
William Compton, Chief Information Officer,
Integra LifeSciences

17
Data catalog use cases

Support a Enable AI
governed data lake governance
Data lake governance takes discipline, good A data catalog can help the enterprise
policy and collaboration between the people governance program grow to support the
who manage data access and the people who maturing demands of AI governance. As AI
access the data. Cataloging helps to tag the takes root, you’ll need an organizational
data in the data lake and create an inventory approach toward developing policies which
of information assets. The catalog interface lets you create a framework to effectively
provides data lake users with information design, deploy, and monitor AI-powered
about the data within its classification, lineage models and algorithms with a focus on
and how it’s governed. The catalog can serve fairness, accountability, transparency, safety,
multiple stakeholders in the organization, and privacy, ensuring fair outcomes.
eliminating inefficiencies associated with
“lost in translation” issues.

Deliver clean, reliable data with data


lake governance

18
The modern data catalog goes way beyond that Therefore, as businesses continue to digitally
of the legacy metadata repository businesses transform themselves to build and incorporate
have been using for decades. They surpass the AI into their overall business strategies, the
concept of metadata capture and management value of data catalogs integrated with a data
by including automation and discovery quality and governance platform becomes
techniques such as visual recognition, natural more essential.
language classification and machine learning.
With these capabilities, a data catalog can IBM Watson Knowledge Catalog is an open
organize data in near real-time with the added and intelligent data catalog for managing
benefit of eliminating the inefficient manual enterprise data and AI model governance, There’s a reason our Talk to an expert to learn more
processes required by older repositories. quality and collaboration. By providing an
customers named Watson about Watson Knowledge
end-to-end experience rooted in metadata
The new wave of intelligent data catalogs is and active policy management, it helps data Knowledge Catalog a 2020 Catalog and explore its
not only changing the way business is run via citizens quickly discover, curate, categorize, Gartner Customer Choice seamless integration with
virtualization and multicloud deployment, but and share data assets, data sets, analytical
how organizations are carving new business models, and their relationships with other Award Winner. Test drive IBM DataOps services for
models and preparing for the future of AI. members of your organization. the product to see why. IBM Cloud Pak® for Data.

19
© Copyright IBM Corporation 2020
01 Holger Hürtgen and Niko Mohr. “Achieving business impact
IBM Corporation with data”, Microsoft Report, April 2018.
Route 100
Somers, NY 10589 02 “AI Augmentation Will Create $2.9 Trillion of Business Value
in 2021”, Gartner, August 2019.
Produced in the United States of America
July 2020 03 Nicholas Schmidt. “Top 5 Operational Impacts of CCPA: Part
5 - Penalties and enforcement mechanisms”, International
IBM, the IBM logo, ibm.com, IBM Cloud Pak and Watson are Association of Privacy Professionals (IAPP), August 2018.
trademarks of International Business Machines Corp., registered
in many jurisdictions worldwide. Other product and service names 04 “IBM Pathways for GDPR readiness”, IBM White Paper,
might be trademarks of IBM or other companies. A current list September 2017.
of IBM trademarks is available on the web at “Copyright and
trademark information” at www.ibm.com/legal/copytrade.shtml. EWDPJZDQ

This document is current as of the initial date of publication and


may be changed by IBM at any time. Not all offerings are available
in every country in which IBM operates.

The performance data and client examples cited are presented for
illustrative purposes only. Actual performance results may vary
depending on specific configurations and operating conditions.

THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”


WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING
WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR
CONDITION OF NON-INFRINGEMENT. IBM products are
warranted according to the terms and conditions of the
agreements under which they are provided.

You might also like