You are on page 1of 4

Data Catalog Value Index

Assumptions Guide
Document Purpose
This document is intended to help Alation’s customers and prospects understand the assumptions needed for
the Data Catalog Value Index model.

General Assumptions
The following assumptions are the inputs for our DCVI model. These are the only inputs needed along with
Alation Analytics data from a customer in order to generate a DCVI report. Prospects only need the applicable
assumptions and to run a simulator to project value for the first year of their program.

1. Customer Stage (only required for customers)


○ This is a customer’s perception of where they are in the maturity of adoption across the
enterprise. It is not currently used in the model and can be ignored. In the future it will be used
to drive benchmark comparisons.

2. Estimated Growth Rate (only required for prospects)


○ This is an estimate of the rate of adoption and therefore usage. The values Low, Med, High
effect multipliers that are based on observed growth patterns within our customer base. A
simulation can be rerun for each growth rate in order to analyze ‘worst’, ‘best’, and ‘most likely’
scenarios.

3. Total Data Assets (only required for prospects)


○ This is the number of assets to be loaded into the catalog. An asset is essentially all objects in
the catalog, which include the number of data sources, schemas, tables, columns, queries,
glossary terms, files, BI reports, metrics, policies, etc. For most, the largest number will be
columns but all asset types should be considered and included.

4. Total Active Users (only required for prospects)


○ This is an estimate of the number of catalog users.

Alation DCVI Assumption Guide 1


5. Average Hourly Employee Rate
○ This is the fully loaded hourly cost of an employee. Clearly, this will vary by catalog user.
○ Customers should use an average that they can defend if challenged internally.
○ A prospect could rerun the simulator for each role type using the correct active user count and
hourly rate, then total the results. In most cases, it's easier to simply use an average hourly rate
and run the simulator once.

6. Intrinsic Value per Cataloged Asset (only required for customers)


○ Alation's most fundamental value proposition is making data searchable. This assumption is the
monetary value of each asset cataloged.
○ This is the average value of all the assets cataloged in Alation. This includes database objects,
articles, BI objects, etc. Each packet of data (datum) has some value. For example, a BI report
written in RStudio might be worth 2-3 orders of magnitude more than an ID column for a table
but it is a non-zero number for both. See the caveat at the end of the document.
○ Since this number is skewed in favor of columns (they are the most numerous), our guidance
for this value is in the range of $0.02 to $0.05 per object.

7. Consumption Value per Asset View (only required for customers)


○ Each time a user consumes metadata exposed by Alation, there is a realization of some
baseline value. This assumption is the monetary value of each act of viewing an asset in
Alation. The concept is similar to cost-per-click in online advertising. Here it is value-per-click.
Our guidance for this value is in the range of $0.05 to $0.07.

8. Average Time Saved per Search (hr)


○ This value is the estimated time (in hours) an active user would spend looking for data without
the information cataloged in Alation.
○ There are two ways to obtain this estimate:
i. Survey user population for average time it took before Alation and after Alation to find
information.
ii. Use heuristic knowledge of the organization to approximate this number.

Alation DCVI Assumption Guide 2


9. Average Analysis/Comprehension Time Saved per Search (hr)
○ This value is the estimated time (in hours) an active user would spend trying to understand the
data without the metadata present in Alation.
○ There are two ways to obtain this estimate:
i. Survey user population for average time it took before Alation and after Alation to
comprehend the information.
ii. Use heuristic knowledge of the organization to approximate this number.

10. Average Time Saved per Published Query Execution (hr)


○ This value is the estimated average time (in hours) saved by an active user for each published
query in Alation.
○ Published queries serve a number of purposes. Query forms, which target
business/non-technical users may save dozens of hours whereas a scheduled ETL job saves
dozens of minutes.
○ The mixture of these two types of use-cases (i.e. published queries run by people vs published
queries scheduled for automatic runs) can be blended using a simple weighted average.
○ Therefore, a customer that primarily uses compose as a scheduling engine will have a lower
value for this assumption and a customer leveraging query forms for their business teams will
have a higher value.

Caveat: The assumptions are trying to decompose complex measures into a single number. The real life data for assumptions
#8-10 take the shape of something called the Generalized Pareto Distribution (GPD) a.k.a. 80/20 or 90/10 rule. Meaning that most
searches will save you time on the order of minutes or hours and some searches will save you days of effort. Keep in mind that
these assumptions are the averages of their respective distributions of data.

Advanced Assumptions (applicable only to customers)


Advanced assumptions impact the data governance metrics of the DCVI model. They require that the
customer has implemented governance processes which ensure that curation standards are being used to
track the timeliness, completeness, and conformance of asset titles and descriptions.

Some of these metrics also require data to be supplied via a direct Alation Analytics connection and extract
instead of using the Compose query method of extract.

In most cases, the majority of the value generated by DVCI will be related to baseline and analyst productivity,
so these data governance specific assumptions can be ignored until a customer has a very mature program.

Alation DCVI Assumption Guide 3


1. Maximum Asset Age (days)
○ This value is the limit you wish to place on maximum time in days since the last update on
information in the catalog. Information is never static. This assumption calculates the timeliness
of information in the catalog.

2. Title chars not allowed


○ Any characters not permitted in the title of assets.

3. Title Case
○ Allowed character set for titles: upper, lower, mixed

4. Title Can Start With Number


○ Indication if a title is allowed to start with a number (True/False)

5. Minimum Description Length


○ The minimum permissible length of a description (0 = no minimum).

6. Description Can Start With Number


○ Indication if a description is allowed to start with a number (True/False)

Alation DCVI Assumption Guide 4

You might also like