You are on page 1of 9

Data Governance – KT - 1

Data and Information: -

 If data is the atom, information is the matter. Information is the set of data that
has already been processed, analyzed, and structured in a meaningful way to
become useful. Once data is processed and gains relevance, it becomes
information that is fully reliable, certain, and useful.
 The major and fundamental difference between data and information is the
meaning and value attributed to each one. Data is meaningless in itself, but once
processed and interpreted, it becomes information that is filled with meaning.
What is data governance in simple terms?

 Data governance is the process of managing the data in enterprise systems, based
on certain internal data standards and policies.
 Effective data governance ensures that data is consistent and trustworthy and
doesn't get misused.
 Most organizations are feeling increased pressure to drive not just success, but
also innovation through better use of data.
Goals of data management?
1. Accurate
2. Accessible
3. Reliable
4. Consistent
5. Secure
6. Compliant
7. Reusable
Pillars of Information Standard?
1. Data security – By limiting the access to users.
2. Data quality – Information Analyzer
3. Master data management,
4. Meta data management (Reference data management)
Data Governance – KT - 2

Dimensions of data quality

1.Integrity (Validity) – Data which is honest.


 Data integrity affects relationships. For example, a customer profile
includes the customer’s name and one or more customer addresses.
 In case one customer address loses its integrity at some stage in the
data journey, the related customer profile can become incomplete
and invalid.

2.Consistency - Same information stored and used at multiple instances


matches.
 If one enterprise system uses a customer phone number with
international code separately, and another system uses prefixed
international code, these formatting inconsistencies can be resolved
quickly.
 However, if the underlying information itself is inconsistent,
resolving may require verification with another source.
 For example, if a patient record puts the date of birth as May 1st,
and another record shows it as June 1st, you may first need to assess
the accuracy of data from both sources.

3.Timeliness - Duration of the object, being on time for meetings,


loading the data on time.

4.Completeness - Whether the cell contains data or not.


Nulls should not be allowed
 For products or services, completeness can suggest vital attributes
that help customers compare and choose.
 If a product description does not include any delivery estimate, it is
not complete.
 Financial products often include historical performance details for
customers to assess alignment with their requirements.
 Completeness measures if the data is sufficient to deliver
meaningful inferences and decisions.

5.Accuracy - providing the right and precise information with proof.


 Data accuracy is the level to which data represents the real-world
scenario and confirms with a verifiable source.
 An accurate phone number of an employee guarantees that the
employee is always reachable. Inaccurate birth details, on the other
hand, can deprive the employee of certain benefits.
 Accuracy of data is highly impacted on how data is preserved
through its entire journey, and successful data governance can
promote this dimension of data quality.

6.Uniquess - No data should repeat more than once.


 Uniqueness is the most critical dimension for ensuring no
duplication or overlaps.
 Data uniqueness is measured against all records within a data
set(table) or across data sets (various tables).
 A high uniqueness score assures minimized duplicates or overlaps.
Ex: - Social Security Number, Aadhar Number.

Data Governance – KT - 3
Data Hierarchy: -
From top to bottom

DATABASE

DATABASE SCHEMA

DATABASE TABLE

DATABASE COLUMN

DATA

Critical Data Element: -


 A column that has higher data significance.
 Critical data elements are key elements of the information that are
used as criteria for processing searching, finding matching values.
 For example, in many large organizations, customer data may be
made of hundreds of attributes. If our business goal is to reduce
increase constructability then Critical Data Elements may include
elements such as Email Address and Mobile Telephone Number,

 This means that we can focus our data governance and data quality
efforts on just these two elements until our business goal has been
achieved and we extend our Critical Data Elements based on the
next priority.
1.Person critical data elements: -
 Social Security Number
 Mobile Number
2.Organization critical data elements: -
 Name
 Tax Identification Number

Data Governance – KT - 4
Meta data - Information about a data.

 Just imagine that you're looking at a report and the graph is showing a
new division that you don’t know exist so you'll ask yourself where
does this data come from, you'll have to analyze all the hops the data
went through before it landed in your report, this analysis takes up a lot
of effort.
 If this analysis has to be done once in a year maybe the organization
could live with that effort but in a data-rich environment with lots of
data, these questions will be asked multiple times.
 This is where metadata management comes into play. Like I said data
would have several hops into various systems before it landed in your
report and those systems keep a log of the data that is being handled,
these logs are called metadata.
 By tying together different metadata sources you can get a picture of
where the data is moving and how it is changing.

ETL - Extract Transfer Load

 If we had to tie all the different sources of metadata together it would


be nearly impossible due to its high complexity. so, this is where ETL
tools come into play.
 As long as we have industry-standard ETL tools we can draw metadata
from the sources without a massive effort.
 ETL tools extract data from the source, transform the data while in
transit then load the data into the target storage of choice.

Information Steward

 Information stewards are dedicated to managing and integrating data


between software applications to achieve some sort of business
outcome. An information steward may be a single individual or a team.
 It’s also much more than just data compliance or regulation because an
information steward owns the role of ensuring the business value of
data, you can see that it is much more than just using or entering
information.

Subject Matter Expert


 This is an individual who has certain expertise in a particular domain.
“Expertise” is usually broken down into knowledge and skills; either
the SME knows about a particular topic or knows how to get something
done.
 Since we are talking about data, the SME is typically someone who
knows about a particular data topic in the enterprise or how to do a
particular thing with data.
 These people have spent years developing themselves within their
discipline and have developed expertise in their field.

Terms
 Column names would be similar across various tables so terms are
defined to identify each data independently.
 Every column name should have a term.

BIRTH_DATE DATE OF BIRTH Describes the Date of


Technical Term Business Term Birth of the person -Short
Description

TERM CREATION PROCESS


 Define the term
 Submit to architect for evaluation
 File will be created by Subject Matter Expert (SME)
 Information Management Analyst (IMA)
 Publish Business Terms
 Information Governance Catalog (IGC)
Short description contains 256 characters
If it exceeds 256 characters then it comes under long description

Data Governance – KT - 5
Data Profiling: -
We can't fix the quality issues within our data without understanding what the
problems are, the process of diagnosis of your data to find out what exactly are
the data quality issues is called data profiling.
provides accurate feedback on the quality of your data.
Information assets are very critical assets in today's world, if you know how to
leverage your information about your customers, about your suppliers, about
your products, and how it is being used it can help you to create new products
based upon your customer behavior but to trust that information you have to
fully trust the data that you have.

Web Sphere Information Analyzer: -


It is an IBM tool used to find out the issues with your data, data quality
execution is done in IA.
It can be used to perform column analysis, key and cross domain analysis and
baseline analysis.
Column analysis: - We can execute the column analysis, you can analyze any
column exactly what kind of issues that you have with the data like whether
there is invalid data or is there any incorrect data, null values etc. we can find
the structure of our data using this analysis it is also used to find min, max,
length.
Key analysis: - We can also do key analysis what key can be primary keys
what attributes could be additional primary key.
Cross domain analysis: - Cross domain analysis like what are the relationship
between different tables.
Baseline analysis: - You fix some data quality issues and then you can
compare how exactly are you improving with respect to your data quality and
this can be done by base-line analysis.
Binding: - Click the column and set as binding
Column name mentioned in the database will not be connected to the database
unless and otherwise it is successfully binded.
The process of binding Data Definition (DD) with database is known as
binding.

Information Governance Catalog: -

 It is a central repository where all the information assets are stored.


 Whatever we create in IA (data definition, rule logic, rule set etc.) will be
available in IGC, but not everything created in IGC will not be available
in IA.
 Customer will use the IGC to see the result of the execution
 A searchable information glossary.

Steps In Data Quality Execution: -


1.Input from customer
2.Column Analysis
3.Rule logic - Data rule creation
4.Rule set definition – execute the rule logic, to bind all the dimensions into a
single one.
Include the data definition in the rule set definition
Click Run Button to execute
5.View Output: - To see the result click view output.
Result of DQ execution: -
Total met all rule
Percentage of met all rule
Not met all rule
Percentage not met all rule
6.When the execution result’s bench mark is greater than or equal to 99% the
result is considered ok, if it not met, we have to inform the customer, then
customer will work on the particular issue and inform us, after that we will re-run
the DQ execution for which the result’s bench mark has to be greater than or
equal to 99%.
7.Share the result with the customer end
8.Go to IGC to update the mandatory fields in IGC which meet Data Quality
Index (DQI).

You might also like