You are on page 1of 8

What is a data mart?

A data mart is a subset of a data warehouse focused on a particular line of business, department,


or subject area. Data marts make specific data available to a defined group of users, which allows
those users to quickly access critical insights without wasting time searching through an entire
data warehouse. For example, many companies may have a data mart that aligns with a specific
department in the business, such as finance, sales, or marketing.

Data mart vs. data warehouse


Data marts, data warehouses, and data lakes are crucial central data repositories, but they serve
different needs within an organization.

A data warehouse is a system that aggregates data from multiple sources into a single, central,
consistent data store to support data mining, artificial intelligence (AI), and machine learning—
which, ultimately, can enhance sophisticated analytics and business intelligence. Through this
strategic collection process, data warehouse solutions consolidate data from the different sources
to make it available in one unified form. 

A data mart (as noted above) is a focused version of a data warehouse that contains a smaller
subset of data important to and needed by a single team or a select group of users within an
organization. A data mart is built from an existing data warehouse (or other data sources)
through a complex procedure that involves multiple technologies and tools to design and
construct a physical database, populate it with data, and set up intricate access and management
protocols.

While it is a challenging process, it enables a business line to discover more-focused insights


quicker than working with a broader data warehouse data set. For example, marketing teams may
benefit from creating a data mart from an existing warehouse, as its activities are usually
performed independently from the rest of the business. Therefore, the team doesn’t need access
to all enterprise data.

Benefits of a data mart


Data marts are designed to meet the needs of specific groups by having a comparatively narrow
subject of data. And while a data mart can still contain millions of records, its objective is to
provide business users with the most relevant data in the shortest amount of time. 

With its smaller, focused design, a data mart has several benefits to the end user, including the
following:

 Cost-efficiency: There are many factors to consider when setting up a data mart, such as the
scope, integrations, and the process to extract, transform, and load (ETL) . However, a data mart
typically only incurs a fraction of the cost of a data warehouse.
 Simplified data access: Data marts only hold a small subset of data, so users can quickly
retrieve the data they need with less work than they could when working with a broader data set
from a data warehouse.
 Quicker access to insights: Intuition gained from a data warehouse supports strategic
decision-making at the enterprise level, which impacts the entire business. A data mart fuels
business intelligence and analytics that guide decisions at the department level. Teams can
leverage focused data insights with their specific goals in mind. As teams identify and extract
valuable data in a shorter space of time, the enterprise benefits from accelerated business
processes and higher productivity.
 Simpler data maintenance: A data warehouse holds a wealth of business information, with
scope for multiple lines of business. Data marts focus on a single line, housing under 100GB,
which leads to less clutter and easier maintenance.
 Easier and faster implementation: A data warehouse involves significant implementation
time, especially in a large enterprise, as it collects data from a host of internal and external
sources. On the other hand, you only need a small subset of data when setting up a data mart, so
implementation tends to be more efficient and include less set-up time.

Who uses a data mart (and how)?


Data marts guide important business decisions at a departmental level. For example, a marketing
team may use data marts to analyze consumer behaviors, while sales staff could use data marts to
compile quarterly sales reports. As these tasks happen within their respective departments, the
teams don't need access to all enterprise data.

Typically, a data mart is created and managed by the specific business department that intends to
use it. The process for designing a data mart usually comprises the following steps:

1. Document essential requirements to understand the business and technical needs of


the data mart.
2. Identify the data sources your data mart will rely on for information.
3. Determine the data subset, whether it is all information on a topic or specific fields at
a more granular level.
4. Design the logical layout for the data mart by picking a schema that correlates with the
larger data warehouse.

With the groundwork done, you can get the most value from a data mart by using specialist
business intelligence tools, such as Qlik or SiSense. These solutions include a dashboard and
visualizations that make it easy to discern insights from the data, which ultimately leads to
smarter decisions that benefit the company.

What is ETL?
ETL, which stands for extract, transform and load, is a data integration process that combines
data from multiple data sources into a single, consistent data store that is loaded into a data
warehouse or other target system.
As the databases grew in popularity in the 1970s, ETL was introduced as a process for
integrating and loading data for computation and analysis, eventually becoming the primary
method to process data for data warehousing projects.

ETL provides the foundation for data analytics and machine learning workstreams. Through a
series of business rules, ETL cleanses and organizes data in a way which addresses specific
business intelligence needs, like monthly reporting, but it can also tackle more advanced
analytics, which can improve back-end processes or end user experiences. ETL is often used by
an organization to: 

 Extract data from legacy systems


 Cleanse the data to improve data quality and establish consistency
 Load data into a target database

How ETL works


The easiest way to understand how ETL works is to understand what happens in each step of the
process.

Extract

During data extraction, raw data is copied or exported from source locations to a staging area.
Data management teams can extract data from a variety of data sources, which can be structured
or unstructured. Those sources include but are not limited to:

 SQL or NoSQL servers
 CRM and ERP systems
 Flat files
 Email
 Web pages

Transform

In the staging area, the raw data undergoes data processing. Here, the data is transformed and
consolidated for its intended analytical use case. This phase can involve the following tasks:

 Filtering, cleansing, de-duplicating, validating, and authenticating the data.


 Performing calculations, translations, or summarizations based on the raw data. This can
include changing row and column headers for consistency, converting currencies or other units
of measurement, editing text strings, and more.
 Conducting audits to ensure data quality and compliance
 Removing, encrypting, or protecting data governed by industry or governmental regulators
 Formatting the data into tables or joined tables to match the schema of the target data
warehouse.
Load

In this last step, the transformed data is moved from the staging area into a target data
warehouse. Typically, this involves an initial loading of all data, followed by periodic loading of
incremental data changes and, less often, full refreshes to erase and replace data in the
warehouse. For most organizations that use ETL, the process is automated, well-defined,
continuous and batch-driven. Typically, ETL takes place during off-hours when traffic on the
source systems and the data warehouse is at its lowest.

https://www.ibm.com/cloud/learn/etl

Data stewardship is the collection of practices that ensure an organization’s data is


accessible, usable, safe, and trusted. It includes overseeing every aspect of the data
lifecycle: creating, preparing, using, storing, archiving, and deleting data, in accordance
with an organization’s established data governance principles for promoting data
quality and integrity.

Data stewardship encompasses:

 Knowing what data an organization possesses


 Understanding where that data is located
 Ensuring that the data is accessible, usable, safe, and trusted
 Safeguarding the transparency and accuracy of data lineage
 Enforcing rules and regulations on how data can be used
 Helping the organization make most of its data for competitive advantage
 Driving toward a data-driven culture
 Being an advocate for trusted data

Data stewardship comes under the umbrella of data governance. But whereas data
governance establishes high-level policies for protecting data against loss, corruption,
theft, or misuse, data stewardship focuses on making sure those policies are actually
followed.

Data stewards are the persons with responsibility for data stewardship. Some people
are assigned “data steward” as a formal title. Others assume the role in addition to their
regular jobs. Either way, the role is indispensable, as data stewards are basically “data
ambassadors” between the data team and the user community, with the ultimate goal
of empowering users with trusted data.

Why is Data Stewardship Important?


Data is swiftly overtaking physical assets in terms of value to organizations. Keeping
data safe, private, consistent, and of high quality is as important to enterprises today as
maintaining factory machinery was in the industrial age. Without trusted data,
organizations end up with messy and unreliable heaps of information sitting in multiple
databases, platforms, and even individual spreadsheets.

When users don’t trust the data, they aren’t confident about leveraging it to make
business decisions or to drive operations. In worst-case scenarios, data of substandard
or inconsistent quality can steer organizations in the wrong strategic direction, with
disastrous business results. Data stewards prevent this from happening. By
establishing consistent data definitions, maintaining business and technical rules, and
monitoring and auditing the reliability of the data, they ensure high levels of data quality,
integrity, availability, trustworthiness, and privacy protection.

Managing data lineage is an especially important part of data stewardship. Data lineage


is the lifecycle of a piece of data: where it originates, what happens to it, what is done to
it, and where it moves over time. With visibility into data lineage, including the
accompanying business context, data stewards can trace any errors or problems when
using data—say, for analytics—back to their root causes.

Because data stewardship is so important, data stewards occupy positions of trust. In


fact, for data stewardship to succeed, both technical staff and business professionals
must have the utmost confidence in the their organization’s data stewards. Such people
form a bridge between data professionals and the community of people who use the
data. Because of this, data stewards must have both a big-picture view of how the
organization works as well as a strong grasp of the down-to-earth details of how data is
created, managed, manipulated, stored, and—most importantly—how it’s used.

It’s also important to note that there are two sides to data stewardship. One is
defensive: to guard against the regulatory and reputational risks that come with owning
data. To that end, data stewards are champions for information governance within their
organizations. They evangelize the reasons for protecting data, and deliver education,
training, and mentorship to the workforce.

At the same time, data stewards are the key drivers of the use of data for strategic
advantage, and they promote improvements to the business process that create and
consume data. For this reason, data stewards must be experts in the business units
they serve. They constantly work to inspire users to make the most out of the data—
consistently, accurately, and safely—to make smarter business decisions each day.
Over time, with strong data stewards in place, employees perform better in their jobs.
They make fewer errors. They contact the right customers for upselling and cross-
selling. They prioritize the right business initiatives. And they do all this while
following data governance policies and processes.
WHAT IS A DATA QUALITY AUDIT? ENSURING YOUR
DATA INTEGRITY
Posted on November 26, 2019 by DataOpsZone

Companies maintain huge volumes of data. This data may contain codes, test data, and financial
information as well as a customer database. However, maintaining this data can be costly. This is
because for storing huge data you have to get the services of a data center. The worst case being
you may have many pieces of unused data like former customer information, for example. You
may also have archived audit reports, which are no longer of any use. But how do you check
what kind of data is useful and what isn’t? The answer is by carrying out a data quality audit.

In this post, we’ll discuss what a data quality audit is. We’ll also explore the process and
importance of a data quality audit. Additionally, we’ll learn about the tools you can use for this
job and how the process will ensure your data integrity. So, let’s get started.

WHAT IS A DATA QUALITY AUDIT?


Before discussing what a data quality audit is, we should know what auditing is. Generally,
auditing refers to inspecting an item or a process. The aim is to find out whether the properties of
that item fulfill the required criteria of the customer. For instance, suppose you’re executing a
data center security audit. During the audit, the team checks if the security protocols of the data
center match the processes written in the contract. Let’s move on to data quality audits.

As I said earlier, companies have a huge volume of data. Before you use any such data, auditing
is a must-have to ensure that the data set fulfills your goal. In layman’s terms, a data quality audit
involves checking some key metrics. The goal is to find out whether the data set has the required
quality. Once the audit team gives the green light, only then can you use the data for your task.
All clear with the definition? Let’s move on to how a data quality audit works.

WHY IS THE DATA QUALITY AUDIT


IMPORTANT TO ENSURE DATA INTEGRITY?
What is data integrity? It’s a metric that checks if your data is accurate and consistent. This
metric also checks if your data is safe and secure and complies with data privacy regulation
policies. But how does a data quality audit ensure data integrity?

In the previous section, we discussed the DM SME and UC SME rules that the audit team
implement during data quality audits. The audit reports find out silos and issues that make your
data unsafe and irrelevant. Besides, the reports also point out the areas of improvement. Suppose
you’re not using the latest encryption technique. Or, what if you’re not aware of the latest GDPR
rules? The audit reports find that out and allow you to correct your data set and storage rules—
thus improving the overall data quality and ensuring data integrity.
TOOLS USED FOR A DATA QUALITY AUDIT
Companies nowadays have huge volumes of data. If your goal is to improve the data quality,
manually analyzing the data is next to impossible. The only solution is to deploy a tool that can
check the data and analyze its quality. Let’s discuss what to look for when choosing a data
quality audit tool.

H O W TO JU D G E A GO OD D A TA QU A LITY AU D IT TO O L
So, we know why companies need a data quality audit tool. Now, let’s discuss how to judge
whether a tool is useful for your company. Apart from finding issues related to data quality, a
data quality audit tool must help in the following:

 Analyze the company’s data


 Access data management and reporting
 Help in checking the system’s ability to report and collect data
 Verify data frequently and evaluate data quality
 Check whether proper data quality management systems and tools are applied in the
system

Before choosing a data quality audit tool, check if the tool is able to perform the above-
mentioned tasks.

Currently, commercial data mining tools use


several
common techniques to extract knowledge.
These include
association rules, clustering, neural
networks,
sequencing, and statistical analysis.
◼ Also used are decision trees, which are a
representation
of the rules used in classification or
clustering, and
statistical analyses, which may include
different types of
regression and many other techniques.
◼ Some commercial products use advanced
techniques
such as genetic algorithms, case-based
reasoning,
Bayesian networks, nonlinear regression,
combinatorial
optimization, pattern matching, and fuzzy
logic

You might also like