Professional Documents
Culture Documents
A data warehouse is a system that aggregates data from multiple sources into a single, central,
consistent data store to support data mining, artificial intelligence (AI), and machine learning—
which, ultimately, can enhance sophisticated analytics and business intelligence. Through this
strategic collection process, data warehouse solutions consolidate data from the different sources
to make it available in one unified form.
A data mart (as noted above) is a focused version of a data warehouse that contains a smaller
subset of data important to and needed by a single team or a select group of users within an
organization. A data mart is built from an existing data warehouse (or other data sources)
through a complex procedure that involves multiple technologies and tools to design and
construct a physical database, populate it with data, and set up intricate access and management
protocols.
With its smaller, focused design, a data mart has several benefits to the end user, including the
following:
Cost-efficiency: There are many factors to consider when setting up a data mart, such as the
scope, integrations, and the process to extract, transform, and load (ETL) . However, a data mart
typically only incurs a fraction of the cost of a data warehouse.
Simplified data access: Data marts only hold a small subset of data, so users can quickly
retrieve the data they need with less work than they could when working with a broader data set
from a data warehouse.
Quicker access to insights: Intuition gained from a data warehouse supports strategic
decision-making at the enterprise level, which impacts the entire business. A data mart fuels
business intelligence and analytics that guide decisions at the department level. Teams can
leverage focused data insights with their specific goals in mind. As teams identify and extract
valuable data in a shorter space of time, the enterprise benefits from accelerated business
processes and higher productivity.
Simpler data maintenance: A data warehouse holds a wealth of business information, with
scope for multiple lines of business. Data marts focus on a single line, housing under 100GB,
which leads to less clutter and easier maintenance.
Easier and faster implementation: A data warehouse involves significant implementation
time, especially in a large enterprise, as it collects data from a host of internal and external
sources. On the other hand, you only need a small subset of data when setting up a data mart, so
implementation tends to be more efficient and include less set-up time.
Typically, a data mart is created and managed by the specific business department that intends to
use it. The process for designing a data mart usually comprises the following steps:
With the groundwork done, you can get the most value from a data mart by using specialist
business intelligence tools, such as Qlik or SiSense. These solutions include a dashboard and
visualizations that make it easy to discern insights from the data, which ultimately leads to
smarter decisions that benefit the company.
What is ETL?
ETL, which stands for extract, transform and load, is a data integration process that combines
data from multiple data sources into a single, consistent data store that is loaded into a data
warehouse or other target system.
As the databases grew in popularity in the 1970s, ETL was introduced as a process for
integrating and loading data for computation and analysis, eventually becoming the primary
method to process data for data warehousing projects.
ETL provides the foundation for data analytics and machine learning workstreams. Through a
series of business rules, ETL cleanses and organizes data in a way which addresses specific
business intelligence needs, like monthly reporting, but it can also tackle more advanced
analytics, which can improve back-end processes or end user experiences. ETL is often used by
an organization to:
Extract
During data extraction, raw data is copied or exported from source locations to a staging area.
Data management teams can extract data from a variety of data sources, which can be structured
or unstructured. Those sources include but are not limited to:
SQL or NoSQL servers
CRM and ERP systems
Flat files
Email
Web pages
Transform
In the staging area, the raw data undergoes data processing. Here, the data is transformed and
consolidated for its intended analytical use case. This phase can involve the following tasks:
In this last step, the transformed data is moved from the staging area into a target data
warehouse. Typically, this involves an initial loading of all data, followed by periodic loading of
incremental data changes and, less often, full refreshes to erase and replace data in the
warehouse. For most organizations that use ETL, the process is automated, well-defined,
continuous and batch-driven. Typically, ETL takes place during off-hours when traffic on the
source systems and the data warehouse is at its lowest.
https://www.ibm.com/cloud/learn/etl
Data stewardship comes under the umbrella of data governance. But whereas data
governance establishes high-level policies for protecting data against loss, corruption,
theft, or misuse, data stewardship focuses on making sure those policies are actually
followed.
Data stewards are the persons with responsibility for data stewardship. Some people
are assigned “data steward” as a formal title. Others assume the role in addition to their
regular jobs. Either way, the role is indispensable, as data stewards are basically “data
ambassadors” between the data team and the user community, with the ultimate goal
of empowering users with trusted data.
When users don’t trust the data, they aren’t confident about leveraging it to make
business decisions or to drive operations. In worst-case scenarios, data of substandard
or inconsistent quality can steer organizations in the wrong strategic direction, with
disastrous business results. Data stewards prevent this from happening. By
establishing consistent data definitions, maintaining business and technical rules, and
monitoring and auditing the reliability of the data, they ensure high levels of data quality,
integrity, availability, trustworthiness, and privacy protection.
It’s also important to note that there are two sides to data stewardship. One is
defensive: to guard against the regulatory and reputational risks that come with owning
data. To that end, data stewards are champions for information governance within their
organizations. They evangelize the reasons for protecting data, and deliver education,
training, and mentorship to the workforce.
At the same time, data stewards are the key drivers of the use of data for strategic
advantage, and they promote improvements to the business process that create and
consume data. For this reason, data stewards must be experts in the business units
they serve. They constantly work to inspire users to make the most out of the data—
consistently, accurately, and safely—to make smarter business decisions each day.
Over time, with strong data stewards in place, employees perform better in their jobs.
They make fewer errors. They contact the right customers for upselling and cross-
selling. They prioritize the right business initiatives. And they do all this while
following data governance policies and processes.
WHAT IS A DATA QUALITY AUDIT? ENSURING YOUR
DATA INTEGRITY
Posted on November 26, 2019 by DataOpsZone
Companies maintain huge volumes of data. This data may contain codes, test data, and financial
information as well as a customer database. However, maintaining this data can be costly. This is
because for storing huge data you have to get the services of a data center. The worst case being
you may have many pieces of unused data like former customer information, for example. You
may also have archived audit reports, which are no longer of any use. But how do you check
what kind of data is useful and what isn’t? The answer is by carrying out a data quality audit.
In this post, we’ll discuss what a data quality audit is. We’ll also explore the process and
importance of a data quality audit. Additionally, we’ll learn about the tools you can use for this
job and how the process will ensure your data integrity. So, let’s get started.
As I said earlier, companies have a huge volume of data. Before you use any such data, auditing
is a must-have to ensure that the data set fulfills your goal. In layman’s terms, a data quality audit
involves checking some key metrics. The goal is to find out whether the data set has the required
quality. Once the audit team gives the green light, only then can you use the data for your task.
All clear with the definition? Let’s move on to how a data quality audit works.
In the previous section, we discussed the DM SME and UC SME rules that the audit team
implement during data quality audits. The audit reports find out silos and issues that make your
data unsafe and irrelevant. Besides, the reports also point out the areas of improvement. Suppose
you’re not using the latest encryption technique. Or, what if you’re not aware of the latest GDPR
rules? The audit reports find that out and allow you to correct your data set and storage rules—
thus improving the overall data quality and ensuring data integrity.
TOOLS USED FOR A DATA QUALITY AUDIT
Companies nowadays have huge volumes of data. If your goal is to improve the data quality,
manually analyzing the data is next to impossible. The only solution is to deploy a tool that can
check the data and analyze its quality. Let’s discuss what to look for when choosing a data
quality audit tool.
H O W TO JU D G E A GO OD D A TA QU A LITY AU D IT TO O L
So, we know why companies need a data quality audit tool. Now, let’s discuss how to judge
whether a tool is useful for your company. Apart from finding issues related to data quality, a
data quality audit tool must help in the following:
Before choosing a data quality audit tool, check if the tool is able to perform the above-
mentioned tasks.