You are on page 1of 12

Report Lab 4 SQL Server Data Quality

Services

Realized by:
ANJAR Fatima
EL MECHRY Mariem

School Year 2023-2024


Objectives

The objective of this lab is to explore the capabilities of SQL Server Integration
Services (SSIS) for data quality by:

​ -Profiling data using SSIS to understand and analyze source data effectively.
​ -Cleansing data using SSIS and Data Quality Services (DQS) to improve data
quality.
SSIS and Data Quality:
SSIS provides tools for performing ETL processes on data. In this lab, we focus on
two key SSIS tasks: Data Profiling and DQS Cleansing. These tasks help understand
and automate data cleansing processes.

Data Profiling with SSIS


Data profiling with SSIS gives us the ability to analyze source data effectively and
understand it better. It prevents data quality problems before integration into the
data warehouse.

1. Setup Data Profiling Task:


Configure profiles to compute.

Save profile output in XML format.


setting the OverwriteDestination property to True to make it possible to re-execute the package
multiple times

2. Analyzing Profile Viewer Profiles:

After setting up profiles and saving them, we will use the Data Profile Viewer to
analyze profiles.

In the execution of this task, the following steps were performed within SQL Server
Data Tools (SSDT) to profile data in the DQS_STAGING_DATA database:

​ Connection to SQL Server Instance:


Windows authentication was used to establish a connection to the SQL Server
instance.

The DQS_STAGING_DATA database was selected as the target database for the
profiling task.

​ Return to Profiling Form:

After selecting the database we returned to the Simple Table Quick Profiling Form
dialog box.

The "CustomersDirty" view was chosen from the "Table Or View" drop-down list.

Configuration of Profiling Options:

● The first four checkboxes were left selected (default settings).


● The "Candidate Key Profile" checkbox was cleared.
● The "Column Pattern Profile" checkbox was selected.

Configuration of Profile Types:


In the Data Profiling Task Editor window:

Different profiles were selected from the "Profile Type" list on the right.

For the "Column Value Distribution Profile Request," the column property was
changed from (*) to "Occupation" to profile this specific column.

The "ValueDistributionOption" property for this request was set to "All-Values."

The column property for the "Column Pattern Profile Request" was changed from (*)
to "EmailAddress."
This data profiling exercise aims to uncover anomalies, patterns, and potential data
quality issues within the dataset, offering valuable insights for subsequent data
cleansing and improvement efforts.
To do so, we performed the following tasks:
● Open the file in the Data Profile Viewer.

In the following step, we examined the distribution of values within the "Occupation"
column, with a focus on identifying and investigating values that occur with very low
frequency, such as the "Professional" value. The drill-down functionality provided us
with information such as the count of the value and its frequency in percentage that
we can compare with the other values. “Professional” is clearly a typo.
​ Then we proceeded by examining Column Pattern Profiles for EmailAddress:
the regular expression patterns displayed for this column are indeed the ones
defined previously when preparing knowledge base in earlier labs.

Then we reviewed the other profile results before completing the analyze step

The data profiling exercise provided valuable insights into the distribution, patterns,
and potential anomalies within the dataset. These findings are crucial for
subsequent data cleansing and improvement efforts, ensuring a higher level of data
quality in the Dim Customers data.
Data Cleansing with SSIS:
The objective of the following part is to demonstrate the process of data cleansing
using SSIS DQS Cleansing transformation. This involves preparing data and building
an SSIS package to cleanse the dirty data.

Preparation of Clean and Dirty Data Tables:

Creation of a clean customer data table (CustomersCleanT)

and populated it with every tenth customer from the DimCustomer table.

Creation of a dirty customer data table (CustomersDirtyT) with additional columns


for tracking updates, and populated it with data from DimCustomer with
modifications.
Checked the dirty data after changes to ensure approximately 40% of the data was
updated.

Updated a specific row in the dirty data table (CustomersDirtyT) to create a scenario
for correction with the DQS Cleansing transformation.
Created two staging tables (CustomersDirtyMatchT and CustomersDirtyNoMatchT) to
store the results of the cleansing process

Added another new table in the dbo schema and named it CustomersDirtyNOMatchT using the
same schema as for the previous table.

Developed an SSIS package (DQSCleansing) with a data flow task.

Configured an OLE DB connection manager to the DQS_STAGING_DATA database.


Configured an OLE DB source to read from the CustomersDirtyT table.

Added the DQS Cleansing transformation (CleanseStreetAddress) to correct the


StreetAddress column.

Summary
This lab demonstrated the practical use of SSIS for data profiling and cleansing. The
Data Profiling task helped identify potential data quality issues, while the DQS
Cleansing transformation improved the quality of data through corrections. The
exercises provided hands-on experience in configuring SSIS tasks, analyzing profiles,
and executing data cleansing processes.

The SSIS and DQS functionalities showcased in this lab offer valuable tools for
maintaining data quality and integrity in SQL Server environments.

You might also like