You are on page 1of 10

Information Integration Blog: Create a QualityStage Match Specification ... http://informationintegrationblog.blogspot.com/2014/07/create-qualitysta...

Blog about the different features of Data Integration products and offerings from IBM Analytics. Disclaimer:
The postings on this site are those of the authors and don’t necessarily represent IBM’s positions, strategies
or opinions

W e d n e s d a y, 9 J u l y 2 0 1 4 Contributors

Alekhya
Create a QualityStage Match Specification in 8 easy
Deepa Yr
steps!!!
Hema Sadagopan
   Kavita Biswas
The QualityStage Match Wizard is a simple interactive tool which can Kishore Patel
be used to create template based match specifications. We just need to Madhavi
answer a few guided questions, make simple selections and we'll be all
Manish Bhide
set!! A basic match specification can be created quickly and easily with a
Niketa Jain
minimal knowledge of matching concepts, Match Designer functionality and
its workflow. Shweta R Sugurmath

Supraja Kakarlapudi
The match specifications created using the Match Wizard serve as a
starting point for many purposes. Customers can use them to learn and Suraj Ranjan Patel

understand match specification creation process and the concept of Suresh Tirumalasetti

matching. They can be used to understand how to choose blocking columns, Vikas Mahajan
match commands, match threshold, reliability and chance agreements (m raolella
probability and u probability) for a given data and configure the test
rawatrjit
environment. Sales Executives can use them in their demos instead of
building the match specifications from the scratch with a minimal learning
Blog Archive
curve involved.
2016 (3)
From IIS version 8.7, Match Wizard is available as an enhancement to
2014 (21)
the Match Designer and it should be noted that it is not an alternate or a
August (2)
substitute for the Match Designer. Once the Match Wizard steps are
July (7)
completed, Match Designer is launched for any further development,
How to read and write data from JDBC
refinement, saving and testing of the wizard generated match specification Connector usi...
so that it can be subsequently used in a match job. Currently, we can use the How to read and write data from
Match Wizard to create match specification for matching the data Greenplum Connecto...

standardized using QualityStage US Name and US Address rule sets. Create a QualityStage Match
Specification in 8 eas...

For the matching process, we need sample data and its frequency Access XMETA repository using SQL
Views
distribution information. It is always recommended to standardize the sample
How to import Metadata using Oracle
data before using it in a match specification as the standardization process Connector from...
ensures uniformity in the data. QualityStage Standardize stage and the rule Steps to Kerborize HDFS in Cloudera
sets can be used to achieve this. The frequency distribution information of Manager and ac...

the sample has a very important role in the matching process. The data that Importing metadata using Greenplum
Connector throu...
is more frequent is less significant while matching as chances of it getting
matched are very high and the vice-versa. The distributions of the sample June (12)

data can be obtained by using the QualityStage Match Frequency Stage.

The Match Designer expects the input sample data and frequency
distributions to be a DataStage dataset file. We can use the sample data and
the predefined jobs that come with the product to standardize the data and
create sample and frequency datasets. Sample data can be found at -

1 of 10 18-11-2018, 08:32:10
Information Integration Blog: Create a QualityStage Match Specification ... http://informationintegrationblog.blogspot.com/2014/07/create-qualitysta...

ISInstallationDirectory/Server/PXEngine/DataQuality/MatchTemplates
/StandardizationInput from IIS version9.1. DataStage Export(dsx) file of the
predefined jobs which can be imported to any DataStage project can be
found at - ISInstallationDirectory\Clients\Samples\DataQuality
\MatchTemplates\Jobs\PredefinedJobs.dsx.This dsx contains match jobs as
well which can be used to deploy the completed match specifications

Steps to create a match specification using Match Wizard:

1. Launch the match wizard


2. Select the Match Form
3. Select the Match Type
4. Select the Match Threshold
5. Select the additional column(s)
6. Configure Test Environment
a. Source data set
b. Frequency data set
c. Database Connection
7. Summary
8. Save the Match Specification in the Match Designer

7.

Let's see each of these steps in detail :

Step # 1: Launch the match wizard:

In the DataStage Designer Client click on File → New → Data


Quality Select Match Specification (Fig 1) .

In the 'Select Match Build Method' dialog, click on 'Help me get


started' link(Fig 2). This will launch the Match Specification Setup
Wizard.

2 of 10 18-11-2018, 08:32:10
Information Integration Blog: Create a QualityStage Match Specification ... http://informationintegrationblog.blogspot.com/2014/07/create-qualitysta...

Let's get familiarized with the Match Wizard design(refer Fig 3)


The Match Specification Setup Wizard is a 3
pane form with
left pane showing the steps that need
to be completed,
center pane showing the options to
choose from and
right pane showing examples and
explanations to help us choose from
the options in the center pane.
Next and Back buttons used to navigate from
one form to the other.
Cancel button used to exit the wizard in any
step.
Finish button used to launch the match
specification in the Match Designer for further
processing once all the required steps are
completed.
Default selections would be provided wherever possible as in the one below.

3 of 10 18-11-2018, 08:32:10
Information Integration Blog: Create a QualityStage Match Specification ... http://informationintegrationblog.blogspot.com/2014/07/create-qualitysta...

Step # 2 - Select the Match Form(refer Fig 3 above)


There are 2 kinds of matching available
Un-duplicate Matching – The option 'Within a single source' is for
creating an Un-duplicate match specification where matching is
done within a data source (generally used to eliminate duplicates in
a source file)
Reference Matching – The option 'One source to another source'
is for creating a Reference match specification where data source
is matched with a reference source (generally used to enrich a
source file from a reference file)
     Appropriate Match Form should be selected according to the requirement.
Now let's continue with the default selection 'Within a single source'.

Step # 3 - Select the Match Type (refer Fig 4)


The Match Wizard provides us with 4 types of matching for each match
form.
Individual Deduplication – This match type helps us
identify duplicate record entries for a person residing in an
address
Individual Householding – This match type helps us
identify duplicate record entries for people residing in an
address
Business Deduplication – This match type helps us
identify duplicate record entries for a business in an
address
Business Householding - This match type helps us identify
duplicate record entries for businesses in an address
Match type should be determined based on the business goal for
matching. In this form too lets continue with the default option selected
'Individual De-duplication'.

4 of 10 18-11-2018, 08:32:10
Information Integration Blog: Create a QualityStage Match Specification ... http://informationintegrationblog.blogspot.com/2014/07/create-qualitysta...

Step # 4 - Select the Match Threshold (refer Fig 5)

Match Tolerance or Match Threshold is determined based whether we


want to be certain about matched records or we want to consider all the
possible or potential matches. Based on the match threshold selected,
predetermined match and clerical cut off values will be assigned for each
match pass.
Lower the Match Threshold – This results in more
matches with lower certainty and false positives. (false
positive meaning records categorized as match records
would be actually non-match records)
Raise the Match Threshold – This results in less matches
with higher certainty and false negatives (false negative
meaning records categorized as non-match would be
actually matched records)
More information on this can be found at http://www-01.ibm.com
/support/knowledgecenter/SSZJPZ_8.5.0/com.ibm.swg.im.iis.qs.ug.doc
/topics/c_Defining_cutoff_values.html?lang=en
Let's continue with the default selection.

Step # 5 - Select the additional column(s) (Optional Step) (refer Fig 6)

We can improve the match results by including more columns in the


matching. For each match type, Match Wizard provides us a set of additional
columns which we can include in the match to get better match results. But,
we can add these columns to the match specification only if the source data
has been standardized with one or more QualityStage rule sets VDATE,
VEMAIL, VPHONE , USTAXID. There is a requirements twisty under each
column which can be expanded to see the conditions to be met to use that
column in the matching. For each additional column selected, an individual
match pass would be created.

5 of 10 18-11-2018, 08:32:10
Information Integration Blog: Create a QualityStage Match Specification ... http://informationintegrationblog.blogspot.com/2014/07/create-qualitysta...

To keep our match specification simple, am not selecting any of the


additional columns here.

Step # 6 - Configure Test Environment (Optional Step)


In order to execute the match specification, the Match Designer
needs the information of from where it can access the sample input data
and reference data (if it is a reference match), frequency distribution of the
sample input and reference data, details of database into which match
results can be stored on successful completion of the execution. Providing
these details is called configuring the test environment.
This step is optional and if we don't intend to complete it now, we can
do it in the Match Designer before executing the match specification. We'll
select the check box for items which we intend to provide the information.
(Fig 7)

6 of 10 18-11-2018, 08:32:10
Information Integration Blog: Create a QualityStage Match Specification ... http://informationintegrationblog.blogspot.com/2014/07/create-qualitysta...

Step # 6a - Source data set(Optional Step) (refer Fig 8)


We need to provide the location of the dataset which contains the sample
input for the Match Designer. Here since we are creating a single source
match specification, we see only one file selection dialog. For a two source
match (reference match) we would see an additional reference input data set
file selection dialog.

Step # 6b - Frequency data set(Optional Step) (refer Fig 9)


We need to provide the location of the dataset which contains the
frequency distribution of the sample input for the Match Designer. Here too
since we are creating a single source match specification, we see only one
file selection dialog. For a two source match (reference match) we would see
an additional reference frequency data set file selection dialog.

7 of 10 18-11-2018, 08:32:10
Information Integration Blog: Create a QualityStage Match Specification ... http://informationintegrationblog.blogspot.com/2014/07/create-qualitysta...

Step # 6c - Database Connection (Optional Step) (refer Fig 10)


         We need to provide the database connection details or the data
connection object which the match designer and the QS server will use to
connect to the match designer results data base.

Step # 7 - Summary (refer Fig 11)


That's it!! We are almost done! The summary of all the selections
made in the Match Wizard will be displayed. Any optional step completed will
have a check mark and those not completed will be greyed out. Finish, Back
and Cancel buttons will be enabled. We can go back to any form and change
any of the selections made and the changes will be reflected in the Summary
form.

Step # 8 - Save the Match Specification in the Match Designer (refer Fig
12)
On clicking the Finish button in the Summary form, Match Designer is
launched with the template generated one source de-duplication match
specification with predetermined default match passes. Each match pass will
be composed of predetermined blocking columns, match commands and
cut-off values set to a lower or higher threshold as per the selection made in
the wizard. Test environment will be populated with the details entered in the
Match Wizard (To open Test Environment window, Under Compose tab, go
to Configure Specification → Test Environment). Save all the match passes
and the match specification with the default names or with the names of your
choice, test them and get the match results.

8 of 10 18-11-2018, 08:32:10
Information Integration Blog: Create a QualityStage Match Specification ... http://informationintegrationblog.blogspot.com/2014/07/create-qualitysta...

Disclaimer: “The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or
opinions.”

Posted by Hema Sadagopan at 03:26

No comments:
Post a Comment

Comment as:

Publish Notify me

Newer Post Home Older Post

Subscribe to: Post Comments (Atom)

9 of 10 18-11-2018, 08:32:10
Information Integration Blog: Create a QualityStage Match Specification ... http://informationintegrationblog.blogspot.com/2014/07/create-qualitysta...

Simple theme. Powered by Blogger.

10 of 10 18-11-2018, 08:32:10