You are on page 1of 1096

Siperian Hub

Administrator Guide
© 2008 Siperian, Inc.
Copyright 2008 Siperian Inc. [Unpublished - rights reserved under the Copyright Laws of the United
States]
Siperian and the Siperian logo are trademarks or registered trademarks of Siperian, Inc. in the US and
other countries. All other products or services mentioned are the trademarks or service marks of their
respective companies or organizations.
THIS DOCUMENTATION CONTAINS CONFIDENTIAL INFORMATION AND TRADE
SECRETS OF SIPERIAN, INC. USE, DISCLOSURE OR REPRODUCTION IS PROHIBITED
WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF SIPERIAN, INC.
Revised: February 20, 2009
Contents
Preface
Intended Audience .......................................................................................................................................xxv
Organization...................................................................................................................................................xxv
Learning About Siperian Hub ..................................................................................................................xxviii
Contacting Siperian ......................................................................................................................................xxxi

Part 1: Introduction
Chapter 1: Introduction
About Siperian Hub Administrators...............................................................................................................4
Phases in Siperian Hub Administration .........................................................................................................4
Startup Phase............................................................................................................................................4
Configuration Phase................................................................................................................................5
Production Phase.....................................................................................................................................5
Summary of Administration Tasks .................................................................................................................6
Setting Up Security ..................................................................................................................................6
Building the Data Model ........................................................................................................................7
Configuring the Data Flow ....................................................................................................................9
Executing Siperian Hub Processes .....................................................................................................13
Configuring Hierarchies .......................................................................................................................14
Configuring Workflow Integration.....................................................................................................14
Other Administration Tasks ................................................................................................................15

Chapter 2: Getting Started with the Hub Console


About the Hub Console .................................................................................................................................18
Starting the Hub Console...............................................................................................................................19

iii
Navigating the Hub Console......................................................................................................................... 24
Toggling Between the Processes and Workbenches Views............................................................ 24
Starting a Tool in the Workbenches View ........................................................................................ 27
Acquiring Locks to Change Settings in the Hub Console .............................................................. 28
Changing the Target Database............................................................................................................ 31
Logging in as a Different User............................................................................................................ 32
Changing the Password for a User ..................................................................................................... 32
Using the Navigation Tree in the Navigation Pane ......................................................................... 33
Adding, Editing and Removing Objects Using Command Buttons............................................. 43
Customizing the Hub Console Interface........................................................................................... 45
Showing Version Details...................................................................................................................... 47
Siperian Hub Workbenches and Tools........................................................................................................ 48
Tools in the Configuration Workbench ............................................................................................ 48
Tools in the Model Workbench.......................................................................................................... 49
Tools in the Security Access Manager Workbench ......................................................................... 50
Tools in the Data Steward Workbench ............................................................................................. 50
Tools in the Utilities Workbench ....................................................................................................... 51

Part 2: Building the Data Model


Chapter 3: About the Hub Store
Databases in the Hub Store........................................................................................................................... 56
How Hub Store Databases Are Related ...................................................................................................... 57
Creating Hub Store Databases...................................................................................................................... 58
Version Requirements .................................................................................................................................... 58

Chapter 4: Configuring Operational Record Stores and Datasources


Before You Begin............................................................................................................................................ 60
About the Databases Tool............................................................................................................................. 60
Starting the Databases Tool .......................................................................................................................... 61
Configuring Operational Record Stores ...................................................................................................... 62

iv Siperian Hub Administrator Guide


Registering an ORS ...............................................................................................................................62
Editing ORS Registration Properties..................................................................................................67
Editing ORS Properties ........................................................................................................................69
Testing ORS Connections....................................................................................................................71
Changing Passwords .............................................................................................................................72
Changing an ORS to Production Mode.............................................................................................75
Unregistering an ORS ...........................................................................................................................76
Configuring Datasources................................................................................................................................77
About Datasources................................................................................................................................77
Managing Datasources in WebLogic ..................................................................................................77
Creating Datasources ............................................................................................................................77
Removing Datasources .........................................................................................................................78

Chapter 5: Building the Schema


Before You Begin ............................................................................................................................................82
About the Schema ...........................................................................................................................................82
Types of Tables in an Operational Record Store .............................................................................83
Requirements for Defining Schema Objects.....................................................................................87
Starting the Schema Manager ........................................................................................................................90
Configuring Base Objects...............................................................................................................................92
About Base Objects...............................................................................................................................92
Relationships Between Base Objects and Other Tables in the Hub Store...................................93
Process Overview for Defining Base Objects...................................................................................94
Base Object Columns ...........................................................................................................................95
Cross-Reference Tables ........................................................................................................................97
History Tables ......................................................................................................................................100
Base Object Properties .......................................................................................................................101
Creating Base Objects.........................................................................................................................107
Editing Base Object Properties .........................................................................................................108
Configuring Custom Indexes for Base Objects ..............................................................................111
Viewing the Impact Analysis of a Base Object...............................................................................115
Deleting Base Objects.........................................................................................................................116
Configuring Dependent Objects.................................................................................................................117

Contents v
About Dependent Objects ................................................................................................................ 117
How Dependent Objects Are Related to Base Objects and Cross-reference Tables .............. 118
Process Overview for Defining Dependent Objects .................................................................... 119
Dependent Object Columns ............................................................................................................. 120
Creating Dependent Objects............................................................................................................. 121
Editing Dependent Objects............................................................................................................... 123
Deleting Dependent Objects............................................................................................................. 125
Configuring Columns in Tables.................................................................................................................. 125
About Columns ................................................................................................................................... 126
Navigating to the Column Editor..................................................................................................... 131
Adding Columns ................................................................................................................................. 134
Importing Column Definitions From Another Table................................................................... 135
Editing Column Properties................................................................................................................ 137
Changing the Column Display Order .............................................................................................. 139
Deleting Columns ............................................................................................................................... 139
Configuring Foreign-Key Relationships Between Base Objects ........................................................... 140
About Foreign Key Relationships .................................................................................................... 140
Parent and Child Base Objects ......................................................................................................... 142
Process Overview for Defining Foreign-Key Relationships........................................................ 143
Adding Foreign-Key Relationships .................................................................................................. 143
Editing Foreign-Key Relationships .................................................................................................. 145
Configuring Lookups for Foreign-Key Relationships................................................................... 147
Deleting Foreign-Key Relationships ................................................................................................ 147
Viewing Your Schema .................................................................................................................................. 148
Starting the Schema Viewer............................................................................................................... 148
Zooming In and Out of the Schema Diagram ............................................................................... 150
Switching Views of the Schema Diagram ....................................................................................... 152
Navigating to Related Design Objects and Batch Jobs................................................................. 155
Configuring Schema Viewer Options .............................................................................................. 156
Saving the Schema Diagram as a JPG Image ................................................................................. 157
Printing the Schema Diagram ........................................................................................................... 158

vi Siperian Hub Administrator Guide


Chapter 6: Configuring Queries and Packages
Before You Begin ..........................................................................................................................................161
About Queries and Packages .......................................................................................................................162
Configuring Queries......................................................................................................................................162
About Queries......................................................................................................................................162
Starting the Queries Tool ...................................................................................................................164
Configuring Query Groups................................................................................................................164
Configuring Queries............................................................................................................................166
Configuring Custom Queries.............................................................................................................190
Viewing the Results of Your Query..................................................................................................193
Viewing the Query Impact Analysis .................................................................................................194
Removing Queries ...............................................................................................................................195
Configuring Packages....................................................................................................................................196
About Packages....................................................................................................................................196
Starting the Packages Tool .................................................................................................................198
Adding Packages..................................................................................................................................199
Modifying Package Properties ...........................................................................................................201
Refreshing Packages After Changing Queries.................................................................................203
Specifying Join Queries.......................................................................................................................204
Removing Packages.............................................................................................................................204

Chapter 7: State Management


Before You Begin ..........................................................................................................................................206
About State Management in Siperian Hub................................................................................................206
About System States............................................................................................................................206
About the Hub State Indicator..........................................................................................................207
Protecting Pending Records Using the Interaction ID .................................................................208
State Transition Rules for State Management .................................................................................208
Hub States and Base Object Record Value Survivorship..............................................................211
Configuring State Management for Base Objects ....................................................................................211
Enabling State Management ..............................................................................................................211
Enabling the History of Cross-Reference Promotion ...................................................................213

Contents vii
Enabling Match on Pending Records .............................................................................................. 214
Enabling Message Queue Triggers for State Changes................................................................... 215
Modifying the State of Records .................................................................................................................. 216
Promoting Records in the Data Steward Tools ............................................................................. 216
Promoting Records Using the Promote Batch Job ....................................................................... 218
Rules for Loading Data ................................................................................................................................ 221

Chapter 8: Configuring Hierarchies


About Configuring Hierarchies .................................................................................................................. 224
Before You Begin................................................................................................................................ 224
Overview of Configuration Steps..................................................................................................... 225
Preparing Your Data for Hierarchy Manager ................................................................................. 225
Use Case Example of How to Prepare Data for HM ................................................................... 227
Starting the Hierarchies Tool ...................................................................................................................... 234
Creating the HM Repository Base Objects..................................................................................... 235
Uploading Default Entity Icons........................................................................................................ 237
Upgrading From Previous Versions of Hierarchy Manager......................................................... 237
Configuring Entity Icons ................................................................................................................... 238
Configuring Entity Objects and Entity Types................................................................................ 240
Configuring Hierarchies............................................................................................................................... 253
About Hierarchies ............................................................................................................................... 253
Adding Hierarchies ............................................................................................................................. 253
Editing Hierarchies ............................................................................................................................. 254
Deleting Hierarchies ........................................................................................................................... 254
Configuring Relationship Base Objects and Relationship Types .......................................................... 255
About Relationships, Relationship Objects, and Relationship Types......................................... 255
Configuring Relationship Base Objects........................................................................................... 256
Configuring Relationship Types ....................................................................................................... 265
Configuring Packages for Use by HM ....................................................................................................... 269
About Packages ................................................................................................................................... 269
Creating Packages................................................................................................................................ 270
After You Create a Package............................................................................................................... 275
Assigning Packages to Entity or Relationship Types .................................................................... 275

viii Siperian Hub Administrator Guide


Configuring Profiles ......................................................................................................................................278
About Profiles ......................................................................................................................................278
Adding Profiles ....................................................................................................................................278
Editing Profiles ....................................................................................................................................280
Validating Profiles ...............................................................................................................................280
Copying Profiles...................................................................................................................................282
Deleting Profiles ..................................................................................................................................283
Deleting Relationship Types from a Profile ....................................................................................284
Deleting Entity Types from a Profile ...............................................................................................284
Assigning Packages to Entity and Relationship Types ..................................................................284
Sandboxes .......................................................................................................................................................284

Part 3: Configuring the Data Flow


Chapter 9: Siperian Hub Processes
Before You Begin ..........................................................................................................................................287
About Siperian Hub Processes....................................................................................................................288
Overall Data Flow for Batch Processes ...........................................................................................288
Consolidation Status for Base Object Records...............................................................................289
Survivorship and Order of Precedence............................................................................................291
Land Process ..................................................................................................................................................292
About the Land Process .....................................................................................................................292
Managing the Land Process ...............................................................................................................294
Stage Process ..................................................................................................................................................295
About the Stage Process.....................................................................................................................295
Managing the Stage Process...............................................................................................................298
Load Process ..................................................................................................................................................299
About the Load Process .....................................................................................................................299
Data Flow for the Load Process .......................................................................................................300
Tables Associated with the Load Process........................................................................................301
Initial Data Loads and Incremental Loads .....................................................................................302

Contents ix
Trust Settings and Validation Rules ................................................................................................. 303
Run-time Execution Flow of the Load Process............................................................................. 304
Other Considerations for the Load Process ................................................................................... 315
Managing the Load Process............................................................................................................... 316
Match Process................................................................................................................................................ 317
About the Match Process .................................................................................................................. 317
Match Data Flow................................................................................................................................. 319
Key Concepts for the Match Process .............................................................................................. 320
Run-Time Execution Flow of the Match Process ......................................................................... 329
Managing the Match Process............................................................................................................. 333
Consolidate Process...................................................................................................................................... 335
About the Consolidate Process......................................................................................................... 335
Consolidation Options ....................................................................................................................... 339
Consolidation and Workflow Integration ....................................................................................... 340
Managing the Consolidate Process................................................................................................... 341
Publish Process.............................................................................................................................................. 342
About the Publish Process ................................................................................................................ 342
Run-time Flow of the Publish Process ............................................................................................ 345
Managing the Publish Process........................................................................................................... 346

Chapter 10: Configuring the Land Process


Before You Begin.......................................................................................................................................... 348
Configuration Tasks for the Land Process ............................................................................................... 348
Configuring Source Systems........................................................................................................................ 348
About Source Systems........................................................................................................................ 348
Starting the Systems and Trust Tool................................................................................................ 350
Source System Properties................................................................................................................... 351
Adding Source Systems ...................................................................................................................... 352
Editing Source System Properties .................................................................................................... 353
Removing Source Systems................................................................................................................. 354
Configuring Landing Tables........................................................................................................................ 355
About Landing Tables........................................................................................................................ 355
Landing Table Columns..................................................................................................................... 356

x Siperian Hub Administrator Guide


Landing Table Properties ...................................................................................................................357
Adding Landing Tables.......................................................................................................................358
Editing Landing Table Properties .....................................................................................................360
Removing Landing Tables..................................................................................................................361

Chapter 11: Configuring the Stage Process


Before You Begin ..........................................................................................................................................364
Configuration Tasks for the Stage Process................................................................................................364
Configuring Staging Tables ..........................................................................................................................364
About Staging Tables ..........................................................................................................................364
Staging Table Columns .......................................................................................................................365
Staging Table Properties.....................................................................................................................367
Adding Staging Tables ........................................................................................................................371
Changing Properties in Staging Tables.............................................................................................374
Jumping to the Source System for a Staging Table ........................................................................376
Configuring Lookups For Foreign Key Columns ..........................................................................376
Removing Staging Tables ...................................................................................................................380
Mapping Columns Between Landing and Staging Tables.......................................................................380
About Mapping Columns...................................................................................................................380
Starting the Mappings Tool................................................................................................................384
Mapping Properties .............................................................................................................................386
Adding Mappings ................................................................................................................................386
Copying Mappings...............................................................................................................................387
Editing Mapping Properties...............................................................................................................388
Mapping Columns Between Landing and Staging Table Columns .............................................389
Configuring Query Parameters for Mappings.................................................................................392
Loading by RowID..............................................................................................................................394
Jumping to a Schema ..........................................................................................................................395
Testing Mappings ................................................................................................................................397
Removing Mappings ...........................................................................................................................398
Using Audit Trail and Delta Detection ......................................................................................................398
Configuring the Audit Trail for a Staging Table .............................................................................399
Configuring Delta Detection for a Staging Table...........................................................................401

Contents xi
Chapter 12: Configuring Data Cleansing
Before You Begin.......................................................................................................................................... 406
About Data Cleansing in Siperian Hub ..................................................................................................... 406
Setup Tasks for Data Cleansing........................................................................................................ 406
Configuring Cleanse Match Servers ........................................................................................................... 407
About the Cleanse Match Server ...................................................................................................... 407
Starting the Cleanse Match Server Tool .......................................................................................... 409
Cleanse Match Server Properties ...................................................................................................... 410
Adding a New Cleanse Match Server .............................................................................................. 411
Editing Cleanse Match Server Properties........................................................................................ 412
Deleting a Cleanse Match Server ...................................................................................................... 413
Testing the Cleanse Match Server Configuration .......................................................................... 413
Using Cleanse Functions.............................................................................................................................. 414
About Cleanse Functions................................................................................................................... 414
Starting the Cleanse Functions Tool................................................................................................ 415
Overview of Configuring Cleanse Functions ................................................................................. 417
Configuring Cleanse Libraries........................................................................................................... 418
Configuring Regular Expression Functions.................................................................................... 422
Configuring Graph Functions........................................................................................................... 424
Testing Functions................................................................................................................................ 437
Using Conditions in Cleanse Functions .......................................................................................... 438
Configuring Cleanse Lists ............................................................................................................................ 440
About Cleanse Lists ............................................................................................................................ 440
Adding Cleanse Lists .......................................................................................................................... 441
Editing Cleanse List Properties......................................................................................................... 442

Chapter 13: Configuring the Load Process


Before You Begin.......................................................................................................................................... 454
Configuration Tasks for Loading Data...................................................................................................... 454
Configuring Trust for Source Systems....................................................................................................... 455
About Trust.......................................................................................................................................... 455
Trust Properties................................................................................................................................... 459

xii Siperian Hub Administrator Guide


Considerations for Setting Trust Values ..........................................................................................460
Enabling Trust for a Column ............................................................................................................461
Assigning Trust Levels to Trust-Enabled Columns.......................................................................462
Configuring Validation Rules.......................................................................................................................468
About Validation Rules.......................................................................................................................468
Enabling Validation Rules for a Column .........................................................................................470
Navigating to the Validation Rules Node ........................................................................................471
Validation Rule Properties .................................................................................................................473
Adding Validation Rules.....................................................................................................................478
Editing Validation Rule Properties ...................................................................................................480
Changing the Sequence of Validation Rules....................................................................................481
Removing Validation Rules................................................................................................................482

Chapter 14: Configuring the Match Process


Before You Begin ..........................................................................................................................................484
Configuration Tasks for the Match Process..............................................................................................484
Understanding Your Data ..................................................................................................................484
Base Object Properties Associated with the Match Process ........................................................484
Configuration Steps for Defining Match Rules ..............................................................................485
Configuring Base Objects with International Data ........................................................................485
Navigating to the Match/Merge Setup Details Dialog............................................................................486
Configuring Match Properties for a Base Object .....................................................................................488
Setting Match Properties ....................................................................................................................488
Match Properties..................................................................................................................................490
Supporting Long ROWID_OBJECT Values .................................................................................497
Configuring Match Paths for Related Records .........................................................................................497
About Match Paths..............................................................................................................................497
Navigating to the Paths Tab ..............................................................................................................505
Configuring Path Components .........................................................................................................507
Configuring Filters for Match Paths.................................................................................................511
Configuring Match Columns .......................................................................................................................515
About Match Columns .......................................................................................................................515
Configuring Match Columns for Fuzzy-match Base Objects ......................................................519

Contents xiii
Configuring Match Columns for Exact-match Base Objects ...................................................... 527
Configuring Match Rule Sets ...................................................................................................................... 531
About Match Rule Sets ...................................................................................................................... 531
Match Rule Set Properties ................................................................................................................. 534
Navigating to the Match Rule Set Tab............................................................................................. 537
Adding Match Rule Sets..................................................................................................................... 538
Editing Match Rule Set Properties ................................................................................................... 539
Renaming Match Rule Sets................................................................................................................ 541
Deleting Match Rule Sets................................................................................................................... 542
Configuring Match Column Rules for Match Rule Sets ......................................................................... 542
About Match Column Rules.............................................................................................................. 542
Match Rule Properties for Fuzzy-match Base Objects Only ....................................................... 544
Match Column Properties for Match Rules.................................................................................... 559
Requirements for Exact-match Columns in Match Column Rules............................................. 563
Command Buttons for Configuring Column Match Rules .......................................................... 564
Adding Match Column Rules............................................................................................................ 565
Editing Match Column Rules............................................................................................................ 570
Deleting Match Column Rules.......................................................................................................... 572
Changing the Execution Sequence of Match Column Rules ....................................................... 573
Specifying Consolidation Options for Match Column Rules....................................................... 574
Configuring the Match Weight of a Column.................................................................................. 575
Configuring Segment Matching for a Column ............................................................................... 576
Configuring Primary Key Match Rules...................................................................................................... 578
About Primary Key Match Rules...................................................................................................... 578
Adding Primary Key Match Rules.................................................................................................... 578
Editing Primary Key Match Rules.................................................................................................... 581
Deleting Primary Key Match Rules.................................................................................................. 582
Investigating the Distribution of Match Keys .......................................................................................... 583
About Match Keys Distribution ....................................................................................................... 583
Navigating to the Match Keys Distribution Tab ........................................................................... 584
Components of the Match Keys Distribution Tab........................................................................ 585
Filtering Match Keys .......................................................................................................................... 587
Excluding Records from the Match Process ............................................................................................ 590

xiv Siperian Hub Administrator Guide


Chapter 15: Configuring the Consolidate Process
Before You Begin ..........................................................................................................................................594
About Consolidation Settings......................................................................................................................594
Immutable Rowid Object...................................................................................................................594
Distinct Systems...................................................................................................................................595
Unmerge Child When Parent Unmerges (Cascade Unmerge) .....................................................597
Changing Consolidation Settings ................................................................................................................598

Chapter 16: Configuring the Publish Process


Before You Begin ..........................................................................................................................................602
Configuration Steps for the Publish Process.............................................................................................602
Starting the Message Queues Tool .............................................................................................................603
Configuring Global Message Queue Settings............................................................................................604
Configuring Message Queue Servers..........................................................................................................605
About Message Queue Servers..........................................................................................................605
Message Queue Server Properties.....................................................................................................605
Adding Message Queue Servers ........................................................................................................606
Editing Message Queue Server Properties.......................................................................................607
Deleting Message Queue Servers ......................................................................................................608
Configuring Outbound Message Queues ..................................................................................................608
About Message Queues ......................................................................................................................608
Message Queue Properties .................................................................................................................608
Adding Message Queues to a Message Queue Server ...................................................................609
Editing Message Queue Properties...................................................................................................611
Deleting Message Queues ..................................................................................................................611
Configuring Message Triggers .....................................................................................................................612
About Message Triggers .....................................................................................................................612
Adding Message Triggers ...................................................................................................................615
Editing Message Triggers ...................................................................................................................621
Deleting Message Triggers .................................................................................................................622
JMS Message XML Reference .....................................................................................................................622
Generating ORS-specific XML Message Schemas.........................................................................622

Contents xv
Elements in an XML Message .......................................................................................................... 623
Filtering Messages ............................................................................................................................... 625
Example XML Messages ................................................................................................................... 625
Legacy JMS Message XML Reference ....................................................................................................... 644
Message Fields for Legacy XML....................................................................................................... 644
Filtering Messages for Legacy XML................................................................................................. 645
Example Messages for Legacy XML................................................................................................ 646

Part 4: Executing Siperian Hub Processes


Chapter 17: Using Batch Jobs
Before You Begin.......................................................................................................................................... 668
About Siperian Hub Batch Jobs ................................................................................................................. 668
Ways to Execute Batch Jobs ............................................................................................................. 668
Support Tables Used By Batch Jobs ................................................................................................ 669
Running Batch Jobs in Sequence...................................................................................................... 670
Best Practices for Working With Batch Jobs.................................................................................. 671
Batch Job Creation.............................................................................................................................. 672
Information-Only Batch Jobs (Not Run in the Hub Console).................................................... 673
Other Batch Jobs................................................................................................................................. 673
Running Batch Jobs Using the Batch Viewer Tool ................................................................................. 674
Batch Viewer Tool .............................................................................................................................. 674
Starting the Batch Viewer Tool......................................................................................................... 674
Grouping by Table, Data, or Procedure Type................................................................................ 675
Running Batch Jobs Manually........................................................................................................... 677
Viewing Job Execution Logs............................................................................................................. 682
Clearing the Job Execution History ................................................................................................. 687
Running Batch Jobs Using the Batch Group Tool.................................................................................. 688
About Batch Groups .......................................................................................................................... 688
Starting the Batch Group Tool ......................................................................................................... 690
Configuring Batch Groups ................................................................................................................ 691

xvi Siperian Hub Administrator Guide


Refreshing the Batch Groups List ....................................................................................................701
Executing Batch Groups Using the Batch Group Tool................................................................701
Filtering Execution Logs By Status...................................................................................................711
Deleting Batch Groups.......................................................................................................................712
Batch Jobs Reference ....................................................................................................................................713
Alphabetical List of Batch Jobs.........................................................................................................713
Accept Non-Matched Records As Unique .....................................................................................715
Autolink Jobs........................................................................................................................................715
Auto Match and Merge Jobs..............................................................................................................716
Automerge Jobs ...................................................................................................................................717
BVT Snapshot Jobs.............................................................................................................................719
External Match Jobs............................................................................................................................719
Generate Match Tokens Jobs ............................................................................................................725
Hub Delete Jobs ..................................................................................................................................726
Key Match Jobs....................................................................................................................................727
Load Jobs ..............................................................................................................................................727
Manual Link Jobs.................................................................................................................................732
Manual Merge Jobs..............................................................................................................................732
Manual Unlink Jobs.............................................................................................................................733
Manual Unmerge Jobs ........................................................................................................................733
Match Jobs............................................................................................................................................734
Match Analyze Jobs.............................................................................................................................738
Match for Duplicate Data Jobs .........................................................................................................740
Migrate Link Style To Merge Style Jobs...........................................................................................740
Multi Merge Jobs .................................................................................................................................741
Promote Jobs........................................................................................................................................741
Recalculate BO Jobs............................................................................................................................743
Recalculate BVT Jobs .........................................................................................................................744
Reset Links Jobs...................................................................................................................................744
Reset Match Table Jobs ......................................................................................................................744
Revalidate Jobs.....................................................................................................................................745
Stage Jobs..............................................................................................................................................745
Synchronize Jobs .................................................................................................................................747

Contents xvii
Chapter 18: Writing Custom Scripts to Execute Batch Jobs
About Executing Siperian Hub Batch Jobs .............................................................................................. 750
Setting Up Job Execution Scripts............................................................................................................... 750
About Job Execution Scripts............................................................................................................. 750
About the C_REPOS_TABLE_OBJECT_V View ...................................................................... 751
Determining Available Execution Scripts ....................................................................................... 754
Retrieving Values from C_REPOS_TABLE_OBJECT_V at Execution Time ....................... 755
Running Scripts Asynchronously...................................................................................................... 755
Monitoring Job Results and Statistics ........................................................................................................ 755
Error Messages and Return Codes................................................................................................... 755
Job Execution Status .......................................................................................................................... 756
Stored Procedure Reference........................................................................................................................ 758
Alphabetical List of Batch Jobs ........................................................................................................ 758
Accept Non-matched Records As Unique .................................................................................... 760
Autolink Jobs ....................................................................................................................................... 762
Auto Match and Merge Jobs ............................................................................................................. 762
Automerge Jobs................................................................................................................................... 764
BVT Snapshot Jobs ............................................................................................................................ 765
Execute Batch Group Jobs................................................................................................................ 765
External Match Jobs ........................................................................................................................... 766
Generate Match Token Jobs ............................................................................................................. 767
Get Batch Group Status Jobs............................................................................................................ 769
Hub Delete Jobs.................................................................................................................................. 769
Key Match Jobs ................................................................................................................................... 773
Load Jobs.............................................................................................................................................. 775
Manual Link Jobs ................................................................................................................................ 777
Manual Merge Jobs ............................................................................................................................. 777
Manual Unlink Jobs ............................................................................................................................ 779
Manual Unmerge Jobs........................................................................................................................ 779
Match Jobs ........................................................................................................................................... 783
Match Analyze Jobs ............................................................................................................................ 785
Match for Duplicate Data Jobs......................................................................................................... 786
Multi Merge Jobs................................................................................................................................. 788

xviii Siperian Hub Administrator Guide


Promote Jobs........................................................................................................................................790
Recalculate BO Jobs............................................................................................................................791
Recalculate BVT Jobs .........................................................................................................................792
Reset Batch Group Status Jobs .........................................................................................................792
Reset Links Jobs...................................................................................................................................792
Reset Match Table Jobs ......................................................................................................................793
Revalidate Jobs.....................................................................................................................................794
Stage Jobs..............................................................................................................................................795
Synchronize Jobs .................................................................................................................................796
Executing Batch Groups Using Stored Procedures.................................................................................798
About Executing Batch Groups........................................................................................................798
Stored Procedures for Batch Groups ...............................................................................................799
Developing Custom Stored Procedures for Batch Jobs..........................................................................806
About Custom Stored Procedures ....................................................................................................806
Required Execution Parameters for Custom Batch Jobs ..............................................................807
Registering a Custom Stored Procedure ..........................................................................................808
Registering a Custom Index...............................................................................................................809
Removing Data from a Base Object and Supporting Metadata Tables ......................................810
Writing Messages to Siperian Hub Database Debug Log .............................................................810
Example Custom Stored Procedure .................................................................................................811

Part 5: Configuring Application Access


Chapter 19: Generating ORS-specific APIs and Message Schemas
Before You Begin ..........................................................................................................................................818
Generating ORS-specific APIs....................................................................................................................818
About ORS-specific Schemas............................................................................................................818
About the SIF Manager Tool ............................................................................................................818
Starting the SIF Manager Tool ..........................................................................................................819
Generating and Deploying ORS-specific SIF APIs .......................................................................820
Generating ORS-specific Message Schemas .............................................................................................823

Contents xix
About the JMS Event Schema Manager Tool ................................................................................ 824
Starting the JMS Event Schema Manager Tool.............................................................................. 825
Generating and Deploying ORS-specific Schemas........................................................................ 827

Chapter 20: Setting Up Security


About Setting Up Security ........................................................................................................................... 832
Siperian Hub Security Concepts ....................................................................................................... 832
How Users, Roles, Privileges, and Resources Are Related........................................................... 835
Security Implementation Scenarios .................................................................................................. 836
Summary of Security Configuration Tasks...................................................................................... 838
Configuration Tasks For Security Scenarios................................................................................... 839
Securing Siperian Hub Resources............................................................................................................... 841
About Siperian Hub Resources......................................................................................................... 841
About the Secure Resources Tool .................................................................................................... 845
Starting the Secure Resources Tool.................................................................................................. 845
Configuring Resources ....................................................................................................................... 846
Configuring Resource Groups .......................................................................................................... 849
Refreshing the Resources List........................................................................................................... 853
Refreshing Other Security Changes ................................................................................................. 854
Configuring Roles ......................................................................................................................................... 854
About Roles ......................................................................................................................................... 854
Starting the Roles Tool....................................................................................................................... 855
Adding Roles........................................................................................................................................ 857
Editing Roles........................................................................................................................................ 858
Deleting Roles...................................................................................................................................... 858
Mapping Internal Roles to External Roles...................................................................................... 859
Assigning Resource Privileges to Roles ........................................................................................... 859
Assigning Roles to Other Roles........................................................................................................ 862
Generating a Report of Resource Privileges for Roles.................................................................. 863
Configuring Siperian Hub Users................................................................................................................. 866
Before You Begin................................................................................................................................ 866
About Configuring Siperian Hub Users .......................................................................................... 867
Starting the Users Tool ...................................................................................................................... 868

xx Siperian Hub Administrator Guide


Configuring Users................................................................................................................................869
Configuring User Access to ORS Databases ..................................................................................875
Configuring Password Policies ..........................................................................................................877
Configuring Secured JDBC Data Sources .......................................................................................880
Configuring User Groups.............................................................................................................................881
About User Groups.............................................................................................................................881
Starting the Users and Groups Tool.................................................................................................882
Adding User Groups...........................................................................................................................883
Editing User Groups...........................................................................................................................883
Deleting User Groups.........................................................................................................................884
Assigning Users and Users Groups to User Groups .....................................................................885
Assigning Users to the Current ORS Database ........................................................................................886
Assigning Roles to Users and User Groups ..............................................................................................887
Assigning Users and User Groups to Roles ....................................................................................887
Assigning Roles to Users and User Groups ....................................................................................888
Managing Security Providers .......................................................................................................................889
About Security Providers ...................................................................................................................889
Starting the Security Providers Tool.................................................................................................890
Managing Provider Files .....................................................................................................................892
Managing Security Provider Settings ................................................................................................896

Chapter 21: Viewing Registered Custom Code


About User Objects ......................................................................................................................................910
About the User Object Registry Tool ........................................................................................................910
Starting the User Object Registry Tool ......................................................................................................911
Viewing User Exits........................................................................................................................................912
About User Exits .................................................................................................................................912
Viewing User Exits..............................................................................................................................913
Viewing Custom Stored Procedures...........................................................................................................913
About Custom Stored Procedures ....................................................................................................913
How Custom Stored Procedures Are Registered ...........................................................................914
Viewing Registered Custom Stored Procedures .............................................................................914
Viewing Custom Java Cleanse Functions ..................................................................................................915

Contents xxi
About Custom Java Cleanse Functions ........................................................................................... 915
How Custom Java Cleanse Functions Are Registered .................................................................. 915
Viewing Registered Custom Java Cleanse Functions .................................................................... 915
Viewing Custom Button Functions............................................................................................................ 916
About Custom Button Functions..................................................................................................... 916
How Custom Button Functions Are Registered ............................................................................ 917
Viewing Registered Custom Button Functions.............................................................................. 917

Chapter 22: Auditing Siperian Hub Services and Events


About Integration Auditing......................................................................................................................... 920
Auditable Events ................................................................................................................................. 920
Audit Manager Tool ........................................................................................................................... 921
Capturing XML for Requests and Responses ................................................................................ 921
Auditing Must Be Explicitly Enabled .............................................................................................. 921
Auditing Occurs After Authentication ............................................................................................ 921
Auditing Occurs for Invocations With Valid, Well-formed XML .............................................. 922
Auditing Password Changes.............................................................................................................. 922
Starting the Audit Manager.......................................................................................................................... 922
Auditable API Requests and Message Queues............................................................................... 923
Systems to Audit ................................................................................................................................. 923
Audit Properties .................................................................................................................................. 924
Auditing SIF API Requests ......................................................................................................................... 926
Auditing Message Queues............................................................................................................................ 928
Auditing Errors ............................................................................................................................................. 929
Configuring Global Error Auditing ................................................................................................. 929
Using the Audit Log ..................................................................................................................................... 930
About the Audit Log .......................................................................................................................... 930
Audit Log Table .................................................................................................................................. 931
Viewing the Audit Log ....................................................................................................................... 933
Sample Audit Log Entries.................................................................................................................. 934
Periodically Purging the Audit Log .................................................................................................. 935

xxii Siperian Hub Administrator Guide


Part 6: Appendixes
Appendix A: Configuring International Data Support
Configuring Unicode in Siperian Hub........................................................................................................940
Creating and Configuring the Database ...........................................................................................940
Configuring Match Settings for Non-US Populations...................................................................941
Cleanse Settings for Unicode.............................................................................................................945
Data in Landing Tables.......................................................................................................................945
Hub Console ........................................................................................................................................945
Locale Recommendations for UNIX When Using UTF8............................................................945
Configuring the ANSI Code Page (Windows Only)................................................................................946
Determining the ANSI Code Page ...................................................................................................946
Changing the ANSI Code Page.........................................................................................................947
Configuring NLS_LANG ............................................................................................................................947
Syntax for NLS_LANG .....................................................................................................................947
Configuring NLS_LANG in the Windows Registry......................................................................948
Configuring NLS_LANG as an Environment Variable................................................................948

Appendix B: Backing Up and Restoring Siperian Hub


Backing Up Siperian Hub.............................................................................................................................952
Backup and Recovery Strategies for Siperian Hub...................................................................................952
Backup and Recovery With Non-Logging Operations..................................................................953
Backup and Recovery Without Non-Logging Operations............................................................953

Appendix C: Configuring User Exits


About User Exits ...........................................................................................................................................956
Types of User Exits.......................................................................................................................................957
User Exits for the Stage Process .......................................................................................................957
User Exits for the Load Process .......................................................................................................961
User Exits for the Match Process .....................................................................................................962
User Exits for the Merge Process .....................................................................................................963

Contents xxiii
User Exits for the Unmerge Process................................................................................................ 964
Additional User Exits ......................................................................................................................... 965

Appendix D: Viewing Configuration Details


About the Enterprise Manager ................................................................................................................... 968
Starting the Enterprise Manager ................................................................................................................. 968
Enterprise Manager Properties ................................................................................................................... 969
Choosing Properties to View ............................................................................................................ 969
Hub Server Properties........................................................................................................................ 970
Cleanse Server Properties .................................................................................................................. 972
Master Database Properties............................................................................................................... 974
ORS Database Properties .................................................................................................................. 975
Environment Report .......................................................................................................................... 976

Appendix E: Implementing Custom Buttons in Hub Console Tools


About Custom Buttons in the Hub Console............................................................................................ 978
What Happens When a User Clicks a Custom Button ................................................................. 979
How Custom Buttons Appear in the Hub Console ...................................................................... 980
Adding Custom Buttons .............................................................................................................................. 981
Writing a Custom Function............................................................................................................... 981
Controlling the Custom Button Appearance.................................................................................. 985
Deploying Custom Buttons............................................................................................................... 986

Appendix F: Configuring Access to Hub Console Tools


About User Access to Hub Console Tools............................................................................................... 989
Starting the Tool Access Tool..................................................................................................................... 990
Granting User Access to Tools and Processes......................................................................................... 991
Revoking User Access to Tools and Processes ........................................................................................ 992

Glossary ..............................................................................................................................................................993

Index.....................................................................................................................................................................1041

xxiv Siperian Hub Administrator Guide


Preface

Welcome to the Siperian Hub™ Administrator Guide. This guide explains how to
administer, manage, and configure Siperian Hub.

Intended Audience
This guide is intended for Siperian Hub administrators. These are the IT people
responsible for configuring or updating a Hub Store so that it provides the rules and
functionality required by the data stewards. Administrators should have an excellent
knowledge of database administration.

Organization
This guide contains the following chapters:

Part 1, “Introduction” Provides an overview of Siperian Hub administration and


explains how to navigate the Hub Console.
Chapter 1, “Introduction” Introduces Siperian Hub administration phases, tools, and
tasks.
Chapter 2, “Getting Started Introduces tools in the Hub Console and provides general
with the Hub Console” navigation instructions.
Part 2, “Building the Data Describes how to construct the schema (data model) used in
Model” your Siperian Hub implementation and stored in the Hub
Store. It provides instructions on using Hub Console tools to
configure Operational Record Stores (ORSs), datasources,
the data model, queries, packages, hierarchies, and other
objects.

xxv
Organization

Chapter 3, “About the Hub Describes the key components of the Hub Store: the Master
Store” Database and Operational Record Stores (ORS).
Chapter 4, “Configuring Explains how to configure Operational Record Stores (ORS)
Operational Record Stores and and datasources.
Datasources”
Chapter 5, “Building the Describes the Hub Store schema and provides instructions
Schema” on building the schema for your Siperian Hub
implementation.
Chapter 6, “Configuring Explains how to use and create Siperian Hub queries and
Queries and Packages” packages.
Chapter 7, “State Describes state management concepts and provides
Management” instructions for configuring state management in your
Siperian Hub implementation.
Chapter 8, “Configuring Explains how to configure Siperian Hierarchy Manager (HM)
Hierarchies” and describes how to create and configure relationships
based on foreign keys.
Part 3, “Configuring the Data Describes the flow of data through the Siperian Hub via a
Flow” series of processes (land, stage, load, match, consolidate, and
distribute), and provides instructions for configuring each
process using tools in the Hub Console.
Chapter 9, “Siperian Hub Describes the flow of data through the Siperian Hub via
Processes” batch processes, starting with the land process and
concluding with the distribution process.
Chapter 10, “Configuring the Describes the data landing process and explains how to
Land Process” configure source systems and landing tables.
Chapter 11, “Configuring the Describes the data staging process and explains how to
Stage Process” configure staging tables, mappings, and other settings in that
affect Stage jobs.
Chapter 12, “Configuring Data Explains how to configure data cleansing rules that are run
Cleansing” during Stage jobs.
Chapter 13, “Configuring the Explains how to use the load process, and how to define
Load Process” trust and validation rules.
Chapter 14, “Configuring the Explains how to configure your Hub Store to match data.
Match Process”
Chapter 15, “Configuring the Explains how to configure your Hub Store to consolidate
Consolidate Process” data.

xxvi Siperian Hub Administrator Guide


Organization

Chapter 16, “Configuring the Explains how to configure Siperian Hub to write changes to
Publish Process” a message queue.
Part 4, “Executing Siperian Describes how to use Hub Console tools to run Siperian
Hub Processes” Hub processes via batch jobs, and how to use third-party job
management tools to schedule and manage Siperian Hub
processes via stored procedures.
Chapter 17, “Using Batch Explains how to use the Siperian Hub batch jobs and batch
Jobs” groups.
Chapter 18, “Writing Custom Explains how to schedule Siperian Hub batch jobs using job
Scripts to Execute Batch Jobs” execution scripts.
Part 5, “Configuring Describes how to use Hub Console tools to configure
Application Access” Siperian Hub client applications that access Siperian Hub
using Services Integration Framework (SIF) requests.
Chapter 19, “Generating Describes how to generate ORS-specific SIF APIs using the
ORS-specific APIs and SIF Manager tool in the Hub Console.
Message Schemas”
Chapter 20, “Setting Up Explains how to set up security for users who will access
Security” Siperian Hub resources via the Hub Console or third-party
applications.
Chapter 21, “Viewing Explains how to register custom code using the User Object
Registered Custom Code” Registry tool in the Hub Console.
Chapter 22, “Auditing Siperian Describes how to set up auditing and debugging in the Hub
Hub Services and Events” Console.
Part 6, “Appendixes” Describes other administration-related topics.
Appendix A, “Configuring Describes how to configure different character sets for
International Data Support” internationalization purposes.
Appendix B, “Backing Up and Explains how to back up and restore a Siperian Hub
Restoring Siperian Hub” implementation.
Appendix C, “Configuring Explains how to configure user exits, which are
User Exits” user-customized, unencrypted stored procedures that are
configured to execute at a specific point during batch job
execution.
Appendix D, “Viewing Explains how to view details of your Siperian Hub
Configuration Details” implementation using the Enterprise Manager tool in the
Hub Console.

xxvii
Learning About Siperian Hub

Appendix E, “Implementing Explains how to add custom buttons to tools in the Hub
Custom Buttons in Hub Console that allow users to invoke external services on
Console Tools” demand.
Appendix F, “Configuring Describes how to grant or revoke user access to tools in the
Access to Hub Console Tools” Hub Console using the Tool Access tool.
Glossary Defines Siperian Hub terminology.

Learning About Siperian Hub


What’s New in Siperian Hub

What’s New in Siperian Hub describes the new features in this Siperian Hub release.

Siperian Hub Release Notes

The Siperian Hub Release Notes contain important information about this Siperian Hub
release. Installers should read the Siperian Hub Release Notes before installing Siperian
Hub.

Siperian Hub Overview

The Siperian Hub Overview introduces Siperian Hub, describes the product architecture,
and explains core concepts that all users need to understand before using the product.

Siperian Hub Installation Guide

The Siperian Hub Installation Guide explains to installers how to set up Siperian Hub,
the Hub Store, Cleanse Match Servers, and other components. There is a Siperian Hub
Installation Guide for each supported platform.

Siperian Hub Cleanse Adapter Guide

The Siperian Hub Cleanse Adapter Guide explains to installers how to configure Siperian
Hub to use the supported adapters and cleanse engines.

xxviii Siperian Hub Administrator Guide


Learning About Siperian Hub

Siperian Hub Data Steward Guide

The Siperian Hub Data Steward Guide explains to data stewards how to use Siperian Hub
tools to consolidate and manage their organization's data. After reading the Siperian
Hub Overview, data stewards should read the Siperian Hub Data Steward Guide.

Siperian Hub Administrator Guide

The Siperian Hub Administrator Guide explains to administrators how to use Siperian
Hub tools to build their organization’s data model, configure and execute Siperian Hub
data management processes, set up security, provide for external application access to
Siperian Hub functionality and resources, and other customization tasks. After reading
the Siperian Hub Overview, administrators should read the Siperian Hub Administrator
Guide.

Siperian Services Integration Framework Guide

The Siperian Services Integration Framework Guide explains to developers how to use
the Siperian Hub Services Integration Framework (SIF) to integrate Siperian Hub
functionality with their applications, and how to create applications using the data
provided by Siperian Hub. SIF allows developers to integrate Siperian Hub smoothly
with their organization's applications. After reading the Siperian Hub Overview,
developers should read the Siperian Services Integration Framework Guide.

Siperian Hub Metadata Manager Guide

The Siperian Hub Metadata Manager Guide explains how to use the Siperian Hub
Metadata Manager tool to validate their organization’s metadata, promote changes
between repositories, import objects into repositories, export repositories, and related
tasks.

Siperian Hub Resource Kit Guide

The Siperian Hub Resource Kit Guide explains how to install and use the Siperian Hub
Resource Kit, which is a set of utilities, examples, and libraries that assist developers
with integrating the Siperian Hub into their applications and workflows. This

xxix
Learning About Siperian Hub

document provides a description of the various sample applications that are included
with the Resource Kit.

Siperian Hub Insight Manager Guide

The Siperian Hub Insight Manager Guide explains how to install, configure, and use the
Siperian Hub Insight Manager to generate reporting metadata for the data managed
in the Hub Store. It provides a description of how to use this reporting metadata with
third-party reporting tools to create reports and metrics for this data.

Siperian Training and Materials

Siperian provides live, instructor-based training to help professionals become proficient


users as quickly as possible. From initial installation onward, a dedicated team of
qualified trainers ensure that an organization’s staff is equipped to take advantage of
this powerful platform. To inquire about training classes or to find out where and
when the next training session is offered, please visit Siperian’s web site or contact
Siperian directly.

xxx Siperian Hub Administrator Guide


Contacting Siperian

Contacting Siperian
Technical support is available to answer your questions and to help you with any
problems encountered using Siperian products. Please contact your local Siperian
representative or distributor as specified in your support agreement. If you have a
current Siperian Support Agreement, you can contact Siperian Technical Support:

Method Contact Information


World Wide Web http://www.siperian.com
Email support@siperian.com
Voice U.S.: 1-866-SIPERIAN (747-3742)

We are interested in hearing your comments about this book. Send your comments to:
by Email: docs@siperian.com
by Postal Service: Documentation Manager
Siperian, Inc.
100 Foster City Blvd.
2nd Floor
Foster City, California 94404 USA

xxxi
Contacting Siperian

xxxii Siperian Hub Administrator Guide


Part 1
Introduction

Contents
• Chapter 1, “Introduction”
• Chapter 2, “Getting Started with the Hub Console”

1
2 Siperian Hub Administrator Guide
1
Introduction

This chapter introduces and provides an overview of administering Siperian MDM


Hub™ (hereinafter referred to as Siperian Hub). It is recommended for anyone who
manages a Siperian Hub implementation.

Note: This document assumes that you have read the Siperian Hub Overview and have a
basic understanding of Siperian Hub architecture and key concepts.

Chapter Contents
• About Siperian Hub Administrators
• Phases in Siperian Hub Administration
• Summary of Administration Tasks

3
About Siperian Hub Administrators

About Siperian Hub Administrators


Siperian Hub administrators have primary responsibility for the configuration of the
Siperian Hub system. Administrators access Siperian Hub through the Hub Console,
which comprises a set of tools for managing a Siperian Hub implementation.

Siperian Hub administrators use the Hub Console to:


• build the data model and other objects in the Hub Store
• configure and execute Siperian Hub data management processes
• configure external application access to Siperian Hub functionality and resources
• monitor ongoing operations

For an introduction to using the Hub Console, see Chapter 2, “Getting Started with
the Hub Console.”

Phases in Siperian Hub Administration

This section describes typical phases in Siperian Hub administration. These phases may
vary for your Siperian Hub implementation based on your organization’s methodology.

Startup Phase
The startup phase involves installing and configuring core Siperian Hub components:
Hub Store, Hub Server, Cleanse Match Server(s), and cleanse adapters. For instructions
on installing the Hub Store, Hub Server, and Cleanse Match Servers, see the Siperian

4 Siperian Hub Administrator Guide


Phases in Siperian Hub Administration

Hub Installation Guide for your platform. For instructions on setting up a cleanse
adapter, see the Siperian Hub Cleanse Adapter Guide.

Note: The instructions in this document assume that you have already completed the
startup phase and are ready to begin configuring your Siperian Hub implementation.

Configuration Phase
After Siperian Hub has been installed and set up, administrators can begin configuring
and testing Siperian Hub functionality—the data model and other objects in the Hub
Store, data management processes, external application access, and so on. This phase
involves a dynamic, iterative process of building and testing Siperian Hub functionality
to meet the stated requirements of an organization. The bulk of the material in this
document refers to tasks associated with the configuration phase.

After a schema has been sufficiently built and the Siperian Hub has been properly
configured, developers can build external applications to access Siperian Hub
functionality and resources. For instructions on developing external applications, see
the Siperian Services Integration Framework Guide.

Production Phase
After a Siperian Hub implementation has been sufficiently configured and tested,
administrators deploy the Siperian Hub in a production environment. In addition to
managing ongoing Siperian Hub operations, this phase can involve performance tuning
to optimize the processing of actual business data.

Introduction 5
Summary of Administration Tasks

Summary of Administration Tasks


This section provides a summary of administration tasks.

Setting Up Security
In this document, Chapter 20, “Setting Up Security,” describes the tasks associated
with setting up security in a Siperian Hub implementation. Setup tasks vary depending
on the particular security requirements of your Siperian Hub implementation, as
described in “Security Implementation Scenarios” on page 836. Additional security
tasks are involved if external applications access your Siperian Hub implementation
using Services Integration Framework (SIF) requests. For more information, see
“About Setting Up Security” on page 832, “Summary of Security Configuration Tasks”
on page 838, and “Configuration Tasks For Security Scenarios” on page 839.

To configure security for a Siperian Hub implementation using Siperian Hub’s internal
security framework, you complete the following tasks using tools in the Hub Console:
High-Level Tasks for Setting Up Security
Task Usage
“Managing the Global Password Required to define global password policies for all users
Policy” on page 877. according to your organization’s security policies and
procedures.
“Configuring Siperian Hub Required to define user accounts for users to access Siperian
Users” on page 866 Hub resources.
“Assigning Users to the Current Required to provide users with access to the database(s)
ORS Database” on page 886 they need to use.
“Configuring User Groups” on Optional. To simplify security configuration tasks by
page 881 configuring user groups and assign users.
“Securing Siperian Hub Required in order to selectively and securely expose Siperian
Resources” on page 841 Hub resources to external applications.
“Configuring Roles” on page Required to define roles and assign resource privileges to
854 them.
“Assigning Roles to Users and Required to assign roles to users and (optionally) user
User Groups” on page 887 groups.

6 Siperian Hub Administrator Guide


Summary of Administration Tasks

High-Level Tasks for Setting Up Security (Cont.)


Task Usage
“Managing Security Providers” Required if you are using external security providers to
on page 889 handle any portion of security in your Siperian Hub
implementation.
“Configuring Access to Hub Required to provide non-administrator users with access to
Console Tools” on page 989 Hub Console tools.

Building the Data Model


In this document, Part 2, “Building the Data Model,” describes how to construct the
schema (data model) used in your Siperian Hub implementation and stored in the Hub
Store. It provides instructions for using Hub Console tools to configure Operational
Record Stores (ORSs), datasources, the data model, queries, packages, hierarchies, and
other metadata.
High-Level Tasks for Building the Data Model
Task Usage
“Creating Hub Store Required for all Siperian Hub implementations. For more
Databases” on page 58 information, see the instructions for installing the Hub
Store in the Siperian Hub Installation Guide for your platform.
“Configuring Operational Required for all Siperian Hub implementations. You must
Record Stores” on page 62 register an ORS so that Siperian Hub can connect to it.
For more information, see “Databases in the Hub Store” on
page 56.
“Configuring Datasources” on Required only if the datasource was not automatically
page 77 created upon registering an ORS. Every ORS requires a
datasource definition in the application server environment.
For more information, see “About Datasources” on page
77.
“Configuring Base Objects” on Required for each base object in your schema. Base objects
page 92 are used for a central business entity (such as customer,
product, or employee) or a lookup table (such as country or
state). For more information, see “About the Schema” on
page 82, “Process Overview for Defining Base Objects” on
page 94, and “About Base Objects” on page 92.

Introduction 7
Summary of Administration Tasks

High-Level Tasks for Building the Data Model (Cont.)


Task Usage
“Configuring Columns in Required for all base objects, dependent objects, landing
Tables” on page 125 tables, and staging tables. For more information, see “About
Columns” on page 126.
“Configuring Foreign-Key Required only when you want to explicitly define a
Relationships Between Base foreign-key relationship (parent-child) between two base
Objects” on page 140 objects. For more information, see “Process Overview for
Defining Foreign-Key Relationships” on page 143 and
“About Foreign Key Relationships” on page 140. For
Hierarchy Manager, see “Configuring Hierarchies” on page
223 instead.
“Configuring Dependent Required only if a base object has a dependent object, which
Objects” on page 117 is a table that is used to store detailed information about the
records in a base object (such as supplemental notes). For
more information, see “About the Schema” on page 82,
“Process Overview for Defining Dependent Objects” on
page 119, and “About Dependent Objects” on page 117.
“Viewing Your Schema” on Useful for visualizing your schema in a graphical format.
page 148
“Configuring Queries” on page Required for creating queries used in packages. For more
162 information, see “About Queries” on page 162 and
“Configuring Packages” on page 196.
Required for queries used by data stewards in the Merge
Manager tool. For more information, see the Siperian Hub
Data Steward Guide.
“Configuring Packages” on page Required to allow external application users to access
196 Siperian Hub functionality using Services Integration
Framework (SIF) requests. For more information, see the
Siperian Services Integration Framework Guide. For more
information, see “About Packages” on page 196.
Required to allow data stewards to merge and update
records in the Hub Store using the Merge Manager and
Data Manager tools. For more information, see the Siperian
Hub Data Steward Guide.

8 Siperian Hub Administrator Guide


Summary of Administration Tasks

Configuring the Data Flow


In this document, Part 3, “Configuring the Data Flow,” describes the flow of data
through the Siperian Hub through a series of processes (land, stage, load, match,
consolidate, and publish), and provides instructions for configuring each process using
tools in the Hub Console.

Configuring the Land Process

To configure the land process for a base object, see “Land Process” on page 292,
“Configuring the Land Process” on page 347, and the following topics:
High-Level Tasks for Configuring the Land Process
Task Usage
“Configuring Source Systems” Required to define a unique name internal name for each
on page 348 source system (external applications or systems that provide
data to Siperian Hub). For more information, see “About
Source Systems” on page 348.
“Configuring Landing Tables” Required to create landing tables, which provide
on page 355 intermediate storage in the flow of data from source
systems into Siperian Hub. For more information, see
“About Landing Tables” on page 355.

Configuring the Stage Process

To configure the stage process for a base object, see “Stage Process” on page 295,
“Configuring the Stage Process” on page 363, and the following topics:
High-Level Tasks for Configuring the Stage Process
Task Usage
“Configuring Staging Tables” Required to create staging tables, which provide temporary,
on page 364 intermediate storage in the flow of data from landing tables
into base objects and dependent objects via load jobs.
To learn more, see “About Staging Tables” on page 364.

Introduction 9
Summary of Administration Tasks

High-Level Tasks for Configuring the Stage Process (Cont.)


Task Usage
“Mapping Columns Between Required to enable Siperian Hub to move data from a
Landing and Staging Tables” on landing table to a staging table during the stage process, and
page 380 also to specify cleanse operations on columns of data that
are moved. To learn more, see “About Mapping Columns”
on page 380.
“Configuring Data Cleansing” Required to set up data cleansing for a base object during
on page 405 the stage process using the Siperian Hub internal cleanse
functionality. To learn more, see “About Data Cleansing in
Siperian Hub” on page 406 and the following topics:
• “Configuring Cleanse Match Servers” on page 407 to
deploy Cleanse Match Servers that execute cleanse
operations and the match process for an Operational
Record Store (ORS). For more information, see “About
the Cleanse Match Server” on page 407.
• “Configuring Cleanse Lists” on page 440 to specify a
logical grouping of cleanse functions that are executed
at run time in a predefined order. For more
information, see “About Cleanse Lists” on page 440.
• “Using Cleanse Functions” on page 414 to build and
execute cleanse functions that cleanse (standardize or
verify) data. For more information, see “About Cleanse
Functions” on page 414.

Configuring the Load Process

To configure the load process for a base object, see “Load Process” on page 299,
“Configuring the Load Process” on page 453, and the following topics:
High-Level Tasks for Configuring the Load Process
Task Usage
“Configuring Trust for Source Used when multiple source systems contribute data to a
Systems” on page 455 column in a base object. Required if you want to designate
the relative trust level (confidence factor) for each
contributing source system. For more information, see
“About Trust” on page 455.
“Configuring Validation Rules” Required if you want to use validation rules to downgrade
on page 468 trust scores for cell data based on configured conditions.
For more information, see “About Validation Rules” on
page 468.

10 Siperian Hub Administrator Guide


Summary of Administration Tasks

Configuring the Match Process

To configure the match process for a base object, see “Match Process” on page 317,
“Configuring the Match Process” on page 483, and the following topics:
High-Level Tasks for Configuring the Match Process
Task Usage
“Configuring Match Properties Required for each base object that will be involved in
for a Base Object” on page 488 mapping. For more information, see “Match Properties” on
page 490.
“Configuring Match Paths for Required for match column rules involving related records
Related Records” on page 497 in either separate tables or in the same table. For more
information, see “About Match Paths” on page 497.
“Configuring Match Columns” Required to specify the base object columns to use in match
on page 515 column rules. For more information, see “About Match
Columns” on page 515.
“Configuring Match Rule Sets” Required if you want to use match rule sets to execute
on page 531 different sets of match column rules at different stages in
the match process. For more information, see “About
Match Rule Sets” on page 531.
“Configuring Match Column Required to specify match column rules that determine
Rules for Match Rule Sets” on whether two records for a base object are similar enough to
page 542 consolidate. For more information, see “About Match
Column Rules” on page 542.
“Configuring Primary Key Required to specify the base object columns (primary keys)
Match Rules” on page 578 to use in primary key match rules. For more information,
see “About Primary Key Match Rules” on page 578.
“Investigating the Distribution Useful for investigating the distribution of generated match
of Match Keys” on page 583 keys upon completion of the match process. For more
information, see “About Match Keys Distribution” on page
583.
“Configuring Match Settings for Required for configuring matches involving non-US
Non-US Populations” on page populations and multiple populations.
941

Introduction 11
Summary of Administration Tasks

Configuring the Consolidation Process

To configure the consolidation process for a base object, see “Consolidate Process” on
page 335 and “Configuring the Consolidate Process” on page 593.

Configuring the Publish Process

To configure the publish process for a base object, see “Publish Process” on page 342,
“Configuring the Publish Process” on page 601, and the following topics:
High-Level Tasks for Configuring the Publish Process
Task Usage
“Configuring Global Message Required to specify global settings for all message queues
Queue Settings” on page 604 involving outbound Siperian Hub messages.
“Configuring Message Queue Required to set up one or more message queue servers that
Servers” on page 605 Siperian Hub will use for incoming and outgoing messages.
The message queue server must already be defined in your
application server environment according to the application
server instructions. For more information, see “About
Message Queue Servers” on page 605.
“Configuring Outbound Required to set up one or more outbound message queues
Message Queues” on page 608 for a message queue server. For more information, see
“About Message Queues” on page 608.
“Configuring Message Triggers” Required for configuring message triggers for a base object.
on page 612 Message queue triggers identify which actions within
Siperian Hub are communicated to outside applications via
messages in message queues. For more information, see
“About Message Triggers” on page 612.

12 Siperian Hub Administrator Guide


Summary of Administration Tasks

Executing Siperian Hub Processes


In this document, Part 4, “Executing Siperian Hub Processes,” describes how to use
Hub Console tools to run Siperian Hub processes, either:
• as batch jobs from the Hub Console, or
• as stored procedures using third-party job management tools to schedule and
manage job execution

Executing Processes in the Hub Console

To execute Siperian Hub processes using tools in the Hub Console, see “About
Siperian Hub Batch Jobs” on page 668, “Using Batch Jobs” on page 667, and the
following topics:
High-Level Tasks for Executing Siperian Hub Process in the Hub Console
Task Usage
“Running Batch Jobs Using the Required if you want to run individual batch jobs from the
Batch Viewer Tool” on page Hub Console using the Batch Viewer tool. For more
674 information, see “Batch Viewer Tool” on page 674.
“Running Batch Jobs Using the Required if you want to run batch jobs in a group from the
Batch Group Tool” on page 688 Hub Console, allowing you to configure the execution
sequence for batch jobs and to execute batch jobs in
parallel. For more information, see “About Batch Groups”
on page 688.

Executing Processes Using Job Management Tools

To execute and manage Siperian Hub stored procedures on a scheduled basis (using
job management tools that control IT processes), see “About Executing Siperian Hub

Introduction 13
Summary of Administration Tasks

Batch Jobs” on page 750, Chapter 18, “Writing Custom Scripts to Execute Batch Jobs,”
and the following topics:
High-Level Tasks for Executing Siperian Hub Processes Using Job Management Tools
Task Usage
“Setting Up Job Execution Required for writing job execution scripts for job
Scripts” on page 750 management tools. For more information, see “About Job
Execution Scripts” on page 750 and “About the C_
REPOS_TABLE_OBJECT_V View” on page 751.
“Monitoring Job Results and Required for determining the execution results of job
Statistics” on page 755 execution scripts. For more information, see “Error
Messages and Return Codes” on page 755 and “Job
Execution Status” on page 756.
“Executing Batch Groups Using Required for executing batch jobs in groups via stored
Stored Procedures” on page 798 procedures using job scheduling software (such as Tivoli,
CA Unicenter, and so on). For more information, see
“About Executing Batch Groups” on page 798.
“Developing Custom Stored Required for create, registering, and running custom stored
Procedures for Batch Jobs” on procedures for batch jobs. For more information, see
page 806 “About Custom Stored Procedures” on page 806.

Configuring Hierarchies
If your Siperian Hub implementation uses Hierarchy Manager to manage hierarchies,
you need to configure hierarchies and their related objects, including entity icons, entity
objects and entity types, relationship base objects (RBOs) and relationship types,
Hierarchy Manager profiles, and Hierarchy Manager packages. For more information,
see Chapter 8, “Configuring Hierarchies.”

Configuring Workflow Integration


If your Siperian Hub implementation integrates with a supported workflow engine, you
need to enable states for base objects and configure other settings. For more
information, see “Configuring State Management for Base Objects” on page 211.

14 Siperian Hub Administrator Guide


Summary of Administration Tasks

Other Administration Tasks


In this document, Part 5, “Configuring Application Access,” and Part 6, “Appendixes,”
provide additional information about administration-related topics.
Other High-Level Administration Tasks
Task Usage
“Generating ORS-specific APIs Required for application developers to generate
and Message Schemas” on page ORS-specific SIF request APIs using the SIF Manager tool
817 in the Hub Console.
“Viewing Registered Custom Used for viewing the following types of user objects that are
Code” on page 909 registered in the selected ORS: user exits, custom stored
procedures, custom Java cleanse functions, and custom
button functions.
“Auditing Siperian Hub Services Used for integration auditing to track activities associated
and Events” on page 919 with the exchange of data between Siperian Hub and
external systems. For more information, see “About
Integration Auditing” on page 920.
“Backing Up and Restoring Used for backing up and restoring a Siperian Hub
Siperian Hub” on page 951 implementation.
“Configuring International Data Required only to configure different character sets in a
Support” on page 939 Siperian Hub implementation.
“Configuring User Exits” on Required only if user exits are used. For more information,
page 955 see “About User Exits” on page 956.
“Viewing Configuration Used for remotely monitoring a Siperian Hub environment,
Details” on page 967 showing configuration settings for the Hub Server, Cleanse
Match Servers, Master Database, and Operational Record
Stores.
“Implementing Custom Buttons Used only if you want to create custom buttons for Hub
in Hub Console Tools” on page Console users to provide on-demand, real-time access to
977 specialized data services. Applies only to the Merge
Manager, Data Manager, and Hierarchy Manager tools.

Introduction 15
Summary of Administration Tasks

16 Siperian Hub Administrator Guide


2
Getting Started with the Hub Console

This chapter introduces the Hub Console and provides a high-level overview of the
tools involved in configuring your Siperian Hub implementation.

Chapter Contents
• About the Hub Console
• Starting the Hub Console
• Navigating the Hub Console
• Siperian Hub Workbenches and Tools

17
About the Hub Console

About the Hub Console


Administrators and data stewards can access Siperian Hub features via the Siperian
Hub user interface, which is called the Hub Console. The Hub Console comprises a set
of tools. Each tool allows you to perform a specific action, or a set of related actions.

Note: The available tools in the Hub Console depend on your Siperian license
agreement. Therefore, your Hub Console tool might differ from the previous figure.

18 Siperian Hub Administrator Guide


Starting the Hub Console

Starting the Hub Console


To access the Hub Console:
1. Open a browser window and enter the following URL:
http://YourHubHost:port/cmx/

where YourHubHost is your local Siperian Hub host and port is the port number.
Check with your administrator for the correct port number.
Note: You must use an HTTP connection to start the Hub Console. SSL
connections are not supported.
The Siperian Hub launch screen is displayed.

2. Click the Launch button.

Getting Started with the Hub Console 19


Starting the Hub Console

The first time (only) that you launch Hub Console from a client machine, Java Web
Start downloads application files and displays a progress bar.

The Siperian Hub Login dialog box is displayed.

3. Enter your user name and password.


Note: If you do not have any user names set up, contact Siperian support.
4. Click OK.

20 Siperian Hub Administrator Guide


Starting the Hub Console

After you have logged in with a valid user name and password, Siperian Hub will
prompt you to choose a target database—the Master Database or an Operational
Record Store(ORS) with which to work.

The list of databases to which you can connect is determined by your security
profile.
• The Master Database stores Siperian Hub environment configuration
settings—user accounts, security configuration, ORS registry, message queue
settings, and so on. A given Siperian Hub environment can have only one
Master Database.
• An Operational Record Store (ORS) stores the rules for processing the
master data, the rules for managing the set of master data objects, along with
the processing rules and auxiliary logic used by the Siperian Hub in defining
the best version of the truth (BVT). A Siperian Hub configuration can have
one or more ORS databases.

Getting Started with the Hub Console 21


Starting the Hub Console

Throughout the Hub Console, an icon next to an ORS indicates whether it has
been validated and, if so, whether the most recent validation resulted in issues.

Image Meaning
Unknown. ORS has not been validated since it was initially created, or since the
last time it was updated.
ORS has been validated with no issues. No change has been made to the ORS
since the validation process was made.
ORS has been validated with warnings.

ORS has been validated and errors were found.

For more information, see Chapter 3, “About the Hub Store.”


5. Select the Master Database or the ORS to which you want to connect.
6. Click Connect.
Note: You can easily change the target database once inside the Hub Console, as
described in “Changing the Target Database” on page 31.

22 Siperian Hub Administrator Guide


Starting the Hub Console

The Hub Console screen is displayed, as shown in the following example (in which
the Schema Manager is selected from the Model workbench).
Menu

Workbenches/Processes Navigation Tree Properties Panel

When you select a tool from the Workbenches page or start a process from the
Processes page, the window is typically divided into several panes:

Pane Description
Workbenches Displays one of the following:
/ Processes
• List of workbenches and tools to which you have access (as shown in the
previous figure).
• List of the steps in the process that you are running.
Note: The workbenches and tools that you see depends on what your
company has purchased, as well as to what your administrator has given you
access. If you do not see a particular workbench or tool when you log into the
Hub Console, then your user account has not been assigned permission to
access it.
Navigation Allows you to navigate items (a list of objects) in the current tool.
Tree For example, in the Schema Manager, the middle pane contains a list of
schema objects (base objects, landing tables, and so on).

Getting Started with the Hub Console 23


Navigating the Hub Console

Pane Description
Properties Shows details (properties) for the selected item in the navigation tree, and
Panel possibly other panels if available in the current tool. Some of the properties
might be editable.

Navigating the Hub Console


This section describes how to navigate the Hub Console interface. Hub Console is a
collection of tools that you use to configure and manage your Siperian Hub
implementation (see “Siperian Hub Workbenches and Tools” on page 48 for a
complete list). Each tool allows you to focus on a particular area of your Siperian Hub
implementation.

Toggling Between the Processes and Workbenches Views


Siperian Hub groups its tools in two different ways:

Pane Description
By Workbenches Similar tools are grouped together by workbench—a logical collection of
related tools.
By Process Tools are grouped into a logical workflow that walks you through the
tools and steps required for completing a task.

You can click the tabs at the left-most side of the Hub Console window to toggle
between the Processes and Workbenches views.

Note: When you log into Siperian Hub, you see only those workbenches and processes
that contain the tools that your Siperian Hub security administrator has authorized you
to use. The screen shots in this document show the full set of workbenches, processes,
and tools available.

24 Siperian Hub Administrator Guide


Navigating the Hub Console

Workbenches View

To view tools by workbench:


• Click the Workbenches tab on the left side of the page.

Hub Console displays a list of available workbenches on the Workbenches tab.


The Workbenches view organizes Hub Console tools by similar functionality, as shown
in the following example.

Utilities Workbench

Tools in the
Utilities Workbench

The workbench names and tool descriptions are metadata-driven, as is the way
in which tools are grouped. It is possible to have customized tool groupings.
Therefore, the arrangement of tools and workbenches that you see after you log in to
Hub Console might differ somewhat from the previous figure.

Getting Started with the Hub Console 25


Navigating the Hub Console

Processes View

To view tools by process:


• Click the Processes tab on the left side of the page.

Hub Console displays a list of available processes on the Processes tab. Tools are
organized into common sequences or processes, as shown in the following example.

Available Processes

Processes step you through a logical sequence of tools to complete a specific task.
The same tool can belong to several processes, and can appear many times in one
process.

26 Siperian Hub Administrator Guide


Navigating the Hub Console

Starting a Tool in the Workbenches View


To start a Hub Console tool from the Workbenches view:
1. In the Workbenches view, expand the workbench that contains the tool that you
want to start (see “Siperian Hub Workbenches and Tools” on page 48).
2. If necessary, expand the workbench node to show the tools associated with that
workbench.
3. Click the tool.
If you selected a tool that requires a different database, the Hub Console prompts
you to select it.

All tools in the Configuration workbench (Databases, Users, Security Providers,


Tool Access, Message Queues, Metadata Manager, and Enterprise Manager)
require a connection to the master database. All other tools require a connection to
an ORS.

The Hub Console displays the tool that you selected.

Getting Started with the Hub Console 27


Navigating the Hub Console

Acquiring Locks to Change Settings in the Hub Console


In the Hub Console, a lock is required to make changes to the underlying schema.
All non-data steward tools (except the ORS security tools) are in read-only mode
unless you acquire a lock. Hub Console locking allows multiple users to make changes
to the Siperian Hub schema at the same time.

Types of Locks

In the Hub Console, the Write Lock menu provides two types of locks:

Type of Lock Description


exclusive lock Allows only one user to make changes to the underlying ORS, preventing any
other users from changing the ORS while the exclusive lock is in effect.
For more information, see “Acquiring an Exclusive Lock” on page 30.
write lock Allows multiple users to making changes to the underlying metadata at the
same time. Write locks can be obtained on the Master Database or on an
ORS. For more information, see “Acquiring a Write Lock” on page 30.

Note: Locks cannot be obtained on an ORS that is in production mode. If an ORS is


in production mode and you attempt to obtain a write lock, you will see a message
stating that you cannot acquire the lock. For more information, see “Editing ORS
Properties” on page 69.

Tools that Require a Lock

The following tools require a lock in order to make changes:

Master Database ORS


Databases Mappings
Users Cleanse Match Server
Security Providers Cleanse Functions
Tool Access Queries
Message Queues Packages

28 Siperian Hub Administrator Guide


Navigating the Hub Console

Master Database ORS


Metadata Manager Schema Manager
Schema Viewer
Secure Resources
Hierarchy Manager
Roles
Users and Groups
Batch Group
Systems and Trust
SIF Manager
Hierarchies

Note: The data steward tools—Data Manager, Merge Manager, and Hierarchy
Manager—do not require write locks. For more information about these tools, see the
Siperian Hub Data Steward Guide. The Audit Manager does not require write locks, either.

Automatic Lock Expiration

The Hub Console takes care of refreshing the lock every 60 seconds on the current
connection. The user can manually release a lock according to the instructions in
“Releasing a Lock” on page 30. If a user switches to a different database while holding
a lock, then the lock is automatically released. If the Hub Console is terminated, then
the lock expires after one minute.

Server Caching and Hub Console Locks

When no locks are in effect in the Hub Console, the Hub Server caches metadata and
other configuration settings for performance reasons. As soon as a Hub Console user
acquires a write lock or exclusive lock, caching is disabled, the cache is emptied, and
Siperian Hub retrieves this information from the database instead. When all locks are
released, caching is enabled again.

Getting Started with the Hub Console 29


Navigating the Hub Console

Acquiring a Write Lock

Write locks allow multiple users to edit data in the Hub Console at the same time.
However, write locks do not prevent those users from editing the same data at the time
time. In such cases, the most recently-saved changes prevail.

To acquire a write lock in Hub Console:


1. From the Write Lock menu, choose Acquire Lock.

• If the lock has already been acquired by someone else, then the login name and
machine address of that person is displayed.
• If the ORS in production mode, then a message is displayed explaining that
you cannot acquire the lock.
• If the lock is acquired successfully, then the tools are in read-write mode.
Multiple users can have a write lock per ORS or in the Master Database.
2. When you are finished, you can explicitly release the write lock according to the
instructions in “Releasing a Lock” on page 30.

Acquiring an Exclusive Lock

To acquire an exclusive lock in Hub Console:


1. From the Write Lock menu, choose Clear Lock to clear any write locks held by
other users, as described in “Clearing Locks” on page 31.
2. From the Write Lock menu, choose Acquire Exclusive Lock.
If the ORS is in production mode, then a message is displayed explaining that you
cannot acquire the exclusive lock.
3. When you are finished making changes, release the exclusive lock, as described in
“Releasing a Lock” on page 30.

Releasing a Lock

To release a lock in Hub Console:


• From the Write Lock menu, choose Release Lock.

30 Siperian Hub Administrator Guide


Navigating the Hub Console

Clearing Locks

You can force the release of any locks—write or exclusive locks—held by other users.
You might want to do this, for example, to obtain an exclusive lock on the ORS.
Because other users are not warned to save changes before their write locks are
released, you should use this only when necessary.

To clear all locks:


• From the Write Lock menu, choose Clear Lock.
Hub Console releases any locks on the ORS.

Changing the Target Database


The status bar at the bottom of the Hub Console window always shows:
• the name of the target database to which you connected
• the user name you used to log in

To change the target database in the Hub Console, do one of the following.
1. On the status bar, click the database name.

Getting Started with the Hub Console 31


Navigating the Hub Console

Hub Console prompts you to choose a target database with which to work.

For a description of the types of databases that you can select, see “Starting the
Hub Console” on page 19.
2. Select the Master Database or the ORS to which you want to connect.
3. Click Connect.

Logging in as a Different User


To log in as a different user in the Hub Console:
1. Click the user name on the status bar.

2. From the Options menu, choose Re-Login As....


3. Specify the user name and password for the user account that you want to use.

Changing the Password for a User


To change the password for the currently logged-in user in the Hub Console:
1. From the Options menu, choose Change Password.

2. Specify the password that you want to use instead.


3. Click OK.

32 Siperian Hub Administrator Guide


Navigating the Hub Console

Using the Navigation Tree in the Navigation Pane


The navigation tree in the Hub Console allows you to view and manage a hierarchical
collection of objects. This section uses the Schema Manager as an example, but the
functionality described in this section also applies to using the navigation tree for the
following Hub Console tools: Message Queues, Mappings, Queries, Packages, Schema,
Users and Groups, and the Batch Viewer.

Parent and Child Nodes

Each named object is represented as a node in the hierarchy tree. A node that contains
other nodes is called a parent node. A node that belongs to a parent node is called a child
node.

Getting Started with the Hub Console 33


Navigating the Hub Console

In the following example in the Schema Manager, the Address base object is the parent
node to the associated child nodes (Columns, Cross-Reference, Dependent Objects,
and so on).

Parent Node
(Address Base Object)

Child Nodes
(of Address)

Tree Options

Showing and Hiding Child Nodes

To show child nodes beneath a parent node:


• Click the plus (+) sign next to the parent node.

To hide child nodes beneath a parent node:


• Click the minus (-) sign next to the parent node.

34 Siperian Hub Administrator Guide


Navigating the Hub Console

Sorting by Display Name

The display name is the name of an object as it appears in the navigation tree. You can
change the order in which the objects are displayed in the navigation tree by clicking
Sort By in the tree options area and selecting the appropriate sort option.

Choose from the following sort options:


• Display Name (a-z) sorts the objects in the tree alphabetically according to
display name.
• Display Name (z-a) sorts the objects in the tree in descending alphabetical order
according to display name.

Filtering Items

You can filter the items shown in the navigation tree by clicking the Filter area at the
bottom of the left pane and selecting the appropriate filter option. The figures in this
section are from the Schema Manager, but the sample principles apply to other Hub
Console tools for which filtering is available.

Choose from the following filter options:


• No Filter (All Items)—Removes any filter that was previously defined.
• One Item—Displays a drop-down list above the navigation tree from which to
select an item.
In the Schema Manager, for example, you can choose Table type or Table.

Getting Started with the Hub Console 35


Navigating the Hub Console

If you choose Table type, you click the down arrow to display a list of table types
from which to select for your filter.

Select a Type

36 Siperian Hub Administrator Guide


Navigating the Hub Console

• If you choose Table, you click the down arrow to display a list of tables from which
to select for your filter.

Select a Table

• Some Items—Allows you to select one or more items.

Getting Started with the Hub Console 37


Navigating the Hub Console

For example, in the Schema Manager, you can choose tables based on either the
table type or table name. When you choose Some Items, the Hub Console displays
the Define Item Filter button above the navigation tree.

• Click the Define Item Filter button.

Select All Items


Clear All Selected Items

• Select the item(s) that you want to include in the filter, and then click OK.

38 Siperian Hub Administrator Guide


Navigating the Hub Console

Note: Use the No Filter (All Items) option to remove the filter.

Changing the Item View

Certain Hub Console tools show a View or View By area below the navigation tree.
• In the Schema Manager, you can show or hide the public Siperian Hub items by
clicking the View area below the navigation tree and choosing the appropriate
command.

For example, you can view all system tables.

• In the Mappings tool, you can view items by mapping, staging table, or landing
table.
• In the Packages tool, you can view items by package or by table.
• In the Users and Groups tool, you can display sub groups and sub users. In the
Batch Viewer, you can group jobs by table, date, or procedure type.

Getting Started with the Hub Console 39


Navigating the Hub Console

Searching For Items

When there is no filter, or when the Some Items filter is selected, Hub Console displays
a Find area above the navigation tree so that you can search for items by name.

For example, in the Schema Manager, you can search for tables and columns.
1. Click anywhere in the Find area to display the Find window.

2. Type the name (or first few letters of the name) that you want to find.
3. Click the F3 - Find button.

40 Siperian Hub Administrator Guide


Navigating the Hub Console

The Hub Console highlights the matched item(s). In the following example, the
Schema Manager displays the list of tables and highlights the table matches the find
criteria:

4. Click anywhere in the Find area to hide the Find window.

Running Commands On Objects in the Navigation Tree

To run commands on an object in the navigation tree, do one of the following:


• Right-click an object name to display a pop-up menu of commands that you can
perform on the object.
OR
• Select an object in the navigation tree, and then choose the command you want
from the Hub Console menu at the top of the window.

Getting Started with the Hub Console 41


Navigating the Hub Console

Note: Whenever possible, this document describes the first approach—right-clicking


an object in the navigation tree and choosing a command from the pop-up menu.
Alternatively, however, you can always choose the command from the Hub Console
menu.

For example, in the Schema Manager, you can right-click on certain types of objects in
the navigation tree to see a popup menu of the commands available for the selected
object.

Popup Menu

42 Siperian Hub Administrator Guide


Navigating the Hub Console

Adding, Editing and Removing Objects Using Command


Buttons
This section describes generally how you use command buttons to add, edit, and delete
objects in the Hub Console.

Command Buttons

If you have access to create, modify, or delete objects in a Hub Console window, and if
you have acquired a write lock (“Acquiring a Write Lock” on page 30), you might see
some or all of the following command buttons in the Properties panel. There are other
command buttons as well.

Button Name Description


Add Add a new object.

Edit Edit a property for the selected item in the Properties panel. Indicates that
the property is editable.
Delete Remove the selected item.

Save Save changes.

Getting Started with the Hub Console 43


Navigating the Hub Console

The following figure shows an example of command buttons on the right side of the
properties panel for the Secure Resources tool.

Command Buttons

To see a description about what a command button does, hold the mouse over the
button to display a tooltip, as shown in the following example.

Tooltip

Adding Objects

To add an object:
1. Acquire a write lock.

2. In the Hub Console tool, click the Add button.


The Hub Console displays an Add object window, where object is the name of the
type of object that you are adding.
3. Specify the object properties.

44 Siperian Hub Administrator Guide


Navigating the Hub Console

4. Click OK.

Editing Object Properties

To edit an object’s properties:


1. Acquire a write lock.

2. In the Hub Console tool, select the object whose properties you want to edit.
3. For each property that you want to edit, click the Edit button next to it, and
specify the new value.
4. Click the Save button to save your changes.

Removing Objects

To remove an object:
1. Acquire a write lock.

2. In the Hub Console tool, select the object that you want to remove.
3. Click the Remove button.
4. If prompted to confirm deletion, choose the appropriate option (OK or Yes) to
confirm deletion.

Customizing the Hub Console Interface


To customize the Hub Console interface:
1. From the Options menu, choose Options.

Getting Started with the Hub Console 45


Navigating the Hub Console

The Options dialog box is displayed.

2. Specify the options you want, including:


• General tab: Specify whether to show wizard welcome screens, and whether
to save window sizes and positions.
• Quick Launch tab: Specify tools that you want to appear as icons in a tool
bar below the menu, as shown in the following example.
Toolbar

46 Siperian Hub Administrator Guide


Navigating the Hub Console

Showing Version Details


To show version details about the currently-installed Siperian Hub:
1. In the Hub Console, choose Help | About.

The Hub Console displays the About Siperian Hub dialog.

2. Click Installation Details.

Getting Started with the Hub Console 47


Siperian Hub Workbenches and Tools

The Hub Console displays the Installation Details dialog.

3. Click Close.
4. Click Close.

Siperian Hub Workbenches and Tools


This section provides an overview of the Siperian Hub workbenches and tools.

Tools in the Configuration Workbench


Icon Tool Name Description
Databases Register and manage Operational Record Stores (ORSs).
To learn more, see Chapter 4, “Configuring Operational Record
Stores and Datasources.”
Users Define users and specify which databases they can access.
Manage global and individual password policies. Note that
Siperian Hub supports external authentication for users, such as
LDAP. For more information, see Chapter 20, “Configuring
Siperian Hub Users.”
Security Providers Configure security providers, which are third-party organizations
that provide security services (authentication, authorization, and
user profile services) for users accessing Siperian Hub. For more
information, see “Managing Security Providers” on page 889.
Tool Access Define which Hub Console tools and processes a user can
access. By default, new user accounts do not have access to any
tools until access is explicitly assigned. For more information,
see Appendix F, “Configuring Access to Hub Console Tools.”

48 Siperian Hub Administrator Guide


Siperian Hub Workbenches and Tools

Icon Tool Name Description


Message Queues Define inbound and outbound message queue interfaces to
Siperian Hub. For more information, see Chapter 16,
“Configuring the Publish Process.”
Metadata Manager Validate Operational Record Store (ORS) metadata, promote
changes between repositories, import objects into repositories,
and export repositories. For more information, see the Siperian
Hub Metadata Manager Guide.
Enterprise Manager View configuration details and version information for the Hub
Server, Cleanse Servers, the Master Database, and Operational
Record Stores. For more information, see Appendix D, “Viewing
Configuration Details.”

Tools in the Model Workbench


Icon Tool Name Description
Schema Define base objects, dependent objects, relationships, history
and security requirements, staging and landing tables, validation
rules, match criteria, and other data model attributes. To learn
more, see Chapter 5, “Building the Schema.”
Schema Viewer View and navigate the current schema. For more information,
see “Viewing Your Schema” on page 148.
Systems and Trust Name the source systems that can provide data for consolidation
in Siperian Hub. Define the trust settings associated with each
source system for each base object column. For more
information, see “Configuring Source Systems” on page 348 and
“Configuring Trust for Source Systems” on page 455.
Queries Define query groups and queries used by packages. To learn
more, see “Configuring Queries” on page 162.
Packages Define packages (table views). To learn more, see “Configuring
Packages” on page 196.
Cleanse Functions Define cleanse functions to perform on your data. For more
information, see “Using Cleanse Functions” on page 414.
Mappings Map cleansing function outputs to target columns in staging
tables. For more information, see “Mapping Columns Between
Landing and Staging Tables” on page 380.

Getting Started with the Hub Console 49


Siperian Hub Workbenches and Tools

Icon Tool Name Description


Hierarchies Set up the structures required to view and manipulate data
relationships in Hierarchy Manager. For more information, see
Chapter 8, “Configuring Hierarchies.”

Tools in the Security Access Manager Workbench


Icon Tool Name Description
Secure Resources Manage secure resources in Siperian Hub. Configure the status
(Private, Secure) for each Siperian Hub resource, and define
resource groups to organize secure resources. For more
information, see “Securing Siperian Hub Resources” on page
841.
Roles Define roles and privilege assignments to resources and resource
groups. Assign roles to users and user groups. For more
information, see “Configuring Roles” on page 854.
Users and Groups Manage the users and user groups within a single Hub Store.
To learn more, see Chapter 20, “Setting Up Security.”

Tools in the Data Steward Workbench


For more information about these tools, see the Siperian Hub Data Steward Guide.

Icon Tool Name Description


Data Manager Manage the content of consolidated data, view cross-references,
edit data, view history and unmerge consolidated records.
To learn more, see the Siperian Hub Data Steward Guide.
Merge Manager Review and merge the matched records that have been queued
for manual merging. For more information, see the Siperian
Hub Data Steward Guide.
Hierarchy Manager Define and manage hierarchical relationships in their Hub Store.
For more information, see the Siperian Hub Data Steward Guide.

50 Siperian Hub Administrator Guide


Siperian Hub Workbenches and Tools

Tools in the Utilities Workbench


Icon Tool Name Description
Batch Group Configure and run batch groups, which are collections of
individual batch jobs (for example, Stage, Load, and Match jobs)
that can be executed with a single command. For more
information, see “Running Batch Jobs Using the Batch Viewer
Tool” on page 674.
Batch Viewer Execute batch jobs to cleanse, load, match or auto-merge data,
and view job logs. For more information, see “Running Batch
Jobs Using the Batch Viewer Tool” on page 674.
Cleanse Match View Cleanse Match Server information, including name, port,
Server server type, and whether server is on or offline. For more
information, see “About the Cleanse Match Server” on page 407.
Audit Manager Configure auditing and debugging of application requests and
message queue events. For more information, see Chapter 22,
“Auditing Siperian Hub Services and Events.”
SIF Manager Generate ORS-specific Services Integration Framework (SIF)
request APIs. SIF Manager generates and deploys the code to
support SIF request APIs for packages, remote packages,
mappings, and cleanse functions in an ORS. Once generated, the
ORS-Specific APIs are available as a Web service and via the
Siperian Client JAR. For more information, see Chapter 19,
“Generating ORS-specific APIs and Message Schemas.”
User Object View registered user exits, user stored procedures, custom Java
Registry cleanse functions, and custom GUI functions for an ORS. For
more information, see Chapter 21, “Viewing Registered Custom
Code.”

Getting Started with the Hub Console 51


Siperian Hub Workbenches and Tools

52 Siperian Hub Administrator Guide


Part 2
Building the Data Model

Contents
• Chapter 3, “About the Hub Store”
• Chapter 4, “Configuring Operational Record Stores and Datasources”
• Chapter 5, “Building the Schema”
• Chapter 6, “Configuring Queries and Packages”
• Chapter 7, “State Management”
• Chapter 8, “Configuring Hierarchies”

53
54 Siperian Hub Administrator Guide
3
About the Hub Store

The Hub Store is where business data is stored and consolidated in Siperian Hub.
The Hub Store contains common information about all of the databases that are part
of your Siperian Hub implementation.

Chapter Contents
• Databases in the Hub Store
• How Hub Store Databases Are Related
• Creating Hub Store Databases
• Version Requirements

55
Databases in the Hub Store

Databases in the Hub Store


The Hub Store is a collection of databases that includes:

Element Description
Master Database Contains the Siperian Hub environment configuration settings—user
accounts, security configuration, ORS registry, message queue
settings, and so on. A given Siperian Hub environment can have only
one Master Database. The default name of the Master Database is
CMX_SYSTEM.
In the Hub Console, the tools in the Configuration workbench
(Databases, Users, Security Providers, Tool Access, and Message
Queues) manage configuration settings in the Master Database.
Operational Record Database that contains the master data, content metadata, the rules
Store (ORS) for processing the master data, the rules for managing the set of
master data objects, along with the processing rules and auxiliary
logic used by the Siperian Hub in defining the best version of the
truth (BVT). A Siperian Hub configuration can have one or more
ORS databases. The default name of an ORS is CMX_ORS.

Users for Hub Store databases are created globally—within the Master Database—and
then assigned to specific ORSs. The Master Database also stores site-level information,
such as the number of incorrect log-in attempts allowed before a user account is locked
out.

56 Siperian Hub Administrator Guide


How Hub Store Databases Are Related

How Hub Store Databases Are Related


A Siperian Hub implementation contains one Master Database and zero or more
ORSs. If no ORS exists, then only the Configuration workbench tools are available in
the Hub Console. A Siperian Hub implementation can have multiple ORSs, such as
separate ORSs for development and production, or separate ORSs for each
geographical location or for different parts of the organization.

You can access and manage multiple ORSs from one Master Database. The Master
Database stores the connection settings and properties for each ORS.

Note: An ORS can be registered in only one Master Database. Multiple Master
Databases cannot share the same ORS. A single ORS cannot be associated with
multiple Master Databases.

About the Hub Store 57


Creating Hub Store Databases

Creating Hub Store Databases


Databases are initially created and configured when you install Siperian Hub.
• To create the Master Database and one ORS, you run the setup.sql script.
• To create an individual ORS, you run the setup_ors.sql script.

To learn more, see the Siperian Hub Installation Guide for your platform.

Version Requirements
Different versions of the Siperian Hub cannot operate together in the same
environment. All components of your installation must be the same version, including
the Siperian Hub software and the databases in the Hub Store.

If you want to have multiple versions of Siperian Hub at your site, you must install each
version in a separate environment. If you try to work with a different version of a
database, you will receive a message telling you to upgrade the database to the current
version.

58 Siperian Hub Administrator Guide


4
Configuring Operational Record Stores
and Datasources

This chapter describes how to configure Operational Record Store (ORS) and
datasources for the Hub Store using the Databases tool in the Hub Console.

Chapter Contents
• Before You Begin
• About the Databases Tool
• Starting the Databases Tool
• Configuring Operational Record Stores
• Configuring Datasources

59
Before You Begin

Before You Begin


Before you begin, you must have installed Siperian Hub, created the Master Database
and at least one ORS (running the setup.sql script creates both) according to the
instructions in the Siperian Hub Installation Guide for your platform. You can create
additional ORSs by running the setup_ors.sql script.

About the Databases Tool


After the Hub Store has been created, you can use the Databases tool in the Hub
Console to complete the following tasks:
• Register an ORS so that the Master Reference Manager can connect to it.
Registration stores the database connection properties in the Master Database.
• Define an ORS datasource in the application server environment for Siperian Hub.
An ORS datasource contains a set of properties for the ORS, such as the location of
the database server, the name of the database, the network protocol used to
communicate with the server, the database user ID and password, and so on.

Note: The Databases tool refers to an ORS as a database.

60 Siperian Hub Administrator Guide


Starting the Databases Tool

Starting the Databases Tool


To start the Databases tool:
1. In the Hub Console, connect to your Master Database. To learn more, see
“Changing the Target Database” on page 31.
2. Expand the Siperian Configuration workbench and then click Databases.
The Hub Console displays the Databases tool, as shown in the following example
(in which a registered ORS is selected).

Registered
ORSs

ORS
Properties

The Databases tool displays the following areas:

Column Description
Number of databases Number of ORSs currently defined in the Hub Store.
Database List List of registered Siperian Hub ORSs.
Database Properties Database properties for the selected ORS.

Configuring Operational Record Stores and Datasources 61


Configuring Operational Record Stores

Configuring Operational Record Stores


This section describes how to configure an ORS in your Hub Store. If you need
assistance with configuring the ORS, consult with your database administrator. For
more information about Operational Record Stores, see “Databases in the Hub Store”
on page 56 and the Siperian Hub Installation Guide for your platform.

Registering an ORS
Note: Registering an ORS will fail if you try to register an ORS that does not contain
the Siperian Hub repository objects or Siperian Hub procedures.

To register an ORS:
1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the button.

62 Siperian Hub Administrator Guide


Configuring Operational Record Stores

The Databases tool displays the Register Database dialog box. By default, the
selected database type is Oracle.

4. If you are registering a DB2 database, select DB2 in the Database type drop-down
list.

Configuring Operational Record Stores and Datasources 63


Configuring Operational Record Stores

The Databases tool displays the Register Database dialog box for a DB2 database.

5. Specify the following settings. Note that Oracle and DB2 have slightly different
settings.

64 Siperian Hub Administrator Guide


Configuring Operational Record Stores

Note: The Schema Name and the User Name are both the name of the ORS that
was specified in the script used to create the ORS. If you need this information,
consult your database administrator.

Property Description
Identity
Database Display Name for this ORS as it will be displayed in the Hub Console.
Name
Machine Identifier Prefix given to keys to uniquely identify records from this
instance of the Hub Store.
Connection
Properties
Database type One of the following values: Oracle or DB2.
Database hostname Oracle only. IP address or name (if supported on your network)
of the server hosting the Oracle database.
Database server name DB2 only. IP address or name (if supported on your network)
of the database server.
Oracle SID Oracle only. Oracle System Identifier (SID) that refers to the
instance of the Oracle database running on the server.
Database name DB2 only. Name of the DB2 database.
Note: The DB2 database needs to be cataloged via the DB2
client on the application server machine.
Port One of the following settings:
• Oracle: The TCP port of the Oracle listener running on the
Oracle database server. The Oracle installation default is
1521.
• DB2: The TCP port on which the database server listens
for connections. The DB2 installation default is 50000.
Oracle TNS Name Oracle only. Name by which the database is known on your
network as defined in the application server’s TNSNAMES.ORA
file. For example:
mydatabase.mycompany.com
This value is set when you install Oracle. See your Oracle
documentation to learn more about this name.
Schema Name Name of the ORS.

Configuring Operational Record Stores and Datasources 65


Configuring Operational Record Stores

Property Description
User Name User name for the ORS. By default, this is the user name that
was specified in the script used to create the ORS. This user
owns all of the ORS database objects in the Hub Store.
If a proxy user has been configured for this ORS, then you can
specify the proxy user instead. For instructions on running of
the setup_ors.sql script and defining proxy users, see the Siperian
Hub Installation Guide.
Password Password associated with the User Name for the ORS.
• For Oracle, this password is case-insensitive.
• For DB2, this password is case-sensitive.
By default, this is the password associated with the user name
that was specified in the script used to create the ORS.
If a proxy user has been configured for this ORS, then you
specify the password for the proxy user instead. For instructions
on running of the setup_ors.sql script and defining proxy users,
see the Siperian Hub Installation Guide.
Create datasource after Check (select) to create the datasource on the application server
registration after registration. For WebLogic users, you will need to specify
the WebLogic username and password.

6. If you want to create the datasource on the application server after registration,
check (select) the Create datasource after registration check box.
Siperian Hub uses the datasources provided by the application server and,
therefore, does not write any data to the ORS at the time of registration.
Note for WebLogic: If you are using WebLogic, a dialog box prompts you for
your username and password. This process writes only to the Master Database.
The ORS and datasource need not be available at registration time.
If you do not check this option, then you will need to manually configure the
datasource, as described in “Configuring Datasources” on page 77.
7. Click OK.
8. Test your database connection settings. To learn more, see “Testing ORS
Connections” on page 71.

66 Siperian Hub Administrator Guide


Configuring Operational Record Stores

Note: When you register an ORS that has been used elsewhere, and if the ORS
already has Cleanse Match Servers registered and no other servers get registered,
then you need to re-register one of the Cleanse Match Servers. This updates the
data in c_repos_db_release.

Editing ORS Registration Properties


Only certain ORS registration properties are editable. For non-editable properties, you
must instead unregister and re-register the ORS with the new properties.

To edit registration settings for an ORS:


1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Select the ORS that you want to configure.
4. Click the button.
The Databases tool displays the Update Database Registration dialog box for the
selected ORS.
Oracle Settings

Configuring Operational Record Stores and Datasources 67


Configuring Operational Record Stores

DB2 Settings

5. Edit any of the following settings:


• Database display name
• Password
By default, this is the password associated with the user name that was
specified when the ORS was created. If a proxy user has been configured for
this ORS, then you specify the password for the proxy user instead.
For instructions on running of the setup_ors.sql script and defining proxy
users, see the Siperian Hub Installation Guide.
• Update datasource after registration check box
• Oracle TNS name (Oracle only)
6. To update the datasource on the application server with the modified settings,
select (check) the Update datasource after registration check box
Note: Updating the datasource settings might cause the JDBC connection pool
settings to be reset to the default values. Be sure to check the JDBC connection
pool settings before and after you click OK so that you can reapply any
customizations to the JDBC connection pool settings.
7. Click OK.
The Databases tool saves your changes.

68 Siperian Hub Administrator Guide


Configuring Operational Record Stores

8. Test your updated database connection settings. To learn more, see “Testing ORS
Connections” on page 71.

Editing ORS Properties


To change properties for a registered ORS:
1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Select the ORS that you want to configure.
The Databases tool displays the database properties for the selected ORS.

Configuring Operational Record Stores and Datasources 69


Configuring Operational Record Stores

The following table describes these properties.

Property Description
Database Type Oracle or DB2
Database ID Identification for the ORS. This ID is used in SIF requests.
The database ID lookup is case-sensitive.
The format for the database ID is:
jdbc/siperian-hostname-sid-databasename
Example:
jdbc/siperian-aiz01-aix01-cmx_ors-ds
When registering a new ORS, the host, server, and database
names are normalized.
• Host name is converted to lowercase.
• Database name is converted to uppercase (the standard for
schemas, tables, etc.).
The normalization of each field can be done on a
database-specific basis so that it can be changed if needed.
JNDI Datasource Displays the datasource JNDI name for the selected ORS.
Name This is the JNDI name that is configured for this JDBC
connection on the application server.
Machine Identifier Prefix given to keys to uniquely identify records from this
instance of the Hub Store.
GETLIST Limit Limits the number of records returned through SIF search
(records) requests, such as searchQuery, searchMatch, getLookupValues,
and so on.

70 Siperian Hub Administrator Guide


Configuring Operational Record Stores

Property Description
Production Mode Specifies whether this ORS is in production mode.
• If not enabled (unchecked, the default), production mode is
disabled, allowing authorized users to edit metadata for this
ORS in the Hub Console.
• If enabled (checked), then production mode is enabled.
Users cannot make changes to the metadata for this ORS.
If a user attempts to acquire a write lock on an ORS in
production mode, the Hub Console will display a message
explaining that the lock cannot be obtained.
Note: Only Siperian Hub administrator users can change this
setting.
For more information, see “Changing an ORS to Production
Mode” on page 75.

4. To change a property, click the button next to it, and edit the property.
5. Click the Save button to save your changes.
If production mode is enabled for an ORS, then the Databases tool displays a lock
icon next to it in the list.

Production mode enabled

Testing ORS Connections


To test a Hub Store connection to an ORS:
1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Select the ORS that you want to test.

Configuring Operational Record Stores and Datasources 71


Configuring Operational Record Stores

4. Click the button.

The Test Database command tests for:


• the database connection parameters via the JDBC connection
• the existence of the datasource
• a valid connection via the datasource
• a valid ORS version
Note for WebSphere: If the test connection fails through the Hub Console, verify
that the test connection is successful from the WebSphere Console. The JNDI
name is case sensitive and should match what is generated in the Hub Console.
5. Click OK.

Changing Passwords
To change passwords for the Master Database or an ORS, you need to make changes
first on your database server and possibly on your application server as well.

Changing the Password for the Master Database

To change the Master Database password:


1. On your database server, change the password for the CMX_SYSTEM database.

2. Log into the administration console for your application server and edit the
datasource connection information, specifying the new password for CMX_
SYSTEM, and then saving your changes.

72 Siperian Hub Administrator Guide


Configuring Operational Record Stores

Changing the Password for an ORS

To change the password for an ORS, there are two options.

Option One
1. On your database server, change the password for the ORS schema.

2. Start the Hub Console and select Master Database as the target database. To learn
more, see “Changing the Target Database” on page 31.
3. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
4. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
5. Select the ORS that you want to configure.
6. Click the button.
The Databases tool displays the Update Database Registration dialog box for the
selected ORS.
7. Enter the new password in the Password text box.
8. Check (select) the Update datasource after registration check box.
9. Click OK.
10. Test your updated database connection settings. To learn more, see “Testing ORS
Connections” on page 71.

Option Two
1. On your database server, change the password for the ORS schema.
2. Start the Hub Console and select Master Database as the target database. To learn
more, see “Changing the Target Database” on page 31.
3. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
4. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
5. Select the ORS that you want to configure.
6. In the Database Properties panel, make a note of the JNDI Datasource Name for
the selected ORS.

Configuring Operational Record Stores and Datasources 73


Configuring Operational Record Stores

7. Log into the administration console for your application server and edit the
datasource connection information for this ORS, specifying the new password for
the noted JDNI Datasource name, and then saving your changes.

Encrypting Passwords

In order to successfully change the schema password, you must change it in the data
sources defined in the application server. This password is not encrypted, because the
application server protects it. In addition to updating the data sources on the
application server, Siperian requires that the password to be encrypted and stored in
various tables.

Steps to Encrypt New Passwords

To encrypt the new password, execute the following command from the prompt:
java -classpath siperian-common.jar
com.siperian.common.security.Blowfish

The results will be echoed to the terminal window:


Plaintext Password: your_new_password
Encrypted Password: encrypted password

For example, if admin is your new password, then the command would be:
java -classpath siperian-common.jar
com.siperian.common.security.Blowfish admin
Plaintext Password: admin
Encrypted Password: A75FCFBCB375F229

Steps to Update Passwords for Your Schema

Execute the following commands to update the passwords for your ORS and Master
Database:

To update your ORS database password:


UPDATE C_REPOS_DB_RELEASE SET DB_PASSWORD = '';
COMMIT;

74 Siperian Hub Administrator Guide


Configuring Operational Record Stores

To update your Master Database password:


UPDATE C_REPOS_DATABASE SET PASSWORD = '' WHERE USER_NAME =

CMX_SYSTEM/ORS User and Passwords

User-name and passwords that can be changed when installing/configuring the MRM:
• The CMX_SYSTEM user should not be changed.
• The CMX_SYSTEM password can be changed after the MRM is installed. You
need to change the password for the CMX user in Oracle, and you need to set the
same password in the datasource on the application server.
• The CMX_ORS user and password can be changed when the setup_ors.sql is run.
You need to use the same password when registering the ORS in the Hub Console.

Changing an ORS to Production Mode


The Hub Console allows administrators to lock the design of an ORS by enabling
production mode. Once production mode is enabled, write locks and exclusive locks
are not permitted, and no changes can be made to the schema definition in the ORS.
When a Hub Console user attempts to place a lock on an ORS for which production
mode is enabled, the Hub Console displays a message to the user explaining that the
lock cannot be obtained because the ORS is in production mode. For more
information, see “Acquiring Locks to Change Settings in the Hub Console” on page
28.

To change the production mode flag for an ORS:


1. Log into the Hub Console with administrator-level privileges to the Siperian Hub
implementation.
In order to change this setting, you must have sufficient privileges to run the
Databases tool and be able to obtain a lock on the Master Database.
2. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
3. Clear any exclusive locks on the ORS.
Note: This setting cannot be changed if the ORS is locked exclusively.

Configuring Operational Record Stores and Datasources 75


Configuring Operational Record Stores

4. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
5. Select the ORS that you want to configure.
The Databases tool displays the database properties for the selected ORS.
6. Change the setting of the Production Mode check box, as described in “Editing
ORS Properties” on page 69.
Select (check) the check box to enable production mode, or clear (uncheck) it to
disable it.
7. Click the Save button to save your changes.

Unregistering an ORS
Unregistering an ORS removes the connection information to this ORS from the
Master Database and removes the datasource definition from the application server
environment.

To unregister an ORS:
1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Select the ORS that you want to unregister.
4. Click the button.
Note: If you are running WebLogic, enter the WebLogic user name and password
when prompted.
The Databases tool prompts you to confirm unregistering the ORS.
5. Click Yes.

76 Siperian Hub Administrator Guide


Configuring Datasources

Configuring Datasources
This section describes how to configure datasources for an ORS. Every ORS requires a
datasource definition in the application server environment.

About Datasources
In Siperian Hub, a datasource specifies properties for an ORS, such as the location of the
database server, the name of the database, the database user ID and password, and so
on. A Siperian Hub datasource points to a JDBC resource defined in your application
server environment. To learn more about JDBC datasources, see your application
server documentation.

Managing Datasources in WebLogic


For WebLogic application servers, whenever you attempt to add, delete, or update a
datasource, Siperian Hub prompts you to specify the application server administrative
username and password. If you are performing multiple operations in the Databases
tool, this dialog box remembers the last username that was entered, but always requires
you to enter the password.

Creating Datasources
You might need to explicitly create a datasource if, for example, you created an ORS
using a different application server, or if you did not check (select) the Create
datasource after registration check box when registering the ORS.

To create a datasource:
1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Right-click the ORS in the Databases list, and then choose Create Datasource.
Note: If you are running WebLogic, enter the WebLogic user name and password
when prompted.

Configuring Operational Record Stores and Datasources 77


Configuring Datasources

The Databases tool creates the datasource and displays a progress message.

4. Click OK.

Removing Datasources
If you have registered an ORS with a configured datasource, you can use the Databases
tool to manually remove its datasource definition from your application server. After
removing the datasource definition, however, the ORS will still appear in Hub Console.
To completely remove a database from the Hub Console, you need to unregister it (see
“Unregistering an ORS” on page 76).

To remove a datasource:
1. Start the Databases tool. To learn more, see “Starting the Databases Tool” on page
61.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Right-click an ORS in the Databases list, and then choose Remove Datasource.
Note: If you are running WebLogic, enter the WebLogic user name and password
when prompted.

78 Siperian Hub Administrator Guide


Configuring Datasources

The Databases tool removes the datasource and displays a progress message.

4. Click OK.

Configuring Operational Record Stores and Datasources 79


Configuring Datasources

80 Siperian Hub Administrator Guide


5
Building the Schema

This chapter explains how to design and build your schema in Siperian Hub.

Chapter Contents
• Before You Begin
• About the Schema
• Starting the Schema Manager
• Configuring Base Objects
• Configuring Dependent Objects
• Configuring Columns in Tables
• Configuring Foreign-Key Relationships Between Base Objects
• Viewing Your Schema

81
Before You Begin

Before You Begin


Before you begin, you must have installed Siperian Hub and created the Hub Store
(including on Operational Record Store) according to the instructions in Siperian Hub
Installation Guide.

About the Schema


The schema is the data model that is used in your Siperian Hub implementation. Siperian
Hub does not impose or require any particular schema. The schema exists inside
Siperian Hub and is independent of the source systems providing data to Siperian Hub.

Note: The process of designing the schema for your Siperian Hub implementation is
outside the scope of this document. It is assumed that you have developed a data
model—using industry-standard data modeling methodologies—that is based on a
thorough understanding of your organization’s requirements and in-depth knowledge
of the data you are working with.

The Siperian schema is a flexible, repository-driven model that supports the data
structure of any vertical business sector. The Hub Store is the database that underpins
Siperian Hub and provides the foundation of Siperian Hub’s functionality. Every
Siperian Hub installation has a Hub Store, which includes one Master Database and
one or more Operational Record Store (ORS) databases. Depending on the
configuration of your system, you can have multiple ORS databases in an installation.
For example, you could have a development ORS, a testing ORS, and a production
ORS. For more information, see Chapter 3, “About the Hub Store,” and Chapter 4,
“Configuring Operational Record Stores and Datasources.”

Before you begin to implement the schema, you must understand the basic structure of
the underlying Siperian Hub schema and its components. This section introduces the
most important tables in an ORS and how they work together.

Note: You must use tools in the Hub Console to define and manage the consolidated
schema—you cannot make changes directly to the database. For example, you must use
the Schema Manager to define tables and columns. For details, see“Requirements for
Defining Schema Objects” on page 87.

82 Siperian Hub Administrator Guide


About the Schema

Types of Tables in an Operational Record Store


An ORS contains both tables that you configure and system support tables.

Configurable Tables

The following types of Siperian Hub tables are used to model business reference data.
You must explicitly create and configure these tables.
Types of Configurable Tables in an ORS
Type of Table Description
base object Used to store data for a central business entity (such as customer,
product, or employee) or a lookup table (such as country or state). In a
base object table (or simply a base object), you can consolidate data from
multiple source systems and use trust settings to determine the most
reliable value of each base object cell. You can define one-to-many
relationships between base objects. Base objects must be explicitly
created and configured according to the instructions in “Process
Overview for Defining Base Objects” on page 94.
dependent object Used to store detailed information about the records in a base object
(for example, supplemental notes). One record in a base object can map
to multiple records in a dependent object table (or simply a dependent
object). Dependent objects must be explicitly created and configured
according to the instructions in “Process Overview for Defining
Dependent Objects” on page 119.
landing table Used to receive batch loads from a source system. Landing tables must
be explicitly created and configured according to the instructions in
“Configuring Landing Tables” on page 355.
staging table Used to load data into a base objects and dependent objects. Mappings
are defined between landing tables and staging tables to specify whether
and how data is cleansed and standardized when it is moved from a
landing table to a staging table. Staging tables must be explicitly created
and configured according to the instructions in “Configuring Staging
Tables” on page 364.

Building the Schema 83


About the Schema

Infrastructure Tables

The following types of Siperian Hub infrastructure tables are used to manage and
support the flow of data in the Hub Store. Siperian Hub automatically creates,
configures, and maintains these tables whenever you configure base objects and
dependent objects.
Types of Infrastructure Tables in an ORS
Type of Table Description
cross-reference Used for tracking the origin of each record in the base object. Named
table according to the following pattern:
C_baseObjectName_XREF
where baseObjectName is the root name of the base object (for example,
C_PARTY_XREF). For this reason, this table is sometimes referred to
as the XREF table. When you create a base object, Siperian Hub
automatically creates a cross-reference table to store information about
data coming from source systems. For more information, see
“Cross-Reference Tables” on page 97.
history table Used if history is enabled for a base object (see “Enable History” on
page 102). Named according to the following pattern:
C_baseObjectName_HIST—base object history table, as described in
“Base Object History Tables” on page 101.
C_baseObjectName_HXRF—cross-reference history table, as described in
“Cross-Reference History Tables” on page 101.
where baseObjectName is the root name of the base object (for example,
C_PARTY_HIST and C_PARTY_HXRF).
Siperian Hub creates and maintains several different history tables to
provide detailed change-tracking options, including merge and unmerge
history, history of the pre-cleansed data, history of the base object, and
the cross-reference history.
match key table Contains the match keys that were generated for all base object records.
Named according to the following pattern:
C_baseObjectName_STRP
where baseObjectName is the root name of the base object (for example,
C_PARTY_STRP). For more information, see “Columns in Match Key
Tables” on page 325.

84 Siperian Hub Administrator Guide


About the Schema

Types of Infrastructure Tables in an ORS (Cont.)


Type of Table Description
match table Contains the pairs of matched records in the base object resulting from
the execution of the match process on this base object. Named
according to the following pattern:
C_baseObjectName_MTCH
where baseObjectName is the root name of the base object (for example,
C_PARTY_MTCH). For more information, see “Populating the Match
Table with Match Pairs” on page 330
external match Uses input (C_baseObjectName_EMI) and output (C_baseObjectName_
table EMO) tables.
• The EMI contains records to match against the records in the base
object.
• The EMO table contains the output data for External Match jobs.
Each row in the EMO represents a pair of matched records—one
from the EMI table and one from the base object:
For more information, see “External Match Jobs” on page 719 and
“External Match Jobs” on page 766.

Building the Schema 85


About the Schema

Supported Relationships Among Data

Siperian Hub supports one:many and many:many relationships among tables, as well as
hierarchical relationships between records in the same base object. In Siperian Hub,
relationships between records can be defined in various ways.

The following table describes these types of relationships.

Type of Relationship Description


foreign key One base object (the child) contains a foreign key column, which
relationship between contains values that match values in the primary key column of
base objects another base object (the parent). For more information, see “Process
Overview for Defining Foreign-Key Relationships” on page 143 and
“Configuring Foreign-Key Relationships Between Base Objects” on
page 140.

86 Siperian Hub Administrator Guide


About the Schema

Type of Relationship Description


base object and A base object (the parent) has a dependent object (the child). The
dependent objects foreign-key relationship is implicit between the dependent object and
its parent base object. For example, a Customer base object could
have an associated Notes dependent object to store free-form notes
about a customer. For more information, see “Process Overview for
Defining Dependent Objects” on page 119 and “Configuring
Dependent Objects” on page 117.
records within the Within a base object, records are related to each other hierarchically.
same base object Allows you to define many-to-many relationships within the base
object. For more information, see “Intra-Table Paths” on page 502.

Once these relationships are configured in the Hub Console, you can use these
relationships to configure match column rules by defining match paths between
records. For more information, see “Configuring Match Paths for Related Records” on
page 497.

Requirements for Defining Schema Objects


This section describes requirements for configuring schema objects.

Make Schema Changes Only in the Hub Console

Siperian Hub maintains schema consistency, provided that all model changes are
done using the Hub Console tools, and that no changes are made directly to the
database. Siperian Hub provides all the tools necessary for maintaining the schema.

Think Before You Change the Schema

Important: Schema changes can involve risk to data and should be approached in a
managed and controlled manner. You should plan the changes to be made and analyze
the impact of the changes before making them. You should also back up the database
before making any changes.

Building the Schema 87


About the Schema

You Must Have a Write Lock to Change the Schema

In order to make any changes to the schema, you must have a write lock. For more
information, see “Acquiring a Write Lock” on page 30.

Rules for Database Object Names

Database object names cannot be longer than 22 characters.

Reserved Suffixes

Note: To understand which Hub processes create which tables and how to best
manage these tables, please refer to the “Transient Tables” technical note found on the
SHARE portal.

Siperian Hub creates metadata objects that use suffixes appended to the names you use
for base objects. In order to avoid confusion and possible data loss, database object
names must not use the following strings as either names or suffixes:
_T _L _D _C _CL _TML0
DLT _MTCH _TUPD _TGVI _TGR _TGVO
OPL _STRP _TSI _TMGA _TGV TBVB_
_TGVI _TMGA _TSU1 _TMGB _TGV1 TBVC_
_TMST _TMG1 _TMG2 _TMG3 _TRLG _TRLT
_TMP0 _EMI _TSU2 _TMG0 _TGC TBVV_
_XREF _EMO _TC0 _TMG1 _TGC1 TBVT_
_VCT _TXCU _TC1 _TMG2 _TGT _BVTB
_TGRP _TVXR TBVT_ _TMG3 _TGN _BVTC
_HIST _TROU TBVN_ _HUID _TGZ _BVTV
_HXRF _TCRV _TBVB _TNKY _TGA TUTR_
_TLA _TSRV _TBVC _TUID _TGA1 TUHM_
_TOU0 _TIND _TBVV _TGF _TGM TUGR_
_TGB1 _TLU _TIRD _TGB _TGD TVXRD_

88 Siperian Hub Administrator Guide


About the Schema

TGCO TFK_ BVTXV_ BVTX_ _RAW REJ


BV0_ BV1_ BV2_ BV3_ BV5_ BV0_
CLC_ TFX_ TBXR_ TRBX_ TBOX_ TGMD_
CSC_ BVLNK_ TCCO_ TCHO_ TCVO_ TCMO_
TCSO_ TCGO_ TCRO_ TCXO_ TCBO_ TCCN_
TCHN_ TCVN_ TCMN_ TCSN_ TCGN_ TCRN_
TCXN_ TCBN_ TFK_ TFX_ TUK_ EXP_
T_verify_ PRL GG CTL TFG_ TGM_
TGA_ TGD_ HMRG TUID_ TGB1_ TGB_
TGC1_ TGC_ TGV1_ TGV_ TMR_ TMMA_
TUHM_ TPBR_ TUGR_ TUTR_ TUDL_ TUCA_
TUCF_ TUCX_ TUCC_ TUCT_ TUCR_ TUPT_
TXDL_ TBDL_ TOBDL_ LNK TDCC_ BVTXC_
TLL TDEL_ TXPR_ TDUMP_

Reserved Column Names

The following column names are reserved and cannot be used for user-defined
columns.
ROWID_OBJECT CONSOLIDATION_IND
PKEY_SRC_OBJECT DELETED_IND
CREATE_DATE DELETED_BY
LAST_UPDATE_DATE DELETED_DATE
CREATOR LAST_ROWID_SYSTEM
UPDATED_BY DIRTY_IND
HIST_CREATE_DATE INTERACTION_ID
HIST_UPDATE_DATE HUB_STATE_IND
SRC_ROWID ROWID_SYSTEM
ROWID_XREF SRC_LUD

Building the Schema 89


Starting the Schema Manager

ROWID_OBJECT CONSOLIDATION_IND
PROMOTE_IND PUT_UPDATE_MERGE_IND

Adding Columns for Technical Reasons

For purely technical reasons, you might want to add columns to a base object.
For example, for a segment match, you must add a segment column. For more
information on adding columns for segment matches, see “Segment Matching” on
page 562.

We recommend that you distinguish columns added to base objects for purely technical
reasons from those added for other business reasons, because you generally do not
want to include these columns in most views used by data stewards. Prefixing these
column names with a specific identifier, such as CSTM_, is one way to easily filter them
out.

Starting the Schema Manager


You use the Schema Manager in the Hub Console to define the schema, staging tables,
and landing tables. The Schema Manager is also used to define rules for match and
merge, validation, and message queues.

To start the Schema Manager:


• In the Hub Console, expand the Model workbench, and then click Schema.

90 Siperian Hub Administrator Guide


Starting the Schema Manager

The Hub Console displays the Schema Manager.

Navigation Pane Properties Pane

The Schema Manager is divided into two panes.

Pane Description
Navigation pane Shows (in a tree view) the core schema objects: base objects and landing
tables. Expanding an object in the tree shows you the property groups
available for that object.
Properties pane Shows the properties for the selected object in the left-hand pane.
Clicking any node in the schema tree displays the corresponding
properties page (that you can view and edit) in the right-hand pane.

For general instructions about using the Schema Manager, see “Navigating the Hub
Console” on page 24. You must use the Schema Manager when defining tables in an
ORS, as described in “Requirements for Defining Schema Objects” on page 87.

Building the Schema 91


Configuring Base Objects

Configuring Base Objects


This section describes how to configure base objects for your Siperian Hub
implementation.

About Base Objects


In Siperian Hub, central business entities—such as customers, accounts, products, or
employees—are represented in tables called base objects. A base object is a table in the
Hub Store that contains collections of data about individual entities—such as customer
A, customer B, customer C, and so on.

Each individual entity has a single master record—the best version of the truth—for that
entity. An individual entity might have additional records in the base object (contributing
records) that contain the “multiple versions of the truth” that need to be consolidated
into the master record. Consolidation is the process of merging duplicate records into a
single consolidated record that contains the most reliable cell values from all of the source
records.
Most Reliable Cell Value

Master
Record

Contributing
Records

Important: You must use the Schema Manager to define base objects—you cannot
configure them directly in the database. For more information, see “Requirements for
Defining Schema Objects” on page 87.

92 Siperian Hub Administrator Guide


Configuring Base Objects

Relationships Between Base Objects and Other Tables in


the Hub Store
The following figure shows base objects in relation to other tables in the Hub Store.

Building the Schema 93


Configuring Base Objects

Process Overview for Defining Base Objects


To define a base object:
1. Using the Schema Manager, create a base object table according to the instructions
in “Creating Base Objects” on page 107.
The Schema Manager automatically adds system columns, as described in “Base
Object Columns” on page 95.
2. Add the user-defined columns that will contain business data according to the
instructions in “Configuring Columns in Tables” on page 125.
Note: Column names cannot be longer than 26 characters.
3. While configuring column properties, specify which column(s) will use trust to
determine the most reliable value when different source systems provide different
values for the same cell. For more information, see “Configuring Trust for Source
Systems” on page 455.
4. For this base object, create one staging table per source system according to the
instructions in “Configuring Staging Tables” on page 364. For each staging table,
select the base object columns that you want to include.
5. Create any landing tables that you need to store data from source systems.
For more information, see “Configuring Landing Tables” on page 355.
6. Map the landing tables to the staging tables according to the instructions in
“Mapping Columns Between Landing and Staging Tables” on page 380.
If any columns need data cleansing, specify the cleanse function in the mapping
according to the instructions in Chapter 12, “Configuring Data Cleansing.”.
Each staging table must get its data from one landing table (with any intervening
cleanse functions), but the same landing table can provide data to more than one
staging table. Map the primary key column of the landing table to the PKEY_
SRC_OBJECT column in the staging table.
7. Populate each landing table with data using an ETL tool or some other process, as
described in “Land Process” on page 292.

94 Siperian Hub Administrator Guide


Configuring Base Objects

Base Object Columns


Base objects have two types of columns:

Column Type Description


system columns Columns that are automatically created and maintained by the
Schema Manager.
user-defined columns Columns that have been added by users according to the instructions
in “Configuring Columns in Tables” on page 125.

Base objects have the following system columns.

Physical Name Data Type (Size) Description


ROWID_OBJECT CHAR (14) Primary key. Unique value assigned by
Siperian Hub whenever a new record is
inserted into the base object.
CREATOR VARCHAR (50) User or process responsible for creating
the record.
CREATE_DATE DATE Date on which the record was created.
UPDATED_BY VARCHAR (50) User or process responsible for the most
recent update on the record.
LAST_UPDATE_DATE DATE Date of the most recent update to any cell
on the record.
CONSOLIDATION_IND INT Integer value indicating the consolidation
state of this record. Valid values are:
• 1=Consolidated
• 2=Ready for merge
• 3=Undergoing the match process
• 4=Ready for match
• 9=On Hold
For more information, see “Consolidation
Status for Base Object Records” on page
289.
DELETED_IND INT Reserved for future use.
DELETED_BY VARCHAR (50) Reserved for future use.

Building the Schema 95


Configuring Base Objects

Physical Name Data Type (Size) Description


DELETED_DATE DATE Reserved for future use.
LAST_ROWID_SYSTEM CHAR (14) The identifier of the system responsible for
the most recent update to any cell in the
base object record.
Foreign key referencing ROWID_
SYSTEM column on C_REPOS_
SYSTEM table.
DIRTY_IND INT Used to determine whether the tokenize
process generates match keys for this
record. Valid values are:
• 0 = record is up to date
• 1 = record is new or has been updated
and needs to be tokenized
After the record has been tokenized, this
flag is reset to zero (0). For more
information, see “Base Object Records
Flagged for Tokenization” on page 323.
INTERACTION_ID INT For state-enabled base objects only.
Interaction identifier that is used to protect
a pending cross-reference record from
updates that are not part of the same
process as the original cross-reference
record. For details, see “Protecting Pending
Records Using the Interaction ID” on page
208.
HUB_STATE_IND INT For state-enabled base objects only. Integer
value indicating the state of this record.
Valid values are:
• 0=Pending
• 1=Active (Default)
• -1=Deleted
For details, see “About the Hub State
Indicator” on page 207.

96 Siperian Hub Administrator Guide


Configuring Base Objects

Cross-Reference Tables
This section describes cross-reference tables in the Hub Store.

About Cross-Reference Tables

Each base object has one associated cross-reference table (or XREF table), which is used
for tracking the lineage (origin) of records in the base object. Siperian Hub
automatically creates a cross-reference table when you create a base object. Siperian
Hub uses cross-reference tables to translate all source system identifiers into the
appropriate ROWID_OBJECT values.

Note: Cross-reference tables are not created or needed for dependent objects, as
dependent objects are not matched and consolidated.

Records in Cross-Reference Tables

Each row in the cross-reference table represents a separate record from a source
system. If multiple sources provide data for a single column (for example, the phone
number comes from both the CRM and ERP systems), then the cross-reference table
contains separate records from each source system. Each base object record will have
one or more associated cross-reference records.

The cross-reference record contains:


• an identifier for the source system that provided the record
• the primary key value of that record in the source system
• the most recent cell value(s) provided by that system

Load Process and Cross-Reference Tables

The load process populates cross-reference tables. During load inserts, new records are
added to the cross-reference table. During load updates, changes are written to the
affected cross-reference record(s).

Building the Schema 97


Configuring Base Objects

Data Steward Tools and Cross-Reference Tables

Cross-reference records are visible in the Merge Manager and can be modified using
the Data Manager. For more information, see the Siperian Hub Data Steward Guide.

Relationships Between Base Objects and Cross-Reference


Tables

The following figure shows an example of the relationships between base objects,
cross-reference tables, and C_REPOS_SYSTEM.

98 Siperian Hub Administrator Guide


Configuring Base Objects

Columns in Cross-Reference Tables

Cross-reference tables have the following system columns. Note that cross-reference
tables have a unique key representing the combination of the PKEY_SRC_OBJECT
and ROWID_SYSTEM columns.

Physical Name Data Type (Size) Description


ROWID_XREF NUMBER (38) Primary key that uniquely identifies this record in the
cross-reference table.
PKEY_SRC_OBJECT VARCHAR2 Primary key value from the source system.
(255) Multi-field/multi-column keys from source systems must
be concatenated into a single key value using the Siperian
Hub internal cleanse process (see “About Data Cleansing
in Siperian Hub” on page 406) or external cleanse process
(an ETL tool or some other data loading utility).
ROWID_SYSTEM CHAR (14) Foreign key to C_REPOS_SYSTEM, which is the
Siperian Hub repository table that stores a Siperian Hub
identifier and description of each source system that can
populate the ORS. For more information, see
“Configuring Source Systems” on page 348.
ROWID_OBJECT CHAR (14) Foreign key to the base object. Unique value assigned by
Siperian to the associated record in the base object.
SRC_ LUD DATE Last source update date. Updated only when an update is
received from the source system.
CREATOR VARCHAR2 (50) User or process responsible for creating the
cross-reference record.
CREATE_DATE DATE Date on which the cross-reference record was created.
UPDATED_BY VARCHAR2 (50) User or process responsible for the most recent update to
the cross-reference record.
LAST_UPDATE_DATE DATE Date of the most recent update to any cell in the
cross-reference record. Can be updated as applicable
during the load and consolidation processes.
DELETED_IND NUMBER (38) Reserved for future use.
DELETED_BY VARCHAR2 (50) Reserved for future use.
DELETED_DATE DATE Reserved for future use.
PUT_UPDATE_MERGE_IND NUMBER (38) Indicates whether a record has been edited using the Data
Manager.

Building the Schema 99


Configuring Base Objects

Physical Name Data Type (Size) Description


INTERACTION_ID NUMBER (38) For state-enabled base objects only. Interaction identifier
that is used to protect a pending cross-reference record
from updates that are not part of the same process as the
original cross-reference record. For more information, see
“Protecting Pending Records Using the Interaction ID”
on page 208.
HUB_STATE_IND NUMBER (38) For state-enabled base objects only. Integer value
indicating the state of this record. Valid values are:
• 0=Pending
• 1=Active (Default)
• -1=Deleted
For more information, see “About the Hub State
Indicator” on page 207.
PROMOTE_IND NUMBER (38) For state-enabled base objects only. Integer value
indicating the promotion status. Used by the Promote job
to determine whether to promote the record to an
ACTIVE state. Valid values are:
• 0=Do not promote this record
• 1=Promote this record to ACTIVE
This value is not changed to 0 during the Promote job if
the record is not promoted.
For more information, see “Promoting Records Using the
Promote Batch Job” on page 218.

History Tables
This section describes history tables in the Hub Store. If history is enabled for a base
object (see “Enable History” on page 102), then Siperian Hub maintains history tables
for base objects and cross-reference tables. History tables are used by Siperian Hub to
provide detailed change-tracking options, including merge and unmerge history, history
of the pre-cleansed data, history of the base object, the cross-reference history, and so
on.

100 Siperian Hub Administrator Guide


Configuring Base Objects

Base Object History Tables

A history-enabled base object has a single history table (named C_baseObjectName_


HIST) that contains historical information about data changes in the base object.
Whenever a record is added or updated in the base object, a new record is inserted into
the base object history table to capture the event.

Cross-Reference History Tables

A history-enabled base object has a single cross-reference history table (named C_


baseObjectName_HXRF) that contains historical information about data changes in the
cross-reference table. Whenever a record changes in the cross-reference table, a new
record is inserted into the cross-reference history table to capture the event.

Base Object Properties


This section describes the basic and advanced properties for base objects.

Basic Base Object Properties

This section describes the basic base object properties.

Item Type

The type of table that you are adding. Select Base Object.

Display Name

The name of this base object as it will be displayed in the Hub Console. Enter a
descriptive name.

Physical Name

The actual name of the table in the database. Siperian Hub will suggest a physical name
for the table based on the display name that you enter. Make sure that you do not use

Building the Schema 101


Configuring Base Objects

any reserved name suffixes, as described in “Rules for Database Object Names” on
page 88.

Data Tablespace

The name of the data tablespace. Read-only. For more information, see the Siperian
Hub Installation Guide for your platform.

Index Tablespace

The name of the index tablespace. Read-only. For more information, see the Siperian
Hub Installation Guide for your platform.

Description

A brief description of this base object.

Enable History

Specifies whether history is enabled for this base object. If enabled, Siperian Hub keeps
a log of records that are inserted, updated, or deleted for this base object. You can use
the information in history tables for audit purposes. For more information, see
“History Tables” on page 100.

Advanced Base Object Properties

This section describes the advanced base object properties.

Complete Tokenize Ratio

When the percentage of the records that have changed is higher than this value, a
complete re-tokenization is performed. If the number of records to be tokenized does
not exceed this threshold, then Siperian Hub deletes the records requiring
re-tokenization from the match key table, calculates the tokens for those records, and
then reinserts them into the match key table. The default value is 60. For more
information, see “Match Keys and the Tokenization Process” on page 322.

102 Siperian Hub Administrator Guide


Configuring Base Objects

Note: Deleting can be a slow process. However, if your Cleanse Match Server is fast
and the network connection between Cleanse Match Server and the database server is
also fast, then you may test with a much lower tokenization threshold (such as 10%).
This will enable you to determine whether there are any gains in performance.

Allow constraints to be disabled

During the initial load/updates—or if there is no real-time, concurrent access—you


can disable the referential integrity constraints on the base object to improve
performance. The default value is 1, signifying that constraints are disabled. For more
information, see “Load Process” on page 299 and Chapter 13, “Configuring the Load
Process.”

Duplicate Match Threshold

This parameter is used only with the Match for Duplicate Data job for initial data
loads. The default value is 0. To enable this functionality, this value must be set to 2 or
above. For more information, see “Match for Duplicate Data Jobs” on page 740 and
the Siperian Hub Data Steward Guide.

Load Batch Size

The load process inserts and updates batches records in the base object. The load
batch size specifies the number of records to load per batch cycle (default is 1000000).
For more information, see “Loading Records by Batch” on page 305, and Chapter 13,
“Configuring the Load Process.”

Max Elapsed Match Minutes

This specifies the execution timeout (in minutes) when executing a match rule. If this
time limit is reached, then the match process (whenever a match rule is executed, either
manually or via a batch job) will exit. If a match process is executed as part of a batch
job, the system should move onto the next match. It will stop if this is a single match
process. The default value is 20. Increase this value only if the match rule and data are
very complex. Generally, rules are able to complete with 20 minutes (the default).
For more information, see “Match Process” on page 317 and Chapter 14, “Configuring
the Match Process.”

Building the Schema 103


Configuring Base Objects

Parallel Degree

Oracle only. This specifies the degree of parallelism set on the base object table and its
related tables. It does not take effect for all batch processes, but can have a beneficial
effect on performance when it is used. However, its use is constrained by the number
of CPUs on the database server machine, as well as the amount of memory available.
The default value is 1.

Requeue On Parent Merge

If this value is greater than zero, when parents are merged, the related child records are
set as unconsolidated. If set, when parents are merged, then related child records are
flagged as New again (consolidation indicator is 4, see “Consolidation Status for Base
Object Records” on page 289) so that they can be matched. The default value is 0.
For more information, see “Consolidation Indicator” on page 289 and “Immutable
Rowid Object” on page 594.

Generate Match Tokens on Load

If selected (checked), then the tokenization process executes after the completion of
the load process. This is useful for intertable match scenarios in which the parent must
be loaded first, followed by the child match/merge. By not tokenizing the parent, the
child match/merge will not need to update any of the parent records in the match key
table.

Once the child match/merge is complete, you can run the match process on the parent
to force it to tokenize. This is also useful in cases where you have a limited window in
which to perform the load process. Not tokenizing will save time in the load process, at
the cost of tokenizing the data later.

You must tokenize before you match your data. For more information, see “Load
Process” on page 299, “Generating Match Tokens (Optional)” on page 316, and
“Generating Match Tokens During Load Jobs” on page 730.

104 Siperian Hub Administrator Guide


Configuring Base Objects

Generate Match Tokens on Put

You can PUT data into a base object using the Data Manager (see the Siperian Hub Data
Steward Guide). If you are using the Data Manager to PUT data, you can enable (check)
this value to tokenize your data later. Performing this operation later allows you to
process PUT requests faster. Use this only when you know that the data will not be
matched immediately. For more information, see “Match Keys and the Tokenization
Process” on page 322.

Note: Do not use the Generate Match Tokens on Put option if you are using the SIF
API. If you have this parameter enabled, your SIF Put and CleansePut requests will
fail. Use the Tokenize request instead. Enable Generate Match Tokens on Put only if
you are not using the SIF API and you want data steward updates from the Hub
Console to be tokenized immediately. For more information, see “Editing Base Object
Properties” on page 108.

Enable Row Locking During Batch

If checked (selected), this feature enables locking of the data during updates, which
allows for a higher degree of concurrent access. The default value is 0, signifying that
row locking is disabled during batch.

Match Flag Audit Table

Specifies whether a match flag audit table is created.


• If checked (selected), then an audit table (BusinessObjectName_FMHA) is created
and populated with the userID of the user who, in Merge Manager, queued a
manual match record for automerging. For more information about the Merge
Manager tool, see the Siperian Hub Data Steward Guide.
• If unchecked (not selected), then the Updated_By column is set to the userID of
the person who executed the Automerge batch job.

For more information, see “Match Process” on page 317 and Chapter 14, “Configuring
the Match Process.”

Building the Schema 105


Configuring Base Objects

Enable State Management

Specifies whether Siperian Hub manages the system state for records in this base
object. By default, state management is disabled. Select (check) this check box to enable
state management for this base object in support of approval workflows. If enabled,
this base object is referred to in this document as a state-enabled base object. For more
information, see Chapter 7, “State Management,” and “Enabling State Management”
on page 211.

Enable History of Cross-Reference Promotion

For state-enabled base objects, specifies whether Siperian Hub maintains the
promotion history for cross-reference records that undergo a state transition from
PENDING (0) to ACTIVE (1). By default, this option is disabled. For more
information, see Chapter 7, “State Management,” and “Enabling the History of
Cross-Reference Promotion” on page 213.

Base Object Style

Select the style (merge or link) for this base object.


• A merge-style base object (the default) is used with Siperian Hub’s match and
merge capabilities.
• A link-style base object is used with Siperian Hub’s match and link capabilities.
If selected, Siperian Hub creates a LINK table for this base object.
If you change a link-style base object back to a merge-style base object, the Schema
Manager prompts you to confirm whether you want to drop the LINK table.

106 Siperian Hub Administrator Guide


Configuring Base Objects

Creating Base Objects


To create each base object in your schema:
1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Right-click in the left pane of the Schema Manager and choose Add Item from the
popup menu.
The Schema Manager displays the Add Table dialog box.

4. Specify the basic base object properties. For more information, see “Basic Base
Object Properties” on page 101.
5. Click OK.
The Schema Manager creates the new base table in the Operational Record Store
(ORS), along with any support tables, and then adds the new base object table to
the schema tree.

Building the Schema 107


Configuring Base Objects

Editing Base Object Properties


To edit the properties of an existing base object:
1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the schema tree, select the base object that you want to modify.
The Schema Manager displays the Basic tab of the Base Object Properties page.

4. For each property that you want to edit on the Basic tab, click the Edit button
next to it, and specify the new value. For more information, see “Basic Base
Object Properties” on page 101.
5. If you want, check (select) the Enable History check box to have Siperian Hub
keep a log of records that are inserted, updated, or deleted. You can use a history
table for audit purposes.

108 Siperian Hub Administrator Guide


Configuring Base Objects

6. To modify other base object properties, click the Advanced tab.

7. Specify the advanced properties for this base object. For more information, see
“Advanced Base Object Properties” on page 102.

Building the Schema 109


Configuring Base Objects

8. In the left pane, click Match/Merge Setup beneath the base object’s name.

9. Specify the match / merge object properties. At a minimum, consider configuring


the following properties:
• maximum number of matches for manual consolidation (see “Maximum
Matches for Manual Consolidation” on page 490)
• number of rows per match job batch cycle (see “Number of Rows per Match
Job Batch Cycle” on page 491)
To edit a property, click the button and enter a new value.
10. Click the button to save your changes.

110 Siperian Hub Administrator Guide


Configuring Base Objects

For more information about setting the properties for matching and merging, see
“Configuring Match Properties for a Base Object” on page 488.

Configuring Custom Indexes for Base Objects


This section describes how to configure custom indexes for a base object.

About Custom Indexes

When you configure columns for a base object, system indexes are created
automatically for primary keys and unique columns. In addition, Siperian Hub
automatically drops and creates system indexes as needed when executing batch jobs or
stored procedures.

A custom index is a optional, supplemental index for a base object that you can define
and have Siperian Hub maintain automatically. Custom indexes are non-unique.

You might want to add a custom index to a base object for performance reasons. For
example, suppose an external application calls the SIF SearchQuery request to search a
base object by last name. If the base object has a custom index on the last name
column, the last name search is processed more quickly. For custom indexes that are
registered in Siperian Hub, custom indexes are automatically dropped and recreated
during batch execution to improve performance.

You have the option to manually define indexes outside the Hub Console using a
database utility for your database platform. For example, you could create a
function-based index—such as Upper(Last_Name) in the index expression—in
support of some specialized operation. However, if you add a user-defined index which
are not supported by the Schema Manager, then the custom index is not registered with
Siperian Hub, and you are responsible for maintaining that index—Siperian Hub will
not maintain it for you. If you do not properly maintain the index, you risk affecting
batch processing performance.

Building the Schema 111


Configuring Base Objects

Navigating to the Custom Index Setup Node


1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the schema tree, expand the tree beneath the base object you want to work with.
4. Click the Custom Index Setup node.
The Schema Manager displays the Custom Index Setup page.

Creating a Custom Index

To add a new custom index:


1. In the Schema Manager, navigate to the Custom Index Setup node for the base
object that you want to work with, as described in “Navigating to the Custom
Index Setup Node” on page 112.
2. Click the Add button.

112 Siperian Hub Administrator Guide


Configuring Base Objects

The Schema Manager creates a new custom index (NI_C_BaseObjectName_inc,


where inc is a incremented number) and displays the list of columns in the base
object.

3. Select the column(s) that you want in the custom index.


4. Click the Save button to save your changes.

Building the Schema 113


Configuring Base Objects

If an index already exists for the selected column(s), the Schema Manager displays
an error message and does not create the index.

Click OK to close the dialog box.

Editing a Custom Index

To change a custom index, you must delete the existing custom index and add a new
custom index with the columns that you want.

Deleting a Custom Index

To delete a custom index:


1. In the Schema Manager, navigate to the Custom Index Setup node for the base
object that you want to work with, as described in “Navigating to the Custom
Index Setup Node” on page 112.
2. In the Indexes list, select the custom index that you want to delete.
3. Click the Delete button.
The Schema Manager prompts you to confirm deletion.
4. Click Yes.

114 Siperian Hub Administrator Guide


Configuring Base Objects

Viewing the Impact Analysis of a Base Object


The Schema Manager allows you to view all of the tables, packages, and queries
associated with a base object. You would typically do this before deleting a base object
to ensure that you do not delete other associated objects by mistake.

To view the impact analysis for a base object:


1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the schema tree, select the base object that you want to view.
4. Right-click the mouse and choose Impact Analysis.
The Schema Manager displays the Table Impact Analysis dialog box.

5. Click Close.

Building the Schema 115


Configuring Base Objects

Deleting Base Objects


To delete a base object:
1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the schema tree, select the base object that you want to delete.
4. Right-click the mouse and choose Remove.
The Schema Manager prompts you to confirm deletion.
5. Choose Yes.
The Schema Manager asks you whether you want to view the impact analysis
before deleting the base object.
6. Choose No if you want to delete the base object without viewing the impact
analysis.
The Schema Manager removes the deleted base object from the schema tree.

116 Siperian Hub Administrator Guide


Configuring Dependent Objects

Configuring Dependent Objects


This section describes how to configure dependent objects for your Siperian Hub
implementation.

About Dependent Objects


A dependent object is used to store supplemental information about the records in a base
object (for example, a header-detail relationships). One record in a base object table
can map to multiple records in a dependent object table. In the schema hierarchy,
dependent objects are wholly subordinate to the parent base object. As such,
dependent objects require less functionality than base objects—they do not support
such features as match and consolidation, history, or trust. For more information, see
“Types of Tables in an Operational Record Store” on page 83.

Important: You must use the Schema Manager to define dependent objects—you
cannot configure them directly in the database. For more information, see
“Requirements for Defining Schema Objects” on page 87.

A dependent object table contains supplemental information about the records in a


base object table. For example, a Customer base object might have a dependent object
called Notes that contains free-form notes about each customer. In the schema
hierarchy, a dependent object is wholly subordinate to the base object with which it is
associated.

Building the Schema 117


Configuring Dependent Objects

How Dependent Objects Are Related to Base Objects and


Cross-reference Tables
The following figure shows how dependent objects are related to base objects and
cross-reference tables.

118 Siperian Hub Administrator Guide


Configuring Dependent Objects

Process Overview for Defining Dependent Objects


To create dependent object tables:
1. Create the base object according to the instructions in “Creating Base Objects” on
page 107.
2. Create the dependent object table according to the instructions in “Creating
Dependent Objects” on page 121.
3. Configure user-defined columns for this dependent object according to the
instructions in “Configuring Columns in Tables” on page 125.
4. Create staging tables for the base object table and the dependent object table
according to the instructions in “Configuring Staging Tables” on page 364.
5. Create landing tables for the source systems, if they do not already exist, according
to the instructions in “Configuring Landing Tables” on page 355.
6. Map the landing tables to the staging tables. Map the column that contains the
source system primary key for the base object to the ROWID_OBJECT column in
the dependent object’s staging table. For more information, see “Mapping
Columns Between Landing and Staging Tables” on page 380.
7. Populate the landing tables.
When data is loaded, Siperian Hub copies the appropriate primary key value from
the base object table into the dependent object table. The same record in a base
object table can correspond to multiple records in a dependent object table.

Building the Schema 119


Configuring Dependent Objects

Dependent Object Columns


Dependent objects have two types of columns:

Column Type Description


system columns Columns that are automatically created and maintained by the
Schema Manager.
user-defined columns Columns that have been added by users according to the instructions
in “Configuring Columns in Tables” on page 125.

Dependent objects have the following system columns.

Physical Name Data Type (Size) Description


ROWID_XREF INT Cross-reference key from the parent
base object’s cross-reference table.
ROWID_OBJECT CHAR (14) Foreign key that points to the primary
key of the base object record associated
with this dependent object record.
DEP_ROWID_SYSTEM CHAR (14) Identifier of the source system
dependent object.
DEP_PKEY_SRC_OBJECT VARCHAR (255) Primary key of the dependent object in
the source system.
The combination of this column and
ROWID_XREF must be unique.
It is recommended that, in the case
where the source system does not
provide a single unique column for the
dependent object, the DEP_PKEY_
SRC_OBJECT should have the
concatenated values from the columns
that actually make up a unique
combination.

120 Siperian Hub Administrator Guide


Configuring Dependent Objects

Physical Name Data Type (Size) Description


INTERACTION_ID INT For state-enabled base objects only.
Interaction identifier that is used to
protect a pending cross-reference
record from updates that are not part
of the same process as the original
cross-reference record. For details, see
“Protecting Pending Records Using the
Interaction ID” on page 208.
CREATOR VARCHAR (50) User or process responsible for creating
the record.
CREATE_DATE DATE Date on which the record was created.
UPDATED_BY VARCHAR (50) User or process responsible for the
most recent update.
LAST_UPDATE_DATE DATE Date of the most recent update.

Creating Dependent Objects


To create a dependent object:
1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.

Building the Schema 121


Configuring Dependent Objects

3. Expand the schema tree for the base object on which the new object will depend.

4. Right-click Dependent Objects and choose Add Dependent.


The Schema Manager displays the Add Table dialog box with Dependent Object
as the Item type.

122 Siperian Hub Administrator Guide


Configuring Dependent Objects

5. Specify the following information:

Property Description
Item Type Type of table that you are adding (Dependent Object).
Display Name Name for this dependent object as it will be displayed in the
Hub Console.
Physical Name Actual name of the table in the database. Siperian Hub will
suggest a physical name for the table based on the display name
that you enter.
Data Tablespace Name of the data tablespace. For more information, see the
Siperian Hub Installation Guide for your platform.
Index Tablespace Name of the index tablespace. For more information, see the
Siperian Hub Installation Guide for your platform.
Description Description of this dependent object.

6. Click OK.
The Schema Manager creates the new dependent object table in the Operational
Record Store (ORS) and then adds the new base object table to the schema tree.

Editing Dependent Objects


To edit an existing dependent object:
1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Expand the schema tree for the base object associated with the dependent object.
4. Expand the Dependent Objects list.

Building the Schema 123


Configuring Dependent Objects

5. Select the dependent object that you want to edit.

6. For each property that you want to edit, click the Edit button next to it, and
specify the new value.
7. Expand the tree below the dependent object.
Note: Dependent objects do not have all of the nodes that are available to base
objects.
• To modify columns, select Columns and follow the instructions in
“Configuring Columns in Tables” on page 125.
• To modify the message trigger configuration, select Message Trigger Setup
and follow the instructions in “Adding Message Triggers” on page 615.
• To modify staging tables, select Staging Tables and follow the instructions in
“Configuring Staging Tables” on page 364.

124 Siperian Hub Administrator Guide


Configuring Columns in Tables

Deleting Dependent Objects


To delete a dependent object:
1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the schema tree, select the base object associated with the dependent table that
you want to delete.
4. Expand the Dependent Objects list.
5. Select the dependent object.
6. Right-click the mouse and choose Remove.
The Schema Manager prompts you to confirm deletion.
7. Choose Yes.
The Schema Manager removes the deleted dependent object from the schema tree.

Configuring Columns in Tables


After you have created a table (base object, dependent object, or landing table), you use
the Schema Manager to define the columns for that table according to the
“Requirements for Defining Schema Objects” on page 87. You must use the Schema
Manager to define columns in tables—you cannot configure them directly in the
database.

Note: In the Schema Manager, you can also view the columns for cross-reference
tables and history tables, but you cannot edit them.

Building the Schema 125


Configuring Columns in Tables

About Columns
This section provides general information about table columns.

Types of Columns in ORS Tables

Tables in the Hub Store contain two types of columns:

Column Description

system columns A column that Siperian Hub automatically creates and maintains.
System columns contain metadata.

user-defined columns Any column in a table that is not a system column. User-defined
columns are added in the Schema Manager and usually contain
business data.

Warning: The system columns contain Siperian Hub metadata. Do not alter Siperian
Hub metadata in any way. Doing so will cause Siperian Hub to behave in unpredictable
ways and you can lose data.

For more information about system columns in Hub Store tables, see:
• “Base Object Columns” on page 95
• “Columns in Cross-Reference Tables” on page 99
• “History Tables” on page 100
• “Dependent Object Columns” on page 120
• “Landing Table Columns” on page 356
• “Staging Table Columns” on page 365

Data Types for Columns

Siperian Hub uses a common set of data types for columns that map directly to the
following Oracle and DB2 data types.

126 Siperian Hub Administrator Guide


Configuring Columns in Tables

Note: For information regarding the available data types, refer to the product
documentation for your database platform.

Siperian Hub Data Type Oracle Data Type DB2 Data Type
CHAR CHAR CHAR
VARCHAR VARCHAR2 VARCHAR
NVARCHAR2 NVARCHAR2
NCHAR NCHAR
DATE DATE DATE
NUMBER NUMBER NUMERIC
INT INTEGER INT or INTEGER

Column Properties

Siperian Hub columns have the following properties.


Column Properties
Property Description
Display Name Name for this column as it will be displayed in the Hub Console.
Physical Name Actual name of the column in the table. Siperian Hub will suggest a
physical name for the column based on the display name that you enter.
Note: For physical names of columns, do not use:
• any reserved column names, as described in “Reserved Column
Names” on page 89
• the dollar sign ($) character
Nullable Enable (check) this option if the column can be empty (null).
• If null values are allowed, you do not need to specify a default value.
• If null values are not allowed, then you must specify a default value.
Data Type For character data types, you can specify the length. For certain numeric
data types, you can specify the precision and scale. For more information,
see “Data Types for Columns” on page 126.
Has Default Enable (check) this option if this column has a default value.

Building the Schema 127


Configuring Columns in Tables

Column Properties (Cont.)


Property Description
Default Used if no value is provided for the column but the column cannot be
null.
Trust Enable (check) this option if this column will contain values from more
than one source system, and you want to use trust to determine the most
reliable value. If you do not enable trust for the column, then the most
recent value will always be used. For more information, see “Enabling
Trust for a Column” on page 461 and “Configuring Trust for Source
Systems” on page 455.
Unique Enable (check) this option to enforce unique column constraints on from
a staging table. Most organizations use the primary key from the source
system for the lookup value. A record with a duplicate value in this
column will be rejected.
Warning: Avoid enabling the Unique option on base objects that might
be consolidated. If you have a base object with a unique column and then
load the same key from different systems, the insert into this base object
fails. To use this feature, you must have unique keys across all systems.
Validate Enable (check) this option if validation rule(s) will be configured for this
column. Validation rules are applied during the load process to downgrade
trust scores for cell values in this column. For more information, see
“Enabling Validation Rules for a Column” on page 470.
Null Value Merge Determines the survivorship of null values during the consolidation
process.
• By default, this option is disabled. Trust scores for cells containing
null values are automatically downgraded so that, during
consolidation, null values are unlikely to win over non-null values.
Instead, non-null values from the next available trusted source would
survive.
• If enabled (checked), trust scores for cells containing null values are
calculated normally, and null values might overwrite non-null values
during consolidation. If you want to reduce trust on cells containing
null data, you must write validation rules to do so.
GBID Enable (check) this option if you want to define this column as the Global
Business Identifier (GBID) for this object. Examples include a social
security number, a driver’s license number, and so on. Doing so eliminates
the need to custom-define identifiers. You can configure any number of
GBID columns for API access and batch loads. For more information,
see “Global Identifier (GBID) Columns” on page 129.
Note: To be configured as a GBID column, the column must be an INT
data type or it must have exactly 255 characters in length for one of the
following data types: CHAR, NCHAR, VARCHAR, and NVARCHAR2.

128 Siperian Hub Administrator Guide


Configuring Columns in Tables

Global Identifier (GBID) Columns

A Global Business Identifier (GBID) column contains common identifiers (key values) that
allow you to uniquely and globally identify a record based on your business needs.
Examples include:
• Identifiers defined by applications external to Siperian Hub, such as ERP (SAP or
Siebel customer numbers) or CRM systems.
• Identifiers defined by external organizations, such as industry-specific codes (AMA
numbers, DEA numbers. and so on), or government-issued identifiers (social
security number, tax ID number, driver’s license number, and so on).

Note: To be configured as a GBID column, the column must be an integer, CHAR,


VARCHAR, NCHAR, or NVARCHAR column type. A non-integer column must be
exactly 255 characters in length.

In the Schema Manager, you can define multiple GBID columns in a base object. For
example, an employee table might have columns for social security number and driver’s
license number, or a vendor table might have a tax ID number.

A Master Identifier (MID) is a common identifier that is generated by a system of


reference or system of record that is used by others (for example, CIF, legacy hubs,
CDI/MDM Hub, counterparty hub, and so on). In Siperian Hub, the MID is the
ROWID_OBJECT, which uniquely identifies individual records from various source
systems.

GBIDs do not replace the ROWID_OBJECT. GBIDs provide additional ways to help
you integrate your Siperian Hub implementation with external systems, allowing you to
query and access data through unique identifiers of your own choosing (using SIF
requests, as described in the Siperian Services Integration Framework Guide). In addition, by
configuring GBID columns using already-defined identifiers, you can avoid the need to
custom-define identifiers.

GBIDs help with the traceability of your data. Traceability is keeping track of the data so
that you can determine its lineage—which systems, and which records from those
systems, contributed to consolidated records. When you define GBID columns in a
base object, the Schema Manager creates a separate table for this base object (the table

Building the Schema 129


Configuring Columns in Tables

name ends with _HUID) that tracks the old and new values (current/obsolete value
pairs).

For example, suppose two of your customers (both of which had different tax ID
numbers) merged into a single company, and one tax ID number survived while the
other one became obsolete. If you defined the taxID number column as a GBID,
Siperian Hub could help you track both the current and historical tax ID numbers so
that you could access data (via SIF APIs) using the historical value.

Note: Siperian Hub does not perform any data verification or error detection on
GBID columns. If the source system has duplicate GBID values, then those duplicate
values will be passed into Siperian Hub.

Columns in Staging Tables

The columns for staging tables cannot be defined using the column editor. Staging
table columns are a special case, as they are based on some or all columns in the staging
table’s target object. You use the Add/Edit Staging Table window to select the columns
on the target table that can be populated by the staging table. Siperian Hub then creates
each staging table column with the same data types as the corresponding column in the
target table. See “Configuring Staging Tables” on page 364 for more information on
choosing the columns for staging tables.

Maximum Number of Columns for Base Objects

A base object cannot have more than 200 user-defined columns if it will have match
rules that are configured for automatic consolidation. For more information, see
“Flagging Matched Records for Automatic or Manual Consolidation” on page 333 and
“Specifying Consolidation Options for Matched Records” on page 543.

130 Siperian Hub Administrator Guide


Configuring Columns in Tables

Navigating to the Column Editor


To configure columns for base objects, dependent objects, and landing tables:
1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Expand the schema tree for the object to which you want to add columns.
4. Select Columns.
The Schema Manager displays column definitions in the Properties pane.

Note: In the above example, the schema shows ANSI SQL data types that Oracle
converts to its own data types. For more information, see “Data Types for
Columns” on page 126.

The Column Editor displays a “locked” icon next to system columns.

Building the Schema 131


Configuring Columns in Tables

Command Buttons in the Column Editor

The Properties pane in the Column Editor contains the following command buttons:

Button Name Description


Add Add new columns. For more information, see “Adding Columns” on
page 134.
Delete Remove existing columns. For more information, see “Deleting
Columns” on page 139.
Move Up Move the selected column up in the display order. For more
information, see “Changing the Column Display Order” on page 139.
Move Down Move the selected column down in the display order. For more
information, see “Changing the Column Display Order” on page 139.
Import Add new columns by importing column definitions from another
table. For more information, see “Importing Column Definitions
From Another Table” on page 135.
Expand View Expand the table columns view. For more information, see
“Expanding the Table Columns View” on page 133.
Restore View Restore the table columns view. For more information, see
“Expanding the Table Columns View” on page 133.
Save Saves changes to the column definitions.

Showing or Hiding System Columns

You can toggle the Show System Columns check box to show or hide system columns.
For more information, see “Types of Columns in ORS Tables” on page 126.

132 Siperian Hub Administrator Guide


Configuring Columns in Tables

Expanding the Table Columns View

You can expand the properties pane to display all the column properties in a single
pane. By default, the Schema Manager displays column definitions in a contracted view.

To show the expanded table columns view:


• Click the button.

The Schema Manager displays the expanded table columns view.

To show the default table columns view:


• Click the button

The Schema Manager displays the default table columns view.

Building the Schema 133


Configuring Columns in Tables

Adding Columns
To add a column:
1. Navigate to the column editor for the table that you want to configure. For more
information, see “Navigating to the Column Editor” on page 131.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the button.
The Schema Manager displays an empty row.

4. For each column, specify its properties. For more information, see “Column
Properties” on page 127.
5. Click the button to save the columns you have added.

134 Siperian Hub Administrator Guide


Configuring Columns in Tables

Importing Column Definitions From Another Table


To import some of the column definitions from another table:
1. Navigate to the column editor for the table that you want to configure. For more
information, see “Navigating to the Column Editor” on page 131.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Import Schema button.
The Import Schema dialog is displayed.

4. Specify the connection properties for the schema that you want to import.
If you need more information about the connection information to specify here,
contact your database administrator.
The settings for the User name / Password fields depend on whether proxy users
are configured for your Siperian Hub implementation.
• If proxy users are not configured (the default), then the user name will be the
same as the schema name.
• If proxy users are configured, then you must specify the custom user name /
password so that Siperian Hub can use those credentials to access the schema.

Building the Schema 135


Configuring Columns in Tables

For more information about proxy user support, see the Siperian Hub Installation
Guide for your platform.
5. Click Next.
Note: The database you enter does not need to be the same as the Siperian ORS
that you’re currently working in, nor does it need to be a Siperian ORS.
The only restriction is that you cannot import from a relational database that is a
different type from the one in which you are currently working. For example, if
your database is an Oracle database, then you can import columns only from
another Oracle database.
The Schema Manager displays a list of the tables that are available for import.

6. Select that table that you want to import.


7. Click Next.

136 Siperian Hub Administrator Guide


Configuring Columns in Tables

The Schema Manager displays a list of columns for the selected table.

8. Select the column(s) you want to import.


9. Click Finish.
10. Click the Save button to save the column(s) that you have added.

Editing Column Properties


Once columns have been added and saved, you can change certain column properties.
Before you make any changes, however, bear in mind that once a table has been
defined and saved, you cannot:
• reduce the length of a CHAR, VARCHAR, NCHAR, or NVARCHAR2 field
• change the scale or precision of a NUMBER field

Important: As with any schema changes that are attempted after the tables have been
populated with data, manage changes to columns in a planned and controlled fashion,
and ensure that the appropriate database backups are done before making changes.

To change column properties:

Building the Schema 137


Configuring Columns in Tables

1. Navigate to the column editor for the table that you want to configure. For more
information, see “Navigating to the Column Editor” on page 131
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. For each column, you can change the following properties. Be sure to read about
the implications of changing a property before you make the change. For more
information about each property, see “Column Properties” on page 127.

Property Notes for Editing Values in This Column


Display Name Name for this column as it will be displayed in the Hub Console.
Length You can only increase the length of a CHAR, VARCHAR, NCHAR, or
NVARCHAR2 field.
Default Used if no value is provided for the column but the column cannot be
null.
Trust Note: You need to synchronize metadata if you enable trust. If you
enable trust for a column on a table that already contains data, you will
be warned that your trust settings have changed and that you need to
run the trust Synchronization batch job in the Batch Viewer tool before
doing any further loads to the table (see “Running Synchronize Batch
Jobs After Changes to Trust Settings” on page 467). Siperian Hub will
automatically make sure that the Synchronization job is available in the
Batch Viewer tool. For more information, see Chapter 17, “Using Batch
Jobs”.
Warning: You must execute the synchronization process before you run
any more Load jobs. Otherwise, the trusted values used to populate the
column will be incorrect.
Warning: Beware and be very careful about disabling (unchecking) trust
for columns that already contain data. Disabling trust results in the
removal of columns from some of the underlying metadata tables and
the resultant loss of data.
If you inadvertently disable trust and save that change, you should
correct your error by enabling trust again and immediately running the
Synchronization job to recreate the metadata.
Unique Enabling the Unique indicator will fail if the column already contains
duplicate values. As noted before, it is recommended that you avoid
using the Unique option, particularly on base objects that might be
merged.

138 Siperian Hub Administrator Guide


Configuring Columns in Tables

Property Notes for Editing Values in This Column


Validate Warning: Beware when disabling validation, which results in the loss of
metadata for the associated column. This should be approached with
caution and should only be done with certainty.

4. Click the button to save your changes.

Changing the Column Display Order


You can move columns up or down in the display order. Changing the display order
does not affect the physical table in the database.

To change the column display order:


1. Navigate to the column editor for the table that you want to configure. For more
information, see “Navigating to the Column Editor” on page 131
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the column that you want to move.
4. Do one of the following:
• Click the button to move the selected column up in the display order.
• Click the button to move the selected column down in the display order.
5. Click the button to save your changes.

Deleting Columns
Removing columns should be approached with extreme caution. Any data that has
already been loaded into a column will be lost when the column is removed. It can also
be a slow process due to the number of underlying tables that could be affected. You
must save the changes immediately after removing the existing columns.

To delete a column from base objects, dependent objects, and landing tables:
1. Navigate to the column editor for the table that you want to configure. For more
information, see “Navigating to the Column Editor” on page 131

Building the Schema 139


Configuring Foreign-Key Relationships Between Base Objects

2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on


page 30.
3. Scroll the column definitions in the Properties pane and select a column that you
want to delete.
4. Click the button.
The Schema Manager prompts you to confirm deletion.
5. Click Yes.
The Schema Manager removes the deleted column definition from the list.
6. Click the button to save your changes.

Configuring Foreign-Key Relationships Between


Base Objects
This section describes how to configure foreign key relationships between base objects
in your Siperian Hub implementation. For a general overview of foreign key
relationships, see “Process Overview for Defining Foreign-Key Relationships” on page
143. For more information about parent-child relationships, see “Configuring Match
Paths for Related Records” on page 497.

About Foreign Key Relationships


In Siperian Hub, a foreign key relationship establishes an association between two base
objects via matching columns. In a foreign-key relationship, one base object (the child)
contains a foreign key column, which contains values that match values in the primary
key column of another base object (the parent).

140 Siperian Hub Administrator Guide


Configuring Foreign-Key Relationships Between Base Objects

Types of Foreign Key Relationships in ORS Tables

There are two types of foreign-key relationships in Hub Store tables.

Type Description
system foreign key Automatically defined and enforced by Siperian Hub to protect
relationships the referential integrity of your schema.
user-defined foreign key Custom foreign key relationships that are manually defined
relations according to the instructions later in this section.

Foreign Key Relationships and Dependent Objects

Foreign-key relationships are implicit between a dependent object and its parent base
object. This relationship is defined according to the instructions in “Configuring
Dependent Objects” on page 117.

Building the Schema 141


Configuring Foreign-Key Relationships Between Base Objects

Parent and Child Base Objects


The following diagram shows a foreign key relationship between parent and child base
objects. The foreign key column in the child base object points to the ROWID_
OBJECT column in the parent base object.

142 Siperian Hub Administrator Guide


Configuring Foreign-Key Relationships Between Base Objects

Process Overview for Defining Foreign-Key Relationships


To create a foreign-key relationship:
1. Create the parent table. For more information, see “Creating Base Objects” on
page 107.
2. Create the child table. For more information, see “Deleting Base Objects” on page
116.
3. Define the foreign key relationship between them according to the instructions in
“Adding Foreign-Key Relationships” on page 143.

If the child table contains generated keys from the parent table, the load process copies
the appropriate primary key value from the parent table into the child table.

Adding Foreign-Key Relationships


To add a foreign-key relationship between two base objects:
1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the schema tree, expand a base object (the base object that will be the child in
the relationship).
4. Right-click Relationships.
The Schema Manager displays the Properties tab of the Relationships page.

Building the Schema 143


Configuring Foreign-Key Relationships Between Base Objects

5. Click the button.


The Schema Manager displays the Add Relationship dialog.

6. Define the new relationship by selecting:


• a column in the Relate from tree, and
• a column in the Relate to tree
7. If you want, check (select) the Virtual relationship check box to create a foreign
key relationship that is not enforced by the database. Metadata is defined in the
ORS that an implicit relationship exists.
Note: You cannot select a display column for foreign key relationships that
Siperian Hub automatically creates.
8. Click OK.

144 Siperian Hub Administrator Guide


Configuring Foreign-Key Relationships Between Base Objects

9. Click the Diagram tab to view the foreign-key relationship diagram.

10. Click the button to save your changes.

Note: After you have created a relationship, if you go back and try to create another
relationship, the column is not displayed because it is in use. When you delete the
relationship, the column will be displayed.

Editing Foreign-Key Relationships


You can change only the Lookup Display Name in a foreign key relationship.
To change any other properties, you need to delete the relationship, add it again, and
specify the properties you want.

To edit the lookup display name for a foreign-key relationship between two base
objects:
1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the schema tree, expand a base object and right-click Relationships.

Building the Schema 145


Configuring Foreign-Key Relationships Between Base Objects

The Schema Manager displays the Properties tab of the Relationships page.

4. On the Properties tab, click the foreign-key relationship whose properties you want
to view.
The Schema Manager displays the relationship details.

5. Click the Edit button next to the Lookup Display Name and specify the new
value.
6. Click the button to save your changes.

146 Siperian Hub Administrator Guide


Configuring Foreign-Key Relationships Between Base Objects

Configuring Lookups for Foreign-Key Relationships


After you have created a foreign key relationship, you can configure a lookup for the
column. A lookup causes Siperian Hub to retrieve a data value from a parent table
during the stage process. For example, if an Address staging table includes a
CONSUMER_CODE_FK column, you could have Siperian Hub perform a lookup to
the ROWID_OBJECT column in the Consumer base object and retrieve the ROWID_
OBJECT value of the associated parent record in the Consumer table. For more
information, see “Configuring Lookups For Foreign Key Columns” on page 376.

Deleting Foreign-Key Relationships


You can delete any user-defined foreign-key relationship that has been added according
to the instructions in “Adding Foreign-Key Relationships” on page 143. You cannot
delete the system foreign key relationships that Siperian Hub automatically defines and
enforces to protect the referential integrity of your schema.

To delete a foreign-key relationship between two base objects:


1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the schema tree, expand a base object and right-click Relationships.
4. On the Properties tab, click the foreign-key relationship that you want to delete.
5. Click the button.
The Schema Manager prompts you to confirm deletion.
6. Click Yes.
The Schema Manager deletes the foreign key relationship.
7. Click the button to save your changes.

Building the Schema 147


Viewing Your Schema

Viewing Your Schema


You can use the Schema Viewer tool in the Hub Console to visualize the schema in an
ORS. The Schema Viewer is particularly helpful for visualizing a complex schema.

Starting the Schema Viewer


Note: The Schema Viewer can also be launched from within the Metadata Manager, as
described in the Siperian Hub Metadata Manager Guide. Once started, however, the
instructions for using the Schema Viewer are the same, regardless of where it was
launched from.

To start the Schema Viewer tool:


• In the Hub Console, expand the Model workbench, and then click Schema
Viewer.
The Hub Console starts the Schema Viewer and loads the data model, showing a
progress dialog.

148 Siperian Hub Administrator Guide


Viewing Your Schema

The Hub Console displays the Schema Viewer tool, as shown in the following
example.

Diagram Pane Overview Pane

Panes in the Schema Viewer

The Schema Viewer is divided into two panes.

Pane Description
Diagram pane Shows a detailed diagram of your schema.
Overview pane Shows an abstract overview of your schema. The gray box highlights the
portion of the overall schema diagram that is currently displayed in the
diagram pane. Drag the gray box to move the display area over a particular
portion of your schema.

Building the Schema 149


Viewing Your Schema

Command Buttons in the Schema Viewer

The Diagram Pane in the Schema Viewer contains the following command buttons:

Button Name Description


Zoom In Zooms in and magnifies a smaller area of the schema diagram, as
described in “Zooming In” on page 150.
Zoom Out Zooms out and displays a larger area of the schema diagram, as
described in “Zooming Out” on page 151.
Zoom All Zooms out to displays the entire schema diagram, as described in
“Zooming All” on page 152.
Layout Toggles between a hierarchic and orthogonal view, as described in
“Switching Views of the Schema Diagram” on page 152.
Options Shows or hides column names and controls the orientation of the
hierarchic view, as described in “Configuring Schema Viewer
Options” on page 156.
Save Saves the schema diagram as a JPG file, as described in “Saving the
Schema Diagram as a JPG Image” on page 157.
Print Prints the schema diagram, as described in “Printing the Schema
Diagram” on page 158.

Zooming In and Out of the Schema Diagram


You can zoom in and out of the schema diagram.

Zooming In

To zoom into a portion of the schema diagram:


• Click the button.

150 Siperian Hub Administrator Guide


Viewing Your Schema

The Schema Viewer magnifies a portion of the screen.

Note that the gray highlight box in the Overview Pane has grown smaller to indicate
the portion of the schema that is displayed in the diagram pane.

Zooming Out

To zoom out of the schema diagram:


• Click the button.

The Schema Viewer zooms out of the schema diagram.

Note that the gray box in the Overview Pane has grown larger to indicate a larger
viewing area.

Building the Schema 151


Viewing Your Schema

Zooming All

To zoom all of the schema diagram, which means that the entire schema diagram is
displayed in the Diagram Pane:
• Click the button.

The Schema Viewer zooms out to display the entire schema diagram.

Switching Views of the Schema Diagram


The Schema Viewer displays the schema diagram in two different views.

152 Siperian Hub Administrator Guide


Viewing Your Schema

Hierarchic View

The following figure shows an example of the hierarchic view (the default).

Building the Schema 153


Viewing Your Schema

Orthogonal View

The following figure shows the same schema in the orthogonal view.

Toggling Views

To switch between the hierarchic and orthogonal views:


• Click the Layout button.

The Schema Viewer displays the other view.

154 Siperian Hub Administrator Guide


Viewing Your Schema

Navigating to Related Design Objects and Batch Jobs


Right-clicking on an object in the Schema Viewer displays a context menu.

The context menu displays the following commands.

Command Description
Go to BaseObject Launches the Schema Manager and displays this base object with an
expanded base object node.
Go to Staging Launches the Schema Manager and displays the selected staging table under
Table the associated base object.
Go to Mapping Launches the Mappings tool and displays the properties for the selected
mapping.
Go to Job Launches the Batch Viewer and displays the properties for the selected
batch job.
Go to Batch Launches the Batch Group tool.
Groups

Building the Schema 155


Viewing Your Schema

Configuring Schema Viewer Options


To configure Schema Viewer options:
1. Click the button.
The Schema Viewer displays the Options dialog.

2. Specify the options you want.

Pane Description
Show column names Controls whether column names appear in the entity boxes.
• Check (select) this option to display column names in the
entity boxes.
• Uncheck (clear) this option to hide column names and display
only entity names in the entity boxes.
Orientation Controls the orientation of the schema hierarchy. One of the
following values:
• Top to Bottom (default)—Hierarchy goes from top to
bottom, with the highest-level node at the top.
• Bottom to Top—Hierarchy goes from bottom to top, with
the highest-level node at the bottom.
• Left to Right—Hierarchy goes from left to right, with the
highest-level node at the left.
• Right to Left—Hierarchy goes from right to left, with the
highest-level node at the right.

156 Siperian Hub Administrator Guide


Viewing Your Schema

In the following example, column names are hidden.

3. Click OK.

Saving the Schema Diagram as a JPG Image


To save the schema diagram as a JPG image:
1. Click the button.

Building the Schema 157


Viewing Your Schema

The Schema Viewer displays the Save dialog.

2. Navigate to the location on the file system where you want to save the JPG file.
3. Specify a descriptive name for the JPG file.
4. Click Save.
The Schema Viewer saves the file.

Printing the Schema Diagram


To print the schema diagram:
1. Click the button.
The Schema Viewer displays the Print dialog.

158 Siperian Hub Administrator Guide


Viewing Your Schema

2. Select the print options that you want.

Pane Description
Print Area Scope of what to print:
• Print All—Print the entire schema diagram.
• Print viewable—Print only the portion of the schema
diagram that is currently visible in the Diagram Pane.
Page Settings Page output options, such as media, orientation, and margins.
Printer Settings Printer options based on available printers in your environment.

3. Click Print.
The Schema Viewer sends the schema diagram to the printer.

Building the Schema 159


Viewing Your Schema

160 Siperian Hub Administrator Guide


6
Configuring Queries and Packages

This chapter describes how to configure Siperian Hub to provide queries and packages
that data stewards and applications can use to access data in the Hub Store.

Chapter Contents
• Before You Begin
• About Queries and Packages
• Configuring Queries
• Configuring Packages

Before You Begin


Before you begin to define queries and packages, you must have:
• installed Siperian Hub and created the Hub Store according to the instructions in
Siperian Hub Installation Guide for your platform
• built the schema according to the instructions Chapter 5, “Building the Schema”

161
About Queries and Packages

About Queries and Packages


In Siperian Hub, a query is a request to retrieve data from the Hub Store. A package is a
public view of one or more underlying tables in Siperian Hub. A package is based on a
query, which can select records from a table or from another package. Queries and
packages go together. Queries define the criteria for selecting data, and packages are
views that users use to operate on that data. A query can be used in multiple packages.
For more information, see:
• “Configuring Queries” on page 162
• “Configuring Packages” on page 196

Configuring Queries
This section describes how to create and modify queries using the Queries tool in the
Hub Console. The Queries tool allows you to create simple, advanced, and custom
queries.

About Queries
In Siperian Hub, a query is a request to retrieve data from the Hub Store. Just like any
SQL-based query statement, Siperian Hub queries allow you to specify, via the Hub
Console, the criteria used to retrieve that data—tables and columns to include,
conditions for filtering records, and sorting and grouping the results. Queries that you
save in the Queries tool can be used in packages, and data stewards can use them in the
Data Manager and Merge Manager tools.

Query Capabilities

You can define a query to:


• return selected columns
• filter the result set with a WHERE clause
• use complex query syntax, such as GROUP BY, ORDER BY, and HAVING
clauses
• use aggregate functions, such as SUM, COUNT, and AVG

162 Siperian Hub Administrator Guide


Configuring Queries

Types of Queries

You can create the following types of queries:

Type Description
query Created by selecting tables and columns, and configuring query conditions,
sort by, and group by options, according to the instructions in “Configuring
Queries” on page 166.
custom query Created by specifying a SQL statement according to the instructions in
“Configuring Custom Queries” on page 190.

How Schema Changes Affect Queries

Queries are dependent on the base object columns from which they retrieve data.
If changes are made to the column configuration in the base object associated with a
query, then the queries—including custom queries—are updated automatically.
For example, if a column is renamed, then the name is updated in any dependent
queries. If a column is deleted in the base object, then the consequences depend on the
type of query:
• For a custom query, the query becomes invalid and must be manually fixed in the
Queries tool or the Packages tool. Otherwise, if executed, an invalid query will
return an error.
• For all other queries, the column is removed from the query, as well as from any
packages that depend on the query.

Configuring Queries and Packages 163


Configuring Queries

Starting the Queries Tool


To start the Queries tool:
• Expand the Model workbench and then click Queries.
The Hub Console displays the Queries tool, as shown in the following example.

Navigation Pane Properties Pane


The Queries tool is divided into two panes:

Pane Description
navigation pane Displays a hierarchical list of configured queries and query
groups.
properties pane Displays the properties of the selected query or query group.

Configuring Query Groups


This section describes how to configure query groups.

About Query Groups

A query group is a logical group of queries. A query group is simply a mechanism for
organizing queries in the Queries tool.

164 Siperian Hub Administrator Guide


Configuring Queries

Adding Query Groups

To add a query group:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Right-click in the navigation pane and choose New Query Group.
The Queries tool displays the Add Query Group window.

4. Enter a descriptive name for this query group.


5. Enter a description for this query group.
6. Click OK.
The Queries tool adds the new query group to the tree.

Editing Query Group Properties

To edit query group properties:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.

Configuring Queries and Packages 165


Configuring Queries

3. In the navigation pane, select the query group that you want to configure.
4. For each property that you want to edit, click the Edit button next to it, and
specify the new value.
5. Click the Save button to save your changes.

Deleting Query Groups

You can delete an empty query group but not a query group that contains queries.

To delete a query group:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the navigation pane, right-click the empty query group that you want to delete,
and choose Delete Query Group.
The Queries tool prompts you to confirm deletion.
4. Click Yes.

Configuring Queries
This section describes how to configure queries.

Adding Queries

To add a query:
1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the query group to which you want to add the query.
4. Right-click in the Queries pane and choose New Query.

166 Siperian Hub Administrator Guide


Configuring Queries

The Queries tool displays the New Query Wizard.


5. If you see a Welcome screen, click Next.

6. Specify the following query properties:

Property Description
Query name Descriptive name for this query.
Description Option description of this query.
Query Group Select the query group to which this query belongs.
Select primary Primary table from which this query retrieves data.
table

7. Do one of the following:


• If you want the query to retrieve all columns and all records from the primary
table, click Finish to complete the process of creating the query.
• If you want to specify selection criteria, click Next and continue.

Configuring Queries and Packages 167


Configuring Queries

The Queries tool displays the Select query columns window.

8. Select the query columns from which you want the query to retrieve data.
Note: PUT-enabled packages require the Rowid Object column in the query.
9. Click Finish.
The Queries tool adds the new query to the tree.
10. Refine the query criteria by proceeding to the instructions in “Editing Query
Properties” on page 168.

Editing Query Properties

Once you have created a query, you can modify its properties to refine the criteria it
uses to retrieve data from the ORS.

168 Siperian Hub Administrator Guide


Configuring Queries

To modify the query properties:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the navigation tree, select the query that you want to modify.
The current query properties are displayed in the properties pane.

The properties pane displays the following set of tabs:

Tab Description
Tables Tables associated with this query. Corresponds to the SQL FROM clause.
For more information, see “Configuring the Table(s) in a Query” on page
170.

Configuring Queries and Packages 169


Configuring Queries

Tab Description
Select Columns associated with this query. Corresponds to the SQL SELECT
clause. For more information, see “Configuring the Column(s) in a
Query” on page 174.
Conditions Conditions associated with this query. Determines selection criteria for
individual records. Corresponds to the SQL WHERE clause. For more
information, see “Configuring Conditions for Selecting Records of Data”
on page 178.
Sort Sort order for the results of this query. Corresponds to the SQL ORDER
BY clause. For more information, see “Specifying the Sort Order for
Query Results” on page 183.
Grouping Grouping for the results of this query. Corresponds to the SQL GROUP
BY clause. “Specifying the Grouping for Query Results” on page 186.
SQL Displays the SQL associated with the selected query settings. “Viewing
the SQL for a Query” on page 190.

4. Make the changes you want.


5. Click the Save button.
The Queries tool validates your query settings and prompts you if it finds errors.

Configuring the Table(s) in a Query

The Tables tab displays the table(s) from which the query will retrieve information.
The information in this tab corresponds to the SQL FROM clause.

Adding a Table to a Query

To add a table to a query:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Tables tab.
4. Click the button.

170 Siperian Hub Administrator Guide


Configuring Queries

The Queries tool prompts you to select the table you want to add.

5. Select a table and then click OK.


If one or more other tables exist on the Tables tab, the Queries tool might prompt
you to select a foreign key relationship between the table you just added and
another table, as shown in the following example.

6. If prompted, select a foreign key relationship (if you want), and then click OK.

Configuring Queries and Packages 171


Configuring Queries

The Queries tool displays the added table in the Tables tab.

For multiple tables, the Queries tool displays all added tables in the Tables tab.

Foreign Key Relationship Join Type

172 Siperian Hub Administrator Guide


Configuring Queries

If you specified a foreign key between tables, the corresponding key columns are
linked. Also, if tables are linked by foreign key relationships, then the Queries tool
allows you to select the type of join for this query.

7. Click the Save button.

Deleting a Table from a Query

A query must have multiple tables in order for you to remove a table. You cannot
remove the last table in a query.

To remove a table from a query:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Tables tab.
4. Select the table that you want to delete.
5. Click the button.
The Queries tool removes the selected table from the query.
6. Click the Save button.

Configuring Queries and Packages 173


Configuring Queries

Configuring the Column(s) in a Query

The Select tab displays the list of column(s) in one or more source tables from which
the query will retrieve information, as shown in the following example.
The information in this tab corresponds to the SQL SELECT clause.

Adding Table Column(s) to a Query

To add a table column to a query:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Select tab.
4. Click the button.

174 Siperian Hub Administrator Guide


Configuring Queries

The Queries tool prompts you to select from a list of one or more tables.

5. Expand the list for the table containing the column that you want to add.
The Queries tool displays the list of columns for the selected table.

6. Select the column(s) you want to include in the query.


7. Click OK.

Configuring Queries and Packages 175


Configuring Queries

The Queries tool adds the selected column(s) to the list of columns on the Select
tab.
8. Click the Save button.

Removing Table Column(s) from a Query

To remove a table column from the query:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Select tab.
4. Select one or more column(s) that you want to remove.
5. Click the button.
The Queries tool removes the selected column(s) from the query.
6. Click the Save button.

Changing the Column Order

To change the order in which the columns will appear in the result set (if the list
contains multiple columns):
1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Select tab.
4. Select one column that you want to move.
5. Do one of the following:
• To move the selected column up the list, click the button.
• To move the selected column up the list, click the button.
The Queries tool moves the selected column up or down.

176 Siperian Hub Administrator Guide


Configuring Queries

6. Click the Save button.

Adding Functions

You can add aggregate functions to your queries (such as COUNT, MIN, or MAX).
At run time, these aggregate functions appear in the usual syntax for the SQL
statement used to execute the query—such as:
select col1, count(col2) as c1 from table_name group by col1

To add a function to a table column:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Select tab.
4. Click the button.
The Queries tool prompts you to select the function you want to add.

5. If you want, select a different column.


6. Select the function that you want to use on the selected column.
7. Click OK.
8. Click the Save button.

Configuring Queries and Packages 177


Configuring Queries

Adding Constants

To add a constant to a table column:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Select tab.
4. Click the button.
The Queries tool prompts you to specify the constant that you want to add.

5. Select the data type from the list.


6. Enter a value that is compatible with the selected data type.
7. Click OK.
8. Click the Save button.

Configuring Conditions for Selecting Records of Data

The Conditions tab displays a list of condition(s) that the query will use to select
records from the table. A comparison is a query condition that involves one column, one

178 Siperian Hub Administrator Guide


Configuring Queries

operator, and either another column or a constant value. The information in this tab
corresponds to the SQL WHERE clause.

Operators

For an operator, you can select one of the following values.

Operator Description
= Equals.
<> Does not equal.
IS NULL
IS NOT NULL
LIKE Value in the comparison column must be like the search value (includes
column values that match the search value). For example, if the search value is
%JO% for the last_name column, then the parameter will match column
values like “Johnson”, “Vallejo”, “Major”, and so on.
NOT LIKE Value in the comparison column must not be like the search value (excludes
column values that match the search value). For example, if the search value is
%JO% for the last_name column, then the parameter will omit column values
like “Johnson”, “Vallejo”, “Major”, and so on.
< Less than.
<= Less than or equal to.

Configuring Queries and Packages 179


Configuring Queries

Operator Description
> Greater than.
>= Greater than or equal to.

Adding a Comparison

To add a comparison to this query:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Conditions tab.
4. Click the button.
The Queries tool prompts you to add a comparison.

5. If you want, select a different column.


6. Select the operator that you want to use on the selected column.
7. Select the type of comparison (Constant or Column).
• If you select Column, then select a column from the Edit Column drop-down
list.

180 Siperian Hub Administrator Guide


Configuring Queries

• If you selected Constant, then click the button, specify the constant that
you want to add, and then click OK.

8. Click OK.
The Queries tool adds the comparison to the list on the Conditions tab.
9. Click the Save button.

Editing a Comparison

To edit a comparison in this query:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Conditions tab.
4. Select the comparison that you want to edit.
5. Click the Edit button.

Configuring Queries and Packages 181


Configuring Queries

The Queries tool prompts you to edit the comparison.

6. Change the settings you want according to the instructions in “Adding a


Comparison” on page 180.
7. Click OK.
The Queries tool updates the comparison in the list on the Conditions tab.
8. Click the Save button.

Removing a Comparison

To remove a comparison from this query:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Conditions tab.
4. Select the comparison that you want to remove.
5. Click the button.
The Queries tool removes the selected comparison from the query.
6. Click the Save button.

182 Siperian Hub Administrator Guide


Configuring Queries

Specifying the Sort Order for Query Results

The Sort By tab displays a list of column(s) containing the values that the query will use
to sort the query results at run time. The information in this tab corresponds to the
SQL ORDER BY clause.

Selecting the Sort Columns

To select the sort columns in this query:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Sort tab.
4. Click the button.

Configuring Queries and Packages 183


Configuring Queries

The Queries tool prompts you to select sort columns.

5. Expand the list for the table containing the column(s) that you want to select for
sorting.
The Queries tool displays the list of columns for the selected table.

6. Select the column(s) you want to use for sorting.

184 Siperian Hub Administrator Guide


Configuring Queries

7. Click OK.
The Queries tool adds the selected column(s) to the list of columns on the Sort By
tab.
8. Do one of the following:
• Enable (check) the Ascending check box to sort records in ascending order for
the specified column.
• Disable (uncheck) the Ascending check box to sort records in descending
order for the specified column.
9. Click the Save button.

Removing Table Column(s) from a Sort Order

To remove a table column from the sort by list:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Sort tab.
4. Select one or more column(s) that you want to remove.
5. Click the button.
The Queries tool removes the selected column(s) from the sort by list.
6. Click the Save button.

Changing the Column Order

To change the order in which the columns will appear in the result set (if the list
contains multiple columns):
1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.

Configuring Queries and Packages 185


Configuring Queries

3. Click the Sort tab.


4. Select one column that you want to move.
5. Do one of the following:
• To move the selected column up the list, click the button.
• To move the selected column up the list, click the button.
The Queries tool moves the selected column up or down a record.
6. Click the Save button.

Specifying the Grouping for Query Results

The Grouping tab displays a list of column(s) containing the values that the query will
use for grouping the query results at run time. The information in this tab corresponds
to the SQL GROUP BY clause.

Selecting the Grouping Columns

To select the grouping columns in this query:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.

186 Siperian Hub Administrator Guide


Configuring Queries

2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on


page 30.
3. Click the Grouping tab.
4. Click the button.
The Queries tool prompts you to select grouping columns.

5. Expand the list for the table containing the column(s) that you want to select for
grouping.

Configuring Queries and Packages 187


Configuring Queries

The Queries tool displays the list of columns for the selected table.

6. Select the column(s) you want to use for grouping.


7. Click OK.
The Queries tool adds the selected column(s) to the list of columns on the
Grouping tab.
8. Click the Save button.

188 Siperian Hub Administrator Guide


Configuring Queries

Removing Table Column(s) from a Grouping Order

To remove a table column from the grouping list:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Grouping tab.
4. Select one or more column(s) that you want to remove.
5. Click the button.
The Queries tool removes the selected column(s) from the grouping list.
6. .Click the Save button.

Changing the Column Order

To change the order in which the columns will be grouped in the result set (if the list
contains multiple columns):
1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Grouping tab.
4. Select one column that you want to move.
5. Do one of the following:
• To move the selected column up the list, click the button.
• To move the selected column up the list, click the button.
The Queries tool moves the selected column up or down a record.
6. Click the Save button.

Configuring Queries and Packages 189


Configuring Queries

Viewing the SQL for a Query

The SQL tab displays the SQL statement that corresponds to the query options you
have specified for the selected query, as shown in the following example.

Configuring Custom Queries


This section describes how to configure custom queries in the Queries tool.

About Custom Queries

A custom query is simply a query for which you supply the SQL statement directly, rather
than building it according to the instructions in “Configuring Queries” on page 166.
Custom queries can be used in packages and in the data steward tools.

Adding Custom Queries

To add a custom query:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.

190 Siperian Hub Administrator Guide


Configuring Queries

3. Select the query group to which you want to add the query.
4. Right-click in the Queries pane and choose New Custom Query.
The Queries tool displays the New Custom Query Wizard.
5. If you see a Welcome screen, click Next.

6. Specify the following custom query properties:

Property Description
Query name Descriptive name for this query.
Description Option description of this query.
Query Group Select the query group to which this query belongs.

7. Click Finish.

Configuring Queries and Packages 191


Configuring Queries

The Queries tool displays the newly-added custom query.

8. Click the Edit button next to the SQL field.


9. Enter the SQL query according to the syntax rules for your database platform.
10. Click the Save button.
If an error occurs when the query is submitted to the database, then the Queries
tool displays the database error message, as shown in the following example.

Fix any errors and save your changes.

192 Siperian Hub Administrator Guide


Configuring Queries

Editing a Custom Query

Once you have created a custom query, you can modify its properties to refine the
criteria it uses to retrieve data from the ORS.

To modify the custom query properties:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the navigation tree, select the custom query that you want to modify.
4. Edit the property settings that you want to change, clicking the Edit button
next to the field if applicable.
5. Click the Save button.
The Queries tool validates your query settings and prompts you if it finds errors.

Deleting a Custom Query

You delete a custom query in the same way in which you delete a regular query.
For more information, see “Removing Queries” on page 195.

Viewing the Results of Your Query


To view the results of your query:
1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. In the navigation tree, expand the query for which you want to view the results.
3. Click View.

Configuring Queries and Packages 193


Configuring Queries

The Queries tool displays the results of your query, as shown in the following
example.

Viewing the Query Impact Analysis


The Queries tool allows you to view the packages based on a given query, along with
any tables and columns used by the query.

To view the impact analysis of a query:


1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Expand the query group associated with the query you want to select.
3. Right click the query and choose Impact Analysis from the pop-up menu.

194 Siperian Hub Administrator Guide


Configuring Queries

4. The Queries tool displays the Impact Analysis dialog.

5. Expand the list next to a table to display the columns associated with the query, if
you want.
6. Click Close.

Removing Queries
If a query has multiple packages based on it, remove those packages first before
attempting to remove the query.

To remove a query:
1. In the Hub Console, start the Queries tool according to the instructions in
“Starting the Queries Tool” on page 164.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Expand the query group associated with the query you want to remove.
4. Select the query you want to remove.
5. Right click the query and choose Delete Query from the pop-up menu.
The Queries tool prompts you to confirm deletion.
6. Click Yes.
The Queries tool removes the query from the list.

Configuring Queries and Packages 195


Configuring Packages

Configuring Packages
This section describes how to create and modify PUT and display packages. You use
the Packages tool in the Hub Console to define packages.

About Packages
A package is a public view of one or more underlying tables in Siperian Hub. Packages
represent subsets of the columns in those tables, along with any other tables that are
joined to the tables. A package is based on a query. The underlying query can select a
subset of records from the table or from another package. For more information, see
“Configuring Queries” on page 162.

What Packages Are Used For

Packages are used for:


• defining user views of the underlying data
• updating data via the Hub Console or applications that invoke Services Integration
Framework (SIF) requests. Some—but not all of the—SIF requests use packages.
For more information, see the Siperian Services Integration Framework Guide.

How Packages Are Used

Packages are used in the following ways:


• The Siperian Hub security model uses packages to control access to data for
third-party applications that access Siperian Hub functionality and resources using
the Services Integration Framework (SIF). To learn more, see “About Setting Up
Security” on page 832 and the Siperian Services Integration Framework Guide.
• The Merge Manager and Data Manager tools use packages to determine the ways
in which data stewards can view data. For more information, see the Siperian Hub
Data Steward Guide.
• Hierarchy Manager uses packages. For more information, see the Chapter 8,
“Configuring Hierarchies,” and “Using the Hierarchy Manager” in Siperian Hub
Data Steward Guide.

196 Siperian Hub Administrator Guide


Configuring Packages

Packages and SECURE Resources

Packages are configured as either SECURE or PRIVATE resources. For more


information, see “Securing Siperian Hub Resources” on page 841.

When to Create a Package

You must create a package if you want your Siperian Hub implementation to:
• Merge and update records in the Hub Store using the Merge Manager and Data
Manager tools. For more information, see the Siperian Hub Data Steward Guide.
• Allow an external application user to access Siperian Hub functionality using
Services Integration Framework (SIF) requests. For more information, see the
Siperian Services Integration Framework Guide.

In most cases, you create one set of packages for the Merge Manager and Data
Manager tools, and a different set of packages for external application users.

PUT-Enabled and Display Packages

There are two types of packages:


• PUT-enabled packages can be used to update data.
• Display packages cannot be used to update data.

You must use PUT-enabled packages when you:


• execute the SIF put request, which inserts or updates records
• use the Merge Manager and Data Manager tools

PUT-enabled packages:
• cannot include joins to other tables
• cannot be based on system tables or other packages
• cannot be based on queries that have constant columns, aggregate functions, or
group by settings

Configuring Queries and Packages 197


Configuring Packages

Note: In the Merge Manager Setup screen, a PUT-enabled package is referred to as a


merge package. The Merge Manager also allows you to choose a display package.

Starting the Packages Tool


To start the Packages tool:
1. Select the Packages tool in the Model workbench.

The Packages tool is displayed.


2. Select a package in the list.
The Packages tool displays properties for the selected package.

Navigation Pane Properties Pane

198 Siperian Hub Administrator Guide


Configuring Packages

The Packages tool is divided into two panes:

Pane Description
navigation pane Displays a hierarchical list of configured packages.
properties pane Displays the properties of the selected package.

Adding Packages
To add a new package:
1. In the Hub Console, start the Packages tool according to the instructions in
“Starting the Packages Tool” on page 198.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Right-click in the Packages pane and choose New Package.
The Packages tool displays the New Package Wizard.
Note: If the welcome screen is displayed, click Next.

Configuring Queries and Packages 199


Configuring Packages

4. Specify the following information.

Field Description
Display Name Name of this package as it will be displayed in the Hub Console.
Physical Name Actual name of the package in the database. Siperian Hub will
suggest a physical name for the package based on the display
name that you enter.
Description Description of this package.
Enable PUT To create a PUT package, check (select) to insert or update
records into base object tables.
Note: Every package that you use for merging data or updating
data must be PUT-enabled.
If you do not enable PUT, you create a display (read-only)
package.
Secure Resource Check (enable) to make this package a secure resource, which
allows you to control access to this package. Once a package is
designated as a secure resource, you can assign privileges to it in
the Roles tool. For more information, see “Securing Siperian
Hub Resources” on page 841, and “Assigning Resource
Privileges to Roles” on page 859.

5. Click Next.

200 Siperian Hub Administrator Guide


Configuring Packages

The New Package Wizard displays the Select Query dialog.

6. If you want, click New Query Group to add a new query group, as described in
“Configuring Query Groups” on page 164.
7. If you want, click New Query to add a new query, as described in “Configuring
Queries” on page 166.
8. Select a query.
Note: For PUT-enabled packages:
• only queries with ROWID_OBJECT can be used
• custom queries cannot be used
9. Click Finish.
The Packages tool adds the news package to the list.

Modifying Package Properties


To edit the package properties:
1. In the Hub Console, start the Packages tool according to the instructions in
“Starting the Packages Tool” on page 198.

Configuring Queries and Packages 201


Configuring Packages

2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on


page 30.
3. Select the package to configure.
4. In the properties panel, change any of the package properties that have an edit
button to the right.
5. If you want, expand the package in the packages list.
6. To change the query, select Query beneath the package and modify the query as
described in “Editing Query Properties” on page 168.

202 Siperian Hub Administrator Guide


Configuring Packages

7. To display the package view, select View beneath the package.

Refreshing Packages After Changing Queries


If a query has been changed, then any packages based on that query must be refreshed.

To refresh a package:
1. In the Hub Console, start the Packages tool according to the instructions in
“Starting the Packages Tool” on page 198.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the package that you want to refresh.

Configuring Queries and Packages 203


Configuring Packages

4. From the Packages menu, choose Refresh.

Note: If after a refresh the query remains out of synch with the package, then simply
check (select) or uncheck (clear) any columns for this query. For more information, see
“Configuring the Column(s) in a Query” on page 174.

Specifying Join Queries


You can choose to allows data stewards to view base object information, along with
information from the other tables, in the Data Manager or Merge Manager.

To expose this information:


1. Create a PUT-enabled base object package.

2. Create a query to join the PUT-enabled base object package with the other tables.
3. Create a display package based on the query you just created.

Removing Packages
To remove a package:
1. In the Hub Console, start the Packages tool according to the instructions in
“Starting the Packages Tool” on page 198.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the package to remove.
4. Right click the package and choose Delete Package.
The Packages tool prompts you to confirm deletion.
5. Click Yes.
The Packages tool removes the package from the list.

204 Siperian Hub Administrator Guide


7
State Management

This chapter describes how to configure state management in your Siperian Hub
implementation.

Chapter Contents
• Before You Begin
• About State Management in Siperian Hub
• State Transition Rules for State Management
• Configuring State Management for Base Objects
• Modifying the State of Records
• Rules for Loading Data

205
Before You Begin

Before You Begin


Before you begin to use state management, you must have:
• installed Siperian Hub and created the Hub Store according to the instructions in
Siperian Hub Installation Guide for your platform
• built a schema; for more information, see “About the Schema” on page 82.

About State Management in Siperian Hub


Siperian Hub supports workflow tools by storing pre-defined system states for base
object and XREF records. By enabling state management on your data, Siperian Hub
offers the following additional flexibility:
• Allows integration with workflow integration processes and tools
• Supports a “change approval” process
• Tracks intermediate stages of the process (pending records)

About System States


System state describes how base object records are supported by Siperian Hub. The
following table describes the supported system states:

State Description
ACTIVE Default state. Record has been reviewed and approved. Active
records participate in Hub processes by default.
This is a state associated with a base object or cross reference record.
A base object record is active if at least one of its cross reference
records is active. A cross reference record contributes to the
consolidated base object only if it is active.
These are the records that are available to participate in any
operation. If records are required to go through an approval process,
then these records have been through that process and have been
approved.
Note that Siperian Hub allows matches to and from PENDING and
ACTIVE records.

206 Siperian Hub Administrator Guide


About State Management in Siperian Hub

State Description
PENDING Pending records are records that have not yet been approved for
general usage in the Hub. These records can have most operations
performed on them, but operations have to specifically request
pending records. If records are required to go through an approval
process, then these records have not yet been approved and are in the
midst of an approval process.
If there are only pending XREF records, then the Best Version of the
Truth (BVT) on the base object is determined through trust on the
PENDING records.
Note that Siperian Hub allows matches to and from PENDING and
ACTIVE records.
DELETED Deleted records are records that are no longer desired to be part of
the Hub’s data. These records are not used in processes (unless
specifically requested). Records can only be deleted explicitly and
once deleted can be restored if desired. When a record that is
pending is deleted, it is physically deleted, does not enter the
DELETED state, and cannot be restored.
In order for a record to be deleted, it must be in either the ACTIVE
state for soft delete or the PENDING state for hard delete.
Note that Siperian Hub does not include records in the DELETED
state for trust and validation rules.

About the Hub State Indicator


All base objects and cross-reference tables have a system column, HUB_STATE_IND,
that indicates the system state for records in those tables. This column contains the
following values associated with system states:

System State Value


ACTIVE (Default) 1
PENDING 0
DELETED -1

State Management 207


About State Management in Siperian Hub

Protecting Pending Records Using the Interaction ID


You can not use the tools in the Hub Console to change the state of a base object or
XREF record from PENDING to ACTIVE state if the interaction_ID is set. The
Interaction ID column is used to protect a pending XREF record from updates that
are not part of the same process as the original XREF record. Use one of the state
management SIF API requests, instead. For more information, see Siperian Services
Integration Framework Guide.

Note: The Interaction ID can be specified through any API; however, it cannot be
specified when performing batch processing. For example, records that are protected
by an Interaction ID cannot be updated by the Load batch process.

The protection provided by interaction IDs is outlined in the following table. Note that
in the following table the Version A and Version B examples are used to represent the
situations where the incoming and existing interaction ID do and do not match:

Incoming Interaction ID Existing Interaction ID


Version A Version B Null
Version A OK Error OK
Version B Error OK OK
Null Error Error OK

State Transition Rules for State Management


This section describes transition rules for state management.

About State Transition Rules

State transition rules determine whether and when a record can change from one state to
another. State transition for base object and XREF records can be enabled using the
following methods:
• Using the Data Manager or Merge Manager tools in the Hub Console; for more
information, see Siperian Hub Data Steward Guide.

208 Siperian Hub Administrator Guide


About State Management in Siperian Hub

• Promote batch job; for more information, see “Promote Jobs” on page 741.
• SiperianClient API; for more information, see Siperian Services Integration Framework
Guide.

State transition rules differ for base object and cross-reference records.

State Management 209


About State Management in Siperian Hub

Transition Rules for Base Object Records


State Description
ACTIVE • Can transition to DELETED state.
• Can transition to PENDING state only if the base object record
becomes DELETED and a pending XREF record is added.
PENDING • Can transition to ACTIVE state. This transition is called
promotion. To learn more, see “Modifying the State of Records”
on page 216.
• Cannot transition to DELETED state. Instead, a PENDING
record is physically removed from the Hub.
DELETED • Can transition to ACTIVE state only if XREF records are
restored.
• Cannot transition to PENDING state.
• Note: In order for a record to be deleted, it must be in either the
ACTIVE state for soft delete or the PENDING state for hard
delete.

Transition Rules for Cross-reference (XREF) Records


State Description
ACTIVE • Can transition to DELETED state.
• Cannot transition to PENDING state.
PENDING • Can transition to ACTIVE state. This transition is called
promotion. To learn more, see “Modifying the State of Records”
on page 216.
• Cannot transition to DELETED state. Instead, a PENDING
record is physically removed from the Hub.
DELETED • Can transition to ACTIVE state. This transition is called restore.
• Cannot transition to PENDING state.
• Note: In order for a record to be deleted, it must be in either the
ACTIVE state for soft delete or the PENDING state for hard
delete.

210 Siperian Hub Administrator Guide


Configuring State Management for Base Objects

Hub States and Base Object Record Value Survivorship


When there are active and pending (or deleted) cross-references together in the same
base object record, whether after a merge, put, or load, the values on the base object
record reflect only the values from the active cross-reference records. As such:
• ACTIVE values always prevail over PENDING and DELETED values.
• PENDING values always prevail over DELETED values.

Configuring State Management for Base Objects


You can configure state management for base objects using the Schema tool. How you
configure the base object depends on your focus. Once you enable state management
for a base object, you can also configure the following options for the base object:
• Enable the history of cross-reference promotion; for more information, see
“Enabling the History of Cross-Reference Promotion” on page 213
• Include pending records in the match process; for more information, see
“Enabling Match on Pending Records” on page 214
• Enable message queue triggers for a state-enabled base object record; for more
information, see “Enabling Message Queue Triggers for State Changes” on page
215

Enabling State Management


State management is configured per base object and is disabled by default—it must be
explicitly enabled.

To enable state management for a base object:


1. Open the Model workbench and click Schema.

2. In the Schema tool, select the desired base object.


3. Click the Enable State Management checkbox on the Advanced tab of the Base
Object properties.

State Management 211


Configuring State Management for Base Objects

Enable State Management Check box

212 Siperian Hub Administrator Guide


Configuring State Management for Base Objects

Enabling the History of Cross-Reference Promotion


When the History of Cross-Reference Promotion option is enabled, the Hub creates
and stores history information in the _HXPR table for a base object each time an
XREF belonging to a record in this base object undergoes a state transition from
PENDING (0) to ACTIVE (1).

To enable the history of cross-reference promotion for a base object:


1. Open the Model workbench and click on the Schema tool.

2. In the Schema tool, select the desired base object.


3. Click the Enable State Management checkbox on the Advanced tab of the Base
Object properties.
4. Click the History of Cross-Reference Promotion checkbox on the Advanced
tab of Base Object properties.

History of Cross-Reference Promotion Check box

State Management 213


Configuring State Management for Base Objects

Enabling Match on Pending Records


By default, the match process includes only active records and ignores pending records.
For state management-enabled objects, to include pending records in the match
process, match pending records must be explicitly enabled.

To enable match on pending records for a base object:


1. Open the Model workbench and click on the Schema tool.

2. In the Schema tool, select the desired base object.


3. Click the Enable State Management checkbox on the Advanced tab of the Base
Object properties.
4. Select Match/Merge Setup for the base object.
5. Click the Enable Match on Pending Records checkbox on the Properties tab of
Match/Merge Setup Details panel.

Enable Match on Pending Records Check box

214 Siperian Hub Administrator Guide


Configuring State Management for Base Objects

Enabling Message Queue Triggers for State Changes


Siperian Hub uses message triggers to identify which actions are communicated to
outside applications using messages in message queues. When an action occurs for
which a rule is defined, a message is placed in the message queue. A message trigger
specifies the queue in which messages are placed.

Siperian Hub enables you to trigger message events for base object record when a
pending update occurs. The following message triggers are available for state changes
to base object or XREF records:

Event Trigger Action


Add new pending data A new pending record is created.
Update existing pending data A pending base object record is updated.
Pending update; only XREF changed A pending XREF record is updated. This event
includes the promotion of a record.
Delete base object data A base object record is soft deleted.
Delete XREF data An XREF record is soft deleted.
Delete pending base object data A base object record is hard deleted.
Delete pending XREF data An XREF record is hard deleted.

To enable the message queue triggers on a pending update for a base object:
1. Open the Model workbench and click on Schema.

2. In the Schema tool, click the Trigger on Pending Updates checkbox for message
queues in the Message Queues tool.

To learn more about message queues and message triggers, including how to enable
message queue triggers for state changes to base object and XREF records, see
“Configuring Message Triggers” on page 612.

State Management 215


Modifying the State of Records

Modifying the State of Records


Promotion of a record is the process of changing the system state of individual records in
Siperian Hub from PENDING state to the ACTIVE state. You can set a record for
promotion immediately using the Data Steward tools, or you can flag records to be
promoted at a later time using the Promote batch process.

Promoting Records in the Data Steward Tools


You can immediately promote PENDING base object or XREF records to ACTIVE
state using the tools in the Data Steward workbench (that is, the Data Manager or
Merge Manager). You can also flag these records for promotion at a later time using
either tool. To learn more about using the Hub Console to perform these tasks, see the
Siperian Hub Data Steward Guide.

Flagging Base Object or XREF Records for Promotion at a Later


Time

To flag base object or XREF records for promotion at a later time using the Data
Manager:
1. Open the Data Steward workbench and click on the Data Manager tool.

2. In the Data Manager tool, click on the desired base object or XREF record.
3. Click on the Flag for Promote button on the associated panel.

216 Siperian Hub Administrator Guide


Modifying the State of Records

Flag for Promote Buttons

Note: If HUB_STATE_IND is set to read-only for a package, the Set Record


State button is disabled (greyed-out) in the Data Manager and Merge Manager
Hub Console tools for the associated records. However, the Flag for Promote
button remains active because it doesn’t directly alter the HUB_STATE_IND
column for the record(s).
4. Run a batch job to promote records that are flagged for promotion. For more
information, see “Promoting Records Using the Promote Batch Job”.

Promoting Matched Records Using the Merge Manager

To promote matched records at a later time using the Merge Manager:


1. Open the Data Steward workbench and click on the Merge Manager tool.

2. In the Merge Manager tool, click on the desired matched record.


3. Click on the Flag for Promote button on the Matched Records panel.

State Management 217


Modifying the State of Records

You can now promote these PENDING XREF records using the Promote batch job.

Promoting Records Using the Promote Batch Job


You can run a batch job to promote records that are flagged for promotion using the
Batch Viewer or Batch Group tool.

Setting Up a Promote Batch Job Using the Batch Viewer

To set up a batch job using the Batch Viewer to promote records flagged for
promotion:
1. Flag the desired PENDING records for promotion.

For more information, see “Modifying the State of Records” on page 216.
2. Open the Utilities workbench and click on the Batch Viewer tool.
3. Click on the Promote batch job under the Base Object node displayed in the
Batch Viewer.
4. Select Promote flagged records abc.
Where abc represents the associated records that you have previously flagged for
promotion.
5. Click Execute Batch button to promote the records flagged for promotion.

218 Siperian Hub Administrator Guide


Modifying the State of Records

Setting Up a Promote Batch Job Using the Batch Group Tool

To add a Promote Batch job using the Batch Group Tool to promote records flagged
for promotion:
1. Flag the desired PENDING records for promotion.

For more information, see “Modifying the State of Records” on page 216.
2. Open the Utilities workbench and click on the Batch Group tool.

3. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
4. Right-click the Batch Groups node in the Batch Group tree and choose Add
Batch Group from the pop-up menu (or select Add Batch Group from the
Batch Group menu). For more information, see “Adding Batch Groups” on page
691.
5. In the batch groups tree, right click on any level, and choose the desired option to
add a new level to the batch group.
The Batch Group tool displays the Choose Jobs to Add to Batch Group dialog.
For more information, see “Adding Levels to a Batch Group” on page 694.
6. Expand the base object(s) for the job(s) that you want to add.

State Management 219


Modifying the State of Records

7. Select the Promote flagged records in [XREF table] job.


8. Click OK.

The Batch Group tool adds the selected job(s) to the batch group.

9. Click the button to save your changes.

You can now execute the batch group job. For more information, see “Executing a
Batch Group” on page 704.

220 Siperian Hub Administrator Guide


Rules for Loading Data

Rules for Loading Data


The load batch process loads records in any state. The state is specified as an input
column on the staging table. The input state can be specified in the mapping as a
landing table column or it can be derived. If an input state is not specified in the
mapping, then the state is assumed to be ACTIVE (for Load inserts). When a record is
updated through a Load batch job and the incoming state is null, the existing state of
the record to update will remain unchanged.

The following table describes how input states affect the states of existing XREF
records.

No XREF
Existing (Load by No Base Object
XREF State: ACTIVE PENDING DELETED rowid) Record
Incoming
XREF State:
ACTIVE Update Update + Update + Insert Insert
Promote Restore
PENDING Pending Pending Pending Pending Pending Insert
Update Update Update + Update
Restore
DELETED Soft Delete Hard Delete Hard Delete Error Error
Undefined Treat as Treat as Treat as Treat as Treat as ACTIVE
ACTIVE PENDING DELETED ACTIVE

State Management 221


Rules for Loading Data

222 Siperian Hub Administrator Guide


8
Configuring Hierarchies

This chapter explains how to configure Siperian Hierarchy Manager (HM) using the
Siperian Hierarchies tool in the Hub Console. The chapter describes how to set up
your data and how to configure the components needed by Hierarchy Manager for
your Siperian Hub implementation, including entity types, hierarchies, relationships
types, packages, and profiles. For instructions on using the Hierarchy Manager, see the
Siperian Hub Data Steward Guide. This chapter is recommended for Siperian Hub
administrators and implementers.

Chapter Contents
• About Configuring Hierarchies
• Starting the Hierarchies Tool
• Configuring Hierarchies
• Configuring Relationship Base Objects and Relationship Types
• Configuring Packages for Use by HM
• Configuring Profiles
• Sandboxes

223
About Configuring Hierarchies

About Configuring Hierarchies


Siperian Hub administrators use the Hierarchies tool to set up the structures required
to view and manipulate data relationships in Hierarchy Manager. Use the Hierarchies
tool to define Hierarchy Manager components—such as entity types, hierarchies,
relationships types, packages, and profiles—for your Siperian Hub implementation.

When you have finished defining Hierarchy Manager components, you can use the
package or query manager tools to update the query criteria.

To understand the concepts in this chapter, you must be familiar with the concepts in
the following chapters in this guide (Siperian Hub Administrator Guide):
• Chapter 5, “Building the Schema”
• Chapter 6, “Configuring Queries and Packages”
• Chapter 15, “Configuring the Consolidate Process”
• Chapter 20, “Setting Up Security”

Before You Begin


Before you begin to configure your Hierarchy Manager (HM) system, you must have
completed the following tasks:
• Start with a blank ORS or a valid ORS and register the database in CMX_SYSTEM, as
described in “Registering an ORS” on page 62.
• Verify that you have a license for Hierarchy Manager. For details, consult your
Siperian Hub administrator.
• Perform data analysis, as described in Preparing Your Data for Hierarchy Manager.

224 Siperian Hub Administrator Guide


About Configuring Hierarchies

Overview of Configuration Steps


To configure Hierarchy Manager, complete the following steps:
1. Start the Hub Console, as described in “Starting the Hub Console” on page 19.

2. Launch the Hierarchies tool, as described in “Starting the Hierarchies Tool” on


page 234.
If you have not already created the Repository Base Object (RBO) tables, Hub
Console walks you through the process, as described in “Creating the HM
Repository Base Objects” on page 235.
3. Create entity objects and types, as described in “Configuring Entity Objects and
Entity Types” on page 240.
4. Create hierarchies, as described in “Configuring Hierarchies” on page 253.
5. Create relationship objects and types, as described in “Configuring Relationship
Base Objects and Relationship Types” on page 255.
6. Create packages, as described in “Configuring Packages for Use by HM” on page
269.
7. Configure profiles, as described in “Deleting Relationship Types from a Profile” on
page 284.
8. Validate the profile, as described in “Validating Profiles” on page 280.

Note: The same options you see on the right-click menu in the Hierarchy Manager are
also available on the Hierarchies menu.

Preparing Your Data for Hierarchy Manager


To make the best use of HM, you should analyze your information and make sure you
have done the following:
• Verified that your data sources are valid and trustworthy.
For more information on security issues, see Chapter 20, “Setting Up Security”.

Configuring Hierarchies 225


About Configuring Hierarchies

• Created valid schema to work with Siperian Hub and the HM.
For more information on schemas and how to create them, see Chapter 5,
“Building the Schema”.
• Created all relationships between your entities, including:
• Hierarchical relationships:
• All child entities must have a valid parent entity related to them.
Your data cannot have any ‘orphan’ child entities when it enters HM.
• All hierarchies must be validated (see Chapter 9, “Siperian Hub
Processes”).
• Foreign key relationships.
For a general overview of foreign key relationships, see “Process Overview for
Defining Foreign-Key Relationships” on page 143. For more information
about parent-child relationships, see “Configuring Match Paths for Related
Records” on page 497.
• One-hop and multi-hop relationships (direct and indirect relationships
between entities). For more information on these kinds of relationships, see
the Siperian Hub Data Steward Guide.
• Derived HM types.
• Consolidated duplicate entities from multiple source systems.
For example, a group of entities (Source A) might be the same as another group of
entities (Source B), but the two groups of entities might have different group
names. Once the entities are identified as being identical, the two groups can be
consolidated.
For more information on consolidation, see Chapter 9, “Siperian Hub Processes”.
• Grouped your entities into logical categories, such as physician’s names into the
“Physician” category.
For more information on how to group your data, see Chapter 4, “Configuring
Operational Record Stores and Datasources”.

226 Siperian Hub Administrator Guide


About Configuring Hierarchies

• Made sure that your data complies with the rules for:
• Referential integrity.
• Invalid data.
• Data volatility.
For more information on these database concepts, see a database reference text.

Use Case Example of How to Prepare Data for HM


This section contains an example of how to manipulate your data before it enters
Siperian Hub and before it is viewed in Hierarchy Manager. Typically, a company’s data
would be much larger than the example given here.

Scenario

John has been tasked with manipulating his company’s data so that it can be viewed
and used within Hierarchy Manager in the most efficient way. To simplify the example,
we are describing a subset of the data that involves product types and products of the
company, which sells computer components.

The company sells three types of products: mice, trackballs, and keyboards. Each of
these product types includes several vendors and different levels of products, such as
the Gaming keyboard and the TrackMan trackball.

Methodology

This section describes the method of data simplification.

Step 1 - Organizing Data into the Hierarchy

In this step you organize the data into the Hierarchy that will then be translated into
the HM configuration.

John begins by analyzing the product and product group hierarchy. He organizes the
products by their product group and product groups by their parent product group.

Configuring Hierarchies 227


About Configuring Hierarchies

The sheer volume of data and the relationships contained within the data are difficult
to visualize, so John lists the categories and sees if there are relationships between
them.

The following table (which contains data from the Marketing department) shows an
example of how John might organize his data.

Note: Most data sets will have many more items.

The table shows the data that will be stored in the Products BO. This is the BO to
convert (or create) in HM. The table shows Entities, such as Mice or Laser Mouse. The
relationships are shown by the grouping, that is, there is a relationship between Mice
and Laser Mouse. The heading values are the Entity Types: Mice is a Product Group
and Laser Mouse is a Product. This Type is stored in a field on the Product table.

Organizing the data in this manner allows John to clearly see how many entities and
entity types are part of the data, and what relationships those entities have.

The major category is ProdGroup, which can include both a product group (such as
mice and pointers), the category Product, and the products themselves (such as the
Trackman Wheel). The relationships between these items can be encapsulated in a
relationship object, which John calls Product Rel. In the information for the Product
Rel, John has explained the relationships: Product Group is the parent of both Product
and Product Group.

Step 2 - Creating Relationship Base Object Tables

Having analyzed the data, John comes to the following conclusions:


• Product (the BO) should be converted to an Entity Object.

228 Siperian Hub Administrator Guide


About Configuring Hierarchies

• Product Group and Product are the Entity Types.


• Product Rel is the Relationship Object to be created.
• The following relationship types (not all shown in the table) need to be created:
• Product is the parent of Product (not shown)
• Product Group is the parent of Product (such as with the Mice to Laser
Mouse example).
• Product Group is the parent of Product Group, such as with Mice + Pointers
being the parent of Mice).

John begins by accessing the Hierarchy Tool. When he accesses the tool, the system
creates the Relationship Base Object Tables (RBO tables). RBO tables are essentially
system base objects that are required base objects containing specific columns. They
store the HM configuration data, such as the data that you see in the table in Step 1.

The Siperian Hub Administrator Guide explains how to create base objects in detail. This
section describes the choices you would make when you create the example base
objects in the Schema tool.

You must create and configure a base object for each entity object and relationship
object that you identified in the previous step. In the example, you would create a base
object for Product and convert it to an HM Entity Object. The Product Rel BO should
be created in HM directly (an easier process) instead of converting. Each new base
object is displayed in the Schema panel under the category Base Objects. Repeat this
process to create all your base objects.

In the next section, you configure the base objects so that they are optimized for HM
use.

Step 3 - Configuring Base Objects

You created the two base objects (Product and Product Rel) in the previous section.
This section describes how to configure them.

Configuring a base object involves filling in the criteria for the object’s properties, such
as the number and type of columns, the content of the staging tables, the name of the

Configuring Hierarchies 229


About Configuring Hierarchies

cross-reference tables (if any), and so on. You might also enable the history function,
set up validation rules and message triggers, create a custom index, and configure the
external match table (if any).

Whether or not you choose these options and how you configure them depends on
your data model and base object choices.

In the example, John configures his base objects as the following sections explain.

Note: Not all components of the base-object creation are addressed here, only the
ones that have specific significance for data that will be used in the HM. For more
information on the components not discussed here, see the Schema chapter in this
Guide.

Columns

This table shows the Product BO after conversion to an HM entity object. In this list,
only the Product Type field is an HM field.

Every base object has system columns and user-defined columns. System columns are
created automatically, and include the required column: Rowid Object. This is the
Primary key for each base object table and contains a unique, Hub-generated value.
This value cannot be null because it is the HM lookup for the class code. HM makes a
foreign key constraint in the database so a ROWID_OBJECT value is required and cannot
be null.

230 Siperian Hub Administrator Guide


About Configuring Hierarchies

For the user-defined columns, John choose logical names that would effectively include
information about the products, such as Product Number, Product Type, and Product
Description. These same column and column values must appear in the staging tables.

Staging Tables

John makes sure that all the user-defined columns from the staging tables are added as
columns in the base object, as the graphic above shows. The Lookup column shows
the HM-added lookup value.

Notice that several columns in the Staging Table (Status Cd, Product Type, and
Product Type Cd) have references to lookup tables. You can set these references up
when you create the Staging Table. You would use lookups if you do not want to
hardcode a value in your staging table, but would rather have the server look up a value
in the parent table.

Most of the lookups are unrelated to HM and are part of the data model. The Rbo Bo
Class lookup is the exception because it was added by HM. HM adds the lookup on the
product Type column.

Note: When you are converting entities to entity base objects (entities that are
configured to be used in HM), you must have lookup tables to check the values for the
Status Cd, Product Type, and Product Type Cd.

Warning: HM Entity objects do not require start and end dates. Any start and end
dates would be user defined. However, Rel Objects do use these. Do not create new
Rel Objects with different names for start and end dates. These are already provided.

Configuring Hierarchies 231


About Configuring Hierarchies

Step 4 - Creating Entity Types

You create entity types in the Hierarchy Tool. John creates two entity types: ProdGroup
and Product Type. The following graphic shows the completed Product Entity Type
information.

Each entity type has a code that derives from the data analysis and the design. In this
example, John chose to use Product as one type, and Product Group as another.

This code must be referenced in the corresponding RBO base object table. In this
example, the code Product is referenced in the C_RBO_BO_CLASS table. The value
of the BO_CLASS_CODE is ‘Product’.

232 Siperian Hub Administrator Guide


About Configuring Hierarchies

The following graphic shows the relationship between the HM entity objects and HM
relationship objects to the RBO tables:

When John has completed all the steps in this section, he will be ready to create other
HM components, such as packages, and to view his data in the HM. For example, the
following graphic shows the relationships that John has set up in the Hierarchies Tool,

Configuring Hierarchies 233


Starting the Hierarchies Tool

displayed in the Hierarchy Manager. This example shows the hierarchy involving Mice
devices fully. For more information on how to use HM, see the Data Steward Guide.

Starting the Hierarchies Tool


To start the Hierarchies tool:
• In the Hub Console, do one of the following:
• Expand the Model workbench, and then click Hierarchies.
OR
• In the Hub Console toolbar, click the Hierarchies tool quick launch button.

234 Siperian Hub Administrator Guide


Starting the Hierarchies Tool

The Hub Console displays the Hierarchies tool, as shown in the following
example:

Properties Pane

Navigation Pane

If you are setting up the Hierarchies tool, see “Creating the HM Repository Base
Objects” on page 235. If you already have RBO tables set up, see “Configuring Entity
Icons” on page 238.

Creating the HM Repository Base Objects


To use the Hierarchies tool with an ORS, the system must first create the Repository
Base Objects (RBO tables) for the ORS. RBO tables are essentially system base objects.
They are required base objects that must contain specific columns.

Queries and MRM packages (and their associated queries) will also be created for these
RBO tables.

Warning: Never modify these RBO tables, queries, or packages.

Configuring Hierarchies 235


Starting the Hierarchies Tool

To create the RBOs:


1. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.

2. Start the Hierarchies tool. Expand the Model workbench and click Hierarchies.
To learn more, see “Starting the Hierarchies Tool” on page 234.

Note: Any option that you can select by right-clicking in the navigation panel, you can
also choose from the Hierarchies tool menu.

After you start the Hierarchies tool, if an ORS does not have the necessary RBO tables,
then the Hierarchies tool walks you through the process of creating them.

The following steps explain what to select in the dialog boxes that the Hierarchies tool
displays:
1. Choose Yes in the Siperian Hub Console dialog to create the metadata (RBO
tables) for HM in the ORS.
2. Select the tablespace names in the Create RBO tables dialog, and then click OK.

236 Siperian Hub Administrator Guide


Starting the Hierarchies Tool

Uploading Default Entity Icons


The Hierarchies tool prompts you to upload default entity icons. These icons will
be useful to you when you are creating entities.
1. Click Yes.
2. The Hub Console displays the Hierarchies tool with the default metadata, as
shown in the following example.

Upgrading From Previous Versions of Hierarchy Manager


After you upgrade a pre-XU schema to XU, you will be prompted to upgrade the
XU-specific Hierarchy Manager (HM) metadata when you open the Hierarchies tool in
the Hub Console.

To upgrade the HM metadata:


1. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.

2. Start the Hub Console. To learn more, see “Starting the Hub Console” on page 19.
3. Launch the Hierarchies tool in the Hub Console.
4. Click Yes to add additional columns.

After you upgrade a pre-XU schema to XU, you will be reminded to remove obsolete
HM metadata when you get into the Hierarchies tool.

Configuring Hierarchies 237


Starting the Hierarchies Tool

To remove obsolete HM metadata:


1. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.

2. Start the Hub Console. To learn more see the “Starting the Hub Console” on page
19.
3. Launch the Hierarchies tool in the Hub Console.

4. Click Yes to delete a base object.

Note: If the Rbo Rel Type Usage base object is being used by some other non-HM
base object, you will be told to manually delete the table by going to the schema
manager.

Siperian Hub XU shows relationship and entity types under the base object with which
they are associated. If a type is not associated with a base object, for example it does
not have packages assigned, it is not displayed in the GUI, but does remain in the
database.

During the ORS upgrade process, the migration script skips over the orphan entity and
relationship types, displays a related warning message, then continues. After the ORS
upgrade, you can delete the orphan types or associate entities and relationship types
with them.

If you want to associate orphan types but you have not created the corresponding base
objects, create the objects, then press refresh. The software prompts you to create the
association.

Configuring Entity Icons


Using the Hierarchies tool, you can add or configure your own entity icons that you
can subsequently use when configuring your entity types. These entity icons are used to
represent your entities in graphic displays within Hierarchy Manager. Entity icons must
be stored in a JAR or ZIP file.

238 Siperian Hub Administrator Guide


Starting the Hierarchies Tool

Adding Entity Icons

To import your own icons, create a ZIP or JAR file containing your icons. For each
icon, create a 16 x 16 icon for the small icon and a 48 x 48 icon for the large icon.

To add new entity icons:


1. Acquire a write lock.

2. Start the Hierarchies tool.


3. Right-click anywhere in the navigation pane and choose Add Entity Icons.
Note: You must acquire a lock to display the right-click menu.
A browse files window opens.
4. Browse for the JAR or ZIP file containing your icons.
5. Click Open to add the icons.

Modifying Entity Icons

You cannot modify icons directly from the console. You can download a ZIP or JAR
file, modify its contents, then upload it again into the console.

You can either delete icons groups or make them inactive. If an icon is already
associated with an entity, or if you could use a group of icons in the future, you might
consider choosing to inactivate them instead of deleting them.

You inactivate a group of icons by marking the icon package Inactive. Inactive icons
are not displayed in the UI and cannot be assigned to an entity type. To reactivate the
icon packet, mark it Active.

Warning: Siperian Hub does not validate icons assignments before deleting. If you
delete an icon that is currently assigned to an Entity Type, you will get an error when
you try to save the edit.

Configuring Hierarchies 239


Starting the Hierarchies Tool

Deleting Entity Icons

You cannot delete individual icons from a ZIP or JAR file from the console; you can
only delete them as a group or package.

To delete a group of entity icons:


1. Acquire a write lock.

2. Start the Hierarchies tool. To learn more, see “Starting the Hierarchies Tool” on
page 234.
3. Right-click the icon collections in the navigation pane and choose Delete Entity
Icons.

Configuring Entity Objects and Entity Types


This section describes how to define entity objects and entity types using the
Hierarchies tool.

About Entities, Entity Objects, and Entity Types

This section describes entities, entity objects, and entity types in Hierarchy Manager.

Entities

In Hierarchy Manager, an entity is any object, person, place, organization, or other thing
that has a business meaning and can be acted upon in your database. Examples include
a specific person’s name, a specific checking account number, a specific company, a
specific address, and so on.

Entity Base Objects

An entity base object is a base object that has been configured in HM, and that is used to
store HM entities. When you create an entity base object using the Hierarchies tool
(instead of the Schema Manager), the Hierarchies tool automatically creates the
columns required for Hierarchy Manager. You can also convert an existing MRM base
object to an entity base object by using the options in the Hierarchies tool.

240 Siperian Hub Administrator Guide


Starting the Hierarchies Tool

After adding an entity base object, you use the Schema Manager to view, edit, or delete
it. To learn more, see “Configuring Base Objects” on page 92.

Entity Types

In Hierarchy Manager, an entity type is a logical classification of one or more entities.


Examples include doctors, checking accounts, banks, and so on. An Entity Base Object
must have a Foreign Key to the Entity Type table (Rbo BO Class). The foreign key can
be defined as either a ROWID or predefined Code value. All entities with the same
entity type are stored in the same entity object. In the Hierarchies tool, entity types are
displayed in the navigation tree under the Entity Object with which the Type is
associated.

Well-defined entity types have the following characteristics:


• They effectively segment the data in a way that reflects the real-world nature of the
entities.
• They are disjoint. That is, no entity can have more than one entity type.
• Taken collectively, they cover the entire set of entities. That is, every entity has one
and only one entity type.
• They are granular enough so that you can easily define the types of relationships
that each entity type can have. For example, an entity type of “doctor” can have the
relationships: “member of ” with a medical group, “staff ” (or “non-staff with
admitting privileges”) with a hospital, and so on.
• A more general entity type, such as “care provider” (which encompasses nurses,
nurse practitioners, doctors, and others) is not granular enough. In this case, the
types of relationships that such a general entity type will have will depend on
something beyond just the entity type. Therefore, you need to need to define
more-granular entity types.

Configuring Hierarchies 241


Starting the Hierarchies Tool

Adding Entity Base Objects

To create a new entity base object:


1. In the Hierarchies tool, acquire a write lock.

2. Right-click anywhere in the navigation pane and choose Create New


Entity/Relationship Object. You can also choose this option from the
Hierarchies tool menu.
3. In the Create New Entity/Relationship Base Object, select Create New Entity
Base Object and click OK.

4. Click OK.
The Hierarchies tool prompts you to enter information about the new base object.

242 Siperian Hub Administrator Guide


Starting the Hierarchies Tool

5. Specify the following properties for this new entity type.

Field Description
Item Type Read-only. Already specified.
Display name Name of this base object as it will be displayed in the Hub
Console.
Physical name Actual name of the table in the database. Siperian Hub will
suggest a physical name for the table based on the display name
that you enter.
The RowId is generated and assigned by the system, but the BO
Class Code is created by the user, making it easier to remember.
Data tablespace Name of the data tablespace. To learn more, see the Siperian Hub
Installation Guide for your platform.
Index tablespace Name of the index tablespace. To learn more, see the Siperian
Hub Installation Guide for your platform.
Description Description of this base object.
Foreign Key column Column used as the Foreign Key for this entity type; can be
for Entity Types either ROWID or CODE.
The ability to choose a BO Class CODE column reduces
the complexity by allowing you to define the foreign key
relationship based on a predefined code, rather than the
Siperian generated ROWID.
Display name Descriptive name of the column of the Entity Type Foreign Key
that is displayed in Hierarchy Manager.
Physical name Actual name of the FK column in the table. Siperian Hub will
suggest a physical name for the FK column based on the display
name that you enter.

6. Click OK to save the new base object.

The base object you created has the columns required by Hierarchy Manager. You
probably require additional columns in the base object, which you can add using the
Schema Manager, as described in “Configuring Columns in Tables” on page 125.

Configuring Hierarchies 243


Starting the Hierarchies Tool

Important: When you modify the base object using the Schema Manager, do not
change any of the columns added by Hierarchy Manager. Modifying any of these
Hierarchy Manager columns will result in unpredictable behavior and possible data loss.

Converting Base Objects to Entity Base Objects

You must convert base objects to entity base objects before you can use them in HM.

Base objects created in MRM do not have the metadata required by Hierarchy
Manager. In order to use these MRM base objects with HM, you must add this
metadata via a conversion process. Once you have done this, you can use these
converted base objects with both MRM and HM.

To convert an existing MRM base object to work with HM:


1. In the Hierarchies tool, acquire a write lock.

2. Right-click anywhere in the navigation pane and choose Convert BO to


Entity/Relationship Object.
Note: The same options you see on the right-click menu are also available on the
Hierarchies menu.
3. In the Modify Existing Base Object dialog, select Convert to Entity and click OK.

Note: If you do not see any choices in the Modify Base Object field, then there are
no non-hierarchy base objects available. You must create one in the Schema tool.
4. Click OK.

244 Siperian Hub Administrator Guide


Starting the Hierarchies Tool

If the base object already has HM metadata, the Hierarchies tool will display a
message indicating the HM metadata that exists.

5. In the Foreign Key Column for Entity Types field, select the column to be added:
RowId Object or BO Class Code.
This is the descriptive name of the column of the Entity Type Foreign Key that is
displayed in Hierarchy Manager.
The ability to choose a BO Class Code column reduces the complexity by allowing
you to define the foreign key relationship based on a predefined code, rather than
the Siperian generated ROWID.
6. In the Existing BO Column to use, select an existing column or select the Create
New Column option.
If no BO columns exist, only the Create New Column option is available.
7. In the Display Name and Physical Name fields, create display and physical names
for the column, and click OK.

Configuring Hierarchies 245


Starting the Hierarchies Tool

The base object will now have the columns that Hierarchy Manager requires. To add
additional columns, use the Schema Manager (see “Configuring Columns in Tables” on
page 125).

Important: When you modify the base object using the Schema Manager tool, do not
change any of the columns added using the Hierarchies tool. Modifying any of these
columns will result in unpredictable behavior and possible data loss.

Adding Entity Types

To add a new entity type:


1. In the Hierarchies tool, right-click on the entity object in which you want to store
the new entity type you are creating and select Add Entity Type.

246 Siperian Hub Administrator Guide


Starting the Hierarchies Tool

The Hierarchies tool displays a new entity type (called New Entity Type) in the
navigation tree under the Entity Object you selected.

2. In the properties panel, specify the following properties for this new entity base
object.

Field Description
Code Unique code name of the Entity Type. Can be used as a foreign
key from HM entity base objects.
Display name Name of this entity type as it will be displayed in the Hub
Console. Specify a unique, descriptive name.
Description Description of this entity type.
Color Color of the entities associated with this entity type as they will
be displayed in the Hub Console in the Hierarchy Manager
Console and Business Data Director.
Small Icon Small icon for entities associated with this entity type as they will
be displayed in the Hub Console in the Hierarchy Manager
Console and Business Data Director.
Large Icon Large icon for entities associated with this entity type as they will
be displayed in the Hub Console in the Hierarchy Manager
Console and Business Data Director.

3. To designate a color for this entity type, click next to Color.

Configuring Hierarchies 247


Starting the Hierarchies Tool

The color choices window is displayed.

The color you choose determines how entities of this type are displayed in the
Hierarchy Manager. Select a color and click OK.
4. To select a small icon for the new entity type, click next to Small Icon.
The Choose Small Icon window is displayed.

Small icons determine how entities of this type are displayed when the graphic
view window shows many entities. To learn more about adding icon graphics for
your entity types, see “Configuring Entity Icons” on page 238.
Select a small icon and click OK.
5. To select a large icon for the new entity type, click next to Large Icon.

248 Siperian Hub Administrator Guide


Starting the Hierarchies Tool

The Choose Large Icon window is displayed.

Large icons determine how entities of this type are displayed when the graphic
view window shows few entities. To learn more about adding icon graphics for
your entity types, see “Configuring Entity Icons” on page 238.
Select a large icon and click OK.
6. Click to save the new entity type.

Editing Entity Types

To edit an entity type:


1. In the Hierarchies tool, in the navigation tree, click the entity type to edit.

2. For each field that you want to edit, click and make the change that you want.
For more information about these fields, see “Adding Entity Types” on page 246.
3. When you have finished making changes, click to save your changes.

Warning: If your entity object uses the code column, you probably do not want to
modify the entity type code if you already have records for that entity type.

Configuring Hierarchies 249


Starting the Hierarchies Tool

Deleting Entity Types

You can delete any entity type that is not used by any relationship types. If the entity
type is being used by one or more relationship types, attempting to delete it will
generate an error.

To delete an entity type:


1. Acquire a write lock.

2. In the Hierarchies tool, in the navigation tree, right-click the entity type that you
want to delete, and choose Delete Entity Type.
If the entity type is not used by any relationship types, then the Hierarchies tool
prompts you to confirm deletion.
3. Choose Yes.
The Hierarchies tool removes the selected entity type from the list.

Warning: You probably do not want to delete an entity type if you already have entity
records that use that type. If your entity object uses the code column instead of the
rowid column and you have records in that entity object for the entity type you are
trying to delete, you will get an error.

Display Options for Entities

In addition to configuring color and icons for entities, you can also configure the font
size and maximum width. While color and icons can be specified for each entity type,
the font size and width apply to entities of all types.

To change the font size in HM, use the HM Font Size and Entity Box Size. The default
entity font size (38 pts) and max entity box width (600 pixels) can be overridden by
settings in the cmxserver.properties file. The settings to use are:
sip.hm.entity.font.size=fontSize
sip.hm.entity.max.width=maxWidth

The value for fontSize can be from 6 to 100 and the value for maxWidth can be from
20 to 5000. If value specified is outside the range, the minimum or maximum values are
used. Default values are used if the values specified are not numbers.

250 Siperian Hub Administrator Guide


Starting the Hierarchies Tool

Reverting Entity Base Objects to Base Objects


If you inadvertently converted a base object to an entity object, or if you no longer
want to work with an entity object in Hierarchy Manager, you can revert the entity
object to a base object. In doing so, you are removing the HM metadata from the
object.
To revert an entity base object to a base object:
1. In the Hierarchies tool, acquire a write lock.
2. Right-click on an entity base object and choose Revert Entity/Relationship
Object to BO.
3. If the following Siperian Hub Console dialog box is displayed, click OK:

Note that when you revert the entity object, you are also reverting its corresponding
relationship objects.

Configuring Hierarchies 251


Starting the Hierarchies Tool

4. In the Revert Entity/Relationship Object dialog box, click OK.

5. A dialog is displayed when the entity is reverted.

252 Siperian Hub Administrator Guide


Configuring Hierarchies

Configuring Hierarchies
This section describes how to define hierarchies using the Hierarchies tool.

About Hierarchies
A hierarchy is a set of relationship types (as described in “About Relationships,
Relationship Objects, and Relationship Types” on page 255). These relationship types
are not ranked, nor are they necessarily related to each other. They are merely
relationship types that are grouped together for ease of classification and identification.
The same relationship type can be associated with multiple hierarchies. A hierarchy type is
a logical classification of hierarchies.

Adding Hierarchies
To add a new hierarchy:
1. In the Hierarchies tool, acquire a write lock.

2. Right-click an entity object in the navigation pane and choose Add Hierarchy.
The Hierarchies tool displays a new hierarchy (called New Hierarchy) in the
navigation tree under the Hierarchies node. The default properties are displayed in
the properties pane.

Configuring Hierarchies 253


Configuring Hierarchies

3. Specify the following properties for this new hierarchy.

Field Description
Code Unique code name of the hierarchy. Can be used as a foreign key
from HM relationship base objects.
Display name Name of this hierarchy as it will be displayed in the Hub
Console. Specify a unique, descriptive name.
Description Description of this hierarchy.

4. Click to save the new hierarchy.

Editing Hierarchies
To edit a hierarchy:
1. In the Hierarchies tool, acquire a write lock.

2. In the navigation tree, click the hierarchy to edit.


3. Click and edit the name.

4. Click to save your changes.

Warning: If your relationship object uses the hierarchy code column (instead of the
rowid column), you probably do not want to modify the hierarchy code if you already
have records for that hierarchy in the relationship object.

Deleting Hierarchies
Warning: You do not want to delete a hierarchy if you already have relationship
records that use the hierarchy. If your relationship object uses the hierarchy code
column instead of the rowid column and you have records in that relationship object
for the hierarchy you are trying to delete, you will get an error.

254 Siperian Hub Administrator Guide


Configuring Relationship Base Objects and Relationship Types

To delete a hierarchy:
1. In the Hierarchies tool, acquire a write lock.

2. In the navigation tree, right-click the hierarchy that you want to delete, and choose
Delete Hierarchy.
The Hierarchies tool prompts you to confirm deletion.
3. Choose Yes.
The Hierarchies tool removes the selected hierarchy from the list.

Note: You are allowed to delete a hierarchy that has relationship types associated with
it. There will be a warning with the list of associated relationship types. If you elect to
delete the hierarchy, all references to it will automatically be removed.

Configuring Relationship Base Objects and


Relationship Types
This section describes how to define relationship types using the Hierarchies tool.

About Relationships, Relationship Objects, and


Relationship Types
This section introduces relationships, relationship base objects, and relationship types
in Hierarchy Manager.

Relationships

A relationship describes the affiliation between two specific entities. Hierarchy Manager
relationships are defined by specifying the relationship type, hierarchy type, attributes
of the relationship, and dates for when the relationship is active.

Relationship Base Objects

A relationship base object is a base object used to store HM relationships.

Configuring Hierarchies 255


Configuring Relationship Base Objects and Relationship Types

Relationship Types

A relationship type describes classes of relationships and defines the types of entities that
a relationship of this type can include, the direction of the relationship (if any), and
how the relationship is displayed in the Hub Console.

Note: Relationship Type is a physical construct and can be configuration heavy, while
Hierarchy Type is more of a logical construct and is typically configuration light.
Therefore, it is often easier to have many Hierarchy Types than to have many
Relationship Types. Be sure to understand your data and hierarchy management
requirements prior to defining Hierarchy Types and Relationship Types within Siperian.

A well defined set of Hierarchy Manager relationship types has the following
characteristics:
• It reflects the real-world relationships between your entity types.
• It supports multiple relationship types for each relationship.

Configuring Relationship Base Objects


This section describes how to configure relationship base objects in the Hierarchies
tool.

Creating Relationship Base Objects

A relationship base object is used to store HM relationships.

To add a new relationship base object:


1. In the Hierarchies tool, acquire a write lock.

2. Right-click anywhere in the navigation pane and choose Create New


Entity/Relationship Object...

256 Siperian Hub Administrator Guide


Configuring Relationship Base Objects and Relationship Types

The Hierarchies tool prompts you to select the type of base object to create.

3. Select Create New Relationship Base Object.


4. Click OK.
The Hierarchies tool prompts you to enter information about the new relationship
base object.

Configuring Hierarchies 257


Configuring Relationship Base Objects and Relationship Types

5. Specify the following properties for this new entity base object.

Field Description
Item Type Read-only. Already specified.
Display name Name of this base object as it will be displayed in the Hub
Console.
Physical name Actual name of the table in the database. Siperian Hub will
suggest a physical name for the table based on the display name
that you enter.
Data tablespace Name of the data tablespace. To learn more, see the Siperian Hub
Installation Guide for your platform.
Index tablespace Name of the index tablespace. To learn more, see the Siperian
Hub Installation Guide for your platform.
Description Description of this base object.
Entity Base Object 1 Entity base object to be linked via this relationship base object.
Display name Name of the column that is a FK to the entity base object 1.
Physical name Actual name of the column in the database. Siperian Hub will
suggest a physical name for the column based on the display
name that you enter.
Entity Base Object 2 Entity base object to be linked via this relationship base object.
Display name Name of the column that is a FK to the entity base object 2.
Physical name Actual name of the column in the database. Siperian Hub will
suggest a physical name for the column based on the display
name that you enter.
Hierarchy FK Column Column used as the foreign key for the hierarchy; can be either
ROWID or CODE.
The ability to choose a BO Class CODE column reduces the
complexity by allowing you to define the foreign key relationship
based on a predefined code, rather than the Siperian generated
ROWID.
Hierarchy FK Display Name of this FK column as it will be displayed in the Hub
Name Console
Hierarchy FK Physical Actual name of the hierarchy foreign key column in the table.
Name Siperian Hub will suggest a physical name for the column based
on the display name that you enter.

258 Siperian Hub Administrator Guide


Configuring Relationship Base Objects and Relationship Types

Field Description
Rel Type FK Column Column used as the foreign key for the relationship; can be
either ROWID or CODE.
Rel Type Display Name of the column that is used to store the Rel Type CODE
Name or ROWID.
Rel Type Physical Actual name of the relationship type FK column in the table.
Name Siperian Hub will suggest a physical name for the column based
on the display name that you enter.

6. Click OK to save the new base object.

The relationship base object you created has the columns required by Hierarchy
Manager. You may require additional columns in the base object, which you can add
using the Schema Manager, as described in “Configuring Columns in Tables” on page
125.

Important: When you modify the base object using the Schema Manager, do not
change any of the columns added by Hierarchy Manager. Modifying any of these
columns will result in unpredictable behavior and possible data loss.

Creating a Foreign Key Relationship Base Object

A foreign key relationship base object is an entity base object with a foreign key to
another entity base object.

To create a foreign key relationship base object:


1. In the Hierarchies tool, acquire a write lock.

2. Right-click anywhere in the navigation pane and choose Create Foreign Key
Relationship.

Configuring Hierarchies 259


Configuring Relationship Base Objects and Relationship Types

The Hierarchies tool displays the Modify Existing Base Object dialog.

3. Specify the base object and the number of Foreign Key columns, then click OK.
The Hierarchies tool displays the Convert to FK Relationship Base Object dialog.

4. Specify the following properties for this new FK relationship object.

Field Description
FK Constraint Entity Select FK entity base object from list.
BO 1
Existing BO Column Name of existing base object column used for FK, or choose to
to Use create a new column.
FK Column Display Name of FK column as it will be displayed in the Hub Console.
Name 1

260 Siperian Hub Administrator Guide


Configuring Relationship Base Objects and Relationship Types

Field Description
FK Column Physical Actual name of FK column in the database. Siperian Hub will
Name 1 suggest a physical name for the table based on the display name
that you enter.
FK Column Choose Entity1 or Entity2, depending on what the FK Column
Represents represents in the relationship.

5. Click OK to save the new FK relationship object.

The base object you created has the columns required by Hierarchy Manager. You may
require additional columns in the base object, which you can add using the Schema
Manager, as described in “Configuring Columns in Tables” on page 125.

Important: When you modify the base object using the Schema Manager tool, do not
change any of the columns added by the Hierarchies tool. Modifying any of these
columns will result in unpredictable behavior and possible data loss.

For more information about foreign key relationships, see Chapter 5, “Building the
Schema.”

Converting Base Objects to Relationship Base Objects

Relationship base objects are tables that contain information about two entity base
objects.

Base objects created in MRM do not have the metadata required by Hierarchy Manager
for relationship information. In order to use these MRM base objects with Hierarchy
Manager, you must add this metadata via a conversion process. Once you have done
this, you can use these converted base objects with both MRM and HM.

Configuring Hierarchies 261


Configuring Relationship Base Objects and Relationship Types

To convert a base object to a relationship object for use with HM:


1. In the Hierarchies tool, acquire a write lock.

2. Right-click in the navigation pane and choose Convert BO to


Entity/Relationship Object.

3. Click OK.
The Convert to Relationship Base Object screen is displayed.

262 Siperian Hub Administrator Guide


Configuring Relationship Base Objects and Relationship Types

4. Specify the following properties for this base object.


Field Description
Entity Base Object 1 Entity base object to be linked via this relationship base object.
Display name Name of the column that is a FK to the entity base object 1.
Physical name Actual name of the column in the database. Siperian Hub will
suggest a physical name for the column based on the display
name that you enter.
Entity Base Object 2 Entity base object to be linked via this relationship base object.
Display name Name of the column that is a FK to the entity base object 2.
Physical name Actual name of the column in the database. Siperian Hub will
suggest a physical name for the column based on the display
name that you enter.
Hierarchy FK Column Column used as the foreign key for the hierarchy; can be either
ROWID or CODE.
The ability to choose a BO Class CODE column reduces the
complexity by allowing you to define the foreign key relationship
based on a predefined code, rather than the Siperian generated
ROWID.
Existing BO Column Actual column in the existing BO to use.
to Use
Hierarchy FK Display Name of this FK column as it will be displayed in the Hub
Name Console
Hierarchy FK Physical Actual name of the hierarchy foreign key column in the table.
Name Siperian Hub will suggest a physical name for the column based
on the display name that you enter.
Rel Type FK Column Column used as the foreign key for the relationship; can be
either ROWID or CODE.
Existing BO Column Actual column in the existing BO to use.
to Use
Rel Type FK Display Name of the FK column that is used to store the Rel Type
Name CODE or ROWID.
Rel Type FK Physical Actual name of the relationship type FK column in the table.
Name Siperian Hub will suggest a physical name for the column based
on the display name that you enter.

Configuring Hierarchies 263


Configuring Relationship Base Objects and Relationship Types

5. Click OK.

Warning: When you modify the base object using the Schema Manager tool, do not
change any of the columns added by HM. Modifying any of these HM columns will
result in unpredictable behavior and possible data loss.

Reverting Relationship Base Objects to Base Objects

This removes HM metadata from the relationship object. The relationship object
remains as a base object, but is no longer displayed in the Hierarchy Manager.

To revert a relationship object to a base object:


1. In the Hierarchies tool, acquire a write lock.

2. Right-click on a relationship base object and choose Revert Entity/Relationship


Object to BO.
3. In the Revert Entity/Relationship Object dialog box, click OK.

4. A dialog is displayed when the entity is reverted.

264 Siperian Hub Administrator Guide


Configuring Relationship Base Objects and Relationship Types

Configuring Relationship Types


This section describes how to configure relationship types in the Hierarchies tool.

Adding Relationship Types

To add a new relationship type:


1. In the Hierarchies tool, acquire a write lock.

2. Right-click on a relationship object and choose Add Relationship Type.


The Hierarchies tool displays a new relationship type (called New Rel Type) in
the navigation tree under the Relationship Types node. The default properties are
displayed in the properties pane.

Note: You can only save a relationship type if you associate it with a hierarchy.

An Foreign Key Relationship Base Object is an Entity Base Object containing a foreign
key to another Entity Base Object. A Relationship Base Object is a table that relates the
two Entity Base Objects.

Note: FK relationship types can only be associated with a single hierarchy.

Configuring Hierarchies 265


Configuring Relationship Base Objects and Relationship Types

3. The properties panel displays the properties you must enter to create the
relationship.

4. In the properties panel, specify the following properties for this new relationship
type.

Field Description
Code Unique code name of the rel type. Can be used as a foreign key
from HM relationship base objects.
Display name Name of this relationship type as it will be displayed in the Hub
Console. Specify a unique, descriptive name.
Description Description of this relationship type.
Color Color of the relationships associated with this relationship type
as they will be displayed in the Hub Console in the Hierarchy
Manager Console and Business Data Director.
Entity Type 1 First entity type associated with this new relationship type.
Any entities of this type will be able to have relationships of this
relationship type.
Entity Type 2 Second entity type associated with this new relationship type.
Any entities of this type will be able to have relationships of this
relationship type.

266 Siperian Hub Administrator Guide


Configuring Relationship Base Objects and Relationship Types

Field Description
Direction Select a direction for the new relationship type to allow a
directed hierarchy. The possible directions are:
• Entity 1 to Entity 2
• Entity 2 to Entity 1
• Undirected
• Bi-Directional
• Unknown
An example of a directed hierarchy is an organizational chart,
with the relationship reports to being directed from employee to
supervisor, and so on, up to the head of the organization.
FK Rel Start Date The start date of the foreign key relationship.
FK Rel End Date The end date of the foreign key relationship.
Hierarchies Check the check box next to any hierarchy that you want
associated with this new relationship type. Any selected
hierarchies can contain relationships of this relationship type.

5. Click next to Color to designate a color for this entity type.


The color choices window is displayed.

Configuring Hierarchies 267


Configuring Relationship Base Objects and Relationship Types

The color you choose determines how entities of this type are displayed in the
Hierarchy Manager. Select a color and click OK.
6. Click the Calendar button to designate a start and end date for a foreign key
relationship. All relationships of this FK relationship type will have the same start
and end date. If you do not specify these dates, the default values are automatically
added.
7. Select a hierarchy.
8. Click to save the new relationship type.

Editing Relationship Types

To edit a relationship type:


1. In the Hierarchies tool, acquire a write lock.

2. In the navigation tree, click the relationship type that you want to edit.
3. For each field that you want to edit, click and make the change that you want.
To learn more about these fields, see “Adding Relationship Types” on page 265.
4. When you have finished making changes, click to save your changes.

Warning: If your relationship object uses the code column, you probably do not want
to modify the relationship type code if you already have records for that relationship
type.

This warning does not apply to FK relationship types.

Deleting Relationship Types

Warning: You probably do not want to delete a relationship type if you already have
relationship records that use the relationship type. If your relationship object uses the
relationship type code column instead of the rowid column and you have records in
that relationship object for the relationship type you are trying to delete, you will get an
error.

268 Siperian Hub Administrator Guide


Configuring Packages for Use by HM

The above warnings are not applicable to FK relationship types.You can delete
relationship types that are associated with hierarchies. The confirmation dialog displays
the hierarchies associated with the relationship type being deleted.

To delete a relationship type:


1. In the Hierarchies tool, acquire a write lock.

2. In the navigation tree, right-click the relationship type that you want to delete, and
choose Delete Relationship Type.
The Hierarchies tool prompts you to confirm deletion.
3. Choose Yes.
The Hierarchies tool removes the selected relationship type from the list.

Configuring Packages for Use by HM


This section describes how to add MRM packages to your schema using the
Hierarchies tool. You can create MRM packages for entity base objects, relationship
base objects, and foreign key relationship base objects. If records will be inserted or
changed in the package, be sure to enable the Put option.

About Packages
As described in Chapter 6, “Configuring Queries and Packages,” package is a public
view of one or more underlying tables in Siperian Hub. Packages represent subsets of
the columns in those tables, along with any other tables that are joined to the tables. A
package is based on a query. The underlying query can select a subset of records from
the table or from another package. Packages are used for configuring user views of the
underlying data. For more information, see “Configuring Queries and Packages” on
page 161.

You must first create a package to use with Hierarchy Manager, then you must
associate it with Entity Types or Relationship Types.

Configuring Hierarchies 269


Configuring Packages for Use by HM

Creating Packages
This section describes how to create HM and Relationship packages.

Creating Entity, Relationship, and FK Relationship Object


Packages

To create an HM package:
1. Acquire a write lock.

2. In the Hierarchies tool, right-click anywhere in the navigation pane and choose
Create New Package.
The Hierarchies tool starts the Create New Package wizard and displays the first
dialog box.

270 Siperian Hub Administrator Guide


Configuring Packages for Use by HM

3. Specify the following information for this new package.

Field Description
Type of Package One of the following types:
• Entity Object
• Relationship Object
• FK Relationship Object
Query Group Select an existing query group or choose to create a new one. In
Siperian Hub, query groups are logical groups of queries. For
more information, see “Configuring Query Groups” on page
164.
Query group name Name of the new query group - only needed if you chose to
create a new group above.
Description Optional description for the new query group you are creating.

4. Click Next.
The Create New Package wizard displays the next dialog box.

Configuring Hierarchies 271


Configuring Packages for Use by HM

5. Specify the following information for this new package.

Field Description
Query Name Name of the query. In Siperian Hub, a query is a request to
retrieve data from the Hub Store. For more information, see
“Configuring Queries” on page 166.
Description Optional description.
Select Primary Table Primary table for this query.

6. Click Next.
The Create New Package wizard displays the next dialog box.

272 Siperian Hub Administrator Guide


Configuring Packages for Use by HM

7. Specify the following information for this new package.

Field Description
Display Name Display name for this package, which will be used to display this
package in the Hub Console.
Physical Name Physical name for this package. The Hub Console will suggest a
physical name based on the display name you entered.
Description Optional description.
Enable PUT Select to enable records to be inserted or changed. (optional)
If you do not choose this, your package will be read only. If you
are creating a foreign key relationship object package, you have
additional steps in Step 9 of this procedure.
Note: You must have both a PUT and a non-PUT package for
every Foreign Key relationship. Both Put and non-Put packages
that you create for the same foreign key relationship object must
have the same columns.
Secure Resource Select to create a secure resource. (optional)

8. Click Next.
The Create New Package wizard displays a final dialog box. The dialog box you see
depends on the type of package you are creating.
• If you selected to create either a package for entities or relationships or a PUT
package for FK relationships, a dialog box similar to the following dialog box
is displayed. The required columns (shown in grey) are automatically selected
— you cannot deselect them.

Configuring Hierarchies 273


Configuring Packages for Use by HM

Deselect the columns that are irrelevant to your package.

Note: You must have both a PUT and a non-PUT package for every Foreign Key
relationship. Both Put and non-Put packages that you create for the same foreign key
relationship object must have the same columns.
• If you selected to create a non-Put enabled package for foreign key
relationships (see Step 7 of this procedure - do not check the Put check box),
the following dialog box is displayed:

274 Siperian Hub Administrator Guide


Configuring Packages for Use by HM

9. If you are creating a non-Put enabled package for foreign key relationships, specify
the following information for this new package.
Field Description
Hierarchy Hierarchy associated with this package. For more information,
see “Configuring Hierarchies” on page 253.
Relationship Type Relationship type associated with this package. For more
information, see “Configuring Relationship Base Objects and
Relationship Types” on page 255.

Note: You must have both a PUT and a non-PUT package for every Foreign Key
relationship. Both Put and non-Put packages that you create for the same foreign key
relationship object must have the same columns.
10. Select the columns for this new package.

11. Click Finish to create the package.

Use the Packages tool to view, edit, or delete this newly-created package, as described
in “Configuring Packages” on page 196.

You should not remove columns that are needed by Hierarchy Manager. These
columns are automatically selected (and greyed out) when the user creates packages
using the Hierarchies tool.

After You Create a Package


After creating a package, assign that package to an entity or relationship type.

Assigning Packages to Entity or Relationship Types


After you create a profile, and a package for each of the entity/relationship types in a
profile, you must assign the packages. This defines what fields are displayed when an
entity is displayed in HM. To learn more, see “Customizing the Hub Console
Interface” on page 45. You can also assign a package for relationship types and entity
types.

Configuring Hierarchies 275


Configuring Packages for Use by HM

To assign a package to an entity/relationship type:


1. Acquire a write lock.

2. In the Hierarchies tool, select the Entity/Relationship Type.


The Hierarchy Manager displays the properties for the Package for that type if they
exist, or the same properties pane with empty fields. When you make the display
and Put package selections, the HM package column information is displayed in
the lower panel.

The numbers in the cells define the sequence in which the attributes are displayed.
3. Configure the package for your entity or relationship type.

Label Columns used to display the label of the entity/relationship you are
viewing in the HM graphical console. These columns are used to
create the Label Pattern in the Hierarchy Manager Console and
Business Data Director.
To edit a label, click the label value to the right of the label. In the
Edit Pattern dialog, enter a new label or double-click a column to use
it in a pattern.
Tooltip Columns used to display the description or comment that appears
when you scroll over the entity/relationship. Used to create the
tooltip pattern in the Hierarchy Manager Console and Business Data
Director.
To edit a tooltip, click the tooltip pattern value to the right of the
Tooltip Pattern label. In the Edit Pattern dialog, enter a new tooltip
pattern or double-click a column to use it in a pattern.

276 Siperian Hub Administrator Guide


Configuring Packages for Use by HM

Label Columns used to display the label of the entity/relationship you are
viewing in the HM graphical console. These columns are used to
create the Label Pattern in the Hierarchy Manager Console and
Business Data Director.
To edit a label, click the label value to the right of the label. In the
Edit Pattern dialog, enter a new label or double-click a column to use
it in a pattern.
Common Columns used when entities/relationships of different types are
displayed in the same list. The selected columns must be in packages
associated with all Entity/Relationship Types in the Profile.
Search Columns that can be used with the search tool
List Columns to be displayed in a search result
Detail Columns used for the detailed view of an entity/relationship
displayed at the bottom of the screen
Put Columns that are displayed when you want to edit a record
Add Columns that are displayed when you want to create a new record

4. When you have finished making changes, click to save your changes.

Configuring Hierarchies 277


Configuring Profiles

Configuring Profiles
This section describes how to configure profiles using the Hierarchies tool.

About Profiles
In Hierarchy Manager, a profile is used to define user access to HM objects—what users
can view and what the HM objects look like to those users. A profile determines what
fields and records an HM user may display, edit, or add. For example, one profile can
allow full read/write access to all entities and relationships, while another profile can be
read-only (no add or edit operations allowed). Once you define a profile, you can
configure it as a secure resource, as described in “Securing Siperian Hub Resources” on
page 841.

Adding Profiles
A new profile (called Default) is created automatically for you before you access the
HM. The default profile can be maintained, and you can also add additional profiles.

Note: The Business Data Director uses the Default Profile to define how Entity
Labels as well as Relationship and Entity Tooltips are displayed. Additional Profiles, as
well as the additional information defined within Profiles, is only used within the
Hierarchy Manager Console and not the Business Data Director.

To add a new profile:


1. Acquire a write lock.

2. In the Hierarchy tool, right-click anywhere in the navigation pane and choose Add
Profiles.

278 Siperian Hub Administrator Guide


Configuring Profiles

The Hierarchies tool displays a new profile (called New Profile) in the navigation
tree under the Profiles node. The default properties are displayed in the properties
pane.

When you select these relationship types and click Save, the tree below the Profile
will be populated with Entity Objects, Entity Types, Rel Objects and Rel Types.
When you deselect a Rel type, only the Rel types will be removed from the tree -
not the Entity Types.
3. Specify the following information for this new profile.

Field Description
Name Unique, descriptive name for this profile.
Description Description of this profile.
Relationship Types Select one or more relationship types associated with this profile.

4. Click to save the new profile.


The Hierarchies tool displays information about the relationship types you selected
in the References section of the screen. Entity types are also displayed. This
information is derived from the relationship types you selected.

Configuring Hierarchies 279


Configuring Profiles

Editing Profiles
To edit a profile:
1. Acquire a write lock.

2. In the Hierarchies tool, in the navigation tree, click the profile that you want to
edit.
3. Configure the profile as needed (specifying the appropriate profile name,
description, and relationship types and assigning packages), according to the
instructions in “Adding Profiles” on page 278 and “Configuring Packages for Use
by HM” on page 269.
4. When you have finished making changes, click to save your changes.

Validating Profiles
To validate a profile:
1. Acquire a write lock.

2. In the Hierarchies tool, in the navigation pane, select the profile to validate.

3. In the properties pane, click the Validate tab.


Note: Profiles can be successfully validated only after the packages are assigned to
Entity Types and Relationship Types.

280 Siperian Hub Administrator Guide


Configuring Profiles

The Hierarchies tool displays the Validate tab.

4. Select a sandbox to use.


For information about creating and configuring sandboxes, see the Siperian Hub
Data Steward Guide.
5. To validate the data, check Validate Data. This may take a long time if you have a
lot of records.
6. To start the validation process, click Validate HM Configuration.

Configuring Hierarchies 281


Configuring Profiles

The Hierarchies tool displays a progress window during the validation process. The
results of the validation appear in the window below the buttons.

7. When the validation is finished, click Save.


8. Choose the directory where the validation report will be saved.
9. Click Clear to clear the box containing the description of the validation results.

Copying Profiles
To copy a profile:
1. Acquire a write lock.

2. In the Hierarchies tool, right-click the profile that you want to copy, and then
choose Copy Profile.
The Hierarchies tool displays a new profile (called New Profile) in the navigation
tree under the Profiles node. This new profile that is an exact copy (with a

282 Siperian Hub Administrator Guide


Configuring Profiles

different name) of the profile that you selected to copy. The default properties are
displayed in the properties pane.

3. Configure the profile as needed (specifying the appropriate profile name,


description, relationship types, and assigning packages), according to the
instructions in “Adding Profiles” on page 278.
4. Click to save the new profile.

Deleting Profiles
To delete a profile:
1. Acquire a write lock.

2. In the Hierarchies tool, right-click the profile that you want to delete, and choose
Delete Profile.
The Hierarchies tool displays a window that warns that packages will be removed
when you delete this profile.
3. Click Yes.
The Hierarchies tool removes the deleted profile.

Configuring Hierarchies 283


Sandboxes

Deleting Relationship Types from a Profile


To delete a relationship type:
1. Acquire a write lock.

2. In the Hierarchy tool, right-click the relationship type and choose Delete Entity
Type/Relationship Type From Profile.
If the profile contains relationship types that use the entity/relationship type that
you want to delete, you will not be able to delete it unless you delete the
relationship type from the profile first.

Deleting Entity Types from a Profile


To delete an entity type:
1. Acquire a write lock.

2. In the Hierarchy tool, right-click the entity type and choose Delete Entity
Type/Relationship Type From Profile.
If the profile contains relationship types that use the entity type that you want to
delete, you will not be able to delete it unless you delete the relationship type from
the profile first.

Assigning Packages to Entity and Relationship Types


After you create a profile, you must:
• Assign packages to the entity types and relationship types associated with the
profile. To learn more, see “Assigning Packages to Entity or Relationship Types”
on page 275.

Configure the package as a secure resource. To learn more, see “Securing Siperian Hub
Resources” on page 841.

Sandboxes
To learn about sandboxes, see the Hierarchy Manager chapter in the Siperian Hub Data
Steward Guide.

284 Siperian Hub Administrator Guide


Part 3
Configuring the Data Flow

Contents
• Chapter 9, “Siperian Hub Processes”
• Chapter 10, “Configuring the Land Process”
• Chapter 11, “Configuring the Stage Process”
• Chapter 12, “Configuring Data Cleansing”
• Chapter 13, “Configuring the Load Process”
• Chapter 14, “Configuring the Match Process”
• Chapter 15, “Configuring the Consolidate Process”
• Chapter 16, “Configuring the Publish Process”

285
286 Siperian Hub Administrator Guide
9
Siperian Hub Processes

This chapter provides an overview of the processes associated with batch processing in
Siperian Hub, including key concepts, tasks, and references to related topics in the
Siperian Hub documentation.

Chapter Contents
• About Siperian Hub Processes
• Land Process
• Stage Process
• Load Process
• Match Process
• Consolidate Process
• Publish Process

Before You Begin


Before you begin, you should be thoroughly familiar with the concepts of
reconciliation, distribution, best version of the truth (BVT), and batch processing that
are described in Chapter 3, “Key Concepts,” in the Siperian Hub Overview.

287
About Siperian Hub Processes

About Siperian Hub Processes


With batch processing in Siperian Hub, data flows through Siperian Hub in a sequence
of individual processes.

Overall Data Flow for Batch Processes


The following figure provides a detailed perspective on the overall flow of data through
the Siperian Hub using batch processes, including individual processes, source systems,
base objects, and support tables.

Note: The publish process is not shown in this figure because it is not a batch process.

288 Siperian Hub Administrator Guide


About Siperian Hub Processes

Consolidation Status for Base Object Records


This section describes the consolidation status of records in a base object.

Consolidation Indicator

All base objects have a system column named CONSOLIDATION_IND. This


consolidation indicator represents the consolidation status of individual records as they
progress through various processes in Siperian Hub.

The consolidation indicator is one of the following values:

Indicator
Value State Name Description
1 CONSOLIDATED Indicates the record has been through the
match and merge process.
2 UNMERGED Indicates that the record has gone through the
match process.
3 QUEUED_FOR_MATCH Indicates that the record is ready to be put
through the match process against the rest of
the records in the base object.
4 NEWLY_LOADED Indicates that the record has been newly loaded
into the base object and has not gone through
the match process.

Siperian Hub Processes 289


About Siperian Hub Processes

Indicator
Value State Name Description
9 ON_HOLD Indicates that the Data Steward has put the
record on hold, to deal with later.

How the Consolidation Indicator Changes

Siperian Hub updates the consolidation indicator for base object records in the
following sequence.
1. During the load process, when a new or updated record is loaded into a base
object, Siperian Hub assigns the record a consolidation indicator of 4, indicating
that the record needs to be matched.
2. Near the start of the match process, when a record is selected as a match
candidate, the match process changes its consolidation indicator to 3.
Note: Any change to the match or merge configuration settings will trigger a reset
match dialog, asking whether you want to reset the records in the base object
(change the consolidation indicator to 4, ready for match). For more information,
see Chapter 14, “Configuring the Match Process,” and Chapter 15, “Configuring
the Consolidate Process.”
3. Before completing, the match process changes the consolidation indicator of
match candidate records to 2 (ready for consolidation).
Note: The match process may or may not have found matches for the record.
A record with a consolidation indicator of 2 or 4 is visible in Merge Manager.
For more information, see the Siperian Hub Data Steward Guide.
4. If Accept All Unmatched Rows as Unique is enabled, and a record has undergone
the match process but no matches were found, then Siperian Hub automatically
changes its consolidation indicator to 1 (unique). For more information, see
“Accept All Unmatched Rows as Unique” on page 492.
5. If Accept All Unmatched Rows as Unique is enabled, after the record has
undergone the consolidate process, and once a record has no more duplicates to
merge with, Siperian Hub changes its consolidation indicator to 1, meaning that
this record is unique in the base object, and that it represents the master record
(best version of the truth) for that entity in the base object.

290 Siperian Hub Administrator Guide


About Siperian Hub Processes

Note: Once a record has its consolidation indicator set to 1, Siperian Hub will
never directly match it against any other record. New or updated records (with a
consolidation indicator of 4) can be matched against consolidated records.

Survivorship and Order of Precedence


When evaluating cells to merge from two records, Siperian Hub determines which cell
data should survive and which one should be discarded. The surviving cell data (or winning
cell) is considered to represent the better version of the truth between the two cells.
Ultimately, a single, consolidated record contains the best surviving cell data and
represents the best version of the truth.

Survivorship applies to both trust-enabled columns and columns that are not trust
enabled. When comparing cells from two different records, Siperian Hub determines
survivorship based on the following factors, in order of precedence:
1. If the two columns are trust-enabled, then the data with the highest trust score
wins.
2. If there are no trust scores, then the data with the more recent LAST_UPDATE_
DATE wins.
3. If trust scores are the same from both systems, then the data with the more recent
cross-reference SRC_LUD wins.
4. If the SRC_LUD values are equal, then Siperian Hub compares whether the record
is an incoming load update (applies to the load process only).
5. If both records are incoming load updates, then Siperian Hub compares the
LAST_UPDATE_DATE values in the associated cross-reference records and the
one with the more recent LAST_UPDATE_DATE wins.
6. If the LAST_UPDATE_DATE values are equal, then Siperian Hub compares the
ROWID_OBJECT, in numeric descending order. The highest ROWID_OBJECT
has the winning values.

Siperian Hub Processes 291


Land Process

Land Process

This section describes concepts and tasks associated with the land process in Siperian
Hub.

About the Land Process


Landing data is the initial step for loading data into Siperian Hub.

Source Systems and Landing Tables

Landing data involves the transfer of data from one or more source systems to Siperian
Hub landing tables.

• A source system is an external system that provides data to Siperian Hub. Source
systems can be applications, data stores, and other systems that are internal to your
organization, or obtained or purchased from external sources. For more
information, see “About Source Systems” on page 348.
• A landing table is a table in the Hub Store that contains the data that is initially
loaded from a source system. For more information, see “About Landing Tables”
on page 355.

292 Siperian Hub Administrator Guide


Land Process

Data Flow of the Land Process

The following figure shows the land process in relation to other Siperian Hub
processes.

Land Process is External to Siperian Hub

The land process is external to Siperian Hub and is executed using an external batch
process or an external application that directly populates landing tables in the Hub
Store. Subsequent processes for managing data are internal to Siperian Hub.

Siperian Hub Processes 293


Land Process

Ways to Populate Landing Tables

Landing tables can be populated in the following ways:

Load Method Description


external batch process An ETL (Extract-Transform-Load) tool or other external process
copies data from a source system to Siperian Hub. Batch loads are
external to Siperian Hub. Only the results of the batch load are visible
to Siperian Hub in the form of populated landing tables.
Note: This process is handled by a separate ETL tool of your choice.
This ETL tool is not part of the Siperian Hub suite of products.
real-time processing External applications can populate landing tables in on-line, real-time
mode. Such applications are not part of the Siperian Hub suite of
products.

For any given source system, the approach used depends on whether it is the most
efficient—or perhaps the only—way to data from a particular source system. In
addition, batch processing is often used for the initial data load (the first time that
business data is loaded into the Hub Store), as it can be the most efficient way to
populate the landing table with a large number of records. For more information, see
“Initial Data Loads and Incremental Loads” on page 302.

Note: Data in the landing tables cannot be deleted until after the load process for the
base object has been executed and completed successfully.

Managing the Land Process


To manage the land process, refer to the following topics in this documentation:

Task Topic(s)
Configuration Chapter 10, “Configuring the Land Process”
• “Configuring Source Systems” on page 348
• “Configuring Landing Tables” on page 355
Execution Execution of the land process is external to Siperian Hub and
depends on the approach you are using to populate landing tables, as
described in “Ways to Populate Landing Tables” on page 294.

294 Siperian Hub Administrator Guide


Stage Process

Task Topic(s)
Application If you are using external application(s) to populate landing tables, see
Development the developer documentation for the API used by your application(s).

Stage Process

This section describes concepts and tasks associated with the stage process in Siperian
Hub.

About the Stage Process


The stage process transfers data from a populated landing table to the staging table
associated with a particular base object or dependent object.

Data is transferred according to mappings that link a source column in the landing
table with a target column in the staging table. Mappings also define data cleansing, if
any, to perform on the data before it is saved in the target table.

If delta detection is enabled (see “Configuring Delta Detection for a Staging Table” on
page 401), Siperian Hub detects which records in the landing table are new or updated
and then copies only these records, unchanged, to the corresponding RAW table.
Otherwise, all records are copied to the target table. Records with obvious problems in

Siperian Hub Processes 295


Stage Process

the data are rejected and stored in a corresponding reject table, which can be inspected
after running the stage process (see “Viewing Rejected Records” on page 685).

Data from landing tables can be distributed to multiple staging tables. However, each
staging table receives data from only one landing table.

The stage process prepares data for the load process, described in “Load Process” on
page 299, which subsequently loads data from the staging table into a target
table—either a base object or a dependent object.

Data Flow of the Stage Process

The following figure shows the stage process in relation to other Siperian Hub
processes.

296 Siperian Hub Administrator Guide


Stage Process

Tables Associated With the Stage Process

The following tables in the Hub Store are associated with the stage process.

Type of Table Description


landing table Contains data that is copied from a source system. For more information,
see “About the Land Process” on page 292 and “About Landing Tables”
on page 355.
staging table Contains data that was accepted and copied from the landing table during
the stage process. For more information, see “About Staging Tables” on
page 364.
raw table Contains data that was archived from landing tables. Raw data can be
configured to get archived based on the number of loads or the duration
(specific time interval). For more information, see “Configuring the Audit
Trail for a Staging Table” on page 399 and “Configuring Delta Detection
for a Staging Table” on page 401.
reject table Contains records that Siperian Hub has rejected for a specific reason.
Records in these tables will not be loaded into base objects and dependent
objects. Data gets rejected automatically during Stage jobs for the
following reasons:
• future date or NULL date in the LAST_UPDATE_DATE column
• NULL value mapped to the PKEY_SRC_OBJECT of the staging
table
• duplicates found in PKEY_SRC_OBJECT
• invalid value in the HUB_STATE_IND field (for state-enabled base
objects only)
• duplicate value found in a unique column
The rejects table is associated with the staging table (called
stagingTableName_REJ). Rejected records can be inspected after running
Stage jobs (see “Viewing Rejected Records” on page 685).

Siperian Hub Processes 297


Stage Process

Managing the Stage Process


To manage the stage process, refer to the following topics in this documentation:

Task Topic(s)
Configuration Chapter 11, “Configuring the Stage Process”
• “Configuring Staging Tables” on page 364
• “Mapping Columns Between Landing and Staging Tables” on
page 380
• “Using Audit Trail and Delta Detection” on page 398
Chapter 12, “Configuring Data Cleansing”
• “Configuring Cleanse Match Servers” on page 407
• “Using Cleanse Functions” on page 414
• “Configuring Cleanse Lists” on page 440
Execution Chapter 17, “Using Batch Jobs”
• “Stage Jobs” on page 745
Chapter 18, “Writing Custom Scripts to Execute Batch Jobs”
• “Stage Jobs” on page 795
Application Siperian Services Integration Framework Guide
Development

298 Siperian Hub Administrator Guide


Load Process

Load Process

This section describes concepts and tasks associated with the load process in Siperian
Hub. For related tasks, see “Managing the Load Process” on page 316.

About the Load Process


In Siperian Hub, the load process moves data from a staging table to the corresponding
target table (the base object or dependent object to which the staging table belongs) in
the Hub Store.

The load process determines what to do with the data in the staging table based on:
• whether the target table is a base object or dependent object
• whether a corresponding record already exists in the target table and, if so, whether
the record in the staging table has been updated since the load process was last run
• whether trust is enabled for certain columns (base objects only); if so, the load
process calculates trust scores for the cell data
• whether the data is valid to load; if not, the load process rejects the record instead
• other configuration settings

Siperian Hub Processes 299


Load Process

Data Flow for the Load Process


The following figure shows the load process in relation to other Siperian Hub
processes.

300 Siperian Hub Administrator Guide


Load Process

Tables Associated with the Load Process


In addition to base objects and dependent objects, the following tables in the Hub
Store are associated with the load process.

Type of Table Description


staging table Contains the data that was accepted and copied from the landing table
during the stage process. For more information, see “Stage Process” on
page 295 and “About Staging Tables” on page 364.
cross-reference Used for tracking the lineage of data—the source system for each record
table in the base object. For each source system record that is loaded into the
base object, Siperian Hub maintains a record in the cross-reference table
that includes:
• an identifier for the system that provided the record
• the primary key value of that record in the source system
• the most recent cell values provided by that system
Each base object record will have one or more cross-reference records.
For more information, see “Cross-Reference Tables” on page 97.
history tables If history is enabled for the base object, and records are updated or
inserted, then the load process writes to this information into two tables:
• base object history table
• cross-reference history table
For more information, see “History Tables” on page 100.
reject table Contains records from the staging table that the load process has rejected
for a specific reason. Rejected records will not be loaded into base objects
or dependent objects. The reject table is associated with the staging table
(called stagingTableName_REJ). For more information, see “Rejected
Records in Load Jobs” on page 314. Rejected records can be inspected
after running Load jobs (see “Viewing Rejected Records” on page 685).

Siperian Hub Processes 301


Load Process

Initial Data Loads and Incremental Loads


The initial data load (IDL) is the very first time that data is loaded into a newly-created,
empty base object.

During the initial data load, all records in the staging table are inserted into the base
object as new records. For more information, see “Load Inserts” on page 306.

Once the initial data load has occurred for a base object, any subsequent load processes
are called incremental loads because only new or updated data is loaded into the base
object.

Duplicate data is ignored. For more information, see “Run-time Execution Flow of the
Load Process” on page 304.

302 Siperian Hub Administrator Guide


Load Process

Trust Settings and Validation Rules


Siperian Hub uses trust and validation rules to help determine the most reliable data.

Trust Settings

If a column in a base object derives its data from multiple source systems, Siperian Hub
uses trust to help with comparing the relative reliability of column data from different
source systems. For example, the Orders system might be a more reliable source of
billing addresses than the Direct Marketing system.

Trust is enabled and configured at the column level. For example, you can specify a
higher trust level for Customer Name in the Orders system and for Phone Number in
the Billing system.

Trust provides a mechanism for measuring the relative confidence factor associated
with each cell based on its source system, change history, and other business rules.

Siperian Hub Processes 303


Load Process

Trust takes into account the quality and age of the cell data, and how its reliability
decays (decreases) over time. Trust is used to determine survivorship (when two
records are consolidated) and whether updates from a source system are sufficiently
reliable to update the master record. For more information, see “Survivorship and
Order of Precedence” on page 291 and “Configuring Trust for Source Systems” on
page 455.

Data stewards can manually override a calculated trust setting if they have direct
knowledge that a particular value is correct. Data stewards can also enter a value
directly into a record in a base object. For more information, see the Siperian Hub Data
Steward Guide.

Validation Rules

Trust is often used in conjunction with validation rules, which might downgrade (reduce)
trust scores according to configured conditions and actions. For more information, see
“Configuring Validation Rules” on page 468.

When data meets the criterion specified by the validation rule, then the trust value for
that data is downgraded by the percentage specified in the validation rule. For example:
Downgrade trust on First_Name by 50% if Length < 3
Downgrade trust on Address Line 1, City, State, Zip and Valid_
address_ind if Valid_address_ind= ‘False’

If the Reserve Minimum Trust flag is enabled (checked) for a column, then the trust
cannot be downgraded below the column’s minimum trust setting.

Run-time Execution Flow of the Load Process


This section provides a detailed explanation of what can occur during the load process
based on configured settings as well as characteristics of the data being processed. This
section describes the default behavior of the Siperian Hub load process. Alternatively,
for incremental loads, you can streamline load, match, and merge processing by loading
by RowID, as described in “Loading by RowID” on page 394.

304 Siperian Hub Administrator Guide


Load Process

Loading Records by Batch

The load process handles staging table records in batches. For each base object, the
load batch size setting (see “Load Batch Size” on page 103) specifies the number of
records to load per batch cycle (default is 1000000).

During execution of the load process for a base object, Siperian Hub creates a
temporary table (_TLL) for each batch as it cycles through records in the staging table.
For example, suppose the staging table contained 250 records to load, and the load
batch size were set to 100. During execution, the load process would:
• create a TLL table and process the first 100 records
• drop and create the TLL table and process the second 100 records
• drop and create the TLL table and process the remaining 50 records
• drop and create the TLL table and stop executing because the TLL table contained
no records

Determining Whether Records Already Exist

During the load process, Siperian Hub first checks to see whether the record has the
same primary key as an existing record from the same source system. It compares each
record in the staging table with records in the target table to determine whether it
already exists in the target table.

What occurs next depends on the results of this comparison.

Siperian Hub Processes 305


Load Process

Load Operation Description


load insert If a record in the staging table does not already exist in the target table,
then Siperian Hub inserts that new record in the target table.
load update If a record in the staging table already exists in the target table, then
Siperian Hub takes the appropriate action. A load update occurs if the
target table (base object or dependent object) gets updated with data in a
record from the staging table. The load process updates a record only if
it has changed since the record was last supplied by the source system.
Load updates are governed by current Siperian Hub configuration
settings and characteristics of the data in each record in the staging table.
For example, if Force Update is enabled (see “Forcing Updates in Load
Jobs” on page 730), the records will be updated regardless of whether
they have already been loaded.

During the load process, load updates are executed first, followed by load inserts.

Load Inserts

What happens during a load insert depends on the target table (base object or
dependent object) and other factors.

306 Siperian Hub Administrator Guide


Load Process

Load Inserts and Target Base Objects

To perform a load insert for a record in the staging table:


• The load process generates a unique ROWID_OBJECT value for the new record.
• The load process performs foreign key lookups and substitutes any foreign key
value(s) required to maintain referential integrity. For more information, see
“Performing Lookups Needed to Maintain Referential Integrity” on page 312.
• The load process inserts the record into the base object, and copies into this new
record the generated ROWID_OBJECT value (as the primary key for this record
in the base object), any foreign key lookup values, and all of the column data from
the staging table (except PKEY_SRC_OBJECT)—including null values.
The base object may have multiple records for the same object (for example, one
record from source system A and another from source system B). Siperian Hub
flags both new records as new.

Siperian Hub Processes 307


Load Process

• For each new record in the base object, the load process sets its DIRTY_IND to 1
so that match keys can be regenerated during the tokenization process, as
described in “Base Object Records Flagged for Tokenization” on page 323.
• For each new record in the base object, the load process sets its
CONSOLIDATION_IND to 4 (ready for match) so that the new record can
matched to other records in the base object. For more information, see
“Consolidation Status for Base Object Records” on page 289.
• The load process inserts a record into the cross-reference table associated with the
base object. The load process generates a primary key value for the cross-reference
table, then copies into this new record the generated key, an identifier for the
source system, and the columns in the staging table (including PKEY_SRC_
OBJECT). For more information, see “Cross-Reference Tables” on page 97.
Note: The base object does not contain the primary key value from the source
system. Instead, the base object’s primary key is the generated ROWID_OBJECT
value. The primary key from the source system (PKEY_SRC_OBJECT) is stored
in the cross-reference table instead.
• If history is enabled for the base object (see “History Tables” on page 100), then
the load process inserts a record into its history and cross-reference history tables.
• If trust is enabled for one or more columns in the base object, then the load
process also inserts records into control tables that support the trust algorithms,
populating the elements of trust and validation rules for each trusted cell with the
values used for trust calculations. This information can be used subsequently to
calculate trust when needed. For more information, see “Configuring Trust for
Source Systems” on page 455 and “Control Tables for Trust-Enabled Columns”
on page 457.
• If Generate Match Tokens on Load is enabled for a base object (see “Generate
Match Tokens on Load” on page 104), then the tokenization process is
automatically started after the load process completes.

Load Inserts and Target Dependent Objects

For load inserts into target dependent objects, the load process:
• inserts the new record into the dependent object
• substitutes any foreign keys required to maintain referential integrity

308 Siperian Hub Administrator Guide


Load Process

Load Updates

What happens during a load update depends on the target table (base object or
dependent object) and other factors.

Load Updates and Target Base Objects

For load updates on target base objects:


• By default, for each record in the staging table, the load process compares the
value in the LAST_UPDATE_DATE column with the source last update date
(SRC_LUD) in the associated cross-reference table.

• If the record in the staging table has been updated since the last time the
record was supplied by the source system, then the load process proceeds with
the load update.
• If the record in the staging table is unchanged since the last time the record
was supplied by the source system, then the load process ignores the record (no
action is taken) if the dates are the same and trust is not enabled, or rejects the
record if it is a duplicate.
Administrators can change the default behavior so that the load process bypasses
this LAST_UPDATE_DATE check and forces an update of the records regardless
of whether the records might have already been loaded. For more information, see
“Forcing Updates in Load Jobs” on page 730.

Siperian Hub Processes 309


Load Process

• The load process performs foreign key lookups and substitutes any foreign key
value(s) required to maintain referential integrity. For more information, see
“Performing Lookups Needed to Maintain Referential Integrity” on page 312.
• If the target base object has trust-enabled columns, then the load process:
• calculates the trust score for each trust-enabled column in the record to be
updated, based on the configured trust settings for this trusted column (as
described in “Configuring Trust for Source Systems” on page 455)
• applies validation rules, if defined, to downgrade trust scores where applicable
(see “Configuring Validation Rules” on page 468)
The load process updates the target record in the base object according to the
following rules:
• If the trust score for the cell in the staging table record is higher than the trust
score in the corresponding cell in the target base object record, then the load
process updates the cell in the target record.
• If the trust score for the cell in the staging table record is lower than the trust
score in the corresponding cell in the target base object record, then the load
process does not update the cell in the target record.
• If the trust score for the cell in the staging table record is the same as the trust
score in the corresponding cell in the target base object record, or if trust is
not enabled for the column, then the cell value in the record with the most
recent LAST_UPDATE_DATE wins.
• If the staging table record has a more recent LAST_UPDATE_DATE,
then the corresponding cell in the target base object record is updated.
• If the target record in the base object has a more recent LAST_
UPDATE_DATE, then the cell is not updated.
For more information, see “Survivorship and Order of Precedence” on page 291.
• For each updated record in the base object, the load process sets its DIRTY_IND
to 1 so that match keys can be regenerated during the tokenization process. For
more information, see “Base Object Records Flagged for Tokenization” on page
323.
• For each updated record in the base object, the load process sets its
CONSOLIDATION_IND to 4 so that the updated record can matched to other

310 Siperian Hub Administrator Guide


Load Process

records in the base object. For more information, see “Consolidation Status for
Base Object Records” on page 289.
• Whenever the load process updates a record in the base object, it also updates the
associated record in the cross-reference table (“Cross-Reference Tables” on page
97), history tables (if history is enabled, see “History Tables” on page 100), and
other control tables as applicable.

• If Generate Match Tokens on Load is enabled for a base object (see “Generate
Match Tokens on Load” on page 104), then the tokenization process is
automatically started after the load process completes.

Siperian Hub Processes 311


Load Process

Load Updates and Target Dependent Objects

For load updates with target dependent objects, the load process updates the records in
the target dependent object with the values in the staging table without checking the last
update date.

Note: Data in staging tables from different source systems must have unique keys in
order to be loaded into a dependent object. Records coming from different source
systems each have their own key that uniquely identifies the record in that source
system. Siperian Hub considers any records from the same source system with the
same key values to be the same record. Therefore, if a record in the staging table has
the same key value as an existing cross-reference record, Siperian Hub performs a load
update because the record is considered to exist already in the base object.

Performing Lookups Needed to Maintain Referential Integrity

Regardless of whether the load process is inserting or updating a record, it performs


any lookups needed to translate source system foreign keys into Siperian Hub foreign
key values using the lookup settings configured for the staging table. For more
information, see “Configuring Lookups For Foreign Key Columns” on page 376.

Disabling Referential Integrity Constraints

During the initial load/updates—or if there is no real-time, concurrent access—you


can disable the referential integrity constraints on the base object to improve
performance. For more information, see “Allow constraints to be disabled” on page
103.

Undefined Lookups

If a lookup on a child object is not defined (the lookup table and column were not
populated), before you can successfully load data, you must repeat the stage process for
the child object prior to executing the load process. For more information, see “Stage
Jobs” on page 745 and “Load Jobs” on page 727.

312 Siperian Hub Administrator Guide


Load Process

Allowing Null Foreign Keys

When configuring columns for a staging table in the Schema Manager, you can specify
whether to allow NULL foreign keys for target base objects; this setting does not apply
to dependent objects. In the Schema Manager, the Allow Null Foreign Key check box
(see “Properties for Columns in Staging Tables” on page 370) determines whether
NULL foreign keys are permitted.
• By default, the Allow Null Foreign Key check box is unchecked, which means that
NULL foreign keys are not allowed. The load process:
• accepts records valid lookup values
• rejects records with NULL foreign keys
• rejects records with invalid foreign key values
• If Allow Null Foreign Key is enabled (selected), then the load process:
• accepts records with valid lookup values
• accepts records with NULL foreign keys (and permits load inserts and load
updates for these records)
• rejects records with invalid foreign key values

The load process permits load inserts and load updates for accepted records only.
Rejected records are inserted into the reject table rather than being loaded into the
target table.

Note: During the initial data load only, when the target base object is empty, the load
process allows null foreign keys. For more information, see “Initial Data Loads and
Incremental Loads” on page 302.

Siperian Hub Processes 313


Load Process

Rejected Records in Load Jobs

During the load process, records in the staging table might be rejected for the
following reasons:
• future date or NULL date in the LAST_UPDATE_DATE column
• NULL value mapped to the PKEY_SRC_OBJECT of the staging table
• duplicates found in PKEY_SRC_OBJECT
• invalid value in the HUB_STATE_IND field (for state-enabled base objects only)
• invalid or NULL foreign keys, as described in “Allowing Null Foreign Keys” on
page 313

Rejected records will not be loaded into base objects or dependent objects. Rejected
records can be inspected after running Load jobs (see “Viewing Rejected Records” on
page 685).

For more information about configuring the behavior delta detection for duplicates
and the retention of records in the REJ and RAW tables for a staging table, see “Using
Audit Trail and Delta Detection” on page 398.

Note: To reject records, the load process requires traceability back to the landing table.
If you are loading a record from a staging table and its corresponding record in the
associated landing table has been deleted, then the load process does not insert it into
the reject table.

314 Siperian Hub Administrator Guide


Load Process

Other Considerations for the Load Process


This section describes other considerations for the load process.

How the Load Process Handles Parent-Child Records

If the child table contains generated keys from the parent table, the load process copies
the appropriate primary key value from the parent table into the child table.
For example, suppose you had the following data.

PARENT TABLE:
PARENT_ID FNAME LNAME
101 Joe Smith
102 Jane Smith

CHILD TABLE: has a relationship to the PARENTS PKEY_SRC_OBJECT


ADDRESS CITY STATE FKEY_PARENT
1893 my city CA 101
1893 my city CA 102

In this example, you can have a relationship pointing to the ROWID_OBJECT, to


PKEY_SRC_OBJECT, or to a unique column for table lookup.

Loading State-Enabled Base Objects

The load process has special considerations when processing records for state-enabled
base objects. For more information, see “Rules for Loading Data” on page 221.

Note: The load process rejects any record from the staging table that has an invalid
value in the HUB_STATE_IND column. For more information, see “About the Hub
State Indicator” on page 207.

Siperian Hub Processes 315


Load Process

Generating Match Tokens (Optional)

Tokenizing data prepares it for the match process. In the Schema Manager, when
configuring a base object, you can specify whether to generate match tokens
immediately after the Load job completes, or to delay tokenizing data until the Match
job runs. The setting of the Generate Match Tokens on Load check box determines
when tokenization occurs. For more information, see “Match Process” on page 317
and “Generate Match Tokens on Load” on page 104.

Managing the Load Process


To manage the load process, refer to the following topics in this documentation:

Task Topic(s)
Configuration Chapter 13, “Configuring the Load Process”
• “Configuring Trust for Source Systems” on page 455
• “Configuring Validation Rules” on page 468
Execution Chapter 17, “Using Batch Jobs”
• “Load Jobs” on page 727
• “Synchronize Jobs” on page 747
• “Revalidate Jobs” on page 745
Chapter 18, “Writing Custom Scripts to Execute Batch Jobs”
• “Load Jobs” on page 775
• “Synchronize Jobs” on page 796
• “Revalidate Jobs” on page 794
Application Siperian Services Integration Framework Guide
Development

316 Siperian Hub Administrator Guide


Match Process

Match Process

This section describes concepts and tasks associated with the match process in Siperian
Hub.

About the Match Process


Before records in a base object can be consolidated, Siperian Hub must determine
which records are likely duplicates (matches) of each other. The match process uses match
rules to:
• identify which records in the base object are likely duplicates (identical or similar)
• determine which records are sufficiently similar to be consolidated automatically,
and which records should be reviewed manually by a data steward prior to
consolidation

In Siperian Hub, the match process provides you with two main ways in which to
compare records and determine duplicates:
• Fuzzy matching is the most common means used in Siperian Hub to match records
in base objects. Fuzzy matching looks for sufficient points of similarity between
records and makes probabilistic match determinations that consider likely
variations in data patterns, such as misspellings, transpositions, the combining or
splitting of words, omissions, truncation, phonetic variations, and so on.
• Exact matching is less commonly-used because it matches records with identical
values in the match column(s). An exact strategy is faster, but an exact match
might miss some matches if the data is imperfect.

The best option to choose depends on the characteristics of the data, your knowledge
of the data, and your particular match and consolidation requirements. For more
information, see “Exact-match and Fuzzy-match Base Objects” on page 320.

Siperian Hub Processes 317


Match Process

During the match process, Siperian Hub compares records in the base object for points
of similarity. If the match process finds sufficient points of similarity (identical or
similar matches) between two records, indicating that the two records probably are
duplicates of each other, then the match process:
• populates a match table with ROWID_OBJECT references to matched record
pairs, along with the match rule that identified the match, and whether the
matched records qualify for automatic consolidation

• flags those records for consolidation by changing their consolidation indicator to 2


(ready for consolidation), as described in “Consolidation Status for Base Object
Records” on page 289

318 Siperian Hub Administrator Guide


Match Process

Match Data Flow


The following figure shows the match process in relation to other Siperian Hub
processes.

Siperian Hub Processes 319


Match Process

Key Concepts for the Match Process


This section describes key concepts that apply to the match process.

Match Rules

A match rule defines the criteria by which Siperian Hub determines whether two records
in the base object might be duplicates. Siperian Hub supports two types of match rules:

Type Description
Match column rules Used to match base object records based on the values in columns
you have defined as match columns, such as last name, first name,
address1, and address2. This is the most commonly-used method
for identifying matches. For more information, see “Configuring
Match Columns” on page 515.
Primary key match rules Used to match records from two systems that use the same
primary keys for records. It is uncommon for two different source
systems to use identical primary keys. However, when this does
occur, primary key matches are quick and very accurate. For more
information, see “Configuring Primary Key Match Rules” on page
578.

Both kinds of match rules can be used together for the same base object.

Exact-match and Fuzzy-match Base Objects

A base object is configured to use one of the following types of matching:

Type of Base Object Description


exact-match base object Can have only exact match columns. For more information, see
“Match Column Types” on page 515.
fuzzy-match base object Can have both fuzzy match and exact match columns:
• fuzzy match only
• exact match only, or
• some combination of fuzzy and exact match

320 Siperian Hub Administrator Guide


Match Process

The type of base object determines the type of match and the type of match columns
you can define. The base object type is determined by the selected match / search
strategy for the base object. For more information, see “Match/Search Strategy” on
page 493.

Support Tables Used in the Match Process

The match process uses the following support tables:

Table Description
match key table Contains the match keys that were generated for all base object records.
A match key table uses the following naming convention:
C_baseObjectName_STRP
where baseObjectName is the root name of the base object.
Example: C_PARTY_STRP. For more information, see “Columns in
Match Key Tables” on page 325.
match table Contains the pairs of matched records in the base object resulting from
the execution of the match process on this base object.
Match tables use the following naming convention:
C_baseObjectName_MTCH
where baseObjectName is the root name of the base object.
Example: C_PARTY_MTCH. For more information, see “Populating the
Match Table with Match Pairs” on page 330.
Note: Link-style base objects use a link table (*_LNK) instead.
match flag audit Contains the userID of the user who, in Merge Manager, queued a manual
table match record for automerging.
Match flag audit tables use the following naming convention:
C_baseObjectName_FHMA
where baseObjectName is the root name of the base object.
Used only if Match Flag Audit Table is enabled for this base object, as
described in “Match Flag Audit Table” on page 105.

Siperian Hub Processes 321


Match Process

Match Keys and the Tokenization Process

Match keys are strings that encode data in the columns used to identify candidates for
matching. Match keys are fixed length, compressed, and encoded values built from a
combination of the words and numbers in a name or address such that relevant
variations have the same match key value. Match tokens are strings consisting of match
keys plus the flattened data from the match columns.

The process of generating match tokens is called tokenization. Match tokens are stored
in the match key table associated with the base object. For each record in the base
object, tokenization stores one or more generated match keys in the match key table. In
the match token table, match tokens are stored in the SSA_KEY column, and match
tokens are the combination of data stored in the SSA_KEY plus the SSA_DATA
columns. For more information, see “Columns in Match Key Tables” on page 325.

When to Generate Match Tokens

Match keys are maintained independently of the match process. The match process
depends on the match keys in the match table being current. Updating match keys can
occur:
• after the load process (see “Generate Match Tokens on Load” on page 104), when
load inserts and load updates
• when it is put into the base object using SIF Put or CleansePut requests (see
“Generate Match Tokens on Load” on page 104, as well as the Siperian Services
Integration Framework Guide and the Siperian Hub Javadoc)
• when you run the Generate Match Tokens job (see “Generate Match Tokens Jobs”
on page 725)
• at the start of a match job, as described in “Regenerating Match Keys If Needed”
on page 329
• after consolidating data, as described in “Consolidate Process” on page 335

322 Siperian Hub Administrator Guide


Match Process

Base Object Records Flagged for Tokenization

All base objects have a system column named DIRTY_IND. This dirty indicator
identifies when match keys need to be generated for the base object record. Match keys
are stored in the match key table.

The dirty indicator is one of the following values:

Value Meaning Description


0 Record is up to date Record does not need to be tokenized.
1 Record needs to be This flag is set to 1 when a record has been:
tokenized
• added (load insert)
• updated (load update)
• consolidated
• edited in the Data Manager

For each record in the base object whose DIRTY_IND is 1, the tokenization process
generates match keys, and then resets the DIRTY_IND to 0.

The following figure shows how the DIRTY_IND flag changes during various batch
processes:

Siperian Hub Processes 323


Match Process

Match Keys Differ Based on Match / Search Strategy

The match / search strategy affects match key generation.

Match / Search Strategy Description


exact-match base objects Match keys are generated for the primary key column.
fuzzy-match base objects Match key are generated for the fuzzy match key (such as names,
addresses, or organization names). For fuzzy-match base objects,
tokenization allows Siperian Hub to match rows with a degree of
fuzziness—the match need not be identical—just sufficiently
similar to be considered a match.

Key Types and Key Widths in Fuzzy-Match Base Objects

For fuzzy-match base objects, match keys are generated based on the following
settings:

Property Description
key type Identifies the primary type of information being tokenized (Person_Name,
Organization_Name, or Address_Part1) for this base object. The match process
uses its intelligence about name and address characteristics to generate match keys
and conduct searches. Available key types depend on the population set being
used, as described in “Population Sets” on page 326. For more information, see
“Key Types” on page 521.
key width Determines the thoroughness and speed of the search, the number of possible
match candidates returned, and how much disk space the keys consume. Available
key widths are Limited, Standard, Extended, and Preferred. For more
information, see “Key Widths” on page 522.

Because match keys must be able to overcome errors, variations, and word
transpositions in the data, Siperian Hub generates multiple match tokens for each
name, address, or organization. The number of keys generated per base object record
varies, depending on your data and the match key width.

Match Key Distribution and Hot Spots

The Match Keys Distribution tab in the Match / Merge Setup Details pane of the
Schema Manager allows you to investigate the distribution of match keys in the match

324 Siperian Hub Administrator Guide


Match Process

key table. This tool can assist you with identifying potential hot spots in your data—high
concentrations of match keys that could result in overmatching—where the match
process generates too many matches, including matches that are not relevant. For more
information, see “Investigating the Distribution of Match Keys” on page 583.

Example Match Keys

The match keys that are generated depend on your configured match settings and
characteristics of the data in the base object. The following example shows match keys
generated from strings using a fuzzy match / search strategy:

String in Record Generated Match Key


BETH O'BRIEN MMU$?/$-
BETH O'BRIEN PCOG$$$$
BETH O'BRIEN VL/IEFLM
LIZ O'BRIEN PCOG$$$$
LIZ O'BRIEN SXOG$$$-
LIZ O'BRIEN VL/IEFLM

In this example, the strings BETH O'BRIEN and LIZ O'BRIEN (keys #3 and 5 in the
example) have the same match token values. The match process would consider these
to be match candidates while searching for match candidates during the match process.

Columns in Match Key Tables

The match key table has the following system columns.

Data Type
Column Name (Size) Description
ROWID_OBJECT CHAR (14) Identifies the record for which this match key was
generated.
SSA_KEY CHAR (8) Generated match token for this record.

Siperian Hub Processes 325


Match Process

Data Type
Column Name (Size) Description
SSA_DATA VARCHAR2 Concatenated, plain text string representing the
(500) source data from all of the match columns defined
in the base object—not just the match key stored
in the SSA_KEY column.

Tokenization Ratio

You can configure the match process to repeat the tokenization process whenever the
percentage of changed records exceeds the specified ratio, which is configured as an
advanced property in the base object. For more information, see “Complete Tokenize
Ratio” on page 102.

Population Sets

For base objects with the fuzzy match/search strategy, the match process uses standard
population sets to account for national, regional, and language differences. The
population set affects how the match process handles tokenization, the match / search
strategy, and match purposes. For more information, see “Fuzzy Population” on page
494.

A population set encapsulates intelligence about name, address, and other identification
information that is typical for a given population. For example, different countries use
different address formats, such as the placement of street numbers and street names,
location of postal codes, and so on. Similarly, different regions have different
distributions for surnames—the surname “Smith” is quite common in the United
States population, for example, but not so common for other parts of the world.

Population sets improve match accuracy by accommodating for the variations and
errors that are likely to appear in data for a particular population. For more
information, see “Configuring Match Settings for Non-US Populations” on page 941.

Matching for Duplicate Data

The match for duplicate data functionality is used to generate matches for duplicates of
all non-system base object columns. These matches are generated when there are more

326 Siperian Hub Administrator Guide


Match Process

than a set number of occurrences of complete duplicates on the base object columns
(see “Duplicate Match Threshold” on page 103). For most data, the optimal value is 2.

Although the matches are generated, the consolidation indicator (see “Consolidation
Indicator” on page 289) remains at 4 (unconsolidated) for those records, so that they
can be later matched using the standard match rules.

Note: The Match for Duplicate Data job is visible in the Batch Viewer if the threshold
is set above 1 and there are no NON_EQUAL match rules defined on the
corresponding base object. For more information, see “Match for Duplicate Data
Jobs” on page 740.

Build Match Groups and Transitive Matches

The Build Match Group (BMG) process removes redundant matching in advance of
the consolidate process. For example, suppose a base object had the following match
pairs:
• record 1 matches to record 2
• record 2 matches to record 3
• record 3 matches to record 4

After running the match process and creating build match groups, and before the
running consolidation process, you might see the following records:
• record 2 matches to record 1
• record 3 matches to record 1
• record 4 matches to record 1

In this example, there was no explicit rule that matched 4 to 1. Instead, the match was
made indirectly due to the behavior of other matches (record 1 matched to 2, 2 matched
to 3, and 3 matched to 4). An indirect matching is also known as a transitive match. In
the Merge Manager and Data Manager, you can display the complete match history to
expose the details of transitive matches.

Siperian Hub Processes 327


Match Process

Maximum Matches for Manual Consolidation

You can configure the maximum number of manual matches to process during batch
jobs. Setting a limit helps prevent data stewards from being overwhelmed with
thousands of manual consolidations to process. Once this limit is reached, the match
process stops running run until the number of records ready for manual consolidation
has been reduced. For more information, see “Maximum Matches for Manual
Consolidation” on page 490 and “Consolidate Process” on page 335.

External Match Jobs

Siperian Hub provides a way to match new data with an existing base object without
actually loading the data into the base object. Rather than run an entire Match job, you
can run the External Match job instead to test for matches and inspect the results. For
more information, see “External Match Jobs” on page 719.

Distributed Cleanse Match Servers

For your Siperian Hub implementation, you can increase the throughput of the match
process by running multiple Cleanse Match Servers in parallel. For more information,
see “Configuring Cleanse Match Servers” on page 407 and the material about
distributed Cleanse Match Servers in the Siperian Hub Installation Guide for your
platform.

Handling Application Server or Database Server Failures

When running very large Match jobs with large match batch sizes, if there is a failure of
the application server or the database, you must re-run the entire batch. Match batches
are a unit. There are no incremental checkpoints. To address this, if you think there
might be a database or application server failure, set your match batch sizes smaller to
reduce the amount of time that will be spent re-running your match batches. For more
information, see “Number of Rows per Match Job Batch Cycle” on page 491 and
“Match Jobs” on page 734.

328 Siperian Hub Administrator Guide


Match Process

Run-Time Execution Flow of the Match Process


This section describes the overall sequence of activities that occur during the execution
of match process. The following figure provides an overview of the flow, which is
determined by the configured match/search strategy for the base object:

Cycles for Merge and Auto Match and Merge Jobs

The Merge job executes the match process for a single match batch (see “Flagging the
Match Batch” on page 329). The Auto Match and Merge job cycles repeatedly until
there are no more records to match (no more base object records with a
CONSOLIDATION_IND = 4).

Base Object Records Excluded from the Match Process

The following base object records are ignored during the match process:
• Records with a CONSOLIDATION_IND of 9 (on hold).
• Records with a PENDING or DELETED status. PENDING records can be
included if explicitly enabled according to the instructions in “Enabling Match on
Pending Records” on page 214.

Regenerating Match Keys If Needed

When the match process (such as a Match or Auto Match and Merge job) executes, it
first checks to determine whether match keys need to be generated for any records in
the base object and, if so, generates the match keys and updates the match key table.
Match keys will be generated if the c_repos_table.STRIP_INCOMPLETE_IND flag
for the base object is 1, or if any base object records have a DIRTY_IND=1 (see
“Base Object Records Flagged for Tokenization” on page 323). For more information,
see “Match Keys and the Tokenization Process” on page 322.

Flagging the Match Batch

The match process cycles through a series of batches until there are no more base
object records to process. It matches a subset of base object records (the match batch)

Siperian Hub Processes 329


Match Process

against all the records available for matching in the base object (the match pool). The size
of the match batch is determined by the Number of Rows per Match Job Batch Cycle
setting (“Number of Rows per Match Job Batch Cycle” on page 491).

For the match batch, the match process retrieves, in no specific order, base object
records that meet the following conditions:
• the record has a CONSOLIDATION_IND value of 4 (ready for match)
The load process sets the CONSOLIDATION_IND to 4 for any record that is
new (load insert) or updated (load update).
• the record qualifies based on rule set filtering, if configured (see “Enable Filtering”
on page 536 and “Filtering SQL” on page 536)

Internally, the match process changes the CONSOLIDATION_IND=3 for any


records in the match batch. At the end, the match process changes this setting to
CONSOLIDATION_IND=2 (match is complete).

Applying Match Rules and Generating Matches

In this step, the match process applies the configured match rules to the match
candidates. The match process executes the match rules one at a time, in the
configured order. The match process executes exact-match rules and exact
match-column rules first, then it executes fuzzy-match rules.

For a match to be declared:


• all match columns in a match rule must pass
• only one match rule needs to pass

The match process continues executing the match rules until there is a match or there
are no more rules to execute.

Populating the Match Table with Match Pairs

When all of the records in the match batch have been processed, the match process
adds all of the matches for that group to the match table and changes
CONSOLIDATION_IND=2 for the records in the match batch.

330 Siperian Hub Administrator Guide


Match Process

Match Pairs

The match process populates a match table for that base object. Each row in the match
table represents a pair of matched records in the base object. The match table stores
the ROWID_OBJECT values for each pair of matched records, as well as the identifier
for the match rule that resulted in the match, an automerge indicator, and other
information.

Columns in the Match Table

Match (_MTCH) tables have the following columns:

Siperian Hub Processes 331


Match Process

Column Name Data Type (Size) Description


ROWID_OBJECT CHAR (14) Identifies one of the records in the matched
pair.
ROWID_OBJECT_ CHAR (14) Identifies the record that matched the record
MATCHED specified in ROWID_OBJECT.
ORIG_ROWID_ CHAR (14) Identifies the original record that was
OBJECT_MATCHED matched to (prior to merge).
MATCH_REVERSE_ NUMBER (38) Indicates the direction of the original match.
IND One of the following values:
• Zero (0): ROWID_OBJECT matched
ROWID_OBJECT_MATCHED.
• One (1): ROWID_OBJECT_
MATCHED matched ROWID_
OBJECT
ROWID_USER CHAR (14) User who executed the match process.
ROWID_MATCH_ CHAR (14) Identifies the match rule that was used to
RULE match the two records.
AUTOMERGE_IND NUMBER (38) Specifies whether a record qualifies for
automatic consolidation during the
consolidate process. One of the following
values:
• Zero (0): Record does not qualify for
automatic consolidation.
• One (1): Record does qualify for
automatic consolidation.
The Automerge and Autolink jobs processes
any records with an AUTOMERGE_IND
of 1. For more information, see “Automerge
Jobs” on page 717 and “Autolink Jobs” on
page 715.
CREATOR VARCHAR2 (50) User or process responsible for creating the
record.
CREATE_DATE DATE Date on which the record was created.
UPDATED_BY VARCHAR2 (50) User or process responsible for the most
recent update to the record.
LAST_UPDATE_DATE DATE Date on which the record was last updated.

332 Siperian Hub Administrator Guide


Match Process

Flagging Matched Records for Automatic or Manual Consolidation

Match rules also determine how matched records are consolidated: automatically or
manually.

Type of Consolidation Description


automatic consolidation Identifies records in the base object that can be consolidated
automatically, without manual intervention. For more
information, see “Automerge Jobs” on page 717.
manual consolidation Identifies records in the base object that have enough points of
similarity to warrant attention from a data steward, but not
enough points of similarity to automatically consolidate them.
The data steward uses the Merge Manager to review and
manually merge records. For more information, see the Siperian
Hub Data Steward Guide.

For more information, see “Specifying Consolidation Options for Matched Records”
on page 543.

Managing the Match Process


To manage the match process, refer to the following topics in this documentation:

Task Topic(s)
Configuration Chapter 14, “Configuring the Match Process”
• “Configuring Match Properties for a Base Object” on page 488
• “Configuring Match Paths for Related Records” on page 497
• “Configuring Match Columns” on page 515
• “Configuring Match Rule Sets” on page 531
• “Configuring Match Column Rules for Match Rule Sets” on
page 542
• “Configuring Primary Key Match Rules” on page 578
• “Investigating the Distribution of Match Keys” on page 583
• “Excluding Records from the Match Process” on page 590
Appendix A, “Configuring International Data Support”
• “Configuring Match Settings for Non-US Populations” on page
941

Siperian Hub Processes 333


Match Process

Task Topic(s)
Execution Chapter 17, “Using Batch Jobs”
• “Auto Match and Merge Jobs” on page 716
• “External Match Jobs” on page 719
• “Generate Match Tokens Jobs” on page 725
• “Key Match Jobs” on page 727
• “Match Jobs” on page 734
• “Match Analyze Jobs” on page 738
• “Match for Duplicate Data Jobs” on page 740
• “Reset Links Jobs” on page 744
• “Reset Match Table Jobs” on page 744
Chapter 18, “Writing Custom Scripts to Execute Batch Jobs”
• “Auto Match and Merge Jobs” on page 762
• “External Match Jobs” on page 766
• “Generate Match Token Jobs” on page 767
• “Key Match Jobs” on page 773
• “Match Jobs” on page 783
• “Match Analyze Jobs” on page 785
• “Match for Duplicate Data Jobs” on page 786
Application Siperian Services Integration Framework Guide
Development

334 Siperian Hub Administrator Guide


Consolidate Process

Consolidate Process

This section describes concepts and tasks associated with the consolidate process in
Siperian Hub.

About the Consolidate Process


After match pairs have been identified in the match process, consolidation is the process
of consolidating data from matched records into a single, master record.

Siperian Hub Processes 335


Consolidate Process

The following figure shows cell data in records from three different source systems
being consolidated into a single master record.

Consolidating Records Automatically or Manually

As described in “Flagging Matched Records for Automatic or Manual Consolidation”


on page 333, match rules set the AUTOMERGE_IND column in the match table to
specify how matched records are consolidated: automatically or manually.
• Records flagged for manual consolidation are reviewed by a data steward using the
Merge Manager tool. For more information, see the Siperian Hub Data Steward
Guide.
• Records flagged for automatic consolidation are automatically merged (see
“Automerge Jobs” on page 717). Alternately, you can run the
automatch-and-merge job (see “Auto Match and Merge Jobs” on page 716) for a
base object, which calls the match and then automerge jobs repeatedly, until either
all records in the base object have been checked for matches, or the maximum
number of records for manual consolidation is reached.

336 Siperian Hub Administrator Guide


Consolidate Process

Consolidate Data Flow

The following figure shows the consolidate process in relation to other Siperian Hub
processes.

Traceability

The goal in Siperian Hub is to identify and eliminate all duplicate data and to merge or
link them together into a single, consolidated record while maintaining full traceability.
Traceability is Siperian Hub functionality that maintains knowledge about which
systems—and which records from those systems—contributed to consolidated
records. Siperian Hub maintains traceability using cross-reference and history tables.

Siperian Hub Processes 337


Consolidate Process

Key Configuration Settings for the Consolidate Process

The following configurable settings affect the consolidate process.

Option Description
base object style Determines whether the consolidate process using merging or
linking. For more information, see “Base Object Style” on page 106
and “Consolidation Options” on page 339.
immutable sources Allows you to specify source systems as immutable, meaning that
records from that source system will be accepted as unique and, once
a record from that source has been fully consolidated, it will not be
changed subsequently. For more information, see “Immutable Rowid
Object” on page 594.
distinct systems Allows you to specify source systems as distinct, meaning that the
data from that system gets inserted into the base object without being
consolidated. For more information, see “Distinct Systems” on page
595.
cascade unmerge for Allows you to enable cascade unmerging for child base objects and to
child base objects specify what happens if records in the parent base object are
unmerged. For more information, see “Unmerge Child When Parent
Unmerges (Cascade Unmerge)” on page 597.
child base object For two base objects in a parent-child relationship, if enabled on the
records on parent child base object, child records are resubmitted for the match process
merge if parent records are consolidated. For more information, see
“Requeue On Parent Merge” on page 104.

338 Siperian Hub Administrator Guide


Consolidate Process

Consolidation Options
There are two ways to consolidate matched records:
• Merging (physical consolidation) combines the matched records and updates the
base object. Merging occurs for merge-style base objects (link is not enabled).
• Linking (virtual consolidation) creates a logical link between the matched records.
Linking occurs for link-style base objects (link is enabled).

By default, base object consolidation is physically saved, so merging is the default


behavior. For more information, see “Base Object Style” on page 106.

Merging combines two or more records in a base object table. Depending on the
degree of similarity between the two records, merging is done automatically or
manually.
• Records that are definite matches are automatically merged (automerge process).
For more information, see “Automerge Jobs” on page 717.
• Records that are close but not definite matches are queued for manual review
(manual merge process) by a data steward in the Merge Manager tool. The data
steward inspects the candidate matches and selectively chooses matches that
should be merged. Manual merge match rules are configured to identify close
matches. For more information, see “Manual Merge Jobs” on page 732 and, for
the Merge Manager, see the Siperian Hub Data Steward Guide.
• Siperian Hub queues all other records for manual review by a data steward in the
Merge Manager tool.

Match rules are configured to identify definite matches for automerging and close
matches for manual merging.

To allow Siperian Hub to automatically change the state of such records to


Consolidated (thus removing them from the Data Steward’s queue), you can check
(select) the Accept all other unmatched rows as unique check box. For more
information, see “Accept All Unmatched Rows as Unique” on page 492.

Siperian Hub Processes 339


Consolidate Process

Best Version of the Truth

For a base object, the best version of the truth (sometimes abbreviated as BVT) is a record
that has been consolidated with the best cells of data from the source records.
The precise definition depends on the base object style:
• For merge-style base objects, the base object record is the BVT record, and is built
by consolidating with the most-trustworthy cell values from the corresponding
source records.
• For link-style base objects, the BVT Snapshot job will build the BVT record(s) by
consolidating with the most-trustworthy cell values from the corresponding linked
base object records and return to the requestor a snapshot for consumption.

Consolidation and Workflow Integration


For state-enabled base objects, consolidation behavior is affected by the current system
state of records in the base object. For example, only ACTIVE records can be
automatically consolidated—records with a PENDING or DELETED system state
cannot be. To understand the implications of system states during consolidation, refer
to the following topics:
• Chapter 7, “State Management,”especially “State Transition Rules for State
Management” on page 208 and “Hub States and Base Object Record Value
Survivorship” on page 211
• “Consolidating Data” in the Siperian Hub Data Steward Guide

340 Siperian Hub Administrator Guide


Consolidate Process

Managing the Consolidate Process


To manage the consolidate process, refer to the following topics in this documentation:

Task Topic(s)
Configuration Chapter 15, “Configuring the Consolidate Process”
• “About Consolidation Settings” on page 594
• “Changing Consolidation Settings” on page 598
Execution Siperian Hub Data Steward Guide
• “Managing Data”
• “Consolidating Data”
Chapter 17, “Using Batch Jobs”
• “Accept Non-Matched Records As Unique” on page 715
• “Auto Match and Merge Jobs” on page 716
• “Autolink Jobs” on page 715
• “Automerge Jobs” on page 717
• “BVT Snapshot Jobs” on page 719
• “Manual Link Jobs” on page 732
• “Manual Merge Jobs” on page 732
• “Manual Unlink Jobs” on page 733
• “Manual Unmerge Jobs” on page 733
• “Multi Merge Jobs” on page 741
• “Reset Links Jobs” on page 744
• “Reset Match Table Jobs” on page 744
• “Synchronize Jobs” on page 747
Chapter 18, “Writing Custom Scripts to Execute Batch Jobs”
• “Auto Match and Merge Jobs” on page 762
• “Autolink Jobs” on page 762
• “Automerge Jobs” on page 764
• “BVT Snapshot Jobs” on page 765
• “Manual Link Jobs” on page 777
• “Manual Unlink Jobs” on page 779
• “Manual Unmerge Jobs” on page 779
Application Development Siperian Services Integration Framework Guide

Siperian Hub Processes 341


Publish Process

Publish Process

This section describes concepts and tasks associated with the publish process in
Siperian Hub.

About the Publish Process


This section describes how Siperian Hub integrates with external systems by generating
XML messages about data changes in the Hub Store and publishing these messages to
an outbound Java Messaging System (JMS) queue—also known as a message queue in the
Hub Console.

Other external systems, processes, or applications can listen on the JMS message
queue, retrieve the XML messages, and process them accordingly.

Siperian Hub supports two JMS models:


• point-to-point—specific destination for a target external system
• publish/subscribe: point-to-point to an Enterprise Service Bus (ESB), then
publish/subscribe from the ESB to other systems.

342 Siperian Hub Administrator Guide


Publish Process

Using the Publish Process is Optional

Siperian Hub implementations use the publish process in support of stated business
and technical requirements. However, not all organizations will take advantage of this
functionality, and its use in Siperian Hub implementations is optional.

Publish Process is Part of the Siperian Hub Distribution Flow

The processes previously described in this chapter—land, stage, load, match, and
consolidate—are all associated with reconciliation, which is the main inbound flow for
Siperian Hub. With reconciliation, Siperian Hub receives data from one or more source
systems, cleanses the data if applicable, and then reconciles “multiple versions of the
truth” to arrive at the master record—the best version of the truth—for that entity.

In contrast, the publish process belongs to the main Siperian Hub outbound
flow—distribution. Once the master record is established or updated for a given entity,
Siperian Hub can then (optionally) distribute the master record data to other
applications or databases. For an introduction to reconciliation and distribution, see the
Siperian Hub Overview. In another scenario, data changes can be sent to the Activity
Manager Rules queue so that the data change can be evaluated against user-defined
rules.

Publish Process Executes By Message Triggers

The land, stage, load, match, and consolidate processes work with batches of records
and are executed as batch jobs or stored procedures. In contrast, the publish process is
executed as the result of a message trigger that executes when a data change occurs in the
Hub Store. The message trigger creates an XML message that gets published on a JMS
message queue.

Siperian Hub Processes 343


Publish Process

Outbound JMS Message Queues

Siperian Hub use an outbound message queue as a communication channel to feed


data changes back to external systems. Siperian supports embedded message queues,
which uses the JMS providers that come with application servers. An embedded
message queue uses the JNDI name of ConnectionFactory and the name of the JMS
queue to connect with. It requires those JNDI names that have been set up by the
application server. The Hub Console allows you to register message queue servers and
message queues that have already been configured in the application server
environment.

ORS-specific XML Message Schemas

XML messages are created using an ORS-specific schema file


(<ors-name>-siperian-mrm-event.xsd) that is based on a common XML schema
(siperian-mrm-events.xsd). You use the JMS Event Schema Manager to generate
this ORS-specific schema. This is a required task for setting up the publish process.
For more information, see “Generating and Deploying ORS-specific Schemas” on
page 827.

344 Siperian Hub Administrator Guide


Publish Process

Run-time Flow of the Publish Process


The following figure shows the run-time flow of the publish process.

Siperian Hub Processes 345


Publish Process

In this scenario:
1. A batch load or a real-time SIF API request (SIF put or cleanse_put request) may
result in an insert or update on a base object.
You can configure a message rule to control data going to the C_REPOS_MQ_
DATA_CHANGE table.
2. Hub Server polls data from C_REPOS_MQ_DATA_CHANGE table at regular
intervals.
3. For data that has not been sent, Hub Server constructs an XML message based on
the data and sends it to the outbound queue configured for the message queue.
4. It is the external application's responsibility to retrieve the message from the
outbound queue and process it.

Managing the Publish Process


To manage the publish process, refer to the following topics in this documentation:

Task Topic(s)
Configuration Chapter 16, “Configuring the Publish Process”
• “Configuring Global Message Queue Settings” on page 604
• “Configuring Message Queue Servers” on page 605
• “Configuring Outbound Message Queues” on page 608
• “Configuring Message Triggers” on page 612
• “Generating and Deploying ORS-specific Schemas” on page 827
Execution Siperian Hub publishes an XML message to an outbound message
queue whenever a messages trigger is fired. You do not need to
explicitly execute a batch job from the Batch Viewer or Batch Group
tool.
To monitor run-time activity for message queues using the Audit
Manager tool in the Hub Console, see “Auditing Message Queues”
on page 928.
Application Siperian Services Integration Framework Guide
Development

346 Siperian Hub Administrator Guide


10
Configuring the Land Process

This chapter explains how to configure the land process for your Siperian Hub
implementation. For an introduction, see “Land Process” on page 292.

Chapter Contents
• Before You Begin
• Configuration Tasks for the Land Process
• Configuring Source Systems
• Configuring Landing Tables

347
Before You Begin

Before You Begin


Before you begin to configure the land process, you must have completed the following
tasks:
• Installed Siperian Hub and created the Hub Store according to the instructions in
Siperian Hub Installation Guide
• Built the schema, including defining base objects, according to the instructions
Chapter 5, “Building the Schema”
• Learned about the land process described in “Land Process” on page 292

Configuration Tasks for the Land Process


To set up the land process for your Siperian Hub implementation, you must complete
the following tasks in the Hub Console:
• “Configuring Source Systems” on page 348
• “Configuring Landing Tables” on page 355

Configuring Source Systems


This section describes how to define source systems for your Siperian Hub
implementation. For an introduction, see “Land Process” on page 292.

About Source Systems


Source systems are external applications or systems that provide data to Siperian Hub.
In order to manage input from various source systems, Siperian Hub requires a unique
internal name for each source system. You use the Systems and Trust tool in the Model
workbench to define source systems for your Siperian Hub implementation.

Configuring Trust for Source Systems

If multiple source systems contribute data for the same column in a base object, you
can configure trust on a column-by-column basis to specify which source system(s) are
more reliable providers of data (relative to other source systems) for that column. Trust

348 Siperian Hub Administrator Guide


Configuring Source Systems

is used to determine survivorship when two records are consolidated, and whether
updates from a source system are sufficiently reliable to update the “best version of the
truth” record. For more information, see “Configuring Trust for Source Systems” on
page 455.

Administration Source System

Siperian Hub uses an administration source system for manual trust overrides and data
edits from the Data Manager or Merge Manager tools, which are described in the
Siperian Hub Data Steward Guide. This administration source system can contribute data
to any trust-enabled column. The administration source system is named Admin by
default, but you can optionally change its name according to the instructions in
“Editing Source System Properties” on page 353.

Siperian System Repository Table

The source systems that you define in the Systems and Trust tool are stored in a special
public Siperian Hub repository table (C_REPOS_SYSTEM, with a display name of
MRM System). This table is visible in the Schema Manager if the Show System Tables
option is selected (for more information, see “Changing the Item View” on page 39).
C_REPOS_SYSTEM can also be used in packages, as described in “Configuring
Packages” on page 196.

Warning: The C_REPOS_SYSTEM table contains Siperian Hub metadata.


As with any Siperian Hub systems tables, you should never alter the structure of, or
data in, the C_REPOS_SYSTEM table. Doing so causes Siperian Hub to behave
unpredictably and can result in data loss.

Configuring the Land Process 349


Configuring Source Systems

Starting the Systems and Trust Tool


To start the Systems and Trust tool:
• In the Hub Console, expand the Model workbench, and then click Systems and
Trust.

The Hub Console displays the Systems and Trust tool, as shown in the following
example.

Navigation Pane Properties Pane

350 Siperian Hub Administrator Guide


Configuring Source Systems

The Systems and Trust tool displays the following panes:

Pane Description
Navigation Systems: List of every source system that contributes data to Siperian Hub,
including the administration source system described in “Administration
Source System” on page 349.
Trust: Expand the tree to display:
• base objects containing one or more trust-enabled columns
• trust-enabled columns (only)
For more information about configuring trust for base object columns, see
“Configuring Trust for Source Systems” on page 455.
Properties Properties for the selected source system. Trust settings for the base object
column if the base object column is selected.

Source System Properties


A source system definition in Siperian Hub has the following properties.

Property Description
Name Unique, descriptive name for this source system.
Primary Key Primary key for this source system. Unique identifier for this system in the
ROWID_SYSTEM column of C_REPOS_SYSTEM. Read only.
Description Optional description for this source system.

Configuring the Land Process 351


Configuring Source Systems

Adding Source Systems


Using the Systems and Trust tool, you need to define each source system that will
contribute data to your Siperian Hub implementation.

To add a source system definition:


1. Start the Systems and Trust tool according to the instructions in “Starting the
Systems and Trust Tool” on page 350.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Right-click in the list of source systems and choose Add System.
The Systems and Trust tool displays the New System dialog.

4. Specify the source system properties. For more information, see “Source System
Properties” on page 351.
5. Click OK.
The Systems and Trust tool displays the newly-added source system in the list of
source systems.
Note: When you add a source system, Hub Store uses the first 14 characters of the
system name (in all uppercase letters) as its primary key (ROWID_SYSTEM value
in C_REPOS_SYSTEM).

352 Siperian Hub Administrator Guide


Configuring Source Systems

Editing Source System Properties


You can rename any source system, including the administration system (see
“Administration Source System” on page 349). You can change the display name used
in the Hub Console to identify this source system—renaming it has no effect outside
of the Hub Console.

Note: If this source system has already contributed data to your Siperian Hub
implementation, Siperian Hub continues to track the lineage (history) of data from this
source system even after you have renamed it.

To edit source system properties:


1. Start the Systems and Trust tool according to the instructions in “Starting the
Systems and Trust Tool” on page 350.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the list of source systems, select the source system that you want to configure.
The screen refreshes, showing the Edit button next to the name and
description fields for the selected source system.

4. Change any of the editable properties. For more information, see “Source System
Properties” on page 351.
5. To change trust settings for a source system, see “Configuring Trust for Source
Systems” on page 455.
6. Click the button to save your changes.

Configuring the Land Process 353


Configuring Source Systems

Removing Source Systems


You can remove any source system except:
• the administration system (see “Administration Source System” on page 349)
• any source system that has contributed data to a staging table after the stage
process has been run
You can remove a source system only before the stage process has copied data from
an associated landing to a staging table.
• any source system that is configured as a source for a base object (meaning that a
staging table associated with a base object points to the source system)

Note: Removing a source system deletes only the source system definition in the Hub
Console—it has no effect outside of Siperian Hub.

To remove a source system:


1. Start the Systems and Trust tool according to the instructions in “Starting the
Systems and Trust Tool” on page 350.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the list of source systems, right-click the source system that you want to remove,
and choose Remove System.
The Systems and Trust tool prompts you to confirm deletion.
4. Click Yes.
The Systems and Trust tool removes the source system from the list, along with
any metadata associated with this source system.

354 Siperian Hub Administrator Guide


Configuring Landing Tables

Configuring Landing Tables


This section describes how to configure landing tables in your Siperian Hub
implementation. For an introduction, see “Land Process” on page 292.

About Landing Tables


A landing table provides intermediate storage in the flow of data from source systems
into Siperian Hub. In effect, landing tables are “where data lands” from source systems
into the Hub Store. You use the Schema Manager in the Model workbench to define
landing tables.

The manner in which source systems populate landing tables with data is entirely
external to Siperian Hub. The data model you use for collecting data in landing tables
from various source systems is also external to Siperian Hub. One source system could
populate multiple landing tables. A single landing table could receive data from
different source systems. The data model you use is entirely up to your particular
implementation requirements.

Inside Siperian Hub, however, landing tables are mapped to staging tables, as described
in “Mapping Columns Between Landing and Staging Tables” on page 380. It is in the
staging table—mapped to a landing table—where the source system supplying the data
to the base object is identified. During the load process, Siperian Hub copies data from
a landing table to a target staging table, tags the data with the source system
identification, and optionally cleanses data in the process. A landing table can be
mapped to one or more staging tables. A staging table is mapped to only one landing
table.

As described in “Ways to Populate Landing Tables” on page 294, landing tables are
populated using batch or real-time approaches that are external to Siperian Hub.
After a landing table is populated, the stage process pulls data from the landing tables,
further cleanses the data if appropriate, and then populates the appropriate staging
tables. For more information, see “Stage Process” on page 295.

Configuring the Land Process 355


Configuring Landing Tables

Landing Table Columns


Landing tables have two types of columns:

Column Type Description


system columns Columns that are automatically created and maintained by the
Schema Manager.
user-defined columns Columns that have been added by users according to the instructions
in “Configuring Columns in Tables” on page 125.

Landing tables have only one system column.

Physical Name Data Type Description


LAST_UPDATE_DATE DATE Date on which the record was last updated in the
source system (for base objects, this will populate
LAST_UPDATE_DATE and SRC_LUD in the
cross-reference table, and may also populate
LAST_UPDATE_DATE on the base object,
depending on trust).

All other columns in the landing table are user-defined columns.

Note: If the source system table has a multiple-column key, concatenate these columns
to produce a single unique VARCHAR value for the primary key column.

356 Siperian Hub Administrator Guide


Configuring Landing Tables

Landing Table Properties


Landing tables have the following properties.

Property Description
Item Type Type of table that you are adding. Select Landing Table.
Display Name Name of this landing table as it will be displayed in the Hub Console.
Physical Name Actual name of the landing table in the database. Siperian Hub will
suggest a physical name for the landing table based on the display name
that you enter.
Data Tablespace Name of the data tablespace for this landing table. For more
information, see the Siperian Hub Installation Guide for your platform.
Index Tablespace Name of the index tablespace for this landing table. For more
information, see the Siperian Hub Installation Guide for your platform.
Description Description of this landing table.
Create Date Date and time when this landing table was created.
Contains Full Specifies whether this landing table contains the full data set from the
Data Set source system, or only updates.
• If selected (default), indicates that this landing table contains the full
set of data from the source system (such as for the initial data load).
When this check box is enabled, you can configure Siperian Hub’s
delta detection feature (see “Configuring Delta Detection for a
Staging Table” on page 401) so that, during the stage process, only
changed records are copied to the staging table.
• If not selected, indicates that this landing table contains only
changed data from the source system (such as for incremental
loads). In this case, Siperian Hub assumes that you filtered out
unchanged records before populating the landing table. Therefore,
the stage process inserts all records from the landing table directly
into the staging table. When this check box is enabled, Siperian
Hub’s delta detection feature is not available.
Note: You can change this property only when editing the source
system properties, as described in “Editing Source System Properties”
on page 353.

Configuring the Land Process 357


Configuring Landing Tables

Adding Landing Tables


To add a landing table:
1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.

2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on


page 30.
3. Select the Landing Tables node.

4. Right-click the Landing Tables node and choose Add Item.

358 Siperian Hub Administrator Guide


Configuring Landing Tables

The Schema Manager displays Add Table dialog box.

5. Specify the properties (described in “Landing Table Properties” on page 357) for
this new landing table.
6. Click OK.
The Schema Manager creates the new landing table in the Operational Record
Store (ORS), along with support tables, and then adds the new landing table to the
schema tree.

7. Configure the columns for your landing table according to the instructions in
“Configuring Columns in Tables” on page 125.
8. If you want to configure this landing table to contain only changed data from the
source system (Contains Full Data Set), edit the landing table properties according
to the instructions in “Editing Landing Table Properties” on page 360.

Configuring the Land Process 359


Configuring Landing Tables

Editing Landing Table Properties


To edit properties in a landing table:
1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the landing table that you want to edit.
The Schema Manager displays the Landing Table Identity pane for the selected
table.

4. Change the landing table properties you want. For more information, see “Landing
Table Properties” on page 357.
5. Click the button to save your changes.
6. Change the column configuration for your landing table, if you want, according to
the instructions in “Configuring Columns in Tables” on page 125.

360 Siperian Hub Administrator Guide


Configuring Landing Tables

Removing Landing Tables


To remove a landing table:
1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the schema tree, expand the Landing Tables node.
4. Right-click the landing table that you want to remove, and choose Remove.
The Schema Manager prompts you to confirm deletion.
5. Choose Yes.
The Schema Manager drops the landing table from the database, deletes any
mappings between this landing table and any staging table (but does not delete the
staging table), and removes the deleted landing table from the schema tree.

Configuring the Land Process 361


Configuring Landing Tables

362 Siperian Hub Administrator Guide


11
Configuring the Stage Process

This chapter explains how to configure the data staging process for your Siperian Hub
implementation. For an introduction, see “Stage Process” on page 295. In addition, to
learn about cleansing data during the data staging process, see Chapter 12,
“Configuring Data Cleansing.”

Chapter Contents
• Before You Begin
• Configuration Tasks for the Stage Process
• Configuring Staging Tables
• Mapping Columns Between Landing and Staging Tables
• Using Audit Trail and Delta Detection

363
Before You Begin

Before You Begin


Before you begin to configure staging data, you must have completed the following
tasks:
• Installed Siperian Hub and created the Hub Store according to the instructions in
Siperian Hub Installation Guide
• Built the schema according to the instructions Chapter 5, “Building the Schema”
• Learn about the stage process described in “Stage Process” on page 295.

Configuration Tasks for the Stage Process


In addition to the prerequisites described in “Before You Begin” on page 364, to set up
the process of staging data in your Siperian Hub implementation, you must complete
the following tasks in the Hub Console:
• “Configuring Staging Tables” on page 364
• “Mapping Columns Between Landing and Staging Tables” on page 380
• “Configuring Data Cleansing” on page 405, if you plan to use Siperian Hub
internal cleansing to normalize your data.

Configuring Staging Tables


This section describes how to configure staging tables in your Siperian Hub
implementation.

About Staging Tables


A staging table provides temporary, intermediate storage in the flow of data from landing
tables into base objects and dependent objects via load jobs (see “Load Jobs” on page
727). Staging tables:
• contain data from one source system for one table in the Hub Store
• are populated from landing tables by stage jobs (see “Stage Jobs” on page 745)
• can be created for base objects and dependent objects

364 Siperian Hub Administrator Guide


Configuring Staging Tables

The structure of a staging table is directly based on the structure of the target object
that will contain the consolidated data. You use the Schema Manager in the Model
workbench to configure staging tables.

Note: You must have at least one source system defined before you can define a
staging table. For more information, see “Configuring Source Systems” on page 348.

Staging Table Columns


Staging tables have two types of columns:

Column Type Description


system columns Columns that are automatically created and maintained by the
Schema Manager.
user-defined columns Columns that have been added by users. To add columns to a staging
table, you select from a list of columns that are already defined in the
base object or dependent object associated with the staging table.
For more information, see “Adding Staging Tables” on page 371 and
“Configuring Columns in Tables” on page 125.

Staging tables have the following system columns.

Physical Name Data Type (Size) Description


PKEY_SRC_OBJECT VARCHAR (255) Primary key from the source system.
This must be unique. If the source
record does not have a single unique
column, then concatenate the values
from multiple columns to uniquely
identify the record.
Display name is Pkey Src Object (or, in
some places, Primary Key from Source
System).
ROWID_OBJECT CHAR (14) Primary key. Unique value assigned by
Siperian during the stage process.
DELETED_IND INT Reserved for future use.
DELETED_DATE DATE Reserved for future use.
DELETED_BY VARCHAR (50) Reserved for future use.

Configuring the Stage Process 365


Configuring Staging Tables

Physical Name Data Type (Size) Description


LAST_UPDATE_DATE DATE Date on which the record was last
updated in the source system. For base
objects, this will populate LAST_
UPDATE_DATE and SRC_LUD in
the cross-reference table, and
(depending on trust settings) may also
populate LAST_UPDATE_DATE on
the base object.
UPDATED_BY VARCHAR (50) User or process responsible for the
most recent update.
CREATE_DATE DATE Date on which the record was created.
CREATOR VARCHAR (50) User or process responsible for creating
the record.
SRC_ROWID VARCHAR (30) Database internal Rowid column that is
used to uniquely trace back records to
the Landing table from Staging.
HUB_STATE_IND INT For state-enabled base objects only.
Integer value indicating the state of this
record. Valid values are:
• 0=Pending
• 1=Active (Default)
• -1=Deleted
For details, see “About the Hub State
Indicator” on page 207.

Staging tables must be based on the columns provided by the source system for the
target base object or dependent object for which the staging table is defined, even if the
landing tables are shared across multiple source systems. If you do not make the
column on staging tables source-specific, then you create unnecessary trust and
validation requirements.

Trust is a powerful mechanism, but it carries performance overhead. Use trust where it
is appropriate and necessary, but not where the most recent cell value will suffice for
the surviving record.

366 Siperian Hub Administrator Guide


Configuring Staging Tables

If you limit the columns in the staging tables to the columns actually provided by the
source systems, then you can restrict the trust columns to those that come from two or
more staging tables. Use this approach instead of treating every column as if it comes
from every source, which would mean needing to add trust for every column, and then
validation rules to downgrade the trust on null values for all of the sources that do not
provide values for the columns.

More trust columns and validation rules obviously affect the load and the merge
processes. Also, the more trusted columns, the longer will the update statements be for
the control table. Bear in mind that Oracle and DB2 have a 32K limit on the size of the
SQL buffer for SQL statements. For this reason, more than 40 trust columns result in a
horizontal split in the update of the control table—MRM will try to update only 40
columns at a time.

Staging Table Properties


Staging tables have the following properties.

Property Description
Staging Identity
Display Name Name of this staging table as it will be displayed in the Hub Console.
Physical Name Actual name of the staging table in the database. Siperian Hub will
suggest a physical name for the staging table based on the display
name that you enter.
System Select the source system for this data. For more information, see
“Configuring Source Systems” on page 348.
Preserve Source Copy key values from the source system rather than using Siperian
System Keys Hub’s internally-generated key values. Applies to staging tables
associated with base objects only (not with dependent objects).
To learn more, see “Preserving Source System Keys” on page 368.
Highest Reserved Key Specify the amount by which the key is increased after the first load.
Visible only if the Preserve Source System Key checkbox is selected.
To learn more, see “Specifying the Highest Reserved Key” on page
369.
Data Tablespace Name of the data tablespace for this staging table. For more
information, see the Siperian Hub Installation Guide for your platform.

Configuring the Stage Process 367


Configuring Staging Tables

Property Description
Index Tablespace Name of the index tablespace for this staging table. For more
information, see the Siperian Hub Installation Guide for your platform.
Description Description of this staging table.
Cell Update Determines whether Siperian Hub updates the cell in the target table
if the value in the incoming record from the staging table is the same.
For more information, see “Enabling Cell Update” on page 369.
Columns Columns in this staging table. For more information, see
“Configuring Columns in Tables” on page 125.
Audit Trail and Delta Configurable after mappings between landing and staging tables have
Detection been defined. For more information, see “Mapping Columns
Between Landing and Staging Tables” on page 380.
Audit Trail If enabled, retains the history of the data in the RAW table based on
the number of loads and timestamps. For more information, see
“Configuring the Audit Trail for a Staging Table” on page 399.
Delta Detection If enabled, Siperian Hub processes only new or changed records and
ignores unchanged records. For more information, see “Configuring
Delta Detection for a Staging Table” on page 401.

Preserving Source System Keys

By default, this option is not enabled. During Siperian Hub stage jobs (see “Stage Jobs”
on page 745), for each inbound record of data, Siperian Hub generates an internal key
that it inserts in the ROWID_OBJECT column of the target base object.

Enable this option when you want to use the value from the primary key column from
the source system instead of Siperian Hub’s internally-generated key. To enable this
option, when adding a staging table to a base object (see “Adding Staging Tables” on
page 371), check (select) the Preserve Source System Keys check box in the Add
staging to Base Object dialog. Once enabled, during stage jobs, instead of generating an
internal key, Siperian Hub takes the value in the PKEY_SOURCE_OBJECT column
from the staging table and inserts it into the ROWID_OBJECT column in the target
base object.

Note: Once a base object is created, you cannot change this setting.

368 Siperian Hub Administrator Guide


Configuring Staging Tables

Specifying the Highest Reserved Key

If the Preserve Source System Keys check box is enabled, then the Schema Manager
displays the Highest Reserved Key field. If you want to insert a gap between the source
key and Siperian Hub’s key, then enter the amount by which the key is increased after
the first load.

Note: Set the Highest Reserved Key to the upper boundary of the source system keys.
To allow a margin, set this number slightly higher, adding a buffer to the expected
range of source system keys. Any records added to the base object that do not contain
this key will be given a key by Siperian Hub that is above the highest reserved value you
set.

Enabling this option has the following consequences when the base object is first
loaded:
1. From the staging table, Siperian Hub takes the value in PKEY_SOURCE_
OBJECT and inserts that into the base object’s ROWID_OBJECT—instead of
generating Siperian Hub’s internal key.
2. Siperian Hub then resets the key's starting position to MAX (PKEY_SOURCE_
OBJECT) + the GAP value.
3. On the next load for this staging table, Siperian Hub continues to use the PKEY_
SOURCE_OBJECT. For loads from other staging tables, it uses the Siperian
Hub-generated key.

Note: Only one staging table per base object can have this option enabled (even if it is
from the same system). The reserved key range is set at the initial load only.

Enabling Cell Update

By default, during the stage process (see “Stage Jobs” on page 745), for each inbound
record of data, Siperian Hub replaces the cell value in the target base object whenever
an incoming record has a higher trust level—even if the value it replaces is identical.
Even though the value has not changed, Siperian Hub updates the last update date for
the cell to the date associated with the incoming record, and assigns to the cell the
same trust level as a new value. For more information, see “Configuring Trust for
Source Systems” on page 455.

Configuring the Stage Process 369


Configuring Staging Tables

You can change this behavior by checking (selecting) the Cell Update check box when
configuring a staging table. If cell update is enabled, then during Stage jobs, Siperian
Hub will compare the cell value with the current contents of the cross-reference table
before it updates the target record in the base object. If the cross-reference record for
this system has an identical value in this cell, then Siperian Hub will not update the cell
in the Hub Store. Enabling cell update can increase performance during Stage jobs if
your Siperian Hub implementation does not require updates to the last update date and
trust value in the target base object record.

Properties for Columns in Staging Tables

Columns in staging tables have the following properties:

Property Description
Column Name of this column as defined in the associated base object or
dependent object.
Lookup System Name of the lookup system if the Lookup Table is a cross-reference
table.
Lookup Table For foreign key columns in the staging table, the name of the table
containing the lookup column.
Lookup Column For foreign key columns in the staging table, the name of the lookup
column in the lookup table. For more information, see “Configuring
Lookups For Foreign Key Columns” on page 376.
Allow Null Update Determines whether null updates are allowed when a Load job
specifies a null value for a cell that already contains a non-null value.
• Check (select) this check box to have the Load job update the
cell. Do this if you want Siperian Hub to update the cell value
even though the new value would be null.
• Uncheck (clear, the default) this check box to prevent null
updates and retain the existing non-null value.

370 Siperian Hub Administrator Guide


Configuring Staging Tables

Property Description
Allow Null Foreign Determines whether null foreign keys are allowed. Use this option
Key only if null values are valid for the foreign key relationship—that is, if
the foreign key is an optional relationship.
• Check (select) this check box to allow data to be loaded when
you do not have a value for lookup.
• Uncheck (clear, the default) this check box to prevent null
foreign keys. In this case, records with null values in the lookup
column will be written to the rejects table instead of being
loaded.

Adding Staging Tables


To add a staging table:
1. Start the Schema Manager according to the instructions in“Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the schema tree, expand the Base Objects node.
4. In the schema tree, expand the node for the base object associated with this
staging table.
5. Do one of the following to identify the base object or dependent object that this
staging table will populate.
• If you want to add a staging table to this base object, right-click the Staging
Tables node and choose Add Staging Table.

Configuring the Stage Process 371


Configuring Staging Tables

• If you want to add a staging table to a dependent object, expand the


Dependent Objects node under the base object, expand the dependent object
node, right-click the Staging Tables node and choose Add Staging Table.

The Schema Manager displays the Add staging to Base Object (or Dependent
Object) dialog.

6. Specify the staging table properties. For more information, see “Staging Table
Properties” on page 367.
Note: Some of these settings cannot be changed after the staging table has been
added, so make sure that you specify the settings you want before closing this
dialog.

372 Siperian Hub Administrator Guide


Configuring Staging Tables

7. From the list of the columns in the base object or dependent object, select all of
the columns that this source system will provide. For more information, see
“Staging Table Columns” on page 365.

Check (select) the columns to include in this staging table

• Click the Select All button to select all of the columns without needing to
click each column individually.
• Click the Clear All button to unselect all selected columns.
These staging table columns inherit the properties of their corresponding columns
in the base object or dependent object. You can select columns but you cannot
change its inherited data types and column widths.
Schema Manager creates the new staging table in the Operational Record Store
(ORS), along with any support tables, and then adds the new staging table to the
schema tree.
Note: The Rowid Object and the Last Update Date are automatically selected.
You cannot uncheck these columns or change their properties.
8. Specify column properties. For more information, see “Properties for Columns in
Staging Tables” on page 370.
9. For each column that has an associated foreign key relationship, select the row and
click the button to define the lookup column. For more information, see
“Configuring Lookups For Foreign Key Columns” on page 376.
Note: You will not be able to save this new staging table unless you complete this
step.
10. Click OK.

Configuring the Stage Process 373


Configuring Staging Tables

The Schema Manager creates the new staging table in the Operational Record
Store (ORS), along with any support tables, and then adds the new staging table to
the schema tree.
11. If you want, configure an Audit Trail and Delta Detection for this staging table.
To learn more, see “Using Audit Trail and Delta Detection” on page 398.

Changing Properties in Staging Tables


To change properties in a staging table:
1. Start the Schema Manager according to the instructions in“Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the schema tree, expand the Base Objects node, and then expand the node for
the base object associated with this staging table.
• If the staging table is associated with the base object, then expand the Staging
Tables node to display it.
• If the staging table is associated with a dependent object, expand the
Dependent Objects node under the base object, then expand the Staging
Tables node to display it.
4. Select the staging table that you want to configure.

374 Siperian Hub Administrator Guide


Configuring Staging Tables

The Schema Manager displays the properties for the selected table.

5. Specify the staging table properties. For more information, see “Staging Table
Properties” on page 367.
For each property that you want to edit (Display Name and Description), click the
Edit button next to it, and specify the new value.
6. From the list of the columns in the base object or dependent object, change the
columns that this source system will provide.
• Click the Select All button to select all of the columns without needing to
click each column individually.
• Click the Clear All button to unselect all selected columns.
Note: The Rowid Object and the Last Update Date are automatically selected.
You cannot uncheck these columns or change their properties.
7. If you want, change column properties. For more information, see “Properties for
Columns in Staging Tables” on page 370.
8. If you want, change lookups for foreign key columns. Select the column and click
the button to configure the lookup column. For more information, see
“Configuring Lookups For Foreign Key Columns” on page 376.
9. If you want to change cell updating (see “Enabling Cell Update” on page 369),
click in the Cell update check box.

Configuring the Stage Process 375


Configuring Staging Tables

10. Change the column configuration for your staging table, if you want. For more
information, see “Configuring Columns in Tables” on page 125.
11. If you want, configure an Audit Trail and Delta Detection for this staging table.
To learn more, see “Using Audit Trail and Delta Detection” on page 398.
12. Click the button to save your changes.

Jumping to the Source System for a Staging Table


To view the source system associated with a staging table:
• Right-click the staging table and choose Jump to Source System.

The Hub Console launches the Systems and Trust tool and displays the source system
associated with this staging table. For more information, see “Configuring Source
Systems” on page 348.

Configuring Lookups For Foreign Key Columns


This section describes how to configure lookups for foreign key columns in staging
tables associated with base objects.

About Lookups

A lookup is the process of retrieving a data value from a parent table during Load jobs.
In Siperian Hub, when configuring a staging table associated with a base object, if a
foreign key column in the staging table (as the child table) is related to the primary key
in a parent table, you can configure a lookup to retrieve data from that parent table.
The target column in the lookup table must be a unique column (such as the primary
key). For more information, see “Performing Lookups Needed to Maintain Referential
Integrity” on page 312.

For example, suppose your Siperian Hub implementation had two base objects: a
Consumer parent base object and an Address child base object, with the following
relationship between them:
Consumer.Rowid_object = Address.Consumer_Fkey

376 Siperian Hub Administrator Guide


Configuring Staging Tables

In this case, the Consumer_Fkey will be included in the Address Staging table and it
will look up data on some column.

Note: The Address.Consumer_Fkey must be the same as Consumer.Rowed_object.

In this example, you could configure three types of lookups:


• to the ROWID_OBJECT (primary key) of the Consumer base object (lookup
table)
• to the PKEY_SRC_OBJECT column (primary key) of the cross-reference table
for the Consumer base object
In this case, you must also define the lookup system. Configuring a lookup to the
PKEY_SRC_OBJECT column of a cross-reference table allows you to point to
parent tables associated with a source system that differs from the source system
associated with this staging table.
• to any other unique column, if available, in the base object or its cross-reference
table

Once defined, when the Load job runs on the base object, Siperian Hub looks up the
source system’s Consumer code value in the primary key from source system column
of the Consumer code cross-reference table, and returns the customer type ROWID_
OBJECT value that corresponds to the source consumer type.

Configuring Lookups

To configure a lookup via foreign key relationship:


1. Start the Schema Manager according to the instructions in“Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the schema tree, expand the Base Objects node, and then expand the node for
the base object associated with this staging table.
4. Select the staging table that you want to configure.
5. Select the row of the foreign key column that you want to configure.

Configuring the Stage Process 377


Configuring Staging Tables

The Edit Lookup button is enabled only for foreign key columns.

Foreign Key Column Edit Lookup Button

6. Click the Edit Lookup button.


7. The Schema Manager displays the Define Lookup dialog.

The Define Lookup dialog contains the parent base object and its cross-reference
table, along with any unique columns (only).
8. Select the target column for the lookup.

378 Siperian Hub Administrator Guide


Configuring Staging Tables

• To define the lookup to a base object, expand the base object and select
Rowid_Object (the primary key for this base object).

• To define the lookup to a cross-reference table, select PKey Src Object


(the primary key for the source system in this cross-reference table).
• To define the lookup to any other unique column, simply select the column.
Note: When you delete a relationship, it clears the lookup.
9. If the lookup column is PKey Src Object in the relationship table, select the
lookup system from the Lookup System drop-down list.
10. Click OK.
11. If you want, configure the Allow Null Update check box to specify what will
happen if a Load job specifies a null value for a cell that already contains a non-null
value. For more information, see “Properties for Columns in Staging Tables” on
page 370.
12. For each column, configure the Allow Null Foreign Key option to specify what
happens if the foreign key column contains a null value (no lookup value is
available). For more information, see “Properties for Columns in Staging Tables”
on page 370.
13. Click the button to save your changes.

Configuring the Stage Process 379


Mapping Columns Between Landing and Staging Tables

Removing Staging Tables


To remove a staging table:
1. Start the Schema Manager according to the instructions in“Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the schema tree, expand the Base Objects node, and then expand the node for
the base object associated with this staging table.
4. Right-click the staging table that you want to remove, and then choose Remove.
The Schema Manager prompts you to confirm deletion.
5. Choose Yes.
The Schema Manager drops the staging table from the Operational Record Store
(ORS), deletes associated control tables, and removes the deleted staging table
from the schema tree.

Mapping Columns Between Landing and Staging


Tables
This section describes how to configure the mapping between landing and staging
tables. Mapping defines how the data is transferred from landing to staging tables via
Stage jobs.

About Mapping Columns


To give Siperian Hub the ability to move data from a landing table to a staging table,
you need to define a mapping from columns in the landing table to columns in the
staging table. This mapping defines:
• which landing table column is used to populate a column in the staging table
• what standardization and verification (cleansing) must be done, if any, before the
staging table is populated

380 Siperian Hub Administrator Guide


Mapping Columns Between Landing and Staging Tables

Mappings are configured as either SECURE or PRIVATE resources. For more


information, see “Securing Siperian Hub Resources” on page 841.

Relationships Between Landing and Staging Tables

You can map columns from one landing table to multiple staging tables. However, each
staging table is mapped to only one landing table.

Data is Either Cleansed or Passed Through Unchanged

For each column of data in the staging table, the data comes from the landing column
in one of two ways:

Copy Method Description


passed through Siperian Hub copies the data as is, without making any changes to it.
Data comes directly from a column in the landing table.
cleansed Siperian Hub standardizes and verifies data using cleanse functions.
The output of the cleanse function becomes the input to the target
column in the staging table. For more information about cleanse
functions, see Chapter 12, “Configuring Data Cleansing.”

In the following figure, data in the Name column is cleansed via a cleanse function,
while data from all other columns is passed directly to the corresponding target column
in the staging table.

Note: A staging table does not need to use every column in the landing table or every
output string from a cleanse function. The same landing table can provide input to

Configuring the Stage Process 381


Mapping Columns Between Landing and Staging Tables

multiple staging tables, and the same cleanse function can be reused for multiple
columns in multiple landing tables.

Decomposition and Aggregation

Cleanse functions can also decompose and aggregate data. Either way, your mappings
need to accommodate the required inputs and outputs.

Cleanse Functions that Decompose Data

In the following figure, the cleanse function decomposes the name field, breaking the
data into smaller pieces.

This cleanse function has one input string and five output strings. In your mapping,
you need to make sure that the input string is mapped to the cleanse function, and each
output string is mapped to the correct target column in the staging table.

382 Siperian Hub Administrator Guide


Mapping Columns Between Landing and Staging Tables

Cleanse Functions that Aggregate Data

In the following figure, the cleanse function aggregates data from five fields into a
single string.

This cleanse function has five input strings and one output string. In your mapping,
you need to make sure that the input strings are mapped to the cleanse function and
the output string is mapped to the correct target column in the staging table.

Considerations for Column Mappings

When mapping columns, consider the following rules and guidelines:


• The source column must have the same data type as the target column, or it must
be a data type that can be implicitly converted to the target column’s data type.
• For string (char or varchar) columns, the length does not need to be the same.
When data is loaded from the landing table to the staging table, any data value that
is too long for the target column will trigger Siperian Hub to place the entire
record in a reject table.
• Although more than three columns from the landing table can be mapped to the
Pkey Src Object column in the staging table, index creation is restricted to only
three columns.

Configuring the Stage Process 383


Mapping Columns Between Landing and Staging Tables

Starting the Mappings Tool


To start the Mappings tool:
• In the Hub Console, expand the Model workbench, and then click Mappings.
The Hub Console displays the Mappings tool, as shown in the following example.

Mappings List Properties Pane

The Mappings tool displays the following panels:

Column Description
Mappings List List of every defined landing-to-staging mapping.
Properties Properties for the selected mapping.

When you select a mapping in the mappings list, its properties are displayed.

384 Siperian Hub Administrator Guide


Mapping Columns Between Landing and Staging Tables

Tabs in the Mappings Tool

When a mapping is selected, the Mappings tool displays the following tabs.

Column Description
General General properties for this mapping. For more information, see
“Mapping Properties” on page 386.
Diagram Interactive diagram that lets you define mappings between columns
in the landing and staging tables. For more information, see
“Mapping Columns Between Landing and Staging Table Columns”
on page 389.
Query Parameters Allows you to specify query parameters for this mapping. For more
information, see “Configuring Query Parameters for Mappings” on
page 392.
Test Allows you to test the mapping.

Mapping Diagrams

When you click the Diagram tab for a mapping, the Mappings tool displays the current
column mappings.

Landing Table Mapping Lines Staging Table


(Source) (Target)

Mapping lines show the mapping from source columns in the landing table to target
columns in the staging table. Colors in the circles at either end of the mapping lines
indicate data types.

Configuring the Stage Process 385


Mapping Columns Between Landing and Staging Tables

Mapping Properties
Mappings have the following properties.

Field Description
Name Name of this mapping as it will be displayed in the Hub Console.
Description Description of this mapping.
Landing Table Select the landing table that will be the source of the mapping.
Staging Table Select the staging table that will be the target of the mapping.
Secure Resource Check (enable) to make this mapping a secure resource, which allows you to
control access to this mapping. Once a mapping is designated as a secure
resource, you can assign privileges to it in the Secure Resources tool.
To learn more, see “Securing Siperian Hub Resources” on page 841, and
“Assigning Resource Privileges to Roles” on page 859.

Adding Mappings
To create a new mapping:
1. Start the Mappings tool according to the instructions in “Starting the Mappings
Tool” on page 384.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Right-click in the area where the mappings are listed and choose Add Mapping.

386 Siperian Hub Administrator Guide


Mapping Columns Between Landing and Staging Tables

The Mappings tool displays the Mapping dialog.

4. Specify the mapping properties. For more information, see “Mapping Properties”
on page 386.
5. Click OK.
The Mappings tool displays the landing table and staging table on the workspace.
6. Using the workspace tools and the input and output nodes, connect the column in
the landing table to the corresponding column in the staging table.
Tip: If you want to automatically map columns in the landing table to columns
with the same name in the staging table, click the button.
7. Click OK.
8. When you are finished, click the button to save your changes.

Copying Mappings
To create a new mapping by copying an existing one:
1. Start the Mappings tool according to the instructions in “Starting the Mappings
Tool” on page 384.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.

Configuring the Stage Process 387


Mapping Columns Between Landing and Staging Tables

3. Right-click the mapping that you want to copy, and then choose Copy Mapping.
The Mappings tool displays the Mapping dialog.

4. Specify the mapping properties. The landing table is already specified. For more
information, see “Mapping Properties” on page 386.
5. Click OK.
6. Click the button to save your changes.

Editing Mapping Properties


To create a new mapping by copying an existing one:
1. Start the Mappings tool according to the instructions in “Starting the Mappings
Tool” on page 384.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the mapping that you want to edit.
4. Edit the mapping properties, diagram, and mapping settings as needed.
5. Click the button to save your changes.

388 Siperian Hub Administrator Guide


Mapping Columns Between Landing and Staging Tables

Mapping Columns Between Landing and Staging Table


Columns
You use the Diagrams tab in the Mappings tool to define the mappings between source
columns in landing tables and target columns staging tables. How you map depends on
whether it is a pass through mapping (directly between columns) or a cleansed
mapping (data is processed by a cleanse function).

For each mapping:


• inputs are columns from the landing table
• outputs are the columns in the staging table

The workspace and the methods of creating a mapping are the same as for creating
cleanse functions. To learn how to use the workspace to define functions, inputs, and
outputs, see “Configuring Graph Functions” on page 424.

Navigate to the Diagrams Tab

To navigate to the Diagrams tab:


1. Start the Mappings tool according to the instructions in “Starting the Mappings
Tool” on page 384.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the mapping that you want to configure.
4. Click the Diagram tab.
The Mappings tool displays the Diagram tab for this mapping.

Mapping Columns Directly

To configure mappings directly between columns in landing and staging tables:


1. Navigate to the Diagrams tab according to the instructions in“Navigate to the
Diagrams Tab” on page 389.

Configuring the Stage Process 389


Mapping Columns Between Landing and Staging Tables

2. Mouse-over the output connector (circle) to the right of the column in the landing
table (the circle outline turns red), drag the line to the input connector (circle) to
the left of the column in the staging table, and then release the mouse button.

Note: If you want to load by RowID, create a mapping between the primary key in
the landing table and the Rowid object in the staging table. For more information,
see “Loading by RowID” on page 394.

3. Click the button to save your changes.

390 Siperian Hub Administrator Guide


Mapping Columns Between Landing and Staging Tables

Mapping Columns Using Cleanse Functions

To cleanse data during Stage jobs, you can include one or more cleanse functions in
your mapping. This section provides brief instructions for configuring cleanse
functions in mappings. To learn more, see “Using Cleanse Functions” on page 414.

To configure mappings between columns in landing and staging tables via cleanse
functions:
1. Navigate to the Diagrams tab according to the instructions in“Navigate to the
Diagrams Tab” on page 389.
2. Add the cleanse function(s) that you want to configure by right-clicking anywhere
in the workspace and choosing the cleanse function that you want to add.
3. For each input connector on the cleanse function, mouse-over the output
connector from the appropriate column in the landing table, drag the line to its
corresponding input connector, and release the mouse button.
4. Similarly, for each output connector on the cleanse function, mouse-over the
output connector, drag the line to its corresponding column in the staging table,
and release the mouse button.
In the following example, the Titlecase cleanse function will process data that
comes from the Last Name column in the landing table and then populate the Last
Name column in the staging table with the cleansed data.

5. Click the button to save your changes.

Configuring the Stage Process 391


Mapping Columns Between Landing and Staging Tables

Configuring Query Parameters for Mappings


To configure query parameters for a mapping:
1. Start the Mappings tool according to the instructions in “Starting the Mappings
Tool” on page 384.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the mapping that you want to configure.
4. Click the Query Parameters tab.
The Mappings tool displays the Query Parameters tab for this mapping.

5. If you want, check or uncheck the Enable Distinct check box, as appropriate, to
configure distinct mapping. For more information, see “Distinct Mapping” on
page 393.
6. If you want, check or uncheck the Enable Condition check box, as appropriate, to
configure conditional mapping. For more information, see “Conditional Mapping”
on page 394.

392 Siperian Hub Administrator Guide


Mapping Columns Between Landing and Staging Tables

If enabled, type the SQL WHERE clause (omitting the WHERE keyword), and
then click Validate to validate the clause.
7. Click the button to save your changes.

Filtering Records in Mappings

By default, all records are retrieved from the landing table. Optionally, you can
configure a mapping that filters records in the landing table. There are two types of
filters: distinct and conditional. You configure these settings on the Query Parameters
tab in the Mappings tool. For more information, see “Configuring Query Parameters
for Mappings” on page 392.

Distinct Mapping

If you click the Enable Distinct check box on the Query Parameters tab, the Stage job
selects only the distinct records from the landing table. Siperian Hub populates the
staging table using the following SELECT statement:

select distinct * from landing_table

Using distinct mapping is useful in situations in which you have a single landing table
feeding multiple staging tables and the landing table is denormalized (for example, it
contains both customer and address data). A single customer could have three
addresses. In this case, using distinct mapping prevents the two extra customer records
from being written to the rejects table.

In another example, suppose a landing table contained the following data:


LUD CUST_ID NAME ADDR_ID ADDR
7/24 1 JOHN 1 1 MAIN ST
7/24 1 JOHN 2 1 MAPLE ST

In the mapping to the customer table, check (select) Enable Distinct to avoid having
duplicate records because only LUD, CUST_ID, and NAME are mapped to the
Customer staging table. With Distinct enabled, only one record would populate your
customer table and no rejects would occur.

Configuring the Stage Process 393


Mapping Columns Between Landing and Staging Tables

Alternatively, for the address mapping, you map ADDR_ID and ADDR with Distinct
disabled so that you get two records and no rejects.

Conditional Mapping

If you select the Enable Condition check box, you can apply a SQL WHERE clause to
unload the data in cleanse. For example, suppose the data in your landing table is from
all states in the US. You can use the WHERE clause to filter the data that is written to
the staging tables to include only data from one state, such as California. To do this,
type in a WHERE clause (but omit the WHERE keyword): STATE = 'CA'. When the
cleanse job is run, it unloads and processes records as SELECT * FROM LANDING
WHERE STATE = 'CA'. If you specify conditional mapping, click the Validate button
to validate the SQL statement.

Loading by RowID
You can streamline load, match, and merge processing by explicitly configuring
Siperian Hub to load by RowID. Otherwise, Siperian Hub loads data according to its
default behavior, which is described in “Run-time Execution Flow of the Load
Process” on page 304.

Note: If you clean the BASE OBJECT using the stored procedure, and if you had
setup the TAKE-ON GAP for the particular staging table, the ROWID sequences are
reset to 1.

In the staging table, the Rowid Object column (a nullable column) has a specialized usage.
You can streamline load, match, and merge processing by mapping any column in a
landing table to the Rowid Object column in a staging table. In the following example,

394 Siperian Hub Administrator Guide


Mapping Columns Between Landing and Staging Tables

the Address Id column in the landing table is mapped to the Rowid Object column in
the staging table.

Rowid
Object

Mapping to the Rowid Object column allows for the loading of records by present- or
lineage-based ROWID_OBJECT. During the load, if an incoming record with a
populated ROWID_OBJECT is new (the incoming PKEY_SRC_OBJECT + ROWID_
SYSTEM is checked), then this record bypasses the match and merge process and gets
added to the base object directly—a real-time API PUT(_XREF) by ROWID_
OBJECT. Using this feature enhances lineage and unmerge support, enables
closed-loop integration with downstream systems, and can increase throughput.

The initial data load for a base object inserts all records into the target base object.
Therefore, enable loading by rowID for incremental loads that occur after the initial
data load. For more information, see “Initial Data Loads and Incremental Loads” on
page 302 and “Run-time Execution Flow of the Load Process” on page 304.

Jumping to a Schema
The Mappings tool allows you to quickly launch the Schema Manager and display the
schema associated with the selected mapping.

Note: The Jump to Schema command is available only in the Workbenches view, not
the Processes view.

To jump to the schema for a mapping:


1. Start the Mappings tool according to the instructions in “Starting the Mappings
Tool” on page 384.

Configuring the Stage Process 395


Mapping Columns Between Landing and Staging Tables

2. Select the mapping whose schema you want to view.


3. In the View By list at the bottom of the navigation pane, choose one of the
following options:
• By Staging Table
• By Landing Table
• by Mapping
4. Right-click anywhere in the navigation pane, and then choose Jump to Schema.

5. The Mappings tool displays the schema for the selected mapping.

396 Siperian Hub Administrator Guide


Mapping Columns Between Landing and Staging Tables

Testing Mappings
To test a mapping that you have configured:
1. Start the Mappings tool according to the instructions in “Starting the Mappings
Tool” on page 384.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the mapping that you want to configure.
4. Click the Test tab.
The Mappings tool displays the Test tab for this mapping.

Configuring the Stage Process 397


Using Audit Trail and Delta Detection

5. Specify input values for the columns under Input Name.


6. Click Test.
7. The Mappings tool tests the mapping and populates the columns under Output
Name with the results.

Removing Mappings
To remove a mapping:
1. Start the Mappings tool according to the instructions in “Starting the Mappings
Tool” on page 384.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Right-click the mapping that you want to remove, and choose Delete Mapping.
The Mappings tool prompts you to confirm deletion.
4. Click Yes.
The Mappings tool drops supporting tables, removes the mapping from the
metadata, and updates the list of mappings.

Using Audit Trail and Delta Detection


After you have completed mapping columns between landing and staging tables, you
can configure the audit trail and delta detection features for a staging table. For more
information, see “Mapping Columns Between Landing and Staging Tables” on page
380.

398 Siperian Hub Administrator Guide


Using Audit Trail and Delta Detection

To configure audit trail and delta detection, click the Settings tab.

Configuring the Audit Trail for a Staging Table


Siperian Hub allows you to configure an audit trail that retains the history of the data in
the RAW table based on the number of Loads and timestamps. This audit trail is useful,
for example, when using HDD (Hard Delete Detection). By default, audit trails are not
enabled, and the RAW table is empty. If enabled, then records are kept in the RAW
table for either the configured number of stage job executions or the specified
retention period.

Note: The Audit Trail has very different functionality from—and is not to be confused
with—the Audit Manager tool described in Chapter 22, “Auditing Siperian Hub
Services and Events”.

To configure the audit trail for a staging table:


1. Start the Schema Manager according to the instructions in“Building the Schema”
on page 81.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.

Configuring the Stage Process 399


Using Audit Trail and Delta Detection

3. If you have not already done so, add a mapping for the staging table. For more
information, see “Adding Mappings” on page 386
4. Select the staging table that you want to configure.
5. At the bottom of the properties panel, click Preserve an audit trail in the raw
table to enable the raw data audit trail.
The Schema Manager prompts you to select the retention period for the audit
table.

6. Selecting one of the following options for audit retention period:

Option Description
Loads Number of batch loads for which to retain data.
Time Period Period of time for which to retain data.

7. Click Save to save your changes.

Once configured, the audit trail keeps data for the retention period that you specified.
For example, suppose you configured the audit trail for two loads (Stage job
executions). In this case, the audit trail will retain data for the two most recent loads to
the staging table. If there were ten records in each load in the landing table, then the
total number of records in the RAW table would be 20.

400 Siperian Hub Administrator Guide


Using Audit Trail and Delta Detection

If the Stage job is run multiple times, then the data in the RAW table will be retained
for the most recent two sets based on the ROWID_JOB. Data for older ROWID_
JOBs will be deleted. For example, suppose the value of the ROWID_JOB for the first
Stage job is 1, for the second Stage job is 2, and so on. When you run the Stage job a
third time, then the records in which ROWID_JOB=1 will be discarded.

Note: Using the Clear History button in the Batch Viewer after the first run of the
process:
If the audit trail is enabled for a staging table and you choose the Clear History button
in the Batch Viewer while the associated stage job is selected, the records in the RAW
and REJ tables will be cleared the next time the stage job is run.

Configuring Delta Detection for a Staging Table


If you enable delta detection for a staging table, Siperian Hub processes only new or
changed records and ignores unchanged records.

Enabling Delta Detection for a Staging Table

To enable delta detection for a staging table:


1. Start the Schema Manager according to the instructions in“Starting the Schema
Manager” on page 90.
2. Select the staging table that you want to configure.
3. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.

Configuring the Stage Process 401


Using Audit Trail and Delta Detection

4. Select (check) the Enable delta detection check box to enable delta detection for
the table. You might need to scroll down to see this option.

5. Specify the manner in which you want to have deltas detected. You can choose:
• Detect deltas by comparing all columns in mapping
• Detect deltas via a date column (select the column)
6. Specify whether to allow staging if a prior duplicate was rejected during the stage
process or load process.
• Select (check) this option to allow the duplicate record being staged, during this
next stage process execution, to bypass delta detection if its previously-staged
duplicate was rejected.
Note: If this option is enabled, and a user in the Batch Viewer clicks the Clear
History button while the associated stage job is selected, then the history of
the prior rejection (that this feature relies on) will be discarded because the
records in the REJ table will be cleared the next time the stage job is run.
• Clear (uncheck) this option (the default) to prevent the duplicate record being
staged, during this next stage process execution, from bypassing delta
detection if its previously-staged duplicate was rejected. Delta detection will
filter out any corresponding duplicate landing record that is subsequently
processed in the next stage process execution.

402 Siperian Hub Administrator Guide


Using Audit Trail and Delta Detection

How Siperian Hub Handles Delta Detection

If delta detection is enabled, then the Stage job compares the contents of the landing
table—which is mapped to the selected staging table—against the data set processed in
the previous run of the stage job. This comparison is done to determine whether the
data has changed since the previous run. Changed, new records, and rejected records
will be put into the staging table. Duplicate records are ignored. For more information,
see “Mapping Columns Between Landing and Staging Tables” on page 380.

Note: Reject records move from cleanse to load after the second stage run.

Considerations for Using Delta Detection

When using delta detection, consider the following issues:


• Delta detection can be done either by comparing entire records or via a date
column. Delta detection on last update date is the most efficient, as Siperian Hub
can simply compare the last update date columns for each incoming record against
the record’s previous last update date.
• When processing records by last update date, do not use the Now cleanse function
to compare last update values (for example, testing whether the last update date in
a source record occurred before the current system date). Using Now in this way
can produce unpredictable results. For more information, see Chapter 12,
“Configuring Data Cleansing.”
• Perform delta detection only on columns for those sources where the Last Update
Date is not a true indicator of change. The Siperian Hub stage job will compare
the entire source record against the most recent corresponding record in the raw
table. If any cell is different, then the record is passed on to the staging table.
• If you update a record and set the Last Update Date to a different date, it will not
get delta detected if you have a date in the landing table that is earlier than the date
you entered. The new Last Update Date always needs to be earlier than the max
date value in the RAW table.
• During delta detection, when you are checking for deltas on all columns, only
records that have null primary keys are rejected. This is expected behavior. Any
other records that fail the delta process are rejected on subsequent stage processes.

Configuring the Stage Process 403


Using Audit Trail and Delta Detection

• When delta detection is based on the Last Update Date, any changes to the last
update date or the primary key will be detected. Updates to any values that are not
the last update date or part of the concatenated primary key will not be detected.
• Duplicate primary keys are not considered during subsequent stage processes when
using delta detection by mapped columns.
• Reject handling allows you to:
• View all reject records for a given staging table regarding of the batch job
• View all reject records by day across all staging tables
• Query reject tables based on query filters

404 Siperian Hub Administrator Guide


12
Configuring Data Cleansing

This chapter describes how to configure your Hub Store to cleanse data during the
stage process. This chapter is a companion to the material provided in Chapter 11,
“Configuring the Stage Process.”

Chapter Contents
• Before You Begin
• About Data Cleansing in Siperian Hub
• Configuring Cleanse Match Servers
• Using Cleanse Functions
• Configuring Cleanse Lists

405
Before You Begin

Before You Begin


Before you begin, you must have completed the following tasks:
• Installed Siperian Hub and created the Hub Store according to the instructions in
the Siperian Hub Installation Guide for your platform.
• Built the schema according to the instructions in Chapter 5, “Building the
Schema.”
• Created staging tables and landing tables according to the instructions in Chapter
11, “Configuring the Stage Process.”
• Installed and configured your cleanse engine according to the documentation
included in your cleanse engine distribution.

About Data Cleansing in Siperian Hub


Data cleansing is the process of standardizing data to optimize it for input into the match
process. Matching cleansed data results in a greater number of reliable matches.
This chapter describes internal cleansing—the data cleansing that occurs inside Siperian
Hub, specifically during a Stage job, when data is copied from landing tables to the
appropriate staging tables (see Chapter 11, “Configuring the Stage Process”).

Note: Data cleansing that occurs prior to its arrival in the landing tables is outside the
scope of this chapter.

Setup Tasks for Data Cleansing


To set up data cleansing for your Siperian Hub implementation, you complete the
following tasks:
• “Configuring Cleanse Match Servers” on page 407
• “Using Cleanse Functions” on page 414
• “Configuring Cleanse Lists” on page 440

406 Siperian Hub Administrator Guide


Configuring Cleanse Match Servers

Configuring Cleanse Match Servers


This section describes how to configure Cleanse Match Servers for your Siperian Hub
implementation. To learn more, see “About Data Cleansing in Siperian Hub” on page
406.

About the Cleanse Match Server


The Cleanse Match Server is a servlet that handles cleanse requests. This servlet is
deployed in an application server environment. The servlet contains two server
components:
• a cleanse server handles data cleansing operations
• a match server handles match operations

The Cleanse Match Server is multi-threaded so that each instance can process multiple
requests concurrently. It can be deployed on a variety of application servers. See the
Siperian Hub Release Notes for a list of supported application servers. See the Siperian Hub
Installation Guide for your platform for instructions on installing and configuring
Cleanse Match Server(s).

Siperian Hub supports running multiple Cleanse Match Servers for each Operational
Record Store (ORS). The cleanse process is generally CPU-bound. This scalable
architecture allows you to scale your Siperian Hub implementation as the volume of
data increases. Deploying Cleanse Match Servers on multiple hosts distributes the
processing load across multiple CPUs and permits the running of cleanse operations in
parallel. In addition, some external adapters are inherently single-threaded, so this
Siperian Hub architecture allows you to simulate multi-threaded operations by running
one processing thread per application server instance.

Modes of Cleanse Operations

Cleanse operations can be classified according to the following modes:


• Online and Batch (default)
• Online Only

Configuring Data Cleansing 407


Configuring Cleanse Match Servers

• Batch Only

The CLEANSE_TYPE can be used to specify which class(es) of operations a


particular Cleanse Match Server will run. If you deploy two Cleanse Match Servers, you
could make one batch-only and the other online-only, or you could make them both
accept both classes of requests. Unless otherwise specified, a Cleanse Match Server will
default to running both kinds of requests.

Distributed Cleanse Match Servers

For your Siperian Hub implementation, you can increase the throughput of the cleanse
process by running multiple Cleanse Match Servers in parallel. To learn more about
distributed Cleanse Match Servers, see the Siperian Hub Installation Guide.

Cleanse Match Servers and Proxy Users

If proxy users have been configured for your Siperian Hub implementation, if you
created proxy_user and cmx_ors with different passwords, then you need to either:
• restart the application server and log in to the proxy user from the Hub Console
or
• register the Cleanse Match Server for the proxy user again

Otherwise, Stage jobs will fail.

Cleanse Requests

All requests for cleansing are issued by database stored procedures. These stored
procedures package a cleanse request as an XML payload and transmit it to a Cleanse
Match Server. When the Cleanse Match Server receives a request, it parses the XML
and invokes the appropriate code:

Mode Type Description


On-line operations The result is packaged as an XML response and sent back via an
HTTP POST connection.

408 Siperian Hub Administrator Guide


Configuring Cleanse Match Servers

Mode Type Description


Batch jobs The Cleanse Match Server pulls the data to be processed into a
flat file, processes it, and then uses a bulk loader to write the data
back.
• For Oracle, it uses the Oracle loader (SQLLDR) utility.
• For DB2, it uses the DB2 Load utility.

The Cleanse Match Server is multi-threaded so that each instance can process multiple
requests concurrently. The default timeout for batch requests from Oracle to a Cleanse
Match Server is one year, and the default timeout for on-line requests is one minute.
For DB2, the default timeout for batch requests or SIF requests is 600 seconds (10
minutes).

When running a stage/match job, if more than one cleanse match server is registered,
and if the total number of records to be staged or matched is more than 500, then the
job will get distributed in parallel among the available Cleanse Match Servers.

Starting the Cleanse Match Server Tool


To view Cleanse Match Server information (including name, port, server type, and
whether the server is on- or off-line):
• In the Hub Console, expand the Model workbench and then click Cleanse Match
Server.

The Cleanse Match Server tool displays a list of any configured Cleanse Match Servers.

Configuring Data Cleansing 409


Configuring Cleanse Match Servers

Cleanse Match Server Properties


When configuring Cleanse Match Servers, you can specify the following settings.

Property Description
Server Host or machine name of the application server on which you
deployed Siperian Hub Cleanse Match Server.
Port HTTP port of the application server on which you deployed the
Cleanse Match Server.
Cleanse Server Determines whether to use the Cleanse Match Server for cleansing
data.
• Select (check) this check box to use the Cleanse Match Server for
cleansing data.
• Clear (uncheck) this check box if you do not want to use the
Cleanse Match Server for cleansing data.
If an ORS has multiple associated Cleanse Match Servers, you can
enhance performance by configuring each Cleanse Match Server as
either a match-only or a cleanse-only server. Use this option in
conjunction with the Match Server check box to implementation this
configuration.
Cleanse Mode Mode that the Cleanse Match Server uses for cleansing data. For
details, see “Modes of Cleanse Operations” on page 407.
Match Server Determines whether to use the Match Server for matching data.
• Check (select) this check box to use the Match Server for
matching data.
• Uncheck (clear) this check box if you do not want to use the
Match Server for matching data.
If an ORS has multiple associated Cleanse Match Servers, you can
enhance performance by configuring each Cleanse Match Server as
either a match-only or a cleanse-only server. Use this option in
conjunction with the Cleanse Server check box to implementation
this configuration.
Match Mode Mode that the Match Server uses for matching data. One of the
following values:
For details, see “Cleanse Requests” on page 408.

410 Siperian Hub Administrator Guide


Configuring Cleanse Match Servers

Property Description
Offline Determines whether the Cleanse Match Server is offline or online.
• Select (check) this check box to take the Cleanse Match Server
offline, making it temporarily unavailable. Once offline, no
cleanse jobs are sent to that Cleanse Match Server (servlet).
• Clear (uncheck) this check box to make an offline Cleanse Match
Server available again so that Siperian Hub can once again send
cleanse jobs to that Cleanse Match Server.
Note: Siperian Hub looks at this field but does not set it. Taking a
Cleanse Match Server offline is an administrative action.
Thread Count Overrides the default thread count. The default, recommended, value
is 1 thread. Thread counts are defined in the Sipeiran Hub Console
and can be changed without having to restart the server.
Note: You must change this value after migration from an earlier hub
version or all values will default to 1 thread.
CPU Rating Specifies a relative CPU performance rating for the host machine on
which this Cleanse Match Server runs. This rating is relevant only in
relation to CPU ratings for other host machines on which Cleanse
Match Servers are also running.

Adding a New Cleanse Match Server


To add a new Cleanse Match Server:
1. Start the Cleanse Match Server tool. To learn more, see “Starting the Cleanse
Match Server Tool” on page 409.
2. Acquire a write lock.To learn more, see “Acquiring a Write Lock” on page 30.
3. In the right pane of the Cleanse Match Server tool, click the button to add a
new Cleanse Match Server.
The Cleanse Match Server tool displays the Add/Edit Match Cleanse Server dialog
4. Set the properties for this new Cleanse Match Server. To learn more, see “Cleanse
Match Server Properties” on page 410.
If proxy users have been configured for your Siperian Hub implementation, see
“Cleanse Match Servers and Proxy Users” on page 408.
5. Click OK.

Configuring Data Cleansing 411


Configuring Cleanse Match Servers

6. Click the Save button to save your changes.

Editing Cleanse Match Server Properties


To edit Cleanse Match Server properties:
1. Start the Cleanse Match Server tool. To learn more, see “Starting the Cleanse
Match Server Tool” on page 409.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Select the Cleanse Match Server that you want to configure.
4. Click the button.
The Cleanse Match Server tool displays the Add/Edit Match Cleanse Server dialog
for the selected Cleanse Match Server tool.

5. Change the properties you want for this Cleanse Match Server. To learn more, see
“Cleanse Match Server Properties” on page 410.
If proxy users have been configured for your Siperian Hub implementation, see
“Cleanse Match Servers and Proxy Users” on page 408.
6. Click OK to apply your changes.
7. Click the Save button to save your changes.

412 Siperian Hub Administrator Guide


Configuring Cleanse Match Servers

Deleting a Cleanse Match Server


To delete a Cleanse Match Server:
1. Start the Cleanse Match Server tool. To learn more, see “Starting the Cleanse
Match Server Tool” on page 409.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the Cleanse Match Server that you want to delete.
4. Click the button.
5. The Cleanse Match Server tool prompts you to confirm deletion. Click OK to
delete the server.

Testing the Cleanse Match Server Configuration


Whenever you add or change your Cleanse Match Server information, it is
recommended that you check the configuration to make sure that the connection
works properly.

To test the Cleanse Match Server configuration:


1. Start the Cleanse Match Server tool. To learn more, see “Starting the Cleanse
Match Server Tool” on page 409.
2. Select the Cleanse Match Server that you want to test.
3. Click the button to test the configuration.

Configuring Data Cleansing 413


Using Cleanse Functions

If the test succeeds, the Cleanse Match Server tool displays a window showing the
connection information and a success message.

If there was a problem, Siperian Hub will display a window with information about
the connection problem.
4. Click OK.

Using Cleanse Functions


This section describes how to use cleanse functions to clean data in your Siperian Hub
implementation. To learn more, see “About Data Cleansing in Siperian Hub” on page
406.

About Cleanse Functions


In Siperian Hub, you can build and execute cleanse functions that cleanse data. A cleanse
function is a function that is applied to a data value in a record to standardize or verify it.
For example, if your data has a column for salutation, you could use a cleanse function
to standardize all instances of “Doctor” to “Dr.” You can apply cleanse functions
successively, or simply assign the output value to a column in the staging table.

Types of Cleanse Functions

In Siperian Hub, each cleanse function is one of the following types:


• a Siperian Hub-defined function
• a function defined by your cleanse engine
• a custom cleanse function you define

414 Siperian Hub Administrator Guide


Using Cleanse Functions

The pre-defined functions provide access to specialized cleansing functionality, such as


name and address standardization, address decomposition, gender determination, and
so on. To learn more, see “Using Cleanse Functions” on page 414.

Libraries

Functions are organized into libraries—Java libraries and user libraries, which are folders
used to organize the functions that you can use in the Cleanse Functions tool in the
Model workbench. To learn more, see “Configuring Cleanse Libraries” on page 418.

Cleanse Functions are Secure Resources

Cleanse functions can be configured as secure resources and made SECURE or


PRIVATE. To learn more, see “Securing Siperian Hub Resources” on page 841.

Available Functions Subject to Cleanse Engine

The functions you see in the Hub Console depend on the cleanse engine that you are
using. Siperian Hub shows the cleanse functions that your cleanse engine makes
available. Regardless of which cleanse engine you use, the overall process of data
cleansing in Siperian Hub is the same.

Starting the Cleanse Functions Tool


The Cleanse Functions tool provides the interface for defining how you cleanse your
data.

To start the Cleanse Functions tool:


• In the Hub Console, expand the Model workbench and then click Cleanse
Functions.

Configuring Data Cleansing 415


Using Cleanse Functions

The Cleanse Functions tool is divided into two panes:

Pane Description
Navigation pane Shows the cleanse functions in a tree view. Clicking on any node in the
tree shows you the appropriate properties page in the right-hand pane.
Properties pane Shows the properties for the selected function. For any of the custom
cleanse functions, you can edit properties in the right-hand pane.

The functions you see in the left pane depend on the cleanse engine you are using.
Your functions may differ from the ones shown in the previous figure.

Cleanse Function Types

Cleanse functions are grouped in the tree according to their type. Cleanse function
types are high-level categories that are used to group similar cleanse functions for
easier management and access.

Cleanse Function Properties

If you expand the list of cleanse function types in the navigation pane, you can select a
cleanse function to display its particular properties.

416 Siperian Hub Administrator Guide


Using Cleanse Functions

In addition to specific cleanse functions, the Misc Functions include Read Database
and Reject functions that provide efficiencies in data management.

Field Description
Read Database Allows a map to lookup records directly from a database table.
Note: This function is designed to be used when there are many
references to the same limited number of data items.
Reject Allows the creator of a map to identify incorrect data and reject the
record, noting the reason.

Overview of Configuring Cleanse Functions


To define cleanse functions, you complete the following tasks:
1. Start the Cleanse Functions tool according to the instructions in “Starting the
Cleanse Functions Tool” on page 415.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click Refresh to refresh your cleanse library.
4. Create your own cleanse library, which is simply a folder where you keep your
custom cleanse functions. See “Configuring Cleanse Libraries” on page 418.
5. Define regular expression functions in the new library, if applicable. See
“Configuring Regular Expression Functions” on page 422.
6. Define graph functions in the new library, if applicable. See “Configuring Graph
Functions” on page 424.

Configuring Data Cleansing 417


Using Cleanse Functions

7. Add cleanse functions to your graph function. See “Adding Functions to a Graph
Function” on page 427.
8. Test your functions. See “Testing Functions” on page 437.

Configuring Cleanse Libraries


You can configure either user libraries or Java libraries.

Configuring User Libraries

You can add a User Library when you want to create a customized cleanse function
from existing internal or external Siperian cleanse functions.

To add a user cleanse library:


1. Start the Cleanse Functions tool according to the instructions in “Starting the
Cleanse Functions Tool” on page 415.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click Refresh to refresh your cleanse library.
4. In the tree, select the Cleanse Functions node.
5. Right-click and choose Add User Library from the pop-up menu.
The Cleanse Functions tool displays the Add User Library dialog.

418 Siperian Hub Administrator Guide


Using Cleanse Functions

6. Specify the following properties:

Field Description
Name Unique, descriptive name for this library.
Description Optional description of this library.

7. Click OK.
The Cleanse Functions tool displays the new library you added in the list under
Cleanse libraries in the navigation pane.

Configuring Java Libraries

To add a Java cleanse library:


1. Start the Cleanse Functions tool according to the instructions in “Starting the
Cleanse Functions Tool” on page 415.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click Refresh to refresh your cleanse library.
4. In the tree, select the Cleanse Functions node.
5. Right-click and choose Add Java Library from the pop-up menu.
The Cleanse Functions tool displays the Add Java Library dialog.

6. Specify the JAR file for this library. You can click the Browse button to look for
the JAR file.

Configuring Data Cleansing 419


Using Cleanse Functions

7. Specify the following properties:

Field Description
Name Unique, descriptive name for this library.
Description Optional description of this library.

8. If applicable, click the Parameters button to specify any parameters for this
library.
The Cleanse Functions tool displays the parameters dialog.

You can add as many parameters as needed for this library.


• To add a parameter, click the button. The Cleanse Functions tool displays
the Add Value dialog.

Type a name and value, and then click OK.

420 Siperian Hub Administrator Guide


Using Cleanse Functions

• To import parameters, click the button. The Cleanse Functions tool


displays the Open dialog, prompting you to select a properties file containing
the parameter(s) you want.

The name, value pairs that are imported from the file will be available to the
user-defined Java function at run time as elements of its Java properties. This
allows you to provide customized values in a generic function, such as “userid”
or “target URL”.
9. Click OK.
The Cleanse Functions tool displays the new library in the list under Cleanse
libraries in the navigation pane.

To learn about adding graph functions to your library, see “Configuring Graph
Functions” on page 424.

Configuring Data Cleansing 421


Using Cleanse Functions

Configuring Regular Expression Functions


This section describes how to configure regular expression functions for your Siperian
Hub implementation.

About Regular Expression Functions

In Siperian Hub, a regular expression function allows you to use regular expressions for
cleanse operations. Regular expressions are computational expressions that are used to
match and manipulate text data according to commonly-used syntactic conventions
and symbolic patterns. To learn more about regular expressions, including syntax and
patterns, refer to the Javadoc for java.util.regex.Pattern. Alternatively, to define a graph
function instead, see “Configuring Graph Functions” on page 424.

Adding Regular Expression Functions

To add a regular expression function:


1. Start the Cleanse Functions tool according to the instructions in “Starting the
Cleanse Functions Tool” on page 415.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Right-click a User Library name and choose Add Regular Expression Function.
The Cleanse Functions tool displays the Add Regular Expression dialog.

422 Siperian Hub Administrator Guide


Using Cleanse Functions

4. Specify the following properties:

Field Description
Name Unique, descriptive name for this regular expression function.
Description Optional description of this regular expression function.

5. Click OK.
The Cleanse Functions tool displays the new regular expression function under the
user library in the list in the left pane, with the properties in the right pane.

Configuring Data Cleansing 423


Using Cleanse Functions

6. Click the Details tab.

7. If you want, specify an input or output expression by clicking the icon to edit
the field, entering a regular expression, and then clicking the icon to apply the
change.
8. Click the icon to save your changes.

Configuring Graph Functions


This section describes how to configure graph functions for your Siperian Hub
implementation.

About Graph Functions

In Siperian Hub, a graph function is a cleanse function that you can visualize and
configure graphically using the Cleanse Functions tool in the Hub Console. You can
add any pre-defined functions to a graph function. Alternatively, to define a regular
expression function, see “Configuring Regular Expression Functions” on page 422.

424 Siperian Hub Administrator Guide


Using Cleanse Functions

Inputs and Outputs

Graph functions have:


• one or more inputs (input parameters)
• one or more outputs (output parameters)

For each graph function, you must configure all required inputs and outputs. Inputs
and outputs have the following properties.

Field Description
Name Unique, descriptive name for this input or output.
Description Optional description of this input or output.
Data Type Data type. Must match exactly. One of the following values:
• Boolean—accepts Boolean values only
• Date—accepts date values only
• Float—accepts float values only
• Integer—accepts integer values only
• String—accepts any data

Adding Graph Functions

To add a graph function:


1. Start the Cleanse Functions tool according to the instructions in “Starting the
Cleanse Functions Tool” on page 415.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Right-click on a User Library name and choose Add Graph Function.

Configuring Data Cleansing 425


Using Cleanse Functions

The Cleanse Functions tool displays the Add Graph Function dialog.

4. Specify the following properties:

Field Description
Name Unique, descriptive name for this graph function.
Description Optional description of this graph function.

5. Click OK.
The Cleanse Functions tool displays the new graph function under the library in
the list in the left pane, with the properties in the right pane:

426 Siperian Hub Administrator Guide


Using Cleanse Functions

This graph function is empty. To configure it and add functions, see “Adding
Functions to a Graph Function” on page 427.

Adding Functions to a Graph Function

You can add as many functions as you want to a graph function. The example in this
section shows adding only a single function.

If you already have graph functions defined, you can treat them just like any other
function in the cleanse libraries. This means that you can add a graph function inside
another graph function. This approach allows you to reuse functions.

To add functions to a graph function:


1. Start the Cleanse Functions tool according to the instructions in “Starting the
Cleanse Functions Tool” on page 415.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click your graph function, and then click the Details tab to see the function
represented in graphical format.

Toolbar

Workspace

Configuring Data Cleansing 427


Using Cleanse Functions

The area in this tab is referred to as the workspace. You might need to resize the
window to see both the input and output on the workspace.

By default, graph functions have one input and one output that are of type string
(gray circle). The function that you are defining might require more inputs and/or
outputs and different data types. To learn more, see “Configuring Inputs” on page
434 and “Configuring Outputs” on page 435.
4. Right-click on the workspace and choose Add Function from the pop-up menu.
For more on the other commands on this pop-up menu, see “Workspace
Commands” on page 432. You can also add or delete these functions using the
toolbar buttons.
The Cleanse Functions tool displays the Choose Function to Add dialog.

428 Siperian Hub Administrator Guide


Using Cleanse Functions

5. Expand the folder containing the function you want to add, select the function to
add, and then click OK.
Note: The functions that are available for you to add depend on your cleanse
engine and its configuration. Therefore, the functions that you see might differ
from the cleanse functions shown in the previous figure.
The Cleanse Functions tool displays the added function in your workspace.

Note: Although this example shows a single graph function on the workspace, you
can add multiple functions to a cleanse function.
To move a function, click it and drag it wherever you need it on the workspace.

6. Right-click on the function and choose Expanded Mode.

Configuring Data Cleansing 429


Using Cleanse Functions

The expanded mode shows the labels for all available inputs and outputs for this
function.

For more on the modes, see “Function Modes” on page 432.


The color of the circle indicates the data type of the input or output. The data
types must match. In the following example, for the Round function, the input is a
Float value and the output is an Integer. Therefore, the Inputs and Outputs have
been changed to reflect the corresponding data types.

To learn more, see “Configuring Inputs” on page 434 and “Configuring Outputs”
on page 435.

430 Siperian Hub Administrator Guide


Using Cleanse Functions

7. Mouse-over the input connector, which is the little circle on the right side of the
input box. It turns red when ready for use.

8. Click the node and draw a line to one of the function input nodes.

9. Draw a line from one of the function output nodes to the output box node.

10. Click the button to save your changes. To learn about testing your new
function, see “Testing Functions” on page 437.

Configuring Data Cleansing 431


Using Cleanse Functions

Workspace Commands

There are several ways to complete common tasks on the workspace.


• One way is to use the buttons on the toolbar. To learn more about these buttons,
see “Workspace Buttons” on page 433.
• Another method to access many of the same features is to right-click on the
workspace. The right-click menu has the following commands:

Function Modes

Function modes determine how the function is displayed on the workspace. Each
function has the following modes, which are accessible by right-clicking the function:

Option Description
Compact Displays the function as a small box, with just the function name.
Standard Displays the function as a larger box, with the name and the nodes for the
input and output, but the nodes are not labeled. This is the default mode.
Expanded Displays the function as a large box, with the name, the input and output
nodes, and the names of those nodes.
Logging Used for debugging. Choosing this option generates a log file for this
Enabled function when you run a Stage job (see “Stage Jobs” on page 745). The log
file records the input and output for every time the function is called
during the stage job. There is a new log file created for each stage job.
The log file is named <jobID><graph function name>.log and is stored
in:
\Siperian\hub\cleanse\tmp\<ORS>
Note: Do not use this option in production, as it will consume disk space
and require performance overhead associated with the disk I/O. To disable
this logging, right-click on the function and uncheck Enable Logging.
Delete Object Deletes the function from the graph function.

You can cycle through the display modes (compact, standard, and expanded) by
double-clicking on the function.

432 Siperian Hub Administrator Guide


Using Cleanse Functions

Workspace Buttons

The toolbar on the right side of the workspace provides the following buttons.

Button Description
Save changes.

Edit the function inputs.

Edit the function outputs.

Add a function. To learn more, see “Adding Functions to a Graph Function” on


page 427.

Add a constant. To learn more, see “Using Constants” on page 433.

Add a conditional execution component. To learn more, see “Using Conditions in


Cleanse Functions” on page 438.

Edit the selected component.

Delete the selected component.

Expand the graph. This makes more room for the workspace on the screen by
hiding the left pane.

Using Constants

Constants are useful in cases where you know that you have standardized input.
For example, if you have a data set that you know consists entirely of doctors, then you
can use a constant to put Dr. in the title. When you use constants in your graph
function, they are differentiated visually from other functions by their grey background
color.

Configuring Data Cleansing 433


Using Cleanse Functions

Configuring Inputs

To add more inputs:


1. Start the Cleanse Functions tool according to the instructions in “Starting the
Cleanse Functions Tool” on page 415.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the cleanse function that you want to configure.
4. Click the Details tab.
5. Right-click on the input and choose Edit inputs.
The Cleanse Functions tool displays the Inputs dialog.

Note: Once you create an input, you cannot later edit the input to change its type.
If you must change the type of an input, create a new one of the correct type and
delete the old one.
6. Click the button to add another input.

434 Siperian Hub Administrator Guide


Using Cleanse Functions

The Cleanse Functions tool displays the Add Parameter dialog.

7. Specify the following properties:

Field Description
Name Unique, descriptive name for this parameter.
Data Type Data type of this parameter.
Description Optional description of this parameter.

8. Click OK.
Add as many inputs as you need for your functions.

Configuring Outputs

To add more outputs:


1. Start the Cleanse Functions tool according to the instructions in “Starting the
Cleanse Functions Tool” on page 415.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the cleanse function that you want to configure.
4. Click the Details tab.
5. Right-click on the output and choose Edit outputs.

Configuring Data Cleansing 435


Using Cleanse Functions

The Cleanse Functions tool displays the Outputs dialog.

Note: Once you create an output, you cannot later edit the output to change its
type. If you must change the type of an output, create a new one of the correct
type and delete the old one.
6. Click the button to add another output.
The Cleanse Functions tool displays the Add Parameter dialog.

Field Description
Name Unique, descriptive name for this parameter.
Data Type Data type of this parameter.
Description Optional description of this parameter.

436 Siperian Hub Administrator Guide


Using Cleanse Functions

7. Click OK.
Add as many outputs as you need for your functions.

Testing Functions
Once you have added and configured a graph or regular expression function, it is
recommended that you test it to make sure it is behaving as expected. This test process
mimics a single record coming into the function.

To test your function:


1. Start the Cleanse Functions tool according to the instructions in “Starting the
Cleanse Functions Tool” on page 415.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the cleanse function that you want to test.
4. Click the Test tab.
The Cleanse Functions tool displays the test screen.

5. For each input, specify the value that you want to test by clicking the cell in the
Value column and typing a value that complies with the data type of the input.
• For Boolean inputs, the Cleanse Functions tool displays a true/false
drop-down list.

Configuring Data Cleansing 437


Using Cleanse Functions

• For Calendar inputs, the Cleanse Functions tool displays a Calendar button
that you can click to select a date from the Date dialog.

6. Click Test.
If the test completed successfully, the output is displayed in the output section.

Using Conditions in Cleanse Functions


This section describes how to add conditions to graph functions.

About Conditional Execution Components

Conditional execution components are similar to the construct of a case (or switch)
statement in a programming language. The cleanse function evaluates the condition
and, based on this evaluation, applies the appropriate graph function associated with
the case that matches the condition. If no case matches the condition, then the default
case is used—the case flagged with an asterisk (*).

When to Use Conditional Execution Components

Conditional execution components are useful when, for example, you have segmented
data. Suppose a table has several distinct groups of data (such as customers and
prospects). You could create a column that indicated the group of which the record is a
member. Each group is called a segment. In this example, customers might have C in
this column. while prospects would have P. You could use a conditional execution
component to cleanse the data differently for each segment. If the conditional value
does not meet any of the conditions you specify, then the default case will be executed.

438 Siperian Hub Administrator Guide


Using Cleanse Functions

Adding Conditional Execution Components

To add a conditional execution component:


1. Start the Cleanse Functions tool according to the instructions in “Starting the
Cleanse Functions Tool” on page 415.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the cleanse function that you want to configure.
4. Right-click on the workspace and choose Add Condition.
The Cleanse Functions tool displays the Edit Condition dialog.

5. Click the button to add a value.


The Cleanse Functions tool displays the Add Value dialog.

Configuring Data Cleansing 439


Configuring Cleanse Lists

6. Enter a value for the condition. Using the customer and prospect example, you
would enter C or P. Click OK.
The Cleanse Functions tool displays the new condition in the list of conditions on
the left, as well as in the input box.
Add as many conditions as you require. You do need to specify a default
condition—the default case is automatically created when you create a new
conditional execution component. However, you can specify the default case with
the asterisk (*). The default case will be executed for all cases that are not covered
by the cases you specify.
7. Add as many functions as you require to process all of the conditions. To learn
more, see “Adding Functions to a Graph Function” on page 427.
8. For each condition—including the default condition—draw a link between the
input node to the input of the function. In addition, draw links between the
outputs of the functions and the output of your cleanse function.

Note: You can specify nested processing logic in graph functions. For example, you
can nest conditional components within other conditional components (such as nested
case statements). In fact, you can define an entire complex process containing many
conditional tests, each one of which contains any level of complexity as well.

Configuring Cleanse Lists


This section describes how to configure cleanse lists in your Siperian Hub
implementation.

About Cleanse Lists


A cleanse list is a logical grouping of string functions that are executed at run time in a
predefined order.

440 Siperian Hub Administrator Guide


Configuring Cleanse Lists

Adding Cleanse Lists


To add a new cleanse list:
1. Start the Cleanse Functions tool according to the instructions in “Starting the
Cleanse Functions Tool” on page 415.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click Refresh to refresh your cleanse library. Used with external cleanse engines.
Important: You must choose Refresh after acquiring a write lock and before
processing any records. Otherwise, your external cleanse engine will throw an
error.
4. Right-click your cleanse library in the list under Cleanse Functions and choose
choose Add Cleanse List.
The Cleanse Functions tool displays the Add Cleanse List dialog.

5. Specify the following properties:

Field Description
Name Unique, descriptive name for this cleanse list.
Description Optional description of this cleanse list.

6. Click OK.

Configuring Data Cleansing 441


Configuring Cleanse Lists

The Cleanse Functions tool displays the details pane for the new (empty) cleanse
list on the right side of the screen.

Editing Cleanse List Properties


New cleanse lists are empty lists. You need to edit the cleanse list to add match and
output strings.

To edit your cleanse list to add match and output strings:


1. Start the Cleanse Functions tool according to the instructions in “Starting the
Cleanse Functions Tool” on page 415.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the cleanse list that you want to configure.

442 Siperian Hub Administrator Guide


Configuring Cleanse Lists

The Cleanse Functions tool displays information about the cleanse list in the right
pane.

4. Change the display name and description in the right pane, if you want, by clicking
the Edit button next to a value that you want to change.
5. Click the Details tab.

Configuring Data Cleansing 443


Configuring Cleanse Lists

The Cleanse Functions tool displays the details for the cleanse list.

6. Click the button in the right hand pane.


The Cleanse Functions tool displays the Output String dialog.

7. Specify a search string, an output string, a match type, and click OK.
The search string is the input that you want to cleanse, resulting in the output
string.
Important: Siperian Hub will search through the strings in the order in which they
are entered. The order in which you specify the items can therefore affect the
results obtained. To learn more about the types of matches available, see “Types of
String Matches” on page 445.
Note: As soon as you add strings to a cleanse list, the cleanse list is saved.
The strings that you specified are shown in the Cleanse List Details section.

444 Siperian Hub Administrator Guide


Configuring Cleanse Lists

8. You can add and remove strings. You can also move string forward or backward in
the cleanse list, which affects their order in run-time execution sequence and,
therefore, the results obtained.
9. You can also specify the “Default value” for every input string that does not match
any of the search strings.
If you do not specify a default value, every input string that does not match a
search string is passed to the output string with no changes.

Types of String Matches

For the output string, you can specify one of the following match types:

Match Type Description


Exact Match Text string (for example, “IBM”).
Regular Expression Pattern using the syntax for regular expressions (for example,
“I.M.*” would match “IBM”, “IB Corp” and “IXM Inc.”) To
parse a name field that consists of first, middle, and last names,
you could use the following regular expression (\S+$) will give
you the last name no matter what name you give it.
The regular expression that is typed in as a parameter will be
used against the string and the matched output will be sent to
the outlet. You can also specify the group number to match an
inner group of the regular expression. Refer to the Javadoc for
java.util.regex.Pattern for the documentation on the regular
expression construction and how groups work.
SQL Match Pattern using the syntax for the LIKE operator in SQL (for
example, “I_M%” would match “IBM”, “IBM Corp” and “IXM
Inc.”)

Importing Match Strings

To import match strings (such as a file or a database table):


1. Click the button in the right hand pane.

Configuring Data Cleansing 445


Configuring Cleanse Lists

The Import Match Strings wizard opens.

2. Specify the connection properties for the source of the data and click Next.
The Cleanse Functions tool displays a list of tables available for import.

3. Select the table you want to import and click Next.

446 Siperian Hub Administrator Guide


Configuring Cleanse Lists

The Cleanse Functions tool displays a list of columns available for import.

4. Click the columns you want to import and click Next.


The Cleanse Functions tool displays a list of match strings available for import.

You can import the records of the sample data either as phrases (one entry for
each record) or as words (one entry for each word in each record). Choose whether
to import the match strings as words or phrases and then click Finish.

Configuring Data Cleansing 447


Configuring Cleanse Lists

The Cleanse List Details box is now populated with data from the specified source.

Note: The imported match strings are not part of the match list. To add them to
the match list, you need to move them to the Search Strings on the right hand side.
• To add match strings to the match list with the match string value in both the
Search String and Output String, select the strings in the Match Strings list, and
click the button.
• If you add match strings to the match list with an Output String value that you
want to define, simply click the record you added and specify a new Search and
Output String.
• To add all Match Strings to the match list, click the button.
• To clear all Match Strings from the match list, click the button.
• Repeat these steps until you have constructed a complete match list.

448 Siperian Hub Administrator Guide


Configuring Cleanse Lists

5. When you have finished changing the match list properties, click the button
to save your changes.

Importing Match Output Strings

To import match output strings, such as a file or a database table:


1. Click the button in the right hand pane.
The Import Match Output Strings wizard opens.

2. Specify the connection properties for the source of the data.


3. Click Next.

Configuring Data Cleansing 449


Configuring Cleanse Lists

The Cleanse Functions tool displays a list of tables available for import.

4. Select the table that you want to import.


5. Click Next.
The Cleanse Functions tool displays a list of columns available for import.

6. Select the columns that you want to import.


7. Click Next.

450 Siperian Hub Administrator Guide


Configuring Cleanse Lists

The Cleanse Functions tool displays a list of match strings available for import.

8. Click Finish.
The Cleanse List Details box is now populated with data from the specified source.
9. When you have finished changing the match list properties, click the button
to save your changes.

Configuring Data Cleansing 451


Configuring Cleanse Lists

452 Siperian Hub Administrator Guide


13
Configuring the Load Process

This chapter explains how to configure the load process in your Siperian Hub
implementation. For an introduction, see “Load Process” on page 299.

Chapter Contents
• Before You Begin
• Configuration Tasks for Loading Data
• Configuring Trust for Source Systems
• Configuring Validation Rules

453
Before You Begin

Before You Begin


Before you begin to configure the load process, you must have completed the following
tasks:
• Installed Siperian Hub and created the Hub Store according to the instructions in
the Siperian Hub Installation Guide for your platform
• Built the schema according to the instructions in Chapter 5, “Building the Schema”
• Defined source systems according to the instructions in “Configuring Source
Systems” on page 348
• Created landing tables according to the instructions in “Configuring Landing
Tables” on page 355
• Created staging tables according to the instructions in “Configuring Staging
Tables” on page 364
• Learned about the load process described in “Load Process” on page 299

Configuration Tasks for Loading Data


In addition to the prerequisites described in “Before You Begin” on page 454, to set up
the process of loading data in your Siperian Hub implementation, you must complete
the following tasks in the Hub Console:
• “Configuring Trust for Source Systems” on page 455
• “Configuring Validation Rules” on page 468

For additional configuration settings that can affect the load process, see:
• “Loading by RowID” on page 394
• “Distinct Systems” on page 595
• “Generate Match Tokens on Load” on page 104
• “Load Process” on page 299

454 Siperian Hub Administrator Guide


Configuring Trust for Source Systems

Configuring Trust for Source Systems


This section describes how to configure trust in your Siperian Hub implementation.
For an introduction, see “Trust Settings” on page 303.

About Trust
Several source systems may contain attributes that correspond to the same column in a
base object table. For example, several systems may store a customer’s address.
However, one system might be a more reliable source for that data than others. If these
systems disagree, then Siperian Hub must decide which value is the best one to use.

To help with comparing the relative reliability of column data from different source
systems, Siperian Hub allows you to configure trust for a column. Trust is a designation
the confidence in the relative accuracy of a particular piece of data. For each column
from each source, you can define a trust level represented by a number between 0 and
100, with zero being the least trustworthy and 100 being the most trustworthy. By
itself, this number has no meaning. It becomes meaningful only when compared with
another trust number to determine which is higher.

Trust takes into account the age of data, how much its reliability has decayed over time,
and the validity of the data. Trust is used to determine survivorship (when two records
are consolidated), and whether updates from a source system are sufficiently reliable to
update the master record.

Trust Levels

A trust level is a number between 0 and 100. By itself, this number has no meaning.
It has meaning only when compared with another trust number.

Data Reliability Decays Over Time

The reliability of data from a given source system can decay (diminish) over time. In
order to reflect this fact in trust calculations, Siperian Hub allows you to configure
decay characteristics for trust-enabled columns. The decay period is the amount of time
that it takes for the trust level to decay from the maximum trust level (see “Maximum

Configuring the Load Process 455


Configuring Trust for Source Systems

Trust” on page 459) to the minimum trust level (see “Minimum Trust” on page 459).
For more information, see “Units” on page 459, “Decay” on page 459, and “Graph
Type” on page 460.

Trust Calculations

The load process calculates trust for trust-enabled columns in the base object. For
records with trust-enabled columns, the load process assigns a trust score to cell data.
This trust score is initially based on the configured trust settings for that column.
The trust score may be subsequently downgraded when the load process applies
validation rules—if configured for a trust-enabled column—after the trust calculations.
For more information, see “Run-time Execution Flow of the Load Process” on page
304.

Trust Calculations for Load Update Operations

During the load process, if a record in the staging table will be used for a load update
operation, and if that record contains a changed cell value in a trust-enabled column,
the load process calculates trust scores for:
• the cell data in the source record in the staging table (which contains the updated
information)
• the cell data in the target record in the base object (which contains the existing
information)

If the cell data in the source record has a higher trust score than the cell data in the
target record, then Siperian Hub updates the cell in the base object record with the cell
data in the staging table record.

Trust Calculations When Consolidating Two Base Object Records

When two records in a base object are consolidated, Siperian Hub calculates the trust
score for each trusted column in the two records being merged. Cells with the highest
trust scores survive in the final consolidated record. If the trust scores are the same,
then Siperian Hub compares records according to an order of precedence, as described
in “Survivorship and Order of Precedence” on page 291.

456 Siperian Hub Administrator Guide


Configuring Trust for Source Systems

Control Tables for Trust-Enabled Columns

The following figure shows control tables associated with trust-enabled columns in a
base object.

For each trust-enabled column in a base object record, Siperian Hub maintains a record
in a corresponding control table that contains the last update date and an identifier of
the source system. Based on these settings, Siperian Hub can always calculate the
current trust for the column value.

If history is enabled for a base object, Siperian Hub also maintains a separate history
table for the control table, in addition to history tables for the base object and its
cross-reference table.

Configuring the Load Process 457


Configuring Trust for Source Systems

Cell Values in Base Object Records and Cross-Reference Records

The cross-reference table for a base object contains the most recent value from each
source system. By default (without trust settings), the base object contains the most
recent value no matter which source system it comes from.

For trust-enabled columns, the cell value in a base object record might not have the
same value as its corresponding record in the cross-reference table. Validation rules,
which are run during the load process after trust calculations, can downgrade trust for
a cell so that a source that had previously provided the cell value might not update the
cell. For more information about validation rules, see “Configuring Validation Rules”
on page 468.

Overriding Trust Scores

Data stewards can manually override a calculated trust setting if they have direct
knowledge that a particular value is correct. Data stewards can also enter a value
directly into a record in a base object. For more information, see the Siperian Hub Data
Steward Guide.

Trust for State-Enabled Base Objects

For state-enabled base objects, trust is calculated for records with a PENDING or
ACTIVE state, but records with a DELETE state are ignored. For more information,
see Chapter 7, “State Management.”

Batch Job Constraints on Number of Trust-Enabled Columns

Synchronize batch jobs can fail for base objects with a large number of trust-enabled
columns. Similarly, Automerge jobs can fail if there is a large number of trust-enabled
or validation-enabled columns. The exact number of columns that cause the job to fail
is variable and is based on the length of the column names and the number of
trust-enabled columns (or, for Automerge jobs, validation-enabled columns as well).
Long column names are at—or close to—the maximum allowable length of 26
characters. To avoid this problem, keep the number of trust-enabled columns below 60
and/or the length of the column names short. A work around is to enable all

458 Siperian Hub Administrator Guide


Configuring Trust for Source Systems

trust/validation columns before saving the base object to avoid running the
synchronization job.

Trust Properties
This section describes the trust properties that you can configure for trust-enabled
columns. Trust properties are configured separately for each source system that could
provide records for trust-enabled columns in a base object.

Maximum Trust

The maximum trust (starting trust) is the trust level that a data value will have if it has
just been changed. For example, if source system X changes a phone number field
from 555-1234 to 555-4321, the new value will be given system X’s maximum trust
level for the phone number field. By setting the maximum trust level relatively high,
you can ensure that changes in the source systems will usually be applied to the base
object.

Minimum Trust

The minimum trust is the trust level that a data value will have when it is old (after the
decay period has elapsed). This value must be less than or equal to the maximum trust.

Note: If the maximum and minimum trust are equal, then the decay curve is a flat line
and the decay period and decay type have no effect.

Units

Specifies the units used in calculating the decay period—day, week, month, quarter, or
year.

Decay

Specifies the number (of days, weeks, months, quarters, or years) used in calculating the
decay period.

Configuring the Load Process 459


Configuring Trust for Source Systems

Note: For the best graph view, limit the decay period you specify to between 1 and
100.

Graph Type

Decay follows a pattern in which the trust level decreases during the decay period. The
graph types show these decay patterns have any of the following settings.

Icon Graph Type Description


Linear Simplest decay. Decay follows a straight line from the maximum trust
to the minimum trust.

Rapid Initial Most of the decrease occurs toward the beginning of the decay period.
Slow Later Decay follows a concave curve. If a source system has this graph
(RISL) type, then a new value from the system will probably be trusted, but
this value will soon become much more likely to be overridden.
Slow Initial Most of the decrease occurs toward the end of the decay period.
Rapid Later Decay follows a convex curve. If a source system has this graph type,
(SIRL) it will be relatively unlikely for any other system to override the value
that it sets until the value is near the end of its decay period.

Test Offset Date

By default, the start date for trust decay shown in the Trust Decay Graph is the current
system date. To see the impact of trust decay based on a different start date for a given
source system, specify a different test offset date according to the instructions in
“Changing the Offset Date for a Trust-Enabled Column” on page 466.

Considerations for Setting Trust Values


Choosing the correct trust values can be a complex process. It is not enough to
consider one system in isolation. You must ensure that the combinations of trust
settings for all of the source systems that contribute to a particular column produce the
behavior that you want. Trust levels for a source system are not absolute—they are

460 Siperian Hub Administrator Guide


Configuring Trust for Source Systems

meaningful only in relation to the trust levels of other source systems that contribute
data for the trust-enabled column.

When determining trust, consider the following questions.


• Does the source system validate this data value? How reliably does it do this?
• How important is this data value to the users of the source system, as compared
with other data values? Users are likely to put the most effort into validating the
data that is central to their work.
• How frequently is the source system updated?
• How frequently is a particular attribute likely to be updated?

Enabling Trust for a Column


Trust is enabled and configured on a per-column basis for base objects in the Schema
Manager. Trust does not apply to columns in dependent objects or any other tables in
an ORS. For more information, see “Configuring Columns in Tables” on page 125.
Select to enable trust

Trust is disabled by default. When trust is disabled, Siperian Hub uses the value from
the most recently-executed load process regardless of which source system it comes
from. If column data for a base object comes from only one system, then trust should
remain disabled for that column.

Trust should be enabled, however, for columns in which data can come from multiple
source systems. If you enable trust for a column, you also assign trust levels to specify
the relative reliability of any source systems that could provide records that update the
column.

Configuring the Load Process 461


Configuring Trust for Source Systems

Assigning Trust Levels to Trust-Enabled Columns


This section describes how to configure trust levels for trust-enabled columns.
Assigning Trust Levels to the Admin Source System

Before You Configure Trust for Trust-Enabled Columns

Before you configure trust for trust-enabled columns, you must have:
• enabled trust for base object columns according to the instructions in “Enabling
Trust for a Column” on page 461
• configured staging tables in the Schema Manager, including associated source
systems and staging table columns that correspond to base object columns,
according to the instructions in “Configuring Staging Tables” on page 364

Specifying Trust for the Administration Source System

At a minimum, you must specify trust settings for trust-enabled columns in the
administration source system (called Admin by default). This source system represents
manual updates that you make within Siperian Hub. This source system can contribute
data to any trust-enabled column. Set the trust settings for this source system to high
values (relative to other source systems) to ensure that manual updates override any
existing values from other source systems. For more information, see “Administration
Source System” on page 349.

Assigning Trust Levels to Trust-Enabled Columns in a Base


Object

To assign trust levels to trust-enabled columns in a base object:

462 Siperian Hub Administrator Guide


Configuring Trust for Source Systems

1. Start the Systems and Trust tool according to the instructions in “Starting the
Systems and Trust Tool” on page 350.

Navigation Pane Properties Pane


2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the navigation pane, expand the Trust node.
The Systems and Trust tool displays all base objects with trust-enabled columns.

4. Select a base object.

Configuring the Load Process 463


Configuring Trust for Source Systems

The Systems and Trust tool displays a read-only view of the trust-enabled columns
in the selected base object, indicating with a check mark whether a given source
system supplies data for that column.

Trust-Enabled Columns Source Systems

Note: The association between trust-enabled columns and source systems is


specified in the staging tables for this base object. For more information, see
“Configuring Staging Tables” on page 364.
5. Expand a base object to see its trust-enabled columns.

6. Select the trust-enabled column that you want to configure.

464 Siperian Hub Administrator Guide


Configuring Trust for Source Systems

For the selected trust-enabled column, the Systems and Trust tool displays the list
of source systems associated with the column, along with editable trust settings to
be configured per source system, and a trust decay graph.

Source Systems Trust Settings Trust Decay Graph

7. Specify the trust properties for each column. For more information, see “Trust
Properties” on page 459.
8. Optionally, you can change the offset date, as described as “Changing the Offset
Date for a Trust-Enabled Column” on page 466.
9. Click the button to save your changes.

Configuring the Load Process 465


Configuring Trust for Source Systems

The Systems and Trust tool refreshes the Trust Decay Graph based on the trust
settings you specified for each source system for this trust-enabled column.

The X-axis is the trust score and the Y-axis is the time.

Changing the Offset Date for a Trust-Enabled Column

By default, the Trust Decay Graph shows the trust decay across all source systems
from the current system date. You can specify a different date (such as a future date) to
test your current trust settings and see how trust would decay from that date. Note that
offset dates are not saved.

To change the offset date for a trust-enabled column:


1. In the Systems and Trust tool, select a trust-enabled column according to the
instructions in “Assigning Trust Levels to Trust-Enabled Columns in a Base
Object” on page 462.
2. Click the Calendar button next to the source system for which you want to
specify a different offset date.

466 Siperian Hub Administrator Guide


Configuring Trust for Source Systems

The Systems and Trust tool prompts you to specify a date.

3. Select a different date.


4. Choose OK.
The Systems and Trust tool updates the Trust Decay Graph based on your current
trust settings and the Offset Date you specified.

To remove the Offset Date:


• Click the Delete button next to the source system for which you want to
remove the Offset Date.
The Systems and Trust tool updates the Trust Decay Graph based on your current
trust settings and the current system date.

Running Synchronize Batch Jobs After Changes to Trust


Settings

After records have been loaded into a base object, if you enable trust for any column,
or if you change trust settings for any trust-enabled column(s) in that base object, then
you must run the Synchronize batch job (see “Synchronize Jobs” on page 747) before
running the consolidation process. If this batch job is not run, then errors will occur
during the consolidation process.

Configuring the Load Process 467


Configuring Validation Rules

Configuring Validation Rules


This section describes how to configure validation rules in your Siperian Hub
implementation. For an introduction, see “Validation Rules” on page 304.

About Validation Rules


A validation rule downgrades trust for a cell value when the cell value matches a given
condition. Each validation rule specifies:
• a condition that determines whether the cell value is valid
• an action to take if the condition is met (downgrade trust by a certain percentage)

For example, the following validation rule:


Downgrade trust on First_Name by 50% if Length < 3’

consists of:
• Condition: Length < 3
• Action: Downgrade trust on First_Name by 50%

If the Reserve Minimum Trust flag is set for the column, then the trust cannot be
downgraded below the column’s minimum trust. You use the Schema Manager to
configure validation rules for a base object.

Validation rules are executed during the load process, after trust has been calculated for
trust-enabled columns in the base object. If validation rules have been defined, then
the load process applies them to determine the final trust scores, and then uses the
final trust values to determine whether to update records in the base object with cell
data from the updated records. For more information, see “Run-time Execution Flow
of the Load Process” on page 304.

Validation Checks

A validation check can be done on any column in a base object. The downgrade resulting
from the validation check can be applied to the same column, as well as to any other

468 Siperian Hub Administrator Guide


Configuring Validation Rules

columns that can be validated. Invalid data in one column can therefore result in trust
downgrades on many columns.

For example, supposed you used an address verification flag in which the flag is OK if
the address is complete and BAD if the address is not complete. You could configure a
validation rule that downgrades the trust on all address fields if the verification flag is
not OK. Note that, in this case, the verification flag should also be downgraded.

Required Columns

Validation rules are applied regardless of the source of the incoming data. However,
validation rules are applied only if the staging table or if the input—a Services
Integration Framework (SIF) request—contains all of the required columns. If any
required columns are missing, validation rules are not applied.

Recalculating Trust Scores After Changing Validation Rules

If a base object contains existing data and you change validation rules, you must run
the Revalidate job to recalculate trust scores for new and existing data, as described in
“Revalidate Jobs” on page 745.

Validation Rules and State-Enabled Base Objects

For state-enabled base objects, validation rules are applied to records with a
PENDING or ACTIVE state, but records with a DELETE state are ignored. For
more information, see Chapter 7, “State Management.”

Automerge Job Constraints on Number of Validation Columns

Automerge jobs can fail if there is a large number of validation-enabled columns.


The exact number of columns that cause the job to fail is variable and is based on the
length of the column names and the number of validation-enabled columns. Long
column names are at—or close to—the maximum allowable length of 26 characters.
To avoid this problem, keep the number of validation-enabled columns below 60
and/or the length of the column names short. A work around is to enable all

Configuring the Load Process 469


Configuring Validation Rules

trust/validation columns before saving the base object to avoid running the
synchronization job.

Enabling Validation Rules for a Column


A validation rule is enabled and configured on a per-column basis for base objects in
the Schema Manager. Validation rules do not apply to columns in dependent objects or
any other tables in an ORS. For more information, see “Configuring Columns in
Tables” on page 125.
Select to enable validation rules

Validation rules are disabled by default. Validation rules should be enabled, however,
for any trust-enabled columns that will use validation rules for trust downgrades.

How the Downgrade Percentage is Applied

Validation rules downgrade trust scores according to the following algorithm:


Final trust = Trust - (Trust * Validation_Downgrade / 100)

For example, with a validation downgrade percentage of 50%, and a trust level
calculated at 60:
Final Trust Score = 60 - (60 * 50 / 100)

The final trust score is:


Final Trust Score = 60 - 30 = 30

470 Siperian Hub Administrator Guide


Configuring Validation Rules

Execution Sequence of Validation Rules

Validation rules are executed in sequence. If multiple validation rules are configured for
a column, only one validation rule—the rule with the greatest downgrade
percentage—is applied to the column. Downgrade percentages are not
cumulative—rather, the “winning” validation rule overwrites any previous-applied
changes.

Therefore, when configuring multiple validation rules for a column, specify an


execution order of increasing downgrade percentage, starting with the validation rule
that has the lowest impact (downgrade percentage) first, and ending with the validation
rule that has the highest impact (downgrade percentage) last.

Note: The execution sequence for validation rules differs between the load process
described in this chapter and PUT requests invoked by external applications using the
Services Integration Framework (SIF). For PUT requests, validation rules are executed
in order of decreasing downgrade percentage. For more information, see the Siperian
Services Integration Framework Guide and the Siperian Hub Javadoc.

Navigating to the Validation Rules Node


To configure validation rules, you navigate to the Validation Rules node for a base
object in the Schema Manager:
1. Start the Schema Manager according to the instructions in “Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Expand the tree for the base object that you want to configure, and then click its
Validation Rules Setup node.

Configuring the Load Process 471


Configuring Validation Rules

The Schema Manager displays the Validation Rules editor.

List of Validation Rules Properties Pane


The Validation Rules editor is divided into the following sections.

Pane Description
Number of Rules Number of configured validation rules for the selected base object.
Validation Rules List of configured validation rules for the selected base object.
Properties Pane Properties for the selected validation rule. For more information, see
“Validation Rule Properties” on page 473.

472 Siperian Hub Administrator Guide


Configuring Validation Rules

Validation Rule Properties


Validation rules have the following properties.

Rule Name

A unique, descriptive name for this validation rule.

Rule Type

The type of validation rule. One of the following values.

Rule Type Description


Existence Check Trust will be downgraded if the cell has a null value (the cell value
does not exist).
Domain Check Trust will be downgraded if the cell value does not fall within a list or
range of allowed values.
Referential Integrity Trust will be downgraded if the value in a cell does not exist in the set
of values in a column on a different table. This rule is for use in cases
where an explicit foreign key has not been defined, and an incorrect
cell value can be allowed if there is no correct cell value that has
higher trust.
Pattern Validation Trust will be downgraded if the value in a cell conforms (LIKE) or
does not conform (NOT LIKE) to the specified pattern.
Custom Used for entering complex validation rules. This rule type should
only be used when SQL functions (such as LENGTH, ABS, etc.)
might be required, or if a complex join is required.
Note: Custom SQL code must conform with the SQL syntax for
your database platform. SQL entered in this pane is not validated at
design time. Invalid SQL syntax errors cause problems when the load
process executes.

Configuring the Load Process 473


Configuring Validation Rules

Rule Columns

For each column, you specify the downgrade percentage and whether to reserve
minimum trust.

Downgrade Percentage

Percentage by which the trust level of the specified column will be decreased if this
validation rule condition is met. The larger the percentage, the greater the downgrade.
For example, 0% has no effect on the trust, while 100% downgrades the trust
completely (unless the reserve minimum trust is specified, in which case 100%
downgrades the trust so that it equals minimum trust).

If trust is downgraded by 100% and you have not enabled minimum reserve trust for
the column, then the value of that column will not be populated into the base object.

Reserve Minimum Trust

Specifies what will happen if the downgrade causes the trust level to fall below the
column’s minimum trust level. You can retain the minimum trust (so that the trust level
will be reduced to the minimum trust but no lower). If this box is cleared (unchecked),
then the trust level will be reduced by the specified percentage even if this means going
below the minimum trust.

Rule SQL

Specifies the SQL WHERE clause representing the condition for this validation rule.
During the load process, the validation rule is executed. If data meets the criteria
specified in the Rule SQL field, then the trust value is downgraded by the downgrade
percentage configured for this validation rule.

474 Siperian Hub Administrator Guide


Configuring Validation Rules

SQL WHERE Clause Based on the Rule Type

The Validation Rules editor prompts you to configure the SQL WHERE clause based
on the selected Rule Type for this validation rule.

Expression

List of Table Columns

During the load process, this query is used to check the validity of the data in the
staging table.

Configuring the Load Process 475


Configuring Validation Rules

Example SQL WHERE Clauses

The following table provides examples of SQL WHERE clauses based on the selected
rule type.
Examples of WHERE Clause for Each Rule Type
Rule Type WHERE clause Examples Result
Existence WHERE WHERE S.MIDDLE_ Affected columns will be
Check S.ColumnName IS NAME IS NULL downgraded for records
NULL with middle names that are
null. The records that do
not meet the condition will
not be affected.
Domain WHERE WHERE S.Gender Affected columns will be
Check S.ColumnName IN NOT IN ('M', 'F', downgraded if the Gender
('?', '?', '?') 'U') is any value other than M,
F, or U.
Referential WHERE NOT EXISTS WHERE NOT EXISTS Affected columns will be
Integrity (SELECT (SELECT DISTINCT downgraded for records
<blank>’a’ FROM ? 'a' FROM ACCOUNT_ with Account Type values
WHERE ?.? = TYPE WHERE that are not on the Account
S.<Column_Name> ACCOUNT_
TYPE.Account_Type Type table.
WHERE NOT EXISTS = S.Account_Type
(SELECT <blank>
'a' FROM <Ref_
Table> WHERE
<Ref_Table>.<Ref_
Column> =
S.<Column_Name>
Pattern WHERE WHERE S.eMail_ Downgrade will be applied
Validation S.ColumnName LIKE Address NOT LIKE if the e-mail address does
'Pattern' '%@%' not contain an @ character.
Custom WHERE WHERE Downgrade will be applied
LENGTH(S.ZIP_ if the length of the zip code
CODE) > 4 column is less than 4.

476 Siperian Hub Administrator Guide


Configuring Validation Rules

Table Aliases and Wildcards

You can use the wildcard character (*) to reference tables via an alias.
• s.* aliases the staging table
• I.* aliases a temporary table and provides ROWID_OBJECT, PKEY_SRC_
OBJECT, and ROWID_SYSTEM information for the records being updated.

Custom Rule Types and SQL WHERE Syntax

For Custom rule types, write SQL statements that are well formed and well tuned. If
you need more information about SQL WHERE clause syntax and wild card patterns,
refer to the product documentation for the database platform used in your Siperian
Hub implementation.

Note: Be sure to specify precedence correctly using parentheses according to the SQL
syntax for your database platform. Incorrect or omitted parentheses can have
unexpected results and long-running queries. For example, the following statement is
ambiguous and leaves it up to the database server to determine precedence:
WHERE conditionA OR conditionB or conditionC

The following statements use parentheses to explicitly specify precedence:


WHERE (conditionA AND conditionB) OR conditionC
WHERE conditionA AND (conditionB OR conditionC)

These two statements will yield very different results when evaluating records.

Configuring the Load Process 477


Configuring Validation Rules

Adding Validation Rules


To add a validation rule:
1. Navigate to the Validation Rules editor. For more information, see “Navigating to
the Validation Rules Node” on page 471.
2. Click the button.
The Schema Manager displays the Add Validation Rule dialog.

3. Specify the properties for this validation rule. For more information, see
“Validation Rule Properties” on page 473.
4. If you want, select the rule column(s) for this validation rule by clicking the
button.

478 Siperian Hub Administrator Guide


Configuring Validation Rules

The Validation Rules editor displays the Select Rule Columns dialog.

The available columns are those that have the Validate flag enabled (see “Column
Properties” on page 127. For more information, see “Configuring Columns in
Tables” on page 125.
Select the column(s) for which the trust level will be downgraded if the condition
specified in the WHERE clause for this validation rule is met, and then click OK.
5. Click OK.
The Schema Manager adds the new rule to the list of validation rules.
Note: If a base object contains existing data and you change validation rules, you
must run the Revalidate job to recalculate trust scores for new and existing data, as
described in “Revalidate Jobs” on page 745.

Configuring the Load Process 479


Configuring Validation Rules

Editing Validation Rule Properties


To edit a validation rule:
1. Navigate to the Validation Rules editor in the Schema Manager. For more
information, see “Navigating to the Validation Rules Node” on page 471.
2. In the Navigation Rules list, select the navigation rule that you want to configure.
The Validation Rules editor displays the properties for the selected validation rule.

3. Specify the editable properties for this validation rule. You cannot change the rule
type. For more information, see “Validation Rule Properties” on page 473.
4. If you want, select the rule column(s) for this validation rule by clicking the
button.

480 Siperian Hub Administrator Guide


Configuring Validation Rules

The Validation Rules editor displays the Select Rule Columns dialog.

The available columns are those that have the Validate flag enabled (see “Column
Properties” on page 127. For more information, see “Configuring Columns in
Tables” on page 125.
Select the column(s) for which the trust level will be downgraded if the condition
specified in the WHERE clause for this validation rule is met, and then click OK.
5. Click the button to save changes.
Note: If a base object contains existing data and you change validation rules, you
must run the Revalidate job to recalculate trust scores for new and existing data, as
described in “Revalidate Jobs” on page 745.

Changing the Sequence of Validation Rules


The execution order for validation rules is extremely important. For more information,
see “Execution Sequence of Validation Rules” on page 471.

Use the following buttons to change the sequence of validation rules in the list.

Click To....
Move the selected validation rule higher in the sequence.

Move the selected validation rule further down in the sequence.

Configuring the Load Process 481


Configuring Validation Rules

Removing Validation Rules


To remove a validation rule:
1. Navigate to the Validation Rules editor in the Schema Manager. For more
information, see “Navigating to the Validation Rules Node” on page 471.
2. In the Validation Rules list, select the validation rule that you want to remove.
3. Click the button.
The Schema Manager prompts you to confirm deletion.
4. Click Yes.
Note: If a base object contains existing data and you change validation rules, you
must run the Revalidate job to recalculate trust scores for new and existing data, as
described in “Revalidate Jobs” on page 745.

482 Siperian Hub Administrator Guide


14
Configuring the Match Process

This chapter describes how to configure your Hub Store to identify and handle
potential duplicate records. For an introduction to the match process, see “Match
Process” on page 317.

Chapter Contents
• Configuration Tasks for the Match Process
• Navigating to the Match/Merge Setup Details Dialog
• Configuring Match Properties for a Base Object
• Configuring Match Paths for Related Records
• Configuring Match Columns
• Configuring Match Rule Sets
• Configuring Match Column Rules for Match Rule Sets
• Configuring Primary Key Match Rules
• Investigating the Distribution of Match Keys
• Excluding Records from the Match Process

483
Before You Begin

Before You Begin


Before you begin, you must have installed Siperian Hub, created the Hub Store
according to the instructions in Siperian Hub Installation Guide, and built the schema
according to the instructions in Chapter 5, “Building the Schema.”

Configuration Tasks for the Match Process


This section provides an overview of the configuration tasks associated with the match
process. For an introduction to the match process, see “Match Process” on page 317.

Understanding Your Data


Before you define match rules, you must be very familiar with your data and
understand:
• the distribution of the values in the columns you intend to use to determine
duplicate records, and
• the general proportion of the total number of records that are duplicates.

Base Object Properties Associated with the Match Process


The following base object properties affect the behavior of the match process.

Property Description
Duplicate Match Used only with the Match for Duplicate Data job for initial data loads.
Threshold For more information, see “Duplicate Match Threshold” on page 103.
Max Elapsed Match Timeout (in minutes) when executing a match rule. If exceeded, the
Minutes match process exits. For more information, see “Max Elapsed Match
Minutes” on page 103.
Match Flag audit If enabled, then an audit table (BusinessObjectName_FMHA) is created
table and populated with the userID of the user who, in Merge Manager,
queued a manual match record for automerging. For more information,
see “Match Flag Audit Table” on page 105 and the Siperian Hub Data
Steward Guide.

484 Siperian Hub Administrator Guide


Configuration Tasks for the Match Process

Configuration Steps for Defining Match Rules


To define match rules:
1. Configure the match properties for the base object. For more information, see
“Setting Match Properties” on page 488.
2. Define your match columns. For more information, see “Match Columns Depend
on the Search Strategy” on page 515.
3. Define a match rule set for your match rules. For more information, see “Adding
Match Rule Sets” on page 538.
4. Define your match rules for the rule set. For more information, see “Adding Match
Column Rules” on page 565.
5. Repeat steps 3 and 4 until you are finished creating match rules.
6. Based on your knowledge of your data, determine whether you require matching
based on primary keys. For more information, see “Configuring Primary Key
Match Rules” on page 578.
7. If your data is appropriate for primary key matching, create your primary key
match rules. For more information, see “Adding Primary Key Match Rules” on
page 578.
8. Tune your rules. This is an iterative process by which you apply your match rules
to a representative data set, analyze the results, and adjust your settings to optimize
the match performance.

Configuring Base Objects with International Data


Siperian Hub supports matching for base objects that contain data from non-United
States populations, as well as base objects that contain data from different populations
(for example, the United States and China). For more information, see “Configuring
Match Settings for Non-US Populations” on page 941.

Configuring the Match Process 485


Navigating to the Match/Merge Setup Details Dialog

Navigating to the Match/Merge Setup Details Dialog


To set up the match and merge process for a base object, begin by completing the
following steps:
1. Start the Schema Manager. For more information, see “Starting the Schema
Manager” on page 90.
2. In the schema navigation tree, expand the base object for which you want to define
match properties.
3. In the schema navigation tree, select Match/Merge Setup.
The Schema Manager displays the Match/Merge Setup Details dialog, as shown in
the following example.

If you want to change settings, you need to Acquire a write lock according to the
instructions in “Acquiring a Write Lock” on page 30.

486 Siperian Hub Administrator Guide


Navigating to the Match/Merge Setup Details Dialog

The Match/Merge Setup Details dialog contains the following tabs:

Tab Name Description


Properties Summarizes the match/merge setup and provides various configurable
match/merge settings. For more information, see “Configuring Match
Properties for a Base Object” on page 488.
Paths Allows you to configure the match path for parent/child relationships
for records in different base objects or in the same base object. For
more information, see “Configuring Match Paths for Related Records”
on page 497.
Match Columns Allows you to configure match columns for match column rules.
To learn more, see “Configuring Match Columns” on page 515 and
“Configuring Match Column Rules for Match Rule Sets” on page 542.
Match Rule Sets Allows you to define a search strategy and rules using match rule sets.
For more information, see “Configuring Match Rule Sets” on page 531.
Primary Key Match Allows you to define primary key match rules. For more information,
Rules see “Configuring Primary Key Match Rules” on page 578.
Match Key Shows the distribution of match keys. For more information, see
Distribution “Investigating the Distribution of Match Keys” on page 583.
Merge Settings Allows you to merge and link settings. For more information, see
Chapter 15, “Configuring the Consolidate Process.”

Configuring the Match Process 487


Configuring Match Properties for a Base Object

Configuring Match Properties for a Base Object


You must set the match properties for a base object before you can configure other
match features, such as match columns and match rules. These match properties apply
to all rules for the base object.

Setting Match Properties


You configure match properties for each base object. These settings apply to all of its
match rules and rule sets.

To configure match properties for a base object:


1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base
object that you want to configure according to the instructions in “Navigating to
the Match/Merge Setup Details Dialog” on page 486.
2. In the Match/Merge Setup Details pane, click the Properties tab.
The Schema Manager displays the Properties tab.

488 Siperian Hub Administrator Guide


Configuring Match Properties for a Base Object

3. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on


page 30.

For a description of each property, see the next section, “Match Properties” on
page 490.
4. Edit the property settings that you want to change, clicking the Edit button
next to the field if applicable.
5. Click the Save button to save your changes.

Configuring the Match Process 489


Configuring Match Properties for a Base Object

Match Properties
This section describes the configuration settings on the Match Properties tab.

Calculated, Read-Only Fields

The Match Properties tab displays the following read-only fields.


Read-Only Match Properties
Property Description
Match Columns Number of match columns configured for this base object.
Read-only.
Match Rule Sets Number of match rule sets configured for this base object.
Read-only.
Match Rules in Active Set Number of match rules configured for this base object in the
rule set currently selected as active. Read-only.
Primary key match rules Number of primary key match rules configured for this base
object. Read-only.

Maximum Matches for Manual Consolidation

This setting helps prevent data stewards from being overwhelmed with thousands of
matches for manual consolidation. This sets the limit on the list of possible matches
that must be decided upon by a data steward (default is 1000). Once this limit is
reached, Siperian Hub stops the match process until the number of records for manual
consolidation has been reduced.

This value is calculated by checking the count of records with a consolidation_ind=2.


At the end of each automatch and merge cycle, this count is checked and, if the count
exceeds the maximum number of matches for manual consolidation, then the
automatch-and-merge process will exit.

490 Siperian Hub Administrator Guide


Configuring Match Properties for a Base Object

Number of Rows per Match Job Batch Cycle

This setting specifies an upper limit on the number of records that Siperian Hub will
process for matching during match process execution (Match or Auto Match and
Merge jobs). When the match process starts executing, it begins by flagging records to
be included in the match job batch. From the pool of new/unconsolidated records that
are ready for match (CONSOLIDATION_IND=4, as described in “Consolidation
Indicator” on page 289), the match process changes CONSOLIDATION_IND to 3.
The number of records flagged is determined by the Number of Rows per Match Job
Batch Cycle. The match process then matches those records in the match job batch
against all of the records in the base object.

The number of records in the match job batch affects how long the match process
takes to execute. The value to specify depends on the size of your data set, the
complexity of your match rules, and the length of the time window you have available
to run the match process. The default match batch size is low (10). You increase this
based on the number of records in the base object, as well as the number of matches
generated for those records based on its match rules.
• The lower your match batch size, the more times you will need to run the match
and consolidation processes.
• The higher your match batch size, the more work each match and consolidation
process does.

For each base object, there is a medium ground where you reach the optimal match
batch size. You need to identify this optimal batch size as part of performance tuning
in your environment. Start with a match batch size of 10% of the volume of records to
be matched and merged, run the match job only, see how many matches are generated
by your match rules, and then adjust upwards or downwards accordingly.

Configuring the Match Process 491


Configuring Match Properties for a Base Object

Accept All Unmatched Rows as Unique

Enable (set to Yes) this feature to have Siperian Hub mark as unique
(CONSOLIDATION_IND=1) any records that have been through the match process,
but for which no matches were identified. If enabled, for such records, Siperian Hub
automatically changes their state to consolidated (changes the consolidation indicator
from 2 to 1). Consolidated records are removed from the data steward’s queue via the
Automerge batch job.

By default, this option is disabled. In a development environment, you might want this
option disabled, for example, while iteratively testing and tuning match rules to
determine which records are found to be unique for a given set of match rules.

This option should always be enabled in a production environment. Otherwise, you can
end up with a large number of records with a consolidation indicator of 2. If this
backlog of records exceeds the Maximum Matches for Manual Consolidation setting
(see “Maximum Matches for Manual Consolidation” on page 490), then you will need
to process these records first before you can continue matching and consolidating
other records.

For more information, see:


• “Initial Data Loads and Incremental Loads” on page 302
• “Consolidation Indicator” on page 289
• “Accept Non-Matched Records As Unique” on page 715
• “Automerge Jobs” on page 717
• “Autolink Jobs” on page 715

492 Siperian Hub Administrator Guide


Configuring Match Properties for a Base Object

Match/Search Strategy

Select the match/search strategy to specify the reliability of the match versus the
performance you require. Select one of the following options.

Strategy Option Description


Fuzzy Probabilistic match that takes into account spelling variations, possible
misspellings, and other differences that can make matching records
non-identical. This is the primary means of matching data in a base
object. Referred to in this document as fuzzy-match base objects.
Note: If you specify a Fuzzy match/search strategy, you must specify a
fuzzy match key.
Exact Matches only records with identical values in the match column(s). If
you specify an exact match, you can define only exact-match columns
for this base object (exact-match base objects cannot have fuzzy-match
columns). Referred to in this document as exact-match base objects.

An exact strategy is faster, but an exact match will miss some matches if the data is
imperfect. The best option to choose depends on the characteristics of the data, your
knowledge of the data, and your particular match and consolidation requirements.

Certain configuration settings the Match / Merge Setup tab apply to only one type of
base object. In this document, such features are indicated with a graphic that shows
whether it applies to fuzzy-match base objects only (as in the following example), or
exact-match base objects only. No graphic means that the feature applies to both.

Note: The match / search strategy is configured at the base object level. For more
information about the match / search strategy configured at the match rule level, see
“Match / Search Strategy” on page 544.

Configuring the Match Process 493


Configuring Match Properties for a Base Object

Fuzzy Population

If the match/search strategy is Fuzzy, then you must select a population, which defines
certain characteristics about the records that you are matching. Data characteristics can
vary from country to country. By default, Siperian Hub comes with the US population,
but Siperian provides standard populations per country. If you require another
population, contact Siperian support. If you chose an exact match/search strategy, then
this value is ignored.

Populations perform the following functions for matching:


• accounts for the inevitable variations and errors that are likely to exist in name,
address, and other identification data
For example, the population for the US has some intelligence about the typical
identification numbers used in US data, such as the social security number.
Populations also have some intelligence about the distribution of common names.
For example, the US population has a relatively high percentage of the surname
Smith. But a population for a non-English-speaking country would not have Smith
among the common names.
• specifies how Siperian Hub builds match tokens, which are described in “Match
Keys and the Tokenization Process” on page 322
• specifies how search strategies and match purposes operate on the population of
data to be matched

Match Only Previous Rowid Objects

If this setting is enabled (checked), then Siperian Hub matches the current records
against records with lower ROWID_OBJECT values. For example, if the current
record has a ROWID_OBJECT value of 100, then the record will be matched only
against other records in the base object with a ROWID_OBJECT value that is less
than 100 (ignoring all records with a ROWID_OBJECT value that is higher than 100).

Using this feature can reduce the number of matches required and speed performance.
However, if PUTs are executed, or if records are inserted out of rowid order, then

494 Siperian Hub Administrator Guide


Configuring Match Properties for a Base Object

records might not be fully matched. You must assess the trade-off between
performance and match quantity based on the characteristics of your data and your
particular match requirements. By default, this option is disabled (unchecked).

Match Only Once

Available only for fuzzy key matching and only if “Match Only Previous Rowid
Objects” is checked (selected). If Match Only Once is enabled (checked), then once a
record has found a match, Siperian Hub will not match it any further within this search
range (the set of similar match key values). Using this feature can reduce duplicates and
increase performance. Instead of finding every match for a record in a search range,
Siperian Hub can find a single match for each. In subsequent match cycles, the merge
process will put these into large groups of XREF records associated with the base
object.

By default, this option is unchecked (disabled). If this feature is enabled, however, you
can miss matches. For example, suppose record A matches record B, and record A
matches record C, but record B and C do not match. You must assess the trade-off
between performance and match quantity based on the characteristics of your data and
your particular match requirements.

Dynamic Match Analysis Threshold

During the match process, dynamic match analysis determines whether the match
process will take an unacceptably long period of time. This threshold value specifies
the maximum acceptable number of comparisons.

To enable the dynamic match threshold, specify a non-zero value. Enable this feature if
you have data that is very similar (with high concentrations of matches) to reduce the
amount of work expended for a hot spot in your data. A hotspot is a group of records
representing overmatched data—a large intersection of matches. If Dynamic Match
Analysis Threshold is enabled, then records that produce more than the specified
number of potential match candidates will be skipped during the match process. By
default, this option is zero (disabled).

Configuring the Match Process 495


Configuring Match Properties for a Base Object

Before conducting a match on a given search range, Siperian Hub calculates the
number of search records (records being searched for matches), and multiplies it by the
number of file records (the number of records returned from the match key table that
need to be compared). If the result is greater than the specified Dynamic Match
Analysis Threshold, then no comparisons are performed on that range of data, and the
range is noted in the application server log for further investigation.

Enable Match on Pending Records

By default, the match process includes only ACTIVE records and ignores PENDING
records. For state management-enabled objects, select this check box to include
PENDING records in the match process. Note that, regardless of this setting,
DELETED records are ignored by the match process. For more information, see
“Enabling Match on Pending Records” on page 214.

Reset Link Properties for Link-style Base Objects

For link-style base objects only, you can unlink consolidated records and requeue them
for match. This can be configured to occur automatically on load update, or manually
by via the Reset Links batch job. For more information, see “Reset Links Jobs” on
page 744.

For link-style base objects only, the Schema Manager displays the following properties.

Property Description
Allow prompt for reset of Specifies whether to prompt for a reset of match links when
match links when match rules / configuration settings for match rules or match columns are
columns are changed changed.
Allow reset of match links for Specifies whether the reset links prompt applies to updated
updated data data (load updates). This prompt is triggered automatically
upon load update.
Allow reset of links to include Specifies whether the reset links process applies to
consolidated records consolidated records.
Note: The reset links process always applies to
unconsolidated records.

496 Siperian Hub Administrator Guide


Configuring Match Paths for Related Records

Property Description
Allow reset of links to include Specifies whether manually-linked records are included by
manually linked records the reset links process. Autolinked records are always
included.
Note: This setting affects the scope of all other reset links
settings.

Supporting Long ROWID_OBJECT Values


If a base object has such a large number of records that the ROWID_OBJECT values
might exceed 12 digits or more, you need to explicitly enable support for longer values
in the Cleanse Match Server. To enable the Cleanse Match Server to use long Rowid
Object values, edit the cmxcleanse.properties file and configure the
cmx.server.bmg.use_longs setting:
cmx.server.bmg.use_longs=1

By default, this option is disabled.

Configuring Match Paths for Related Records


This section describes how to configure match paths for related records, which are
used for matching in your Siperian Hub implementation.

About Match Paths


This section describes match paths and related concepts.

Match Paths

A match path allows you to traverse the hierarchy between records—whether that
hierarchy exists between base objects (inter-table paths) or within a single base object
(intra-table paths). Match paths are used for configuring match column rules involving
related records in either separate tables or in the same table.

Configuring the Match Process 497


Configuring Match Paths for Related Records

Foreign Key Relationships and Filters

Configuring match paths that point to other records involves two main components:

Component Description
foreign key Used to traverse the relationships to other records. Allows you to
relationships specify parent-to-child and child-to-parent relationships.
filters (optional) Allow you to selectively include or exclude records based on values in
a given column, such as ADDRESS_TYPE or PARTY_TYPE.
For more information, see “Configuring Filters for Match Paths” on
page 511.

Relationship Base Objects

In order to configure match rules for these kinds of relationships, particularly


many-to-many relationships, you need create a separate base object that serves as a
relationship base object to describe to Siperian Hub the relationships between records. You
populate this relationship base object with information about the relationships using a
data management tool (outside of Siperian Hub) rather than using the Siperian Hub
processes (land, stage, and load, as described in Chapter 9, “Siperian Hub Processes.”).

You configure a separate relationship base object for each type of relationship. You can
include additional attributes of the relationship type, such as start date, end date, and
other relationship details. The relationship base object defines a match path that
enables you to configure match column rules.

Important: Do not run the match and consolidation processes on a base object that is
used to define relationships between records in inter-table or intra-table match paths.
Doing so will change the relationship data, resulting in the loss of the associations
between records.

Inter-Table Paths

An inter-table path defines the relationship between records in two different base
objects. In many cases, this relationship can be defined simply by configuring a foreign
key relationship: a key column in the child base object points to the primary key of the

498 Siperian Hub Administrator Guide


Configuring Match Paths for Related Records

parent base object. For more information, see “Configuring Foreign-Key Relationships
Between Base Objects” on page 140.

In some cases, however, the relationship between records can be more complex,
requiring an intermediary base object that defines the relationship between records in
the two tables.

Example Base Objects for Inter-Table Paths

Consider the following example in which a Siperian Hub implementation has two base
objects:

Base Object Description


Person Contains any type of person, such as employees for your organization,
employees for some other organizations (prospects, customers, vendors, or
partners), contractors, and so on.
Address Contains any type of address—mailing, shipping, home, work, and so on.

In this example, there is the potential for many-to-many relationships:


• A person could have multiple addresses, such as a home and work address.
• A single address could have multiple persons, such as a workplace or home.

In order to configure match rules for this kind of relationship between records in
different base objects, you would create a separate base object (such as PersAddrRel)
that describes to Siperian Hub the relationships between records in the two base
objects.

Columns in the Example Base Objects

Suppose the Person base object had the following columns:

Column Type Description


ROWID_OBJECT CHAR(14) Primary key. Uniquely identifies this person in the
base object.
TYPE CHAR(14) Type of person, such as an employee or customer
contact.

Configuring the Match Process 499


Configuring Match Paths for Related Records

Column Type Description


NAME VARCHAR(50) Person’s name (simplified for this example).
EMPLOYER VARCHAR(50) Person’s employer.
... ... ...

Suppose the Address base object had the following columns:

Column Type Description


ROWID_OBJECT CHAR(14) Primary key. Uniquely identifies this employee.
TYPE CHAR(14) Type of address, such as their home, work, mailing, or
shipping address.
NAME VARCHAR(50) Name of the individual or organization residing at this
address.
ADDRESS_1 VARCHAR(50) First address line.
ADDRESS_2 VARCHAR(50) Second address line.
CITY VARCHAR(50) City
STATE_PROV VARCHAR(50) State or province
POSTAL_CODE VARCHAR(50) Postal code
... ... ...

To define the relationship between records in the two base objects, the PersonAddrRel
base object could have the following columns:

Column Type Description


ROWID_OBJECT CHAR(14) Primary key. Uniquely identifies this person in the
base object.
PERS_FK CHAR(14) Foreign key to the ROWID_OBJECT column in the
Person base object.
ADDR_FK CHAR(14) Foreign key to the ROWID_OBJECT column in the
Address base object.

500 Siperian Hub Administrator Guide


Configuring Match Paths for Related Records

Note that the column type of the foreign key columns—CHAR(14)—matches the
primary key to which they point.

Example Configuration Steps

After you have configured the relationship base object (PersonAddrRel), you would
complete the following tasks:
1. Configure foreign keys from this base object to the ROWID_OBJECT of the
Person and Address base objects. For more information, see “Configuring
Foreign-Key Relationships Between Base Objects” on page 140.

2. Load the PersAddrRel base object with data that describes the relationships
between records, as shown in the following example.

ROWID_OBJECT PERS_FKEY ADDR_FKEY


1 380 132
2 480 920
3 786 432
4 786 980
5 12 1028
6 922 1028
7 1302 110
... ... ...

Configuring the Match Process 501


Configuring Match Paths for Related Records

In this example, note that Person #786 has two addresses, and that Address #1028
has two persons.
3. Use the PersonAddrRel base object when configuring match column rules for the
related records. For more information, see “Configuring Match Column Rules for
Match Rule Sets” on page 542.

Intra-Table Paths

Within a base object, parent/child relationships can exist between individual records.
Siperian Hub allows you to clarify relationships between records in the same base
object, and then use those relationships when configuring column match rules.

Example Base Object for Intra-Table Paths

Consider the following example of an Employee base object in which reporting


relationships exist between employees.

The relationships among employees is hierarchical. The CEO is at the top of the
hierarchy, representing what is called the global ultimate parent record.

502 Siperian Hub Administrator Guide


Configuring Match Paths for Related Records

Columns in the Example Base Object

Suppose the Employee base object had the following columns:

Column Type Description


ROWID_OBJECT CHAR(14) Primary key. Uniquely identifies this employee in the
base object.
NAME VARCHAR(50) Employee name.
TITLE VARCHAR(50) Employee’s job title.
... ... ...

Create a Relationship Base Object

In order to configure match rules for this kind of object, you would create a separate
base object to describe to Siperian Hub the relationships between records.
For example, you could create and configure a EmplRepRel base object with the
following columns:

Column Type Description


ROWID_OBJECT CHAR(14) Primary key. Uniquely identifies this relationship
record.
EMPLOYEE_FK CHAR(14) Foreign key to the ROWID_OBJECT of the
employee record.
REPORTS_TO_FK CHAR(14) Foreign key to the ROWID_OBJECT of a manager
record.

Note that the column type of the foreign key columns—CHAR(14)—matches the
primary key to which they point.

Example Configuration Steps

After you have configured this base object, you must complete the following tasks:

Configuring the Match Process 503


Configuring Match Paths for Related Records

1. Configure foreign keys from this base object to the ROWID_OBJECT of the
Employee base object. For more information, see “Configuring Foreign-Key
Relationships Between Base Objects” on page 140.

2. Load this base object with data that describes the relationships between records, as
shown in the following example.

ROWID_OBJECT EMPLOYEE REPORTS_TO


1 7 93
2 19 71
3 24 82
4 29 82
5 31 82
6 31 71
7 48 16
8 53 12

Note that you can define many-to-many relationships between records. For
example, the employee whose ROWID_OBJECT is 31 reports to two different
managers (ROWID_OBJECT=82 and ROWID_OBJECT=71), while this

504 Siperian Hub Administrator Guide


Configuring Match Paths for Related Records

manager (ROWID_OBJECT=82) has three reports (ROWID_OBJECT=24, 29,


and 31).
3. Use the EmplRepRel base object when configuring match column rules for the
related records according to the instructions in “Configuring Match Column Rules
for Match Rule Sets” on page 542.
For example, you could create a match rule that takes into account the employee’s
manager to produce more accurate matches.

Note: This example used a REPORTS_TO field to define the relationship, but you
could use piece of information to associate the records—even something more generic
and flexible like RELATIONSHIP_TYPE.

Navigating to the Paths Tab


To navigate to the Paths tab for a base object:
1. In the Schema Manager, navigate to the Match/Merge Setup Details dialog for the
base object that you want to configure. For more information, see “Navigating to
the Match/Merge Setup Details Dialog” on page 486.
2. Click the Paths tab.

Configuring the Match Process 505


Configuring Match Paths for Related Records

The Schema Manager displays the Paths tab.

Sections of the Paths Tab

The Paths tab has two sections:

Section Description
Path Components Configure the foreign keys used to traverse the relationships. For more
information, see “Configuring Path Components” on page 507.
Filters Configure filters used to include or exclude records for matching. For
more information, see “Configuring Filters for Match Paths” on page 511.

Root Base Object

The root base object is displayed automatically in the Path Components section of the
screen and is always available. The root base object represents an entity without child
or parent relationships. If you want to configure match rules that involve parent or
child records, you need to explicitly add path components to the root base object, and

506 Siperian Hub Administrator Guide


Configuring Match Paths for Related Records

these relationships must have been configured beforehand (see “Configuring


Foreign-Key Relationships Between Base Objects” on page 140).

Configuring Path Components


This section describes how to configure path components in the Schema Manager.
Path components provide a way to define the connection between parent and child
tables using foreign keys for the purpose of using columns from that table in a match
column.

Properties of Path Components

This section describes properties of path components.

Display Name

The name of this path component as it will be displayed in the Hub Console.

Physical Name

Actual name of the path component in the database. Siperian Hub will suggest a
physical name for the path component based on the display name that you enter.

Check For Missing Children

The Check for Missing Children check box instructs Siperian Hub to either allow for
missing child records (enabled, the default) or to require all parent records to have child
records.

Setting Description
Enabled If you might have some missing child records and you have rules that do not
(Checked) include columns in the tables that might be missing records.
Disabled If all of your rules use the child columns and do not have null match enabled.
(Unchecked) In this case, checking for missing children does not add any value, and it can
have an negative impact on performance.

Configuring the Match Process 507


Configuring Match Paths for Related Records

If you are certain that your data is complete (parent records have child records), and
you include the parent in the child match rule, then inter-table matching works as
expected. However, if your data tends to contain parent records that are missing child
records, or if you do not include the parent column in the child match rule, you must
check (select) the Check for Missing Children check box in the path component
associated with this match column rule to ensure that an outer join occurs when
Siperian Hub checks for records to match.

Note: If the Check for Missing Children option is enabled, Siperian Hub performs an
outer join between the parent and child tables, which can have a performance impact.
Therefore, when not needed, it is more efficient to disable this option.

Constraints

Property Description
Table List of tables in the schema.
Direction Direction of the foreign key:
• Parent-to-Child
• Child-to-Parent
• N/A
Foreign Key On Column to which the foreign key points. This column can be either in a
different base object or the same base object.

508 Siperian Hub Administrator Guide


Configuring Match Paths for Related Records

Adding Path Components

To add a path component:


1. In the Schema Manager, navigate to the Paths tab according to the instructions in
“Navigating to the Paths Tab” on page 505.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the Path Components section, click the Add button.
The Schema Manager displays the Add Path Component dialog.

4. Specify the properties for this path component. For more information, see
“Properties of Path Components” on page 507.
5. Click OK.
6. Click the button to save your changes.

Editing Path Components

To edit a path component:


1. In the Schema Manager, navigate to the Paths tab according to the instructions in
“Navigating to the Paths Tab” on page 505.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the Path Components tree, select the path component that you want to delete.

Configuring the Match Process 509


Configuring Match Paths for Related Records

4. In the Path Components section, click the button.


The Schema Manager displays the Edit Path Component dialog.

5. Specify the properties for this path component. You can change the following
values:
• Display Name (see “Display Name” on page 507)
• Check for Missing Children (see “Check For Missing Children” on page 507)
6. Click OK.
7. Click the button to save your changes.

Deleting Path Components

You can delete path components but not the root base object. To delete a path
component:
1. In the Schema Manager, navigate to the Paths tab according to the instructions in
“Navigating to the Paths Tab” on page 505.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the Path Components tree, select the path component that you want to delete.
4. In the Path Components section, click the button.
The Schema Manager prompts you to confirm deletion.
5. Click Yes.
6. Click the button to save your changes.

510 Siperian Hub Administrator Guide


Configuring Match Paths for Related Records

Configuring Filters for Match Paths


This section describes how to configure filters for match paths in the Schema Manager.

About Filters

In match paths, a filter allows you to selectively determine whether to include or exclude
records for matching based on values in a given column. When you define a filter for a
column, you specify the filter condition with one or more values that determine which
records qualify for match processing. For example, if you have an Address base object
that contains both shipping and billing addresses, you might configure a filter that
includes only billing addresses for matching and ignores the shipping addresses. During
execution, the match process will match records in the match batch with billing address
records only.

Filter Properties

In Siperian Hub, filters have the following properties.

Setting Description
Column Column to configure in the currently-selected base object.
Operator Operator to use for this filter. One of the following values:
• IN—Include columns that contain the specified values.
• NOT IN—Exclude columns that contain the specified values.
Values One or more values to use for this filter.

Example Filter

For example, if you wanted to match only on mailing addresses in an Address base
object, you could specify:

Configuring the Match Process 511


Configuring Match Paths for Related Records

Setting Example Value


Column ADDR_TYPE
Operator IN
Values MAILING

In this example, only mailing addresses would qualify for matching—records in which
the COLUMN field contains “MAILING”. All other records would be ignored.

Adding Filters

If you add multiple filters, Siperian Hub evaluates the entire expression using the
logical AND operator. For example,
xExpr AND yExpr AND zExpr

To add a filter:
1. In the Schema Manager, navigate to the Paths tab according to the instructions in
“Navigating to the Paths Tab” on page 505.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the Filters section, click the Add button.
The Schema Manager displays the Add Filter dialog.

4. Specify the properties for this path component. For more information, see
“Properties of Path Components” on page 507.

512 Siperian Hub Administrator Guide


Configuring Match Paths for Related Records

5. Specify the value(s) for this filter according to the instructions in “Editing Values
for a Filter” on page 513.
6. Click the button to save your changes.

Editing Values for a Filter

To edit values for a filter:


1. Do one of the following:

• Add a filter. For more information, see “Adding Filters” on page 512.
• Edit filter properties. For more information, see “Editing Filter Properties” on
page 513.
2. In either the Add Filter or Edit Filter dialog, click the button next to the
Values field.
The Schema Manager displays the Edit Values dialog.
3. Configure the values for this filter.
• To add a value, click the button. When prompted, specify a value and
then click OK.

• To delete a value, select it in the Edit Values dialog, click the button, and
then click Yes when prompted to delete the value.
4. Click OK.
5. Click the button to save your changes.

Editing Filter Properties

To edit filter properties:


1. In the Schema Manager, navigate to the Paths tab according to the instructions in
“Navigating to the Paths Tab” on page 505.

Configuring the Match Process 513


Configuring Match Paths for Related Records

2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on


page 30.
3. In the Filters section, click the button.
The Schema Manager displays the Add Filter dialog.

4. Specify the properties for this path component. For more information, see
“Properties of Path Components” on page 507.
5. Specify the value(s) for this filter according to the instructions in “Editing Values
for a Filter” on page 513.
6. Click the button to save your changes.

Deleting Filters

To delete a filter:
1. In the Schema Manager, navigate to the Paths tab according to the instructions in
“Navigating to the Paths Tab” on page 505.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the Filters section, select the filter that you want to delete, and then click the
button.
The Schema Manager prompts you to confirm deletion.
4. Click Yes.

514 Siperian Hub Administrator Guide


Configuring Match Columns

Configuring Match Columns


This section describes how to configure match columns so that you can use them in
match column rules (see “Configuring Match Column Rules for Match Rule Sets” on
page 542). If you want to configure primary key match rules instead, see the
instructions in “Configuring Primary Key Match Rules” on page 578.

About Match Columns


A match column is a column that you want to use in a match rule, such as name or
address columns. Before you can use a column in rule definitions, you must first
designate it as a column that can be used in match rules, and provide information
about the data it contains. To learn more, see “Match Columns Depend on the Search
Strategy” on page 515.

Match Column Types

There are two types of columns used in match rules:

Column Type Description


Fuzzy Probabilistic match. Suitable for columns containing data that varies in
spelling, abbreviations, word sequence, completeness, reliability, and
other inconsistencies. Examples include street addresses and names of
people or organizations.
Exact Deterministic match. Suitable for columns containing consistent and
predictable patterns. Exact match columns match only on identical data.
Examples include IDs, postal codes, industry codes, or any other
well-defined piece of information.

Match Columns Depend on the Search Strategy

The types of match columns that you can configure depend on the type of the base
object that you are configuring (see “Exact-match and Fuzzy-match Base Objects” on

Configuring the Match Process 515


Configuring Match Columns

page 320). The type of base object is defined by the selected match / search strategy
(see “Match/Search Strategy” on page 493).

Match Strategy Description


Fuzzy-match base objects Allows you to configure fuzzy-match columns as well as
exact-match columns. For more information, see “Configuring
Match Columns for Fuzzy-match Base Objects” on page 519.
Exact-match base objects Allows you to configure exact-match columns but not
fuzzy-match columns. For more information, see “Configuring
Match Columns for Exact-match Base Objects” on page 527.

Path Component

The path component is either the source table to use for a match column definition, or
the match path used to navigate a hierarchy of records. Match paths are used for
configuring match column rules involving related records in either separate tables or in
the same table. Before you can specify a path component, the match path must be
configured. For more information, see “Configuring Match Paths for Related Records”
on page 497.

To specify a path component for a match column:


1. Click the key next to the Path Component field.
The Schema Manager displays the Select Match Path Component dialog.

2. Select the match path component.


3. Click OK.

516 Siperian Hub Administrator Guide


Configuring Match Columns

Field Types

For fuzzy-match columns, the field name drop-down list displays the following field
types. For more information, see “Adding Exact-match Columns for Fuzzy-match Base
Objects” on page 525.
Field Types
Field Name Description
Address_Part1 Includes the part of address up to, but not including, the locality last
line. The position of the address components should be the normal
word order used in your data population. Pass this data in one field.
Depending on your base object, you may concatenate these
attributes into one field before matching. For example, in the US, an
Address_Part1 string includes the following fields: Care-of +
Building Name + Street Number + Street Name + Street Type +
Apartment Details. Address_Part1 uses methods and options
designed specifically for addresses.
Address_Part2 Locality line in an address. For example, in the US, a typical
Address_Part2 includes: City + State + Zip (+ Country). Matching
on Address_Part2 uses methods and options designed specifically
for addresses.
Attribute1, Attribute2 Two general purpose fields. These fields are matched using a general
purpose, string matching algorithm that compensates for
transpositions and missing characters or digits.
Date Matches any type of date, such as date of birth, expiry date, date of
contract, date of change, creation date, and so on. It expects the date
to be passed in Day+Month+Year format. It supports the use or
absence of delimiters between the date components. Matching on
dates uses methods and options designed specifically for dates. It
overcomes the typical error and variation found in this data type.
ID Matches any type of ID number, such as: Account number,
Customer number, Credit Card number, Drivers License number,
Passport, Policy number, SSN or other identity code, VIN, and so
on. It uses a string matching algorithm that compensates for
transpositions and missing characters or digits.
Organization_Name Matches the names of organizations, such as company names,
business names, institution names, department names, agency names,
trading names, and so on. This field supports matching on a single
name or on a compound name (such as a legal name and its trading
style). You may also use multiple names (for example, a legal name
and a trading style) in a single Organization_Name column for the
match.

Configuring the Match Process 517


Configuring Match Columns

Field Types (Cont.)


Field Name Description
Person_Name Matches the names of people. Use the full person name.
The position of the first name, middle names, and family names,
should be the normal word order used in your population. For
example, in English-speaking countries, the normal order is: First
Name + Middle Name(s) + Family Name(s). Depending on your
base object design, you can concatenate these fields into one field
before matching. This field supports matching on a single name, or
an account name (such as JOHN & MARY SMITH). You may also
use multiple names, such as a married name and a former name.
Postal_Area Can be used to place more emphasis on the postal code than if it
were included in the Address_Part2 field. It is for all types of postal
codes, including Zip codes. It uses a string matching algorithm that
compensates for transpositions and missing characters or digits.
Telephone_Number Used to match telephone numbers. It uses a string matching
algorithm that compensates for transpositions and missing digits or
area codes.

Selecting Multiple Columns for Matching

If you specify more than one column for matching:


• Values are concatenated into the field used by the match purpose, with a space
inserted between each value. For example, you can select first, middle, last, and
suffix columns in your base object. The concatenated fields will look like this
(a space follows the last word in the string):
first middle last suffix

For example:
Anna Maria Gonzales MD

• For data containing spaces or null data:


• If there are spaces in the data, then the spaces remain and the field is not
NULL.
• If all the fields are null, then the combined value is null.
• If any component on the combined field is null, then no extra space will be
added to replace the null.

518 Siperian Hub Administrator Guide


Configuring Match Columns

Note: Concatenating columns is not recommended for exact match columns.

Configuring Match Columns for Fuzzy-match Base Objects

Fuzzy-match base objects can have both fuzzy and exact-match columns.
For exact-match base objects instead, see “Configuring Match Columns for
Exact-match Base Objects” on page 527.

Navigating to the Match Columns Tab for a Fuzzy-match Base


Object

To define match columns for a fuzzy-match base object:


1. In the Schema Manager, select the fuzzy-match base object that you want to
configure.
2. Click the Match/Merge Setup node. For more information, see “Navigating to
the Match/Merge Setup Details Dialog” on page 486.
3. Click the Match Columns tab.

Configuring the Match Process 519


Configuring Match Columns

The Schema Manager displays the Match Columns tab for the fuzzy-match base
object.

The Match Columns tab for a fuzzy-match base object has the following sections.

Property Description
Fuzzy Match Key Properties for the fuzzy match key. For more information, see
“Configuring Fuzzy Match Key Properties” on page 521.
Match Columns Match columns and their properties:
• Field Name (see “Field Types” on page 517)
• Column Type (see “Match Column Types” on page 515)
• Path Component (see “Path Component” on page 516)
• Source Table—table referenced in the path component, or the
base object (if the path component is root)
Match Column List of available columns in the base object, as well as columns that
Contents have been selected for match.

520 Siperian Hub Administrator Guide


Configuring Match Columns

Configuring Fuzzy Match Key Properties

This section describes how to configure the match column properties for fuzzy-match
base objects (see “Match/Search Strategy” on page 493).

Key Types

The match key type describes important characteristics about a column to Siperian Hub.
Siperian Hub has some intelligence about names and addresses, so this information
helps Siperian Hub generate keys correctly and conduct better searches. This is the
main criterion for the search that builds the initial list of potential match candidates.
This key type should be based on the main type of data that is in physical column(s)
that make up the fuzzy match key.

For a fuzzy-match base object, you can select one of the following key types:

Key Type Description


Person_Name Used if your fuzzy match key contains data for individuals only.
Organization_Name Used if your fuzzy match key contains data for organizations only, or
if it contains data for both organizations and individuals.
Address_Part1 Used if your fuzzy match key contains address data to be
consolidated.

Note: Key types are based on the population you select. The above list of key types
applies to the default population (US). Other populations might have different key
types. If you require another population, contact Siperian support.

Configuring the Match Process 521


Configuring Match Columns

Key Widths

The match key width determines how fast the searches are, the number of possible match
candidates returned, and how much disk space the keys consume. Key widths apply to
fuzzy match objects only.

Key Width Description


Standard Appropriate for most fuzzy match keys, balancing reliability and space usage.
Extended Might result in more match candidates, but at the cost of longer processing
time to generate keys. This option provides some additional matching
capability due to the concatenation of columns. This key width works best
when:
• your data set is not extremely large
• your data set is not complete
• you have sufficient resources to handle the processing time and disk
space requirements
Limited Trades some match reliability for disk space savings. This option might result
in fewer match candidates, but searches can be faster. This option works well
if you are willing to undermatch for faster searches that use less disk space for
the keys. Limited keys match fewer records with word-order variations than
standard keys. This choice provides a subset of the Standard key set, but might
be the best option if disk space is restricted or the data volume is extremely
large.
Preferred Generates a single key per base object record. This option trades some match
reliability for performance (reduces the number of matches that need to be
performed) and disk space savings (reduces the size of the match key table).
Depending on characteristics of the data, a preferred key width might result in
fewer match candidates.

Steps to Configure Fuzzy Match Key Properties

To configure fuzzy match key properties for a fuzzy-match base object:


1. In the Schema Manager, navigate to the Match Columns tab according to the
instructions in “Navigating to the Match Columns Tab for a Fuzzy-match Base
Object” on page 519.

522 Siperian Hub Administrator Guide


Configuring Match Columns

2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on


page 30.
3. Configure the following settings for this fuzzy-match base object.

Property Description
Key Type Type of field primarily used in the match. This is the main criterion
for the search that builds the initial list of potential match
candidates. This key type should be based on the main type of data
stored in the base object. For more information, see “Key Types”
on page 521.
Key Width Size of the search range for which keys are generated. For more
information, see “Key Widths” on page 522.
Path Component Path component for this fuzzy match key. This is a table containing
the column(s) to designate as the key type: Base Object, Child Base
Object table, or Cross-reference table. For more information, see
“Path Component” on page 516.

4. Click the Save button to save your changes.

Adding a Fuzzy-match Column for Fuzzy-match Base Objects

To define a fuzzy-match column for a fuzzy-match base object:


1. In the Schema Manager, navigate to the Match Columns tab. For more
information, see “Navigating to the Match Columns Tab for a Fuzzy-match Base
Object” on page 519.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. To add a fuzzy-match column, click the button.

Configuring the Match Process 523


Configuring Match Columns

The Schema Manager displays the Add Fuzzy-match Column dialog.

4. Specify the following settings.

Property Description
Match Path Match path component for this fuzzy-match column. For a
Component fuzzy-match column, the source table can be the parent table, a
parent cross-reference table, or any child base object table. For
more information, see “Path Component” on page 516.
Field Name Name of this field as it will be displayed in the Hub Console. For
fuzzy match columns, this is a drop-down list where you can select
the type of data in the match column being defined, as described in
“Field Types” on page 517.

5. Specify the base object column(s) for the fuzzy match.


To add a column to the Selected Columns list, select a column name and then click
the right arrow button.
Note: If you add multiple columns, the values are concatenated, with a separator
space between values. For more information, see “Selecting Multiple Columns for
Matching” on page 518.
6. Click OK.

524 Siperian Hub Administrator Guide


Configuring Match Columns

The Schema Manager adds the match column to the Match Columns list.
7. Click the Save button to save your changes.

Adding Exact-match Columns for Fuzzy-match Base Objects

To define an exact-match column for a fuzzy-match base object:


1. In the Schema Manager, navigate to the Match Columns tab. For more
information, see “Navigating to the Match Columns Tab for a Fuzzy-match Base
Object” on page 519.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. To add an exact-match column, click the button.
The Schema Manager displays the Add Exact-match Column dialog.

4. Specify the following settings.

Configuring the Match Process 525


Configuring Match Columns

Property Description
Match Path Match path component for this exact-match column. For an
Component exact-match column, the source table can be the parent table and /
or child physical columns. For more information, see “Path
Component” on page 516.
Field Name Name of this field as it will be displayed in the Hub Console.

5. Specify the base object column(s) for the exact match.


To add a column to the Selected Columns list, select a column name and then click
the right arrow.
Note: If you add multiple columns, the values are concatenated, with a separator
space between values. For more information, see “Selecting Multiple Columns for
Matching” on page 518.
Note: Concatenating columns is not recommended for exact match columns.
6. Click OK.
The Schema Manager adds the match column to the Match Columns list.
7. Click the Save button to save your changes.

Editing Match Column Properties for Fuzzy-match Base Objects

Instead of editing match column properties, you must:


• delete the match column, as described in “Deleting Match Columns for
Fuzzy-match Base Objects” on page 526
• add a new match column, specifying the settings that you want, as described in
“Adding Exact-match Columns for Fuzzy-match Base Objects” on page 525

Deleting Match Columns for Fuzzy-match Base Objects

To delete a match column for a fuzzy-match base object:


1. In the Schema Manager, navigate to the Match Columns tab. For more
information, see “Navigating to the Match Columns Tab for a Fuzzy-match Base
Object” on page 519.

526 Siperian Hub Administrator Guide


Configuring Match Columns

2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on


page 30.
3. In the Match Columns list, select the match column that you want to delete.
4. Click the button.
The Schema Manager prompts you to confirm deletion.
5. Click Yes.
6. Click the Save button to save your changes.

Configuring Match Columns for Exact-match Base Objects

Before you define match column rules, you must define the match columns on which
they will be based. Exact-match base objects can have only exact-match columns. For
more information about configuring match columns for fuzzy-match base objects
instead, see “Configuring Match Columns for Fuzzy-match Base Objects” on page 519.

Navigating to the Match Columns Tab for an Exact-match Base


Object

To define match columns for an exact-match base object:


1. In the Schema Manager, display the Match/Merge Setup Details dialog for the
exact-match base object that you want to configure. For more information, see
“Navigating to the Match/Merge Setup Details Dialog” on page 486.
2. Click the Match Columns tab.

Configuring the Match Process 527


Configuring Match Columns

The Schema Manager displays the Match Columns tab for the exact-match base
object.

The Match Columns tab for an exact-match base object has the following sections.

Property Description
Match Columns Match columns and their properties:
• Field Name
• Column Type (see “Match Column Types” on page 515)
• Path Component (see “Path Component” on page 516)
• Source Table—table referenced in the path component, or the
base object (if the path component is root)
Match Column List of available columns and columns selected for matching.
Contents

528 Siperian Hub Administrator Guide


Configuring Match Columns

Adding Match Columns for Exact-match Base Objects

You can add only exact-match columns for exact-match base objects. Fuzzy-match
columns are not allowed.

To add an exact-match column for an exact-match base object:


1. In the Schema Manager, navigate to the Match Columns tab. For more
information, see “Navigating to the Match Columns Tab for an Exact-match Base
Object” on page 527.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. To add an exact-match column, click the button.
The Schema Manager displays the Add Exact-match Column dialog.

Configuring the Match Process 529


Configuring Match Columns

4. Specify the following settings.

Property Description
Match Path Match path component for this exact-match column. For an
Component exact-match column, the source table can be the parent table and /
or child physical columns. For more information, see “Path
Component” on page 516.
Field Name Name of this field as it will be displayed in the Hub Console.

5. Specify the base object column(s) for the exact match.


To add a column to the Selected Columns list, select a column name and then click
the right arrow.
Note: If you add multiple columns, the values are concatenated, with a separator
space between values. For more information, see “Selecting Multiple Columns for
Matching” on page 518.
Note: Concatenating columns is not recommended for exact match columns.
6. Click OK.
The Schema Manager adds the selected match column(s) to the Match Columns
list.
7. Click the Save button to save your changes.

Editing Match Column Properties for Exact-match Base Objects

Instead of editing match column properties, you must:


1. Delete the match column, as described in “Deleting Match Columns for
Exact-match Base Objects” on page 531.
2. If you want to add a match column with the same name, click the Save button to
save your changes first.
3. Add a new match column, specifying the settings that you want, as described in
“Adding Match Columns for Exact-match Base Objects” on page 529.

530 Siperian Hub Administrator Guide


Configuring Match Rule Sets

Deleting Match Columns for Exact-match Base Objects

To delete a match column for an exact-match base object:


1. In the Schema Manager, navigate to the Match Columns tab. For more
information, see “Navigating to the Match Columns Tab for an Exact-match Base
Object” on page 527.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the Match Columns list, select the match column that you want to delete.
4. Click the button.
The Schema Manager prompts you to confirm deletion.
5. Click Yes.
6. Click the Save button to save your changes.

Configuring Match Rule Sets


This section describes how to configure match rule sets for your Siperian Hub
implementation.

About Match Rule Sets


A match rule set is a logical collection of match column rules (see “Configuring Match
Column Rules for Match Rule Sets” on page 542) that have some properties in
common. Match rule sets are associated with match column rules only—not primary
key match rules (which are described in “Configuring Primary Key Match Rules” on
page 578).

Match rule sets allow you to execute different sets of match column rules at different
times. The match process uses only one match rule set per execution. To match using a
different match rule set, the match rule set must be selected and the match process
must be executed again.

Configuring the Match Process 531


Configuring Match Rule Sets

Note: Only one match column rule in the match rule set needs to succeed in order to
declare a match between records.

What Match Rule Sets Specify

Match rule sets include:


• a search level that dictates the search strategy
• any number of automatic and manual match column rules
• optionally, a filter that allows you to selectively include or exclude records from the
match batch during the match process

Multiple Match Rule Sets and the Specified Default

You can configure any number of rule sets. When users want to run the Match batch
job, they select one rule set from the list of rule sets that have been defined for the base
object.

For more information about choosing match rule sets, see “Selecting a Match Rule Set”
on page 737.

In the Schema Manager, you designate one match rule set as the default.

Default (*)

532 Siperian Hub Administrator Guide


Configuring Match Rule Sets

When to Use Match Rule Sets

Match rule sets allow you to accommodate different match column rule requirements
at different times. For example, you might use one match rule set for an initial data load
and a different match rule set for subsequent incremental loads. Similarly, you might
use one match rule set to process all records, and another match rule set with a filter to
process just a subset of records (see “Filtering SQL” on page 536).

Rule Set Evaluation

Before saving any changes to a match rule set (including any changes to match rules in
the match rule set), the Schema Manager analyzes the match rule set and prompts you
with a warning message if the match rule set has any issues, as shown in the following
example.

Note: This is only a warning message. You can choose to ignore the message and save
changes anyway.

Example issues include a match rule set that:


• is identical to an already existing match rule set
• is empty—no match column rules have been added
• contains no fuzzy-match column rules for a fuzzy-match base object
• contains one or more fuzzy-match columns but no exact-match column (can
impact match performance)
• contains fuzzy and exact-match columns with the same source columns

Configuring the Match Process 533


Configuring Match Rule Sets

Match Rule Set Properties


This section describes the properties for match rule sets.

Name

The name of the rule set. Specify a unique, descriptive name.

Search Levels

Used with fuzzy-match base objects only. When you configure a match rule set, you
define a search level that instructs Siperian Hub on how stringently and thoroughly to
search for candidate matches.

The goal of the match process is to find the optimal number of matches for your data:
• not too few (called undermatching), which misses relevant matches, or
• not too many (called overmatching), which generates too many matches, including
matches that are not relevant

For any name or address in a fuzzy match key, Siperian Hub uses the defined search
level to generate different key ranges for the purpose of determining which records are
possible match candidates—and to which records the match column rules will be
applied.

You can choose one of the following search levels:

Search Level Description


Narrow Most stringent level in searching for possible match candidates.This
search level is fast, but it can result in fewer matches than other
search levels might generate and possibly result in undermatching.
Narrow can be appropriate if your data set is relatively correct and
complete, or for very large data sets with highly matchy data.
Typical Appropriate for most rule sets.

534 Siperian Hub Administrator Guide


Configuring Match Rule Sets

Search Level Description


Exhaustive Generates a larger set of possible match candidates than the Typical
level. This can result in more matches than other search levels might
generate, possibly result in overmatching, and take more time. This
level might be appropriate for smaller data sets that are less complete.
Extreme Generates a still larger set of possible match candidates, which can
result in overmatching and take more much more time. This level
might be appropriate for smaller data sets that are less complete, or
to identify the highest possible number of matching records.

The search level you choose should be determined by the size of your data set, your
time constraints, and how critical the matches are. Depending on your circumstances
and requirements, it is sometimes more appropriate to undermatch, while at other
times, it is more appropriate to overmatch. Implementations dealing with relatively
reliable and complete data can use the Narrow level, while implementations dealing
with less reliable data or with more critical problems should use Exhaustive or
Extreme.

The search level might also differ depending on the phase of a project. It might be
necessary to have a looser level (exhaustive or extreme) for initial matching, and tighten
as the data is deduplicated.

Enable Search by Rules

This setting specifies whether searching by rules is enabled (checked) or not


(unchecked, the default). Used with fuzzy-match base objects only and applies only to
the SIF searchMatch request. The searchMatch request searches for records in a
package based on match column and rule definitions. The searchMatch request uses
the columns in these records to generate match columns that are used by the match
server to find match candidates. For more information about searchMatch, see the
Siperian Services Integration Framework Guide and the Siperian Hub Javadoc.

By default, when an application calls the SIF searchMatch request, all possible match
columns are generated from the package or mapping records specified in the request,

Configuring the Match Process 535


Configuring Match Rule Sets

and the match is performed by treating all columns with equal weight. You can enable
this option, however, to allow applications to specify input match columns, in which
case the searchMatch API ignores any columns that were not passed as part of the
request. You might use this feature if, for example, you were using a custom population
definition and wanted to call the searchMatch API with a particular set of rules.

Enable Filtering

Specifies whether filtering is enabled for this match rule set.


• If checked (selected), allows you to define a filter (see “Filtering SQL” on page
536) for this match rule set. When running a Match job, users can select the match
rule set (see “Selecting a Match Rule Set” on page 737) with a filter defined so that
the Match job processes only the subset of records that meet the filter criteria.
• If unchecked (not selected), then all records will be processed by the match rule set
when the Match batch job runs.

For example, if you had an Organization base object that contained multiple types of
organizations (customers, vendors, prospects, partners, and so on), you could define
different match rule sets that selectively processed only the type of records you want to
match: MatchAll (no filter), MatchCustomersOnly, MatchVendorsOnly, and so on.

Filtering SQL

By default, when the Match batch job is run (see “Match Jobs” on page 734), the
match rule set processes all records. If the Enable Filtering check box (see “Enable
Filtering” on page 536) is selected (checked), you can specify a filter condition to
restrict processing to only those rules that meet the filter condition. A filter is analogous
to a WHERE clause in a SQL statement. The filter expression can be any expression
that is valid for the WHERE clause syntax used in your database platform.

Note: The match rule set filter is applied to the base object records that are selected
for the match batch only (the records to match from)—not the records in the match pool
(the records to match to). For more information, see “Flagging the Match Batch” on
page 329.

536 Siperian Hub Administrator Guide


Configuring Match Rule Sets

For example, suppose your implementation had an Organization base object that
contained multiple types of organizations (customers, vendors, prospects, partners, and
so on). Using filters, you could define a match rule set (MatchCustomersOnly) that
processed customer data only.
org_type=’C’

All other, non-customer records would be ignored and not processed by the Match job.

Note: It is the administrator’s responsibility to specify an appropriate SQL expression


that correctly filters records during the Match job. The Schema Manager validates the
SQL syntax according to your database platform, but it does not check the logic or
suitability of your filter condition.

Match Rules

This area of the window displays a list of match column rules that have been
configured for the selected match rule set. For more information, see “Configuring
Match Column Rules for Match Rule Sets” on page 542.

Navigating to the Match Rule Set Tab


To navigate to the Match Rule Set tab:
1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base
object that you want to configure. For more information, see “Navigating to the
Match/Merge Setup Details Dialog” on page 486.
2. Click the Match Rule Sets tab.

Configuring the Match Process 537


Configuring Match Rule Sets

The Schema Manager displays the Match Rule Sets tab for the selected base object.

The Match Rule Sets tab consists of the following sections:

Search Level Description


Match Rule Sets List of configured match rule sets.
Properties Properties for the selected match rule set.

Adding Match Rule Sets


To add a new match rule set:
1. In the Schema Manager, display the Match Rule Sets tab in the Match/Merge
Setup Details dialog for the base object that you want to configure. For more
information, see “Navigating to the Match Rule Set Tab” on page 537.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the button.
The Schema Manager displays the Add Match Rule Set dialog.

4. Enter a unique, descriptive name for this new match rule set.

538 Siperian Hub Administrator Guide


Configuring Match Rule Sets

5. Click OK.
The Schema Manager adds the new match rule set to the list.
6. Configure the match rule set according to the instructions in the next section,
“Editing Match Rule Set Properties” on page 539.

Editing Match Rule Set Properties


To edit the properties of a match rule set:
1. In the Schema Manager, display the Match Rule Sets tab in the Match/Merge
Setup Details dialog for the base object that you want to configure. For more
information, see “Navigating to the Match Rule Set Tab” on page 537.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the match rule set that you want to configure.
The Schema Manager displays its properties in the properties panel.
• The following example shows the properties for a fuzzy-match base object.

Configuring the Match Process 539


Configuring Match Rule Sets

• The following example shows the properties for an exact-match base object.

4. Configure properties for this match rule set. For more information, see “Match
Rule Set Properties” on page 534.
5. Configure match columns for this match rule set according to the instructions in
“Configuring Match Column Rules for Match Rule Sets” on page 542.
6. Click the Save button to save your changes.
Before saving changes, the Schema Manager analyzes the match rule set and
prompts you with a message if the match rule set contains certain incongruences.
For more information, see “Rule Set Evaluation” on page 533.
7. If you are prompted to confirm saving changes, click OK button to save your
changes.

540 Siperian Hub Administrator Guide


Configuring Match Rule Sets

Renaming Match Rule Sets


To rename a match rule set:
1. In the Schema Manager, display the Match Rule Sets tab in the Match/Merge
Setup Details dialog for the base object that you want to configure. For more
information, see “Navigating to the Match Rule Set Tab” on page 537.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the match rule set that you want to rename.
4. Click the button.
The Schema Manager displays the Edit Rule Set Name dialog.

5. Specify a unique, descriptive name for this match rule set.


6. Click OK.
The Schema Manager updates the name of the match rule set in the list.

Configuring the Match Process 541


Configuring Match Column Rules for Match Rule Sets

Deleting Match Rule Sets


To delete a match rule set:
1. In the Schema Manager, display the Match Rule Sets tab in the Match/Merge
Setup Details dialog for the base object that you want to configure. For more
information, see “Navigating to the Match Rule Set Tab” on page 537.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Select the name of the match rule set that you want to delete.
4. Click the button.
The Schema Manager prompts you to confirm deletion.
5. Click Yes.
The Schema Manager removes the deleted match rule set, along with all of the
match column rules it contains, from the list.

Configuring Match Column Rules for Match Rule


Sets
This section describes how to configure match column rules for a match rule set in
your Siperian Hub implementation. For more information about match rules sets, see
“Configuring Match Rule Sets” on page 531. For more information about the
difference between match column rules and primary key rules, see “Configuring
Primary Key Match Rules” on page 578.

About Match Column Rules


A match column rule determines what constitutes a match during the match process.
Match column rules determine whether two records are similar enough to consolidate.
Each match rule is defined as a set of one or more match columns that it needs to
examine for points of similarity. Match rules are configured by setting the conditions
for identifying matching records within and across source systems. For more
information, see “About the Match Process” on page 317.

542 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

Prerequisites for Configuring Match Column Rules

You can configure match column rules only after you have:
• configured the columns that you intend to use in your match rules, as described in
“Configuring Match Columns” on page 515
• created at least one match rule set, as described in “Configuring Match Rule Sets”
on page 531

Match Column Rules Differ Between Exact-Match and


Fuzzy-Match Base Objects

The properties for match column rules differ between exact match and fuzzy-match
base objects (see “Exact-match and Fuzzy-match Base Objects” on page 320).
• For exact-match base objects, you can configure only exact column types.
• For fuzzy-match base objects, you can configure fuzzy or exact column types.
For more information, see “Match Rule Properties for Fuzzy-match Base Objects
Only” on page 544.

Specifying Consolidation Options for Matched Records

For each match column rule, decide whether matched records should be automatically or
manually consolidated. For more information, see “Specifying Consolidation Options
for Match Column Rules” on page 574 and “Consolidating Records Automatically or
Manually” on page 336.

Configuring the Match Process 543


Configuring Match Column Rules for Match Rule Sets

Match Rule Properties for Fuzzy-match Base Objects Only

This section describes match rule properties for fuzzy-match base objects.
These properties do not apply to exact-match base objects.

Match / Search Strategy

For fuzzy-match base objects, the match / search strategy defines the strategy that Siperian
Hub uses for searching and matching in the match rule. Select one of the following
options:

Strategy Option Description


Fuzzy Probabilistic match that takes into account spelling variations, possible
misspellings, and other differences that can make matching records
non-identical.
Exact Matches only records that are identical.

Certain configuration settings on the Match / Merge Setup tab apply to only one type
of column. In this document, such features are indicated with a graphic that shows
whether it applies to fuzzy-match columns only (as in the following example), or
exact-match columns only. No graphic means that the feature applies to both.

The match / search strategy determines how to match candidate A with candidate B
using fuzzy or exact methods. The match / search strategy can affects the quantity and
quality of the match candidates. An exact match / search strategy requires clean and

544 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

complete data—it might miss some matches if the data is less clean, incomplete, or full
of duplicates. When defining match rule properties, you must find the optimal balance
between finding all possible candidates, and not encumber the process with too many
irrelevant candidates.

Note: This match / search strategy is configured at the match rule level. For more
information about the match / search strategy configured at the base object level
(which determines whether it is a fuzzy-match base object or exact-match base object),
see “Match/Search Strategy” on page 493.

When specifying the match / search strategy for a fuzzy-match base object, consider
the implications of configuring the following types of match rules:

Type of Match Rule Applies to


Fuzzy - Fuzzy Search Fuzzy and exact-match columns.
Strategy
Exact - Exact Search Exact-match columns only. This option bypasses the fuzziness of
Strategy the base object and executes a simple exact match rule on a fuzzy
base object.
Filtered - Fuzzy Search Exact-match columns only. This option uses the fuzzy match key
Strategy as a filter, and then applies the exact match rule.

Match Purpose

For fuzzy-match base objects, the match purpose defines the primary goal behind a match
rule. For example, if you're trying to identify matches for people where address is an
important part of determining whether two records are for the same person, then you
would choose the Match Purpose called Resident.

For every match rule you define, you must choose the purpose of the rule from a list of
predefined match purposes provided by Siperian. Each match purpose contains
knowledge about how best to compare two records to achieve the purpose of the
match. Siperian Hub uses the selected match purpose as a basis for applying the match
rules to determine matched records. The behavior of the rules is dependent on the

Configuring the Match Process 545


Configuring Match Column Rules for Match Rule Sets

selected purpose. The list of available match purposes depends on the population used,
as described in “Fuzzy Population” on page 494,

What the Match Purpose Determines

The match purpose determines:


• how your match rules behave
• which columns are required
• how much attention Siperian Hub pays to each of the columns used in the match
process

Two rules with all attributes identical (except for the purpose) will return different sets
of matches because of the different purpose.

Mandatory and Optional Fields

Each match purpose supports a combination of mandatory and optional fields. Each
field is weighted according to its influence in the match decision. Some fields in some
purposes may be grouped. There are two types of groupings:
• Required—requires at least one of the field members to be non-null
• Best of—contributes only the best score from the fields in the group to the overall
match score

For example, in the Individual match purpose:


• Person_Name is a mandatory field
• One of either ID Number or Date of Birth is required
• Other attributes are optional

The overall score returned by each purpose is calculated by adding the participating
field scores multiplied by their respective weight and divided by the total of all field
weights. If a field is optional and is not provided, it is not included in the weight
calculation.

546 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

Name Formats

Siperian Hub match has the concept of a default name format which tells it where to
expect the last name. The options are:
• Left—last name is at the start of the full name, for example Smith Jim
• Right—last name is at the end of the full name, for example, Jim Smith

The name format used by Siperian Hub depends on the purpose that you're using.
If you are using Organization, then the default is Last name, First name, Middle name.
If using Person/Resident then the default is First Middle Last.

Bear this in mind when formatting data for matching. It might not make a big
difference, but there are edge cases where it helps, particularly for names that do not
fall within the selected population.

Configuring the Match Process 547


Configuring Match Column Rules for Match Rule Sets

List of Match Purposes

Siperian supplies the following match purposes:


Match Purpose Settings
Match Purpose Description
Person_Name This purpose is for matches intended to identify a person by name. This
purpose is best suited to online searches when a name-only lookup is
required and a human is available to make the choice. Matching in batch
typically requires other attributes in addition to name to make match
decisions. Use this purpose only when the rule does not contain address
fields. This purpose will allow matches between people with an address
and those without an address. If the rules contain address fields, use the
Resident purpose instead.
This purpose uses the following fields:
• Person_Name (Required)
• Address_Part1
• Address_Part2
• Postal_Area
• Telephone_Number
• ID
• Date
• Attribute1
• Attribute2
Unless otherwise indicated, fields are not required.
To achieve a “best of ” score between Address_Part2 and Postal_Area,
use Postal_Area as a repeat value in the Address_Part2 field.

548 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)


Match Purpose Description
Individual This purpose is intended to identify a specific individual by name and
with either the same ID number or date of birth attributes.
Since this purpose requires additional information, it is typically used
after a search by Person_Name.
This purpose uses the following fields:
• Person_Name (Required)
• ID-Either ID or Date are required (Using both is acceptable.)
• Date
• Attribute1
• Attribute2
Unless otherwise indicated, fields are not required.
Resident Intended to identify a person at an address. This purpose is typically used
after a search by either Person_Name or Address_Part1. Optional input
fields help qualify or rank a match if more information is available.
To achieve a “best of ” score between Address_Part2 and Postal_Area,
pass Postal_Area as a repeat value in the Address_Part2 field.
This purpose uses the following fields:
• Person_Name (Required)
• Address_Part1 (Required)
• Address_Part2
• Postal_Area
• Telephone_Number
• ID
• Date
• Attribute1
• Attribute2
Unless otherwise indicated, fields are not required.

Configuring the Match Process 549


Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)


Match Purpose Description
Household Designed to identify matches where individuals with the same or similar
family names share the same address.
This purpose is typically used after a search by Address_Part1. (Note: it
is not practical to search by Person_Name because ultimately only one
word from the Person_Name must match, and a one-word search will
not perform well in most situations).
Emphasis is placed on the Last Name, the major word of the Person_
Name field, so this is one of the few cases where word order is important
in the way the records are presented for matching.
However, a reasonable score will be generated provided that a match
occurs between the major word in one name and any other word in the
other name.
This purpose uses the following fields:
• Person_Name (Required)
• Address_Part1 (Required)
• Address_Part2
• Postal_Area
• Telephone_Number
• Attribute1
• Attribute2
Unless otherwise indicated, fields are not required.
To achieve a “best of ” score between Address_Part2 and Postal_Area,
pass Postal_Area as a repeat value in the Address_Part2 field.

550 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)


Match Purpose Description
Family Designed to identify matches where individuals with the same or similar
family names share the same address or the same telephone number.
This purpose is typically used after a tiered search (multi-search) by
Address_Part1 and Telephone_Number. (Note: it is not practical to
search by Person_Name because ultimately only one word from the
Person_Name needs to match, and a one-word search will not perform
well in most situations).
Emphasis is placed on the Last Name, the major word of the Person_
Name field, so this is one of the few cases where word order is important
in the way the records are presented for matching.
However, a reasonable score will be generated provided that a match
occurs between the major word in one name and any other word in the
other name.
This purpose uses the following fields:
• Person_Name (Required)
• Address_Part1 (Required)
• Telephone_Number (Required) (Score will be based on best of
Address_Part_1 and Telephone_Number)
• Address_Part2
• Postal_Area
• Attribute1
• Attribute2
Unless otherwise indicated, fields are not required.
To achieve a “best of ” score between Address_Part2 and Postal_Area,
pass Postal_Area as a repeat value in the Address_Part2 field.

Configuring the Match Process 551


Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)


Match Purpose Description
Wide_Household Designed to identify matches where the same address is shared by
individuals with the same family name or with the same telephone
number.
This purpose is typically used after a search by Address_Part1. (Note: it
is not practical to search by Person_Name because ultimately only one
word from the Person_Name needs to match, and a one-word search will
not perform well in most situations).
Emphasis is placed on the last name, the major word of the Person_
Name field, so this is one of the few cases where word order is important
in the way the records are presented for matching.
However, a reasonable score will be generated provided that a match
occurs between the major word in one name and any other word in the
other name.
This purpose uses the following fields:
• Address_Part1 (Required)
• Person_Name (Required)
• Telephone_Number (Required)
• Score will be based on best of Person_Name and Telephone_
Number
• Address_Part2
• Postal_Area
• Attribute1
• Attribute2
Unless otherwise indicated, fields are not required.
To achieve a “best of ” score between Address_Part2 and Postal_Area,
pass Postal_Area as a repeat value in the Address_Part2 field.

552 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)


Match Purpose Description
Address Designed to identify an address match. The address might be postal,
residential, delivery, descriptive, formal, or informal.
The only required field is Address_Part1. The fields Address_Part2,
Postal_Area, Telephone_Number, ID, Date, Attribute1 and Attribute2
are available as optional input fields to further differentiate an address.
For example if the name of a City and/or State is provided as Address_
Part2, it will help differentiate between a common street address [100
Main Street] in different locations.
This purpose uses the following fields:
• Address_Part1 (Required)
• Address_Part2
• Postal_Area
• Telephone_Number
• ID
• Date
• Attribute1
• Attribute2
Unless otherwise indicated, fields are not required.
To achieve a “best of ” score between Address_Part2 and Postal_Area,
pass Postal_Area as a repeat value in the Address_Part2. In that case, the
Address_Part2 score used will be the higher of the two scored fields.

Configuring the Match Process 553


Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)


Match Purpose Description
Organization Designed to match organizations primarily by name. It is targeted at
online searches when a name only lookup is required and a human is
available to make the choice. Matching in batch typically requires other
attributes in addition to name to make match decisions. Use this purpose
only when the rule does not contain address fields. This purpose will
allow matches between organizations with an address and those without
an address. If the rules contain address fields, use the Division purpose.
This purpose uses the following fields:
• Organization_Name (Required)
• Address_Part1
• Address_Part2
• Postal_Area
• Telephone_Number
• ID
• Date
• Attribute1
• Attribute2
Unless otherwise indicated, fields are not required. Any optional input
fields you provide refine the ranking of matches.
To achieve a “best of ” score between Address_Part2 and Postal_Area,
pass Postal_Area as a repeat value in the Address_Part2 field.

554 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)


Match Purpose Description
Division Designed to identify an Organization at an Address. It is typically used
after a search by Organization_Name or by Address_Part1, or both.
It is in essence the same purpose as Organization, except that Address_
Part1 is a required field. Thus, this Purpose is designed to match
company X at an address of Y (or Z, etc., if multiple addresses are
supplied).
This purpose uses the following fields:
• Organization_Name (Required)
• Address_Part1 (Required)
• Address_Part2
• Postal_Area
• Telephone_Number
• ID
• Attribute1
• Attribute2
Unless otherwise indicated, fields are not required.
To achieve a “best of ” score between Address_Part2 and Postal_Area,
pass Postal_Area as a repeat value in the Address_Part2 field.

Configuring the Match Process 555


Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)


Match Purpose Description
Contact Designed to identify a contact within an organization at a specific
location.
This Match purpose is typically used after a search by Person_Name.
However, either Organization_Name or Address_Part1 may be used as
the search criteria.
This purpose uses the following fields:
• Person_Name (Required)
• Organization_Name (Required)
• Address_Part1 (Required)
• Address_Part2
• Postal_Area
• Telephone_Number
• ID
• Date
• Attribute1
• Attribute2
Unless otherwise indicated, fields are not required.
To achieve a “best of ” score between Address_Part2 and Postal_Area,
pass Postal_Area as a repeat value in the Address_Part2 field.

556 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

Match Purpose Settings (Cont.)


Match Purpose Description
Corporate_Entity Designed to identify an Organization by its legal corporate name,
including the legal endings such as INC, LTD, etc. It is designed for
applications that need to honor the differences between such names as
ABC TRADING INC and ABC TRADING LTD.
This purpose is typically used after a search by Organization_Name. It is
in essence the same purpose as Organization, except that tighter
matching is performed and legal endings are not treated as noise.
This purpose uses the following fields:
• Organization_Name (Required)
• Address_Part1
• Address_Part2
• Postal_Area
• Telephone_Number
• ID
• Attribute1
• Attribute2
Unless otherwise indicated, fields are not required.
To achieve a “best of ” score between Address_Part2 and Postal_Area,
pass Postal_Area as a repeat value in the Address_Part2 field.
Wide_Contact Designed to loosely identify a contact within an organization—that is,
without regard to actual location.
It is typically used after a search by Person_Name.
In addition to the required fields, ID, Attribute1 and Attribute2 may be
optionally provided for matching to further qualify a contact.
This purpose uses the following fields:
• Person_Name (Required)
• Organization_name (Required)
• ID
• Attribute1
• Attribute2
Unless otherwise indicated, fields are not required.
Fields Provided for general, non-specific use. It is designed in such a way that
there are no required fields. All field types are available as optional input
fields.

Configuring the Match Process 557


Configuring Match Column Rules for Match Rule Sets

Match Levels

For fuzzy-match base objects, the match level determines how precise the match is.
You can specify one of the following match levels for a fuzzy-match base object:
Match Levels
Level Description
Typical Appropriate for most matches.
Conservative Produces fewer matches than the Typical level. Some data that actually
matches may pass through the match process without being flagged as a
match. This situation is called undermatching.
Loose Produces more matches than the Typical level. Loose matching may
produce a significant number of match candidates that are not really
matches. This situation is called overmatching. You might choose to use this in
a match rule for manual merges, to make sure that other, tighter match
rules have not missed any potential matches.

Select the level based on your knowledge of the data to be matched: Typical,
Conservative (fewer matches), or Looser (more matches). When in doubt, use Typical.

Accept Limit Adjustment

For fuzzy-match base objects, the accept limit is a number that determines the
acceptability of a match. This setting does the exact same thing as the match level (see
“Match Levels” on page 558), but to a more granular degree. The accept limit is
defined by Siperian within a population in accordance with its match purpose. The
Accept Limit Adjustment allows a coarse adjustment to what is considered to be a
match for this match rule.
• A positive adjustment results in more conservative matching.
• A negative adjustment results in looser matching.

558 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

For example, suppose that, for a given field and a given population, the accept limit for
a typical match level is 80, for a loose match level is 70, and for a conservative match
level is 90. If you specify a positive number (such as 3) for the adjustment, then the
accept level becomes slightly more conservative. If you specify a negative number (such
as -2), then the accept level becomes looser.

Configuring this setting provides a optional refinement to your match settings that
might be helpful in certain circumstances. Adjusting the accept limit even a few points
can have a dramatic effect on your matches, resulting in overmatching or
undermatching. Therefore, it is recommended that you test different settings iteratively,
with small increments, to determine the best setting for your data.

Match Column Properties for Match Rules


This section describes the match column properties that you can configure for match
rules.

Match Subtype

For base objects containing different types of data, the match subtype option allows you
to apply match rules to specific types of data within the same base object. You have the
option to enable or disable match subtyping for exact-match columns that have
parent/child path components. Match subtype is available only for:
• exact-match column types that are based on a non-root Path Component, and
• match rules that have a fuzzy match / search strategy

To use match subtyping, for each match rule, specify one or more exact-match column(s)
that will serve as the “subtyping” column(s) to use. The subtype indicator can be set
for any of the exact-match columns regardless of whether they are used for segment
match or not. During the match process, evaluation of the subtype column precedes

Configuring the Match Process 559


Configuring Match Column Rules for Match Rule Sets

evaluation of the other match columns. Use match subtyping judiciously, because it can
have a performance impact on the match process.

Match Subtype behaves just like a standard parent/child matching scenario with the
additional requirement that the match column marked as Match Subtype must be the
same across all records being matched. In the following example, the Match Subtype
column is Address Type and the match rule consists of Address Line1, City, and State.

Parent ID Address Line 1 City State Address Type


3 123 Main NYC ON Billing
3 50 John St Toronto NY Shipping
5 123 Main Toronto BC Billing
5 20 Adelaide St Markham AB Shipping
5 50 John St Ottawa ON Billing
7 50 John St Barrie BC Billing
7 20 Adelaide St Toronto NB Shipping
7 90 Yonge St Toronto ON Billing

Without Match Subtype, Parent ID 3 would match with 5 and 7. With Match Subtype,
however, Parent ID 3 will not match with 5 nor 7 because the matching rows are
distributed between different Address Types. Parent ID 5 and 7 will match with each
other, however, because the matching rows all fall within the 'Billing' Address Type.

Non-Equal Matching

Note: Non-Equal Matching and Segment Matching are mutually exclusive. If one is
selected, then the other cannot be selected.

Use non-equal matching in match rules to prevent equal values in a column from
matching each other. Non-equal matching applies only to exact-match columns.

560 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

NULL Matching

Note: Null Matching and Segment Matching are mutually exclusive. If one is selected,
then the other cannot be selected.

Use NULL matching to specify how the match process should behave when null values
match other null values. NULL matching applies only to exact-match columns.

By default, null matching is disabled, meaning that Siperian Hub treats nulls as unequal
values when it searches for matches (a null value will not match with anything).
To enable null matching, you must explicitly select a null matching option for the
match columns to allow null matching.

A match column containing a NULL value is identified as matching based on the


following settings:

Property Description
Disabled Regardless of the other value, nothing will match (nulls are
unequal values). Default setting. A NULL is seen as a
placeholder for an unknown value.
NULL Matches NULL If both values are NULL, then it is considered a match.
NULL Matches Non-NULL If one value is NULL and the other value is not NULL,
then it is considered a match.

Once null matching is configured, Build Match Groups will allow only a single “Null to
non NULL” match into any group, thereby reducing the possibility of unwanted
transitive matching. For more information, see “Build Match Groups and Transitive
Matches” on page 327.

Note: Null matching is exclusive of exact matching. For example, if you enable NULL
Matches Non-Null, the match rule returns only those matches in which one of the cell
values is NULL. It will not provide exact matches where both cells are equal in
addition to also matching NULL against non-NULL. Therefore, if you need both

Configuring the Match Process 561


Configuring Match Column Rules for Match Rule Sets

behaviors, you must create two exact match rules—one with NULL matching enabled,
and the other with NULL matching disabled.

Segment Matching

Note: Segment Matching and Non-Equal Matching are mutually exclusive. If one is
selected, then the other cannot be selected. Segment Matching and NULL Matching
are also mutually exclusive. If one is selected, then the other cannot be selected.

For exact-match columns only, you can use segment matching to limit match rules to
specific subsets of data. For example, you could define different match rules for
customers in different countries by using segment matching to limit certain rules to
specific country codes. Segment matching applies to both exact-match and
fuzzy-match base objects. For more information, see “Configuring Segment Matching
for a Column” on page 576.

If the Segment Matching check box is checked (selected), you can configure two other
options: Segment Matches All Data and Segment Match Values.

Segment Matches All Data

When unchecked (the default), Siperian Hub will only match records within the set of
values defined in Segment Match Values. For example, suppose a base object contained
Leads, Partners, Customers, and Suppliers. If Segment Match Values contained the
values Leads and Partners, and Segment Matches All Data were unchecked, then
Siperian would only match within records that contain Leads or Partners.
All Customers and Suppliers records will be ignored.

With Segment Matches All Data checked (selected), then Leads and Partners would
match with Customers and Suppliers, but Customers and Suppliers would not match
with each other.

562 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

Segment Match Values

For segment matching, specifies the list of segment values to use for segment matching.
You must specify one or more values (for a match column) that defines the segment
matching. For example, for a given match rule, suppose you wanted to define segment
matching by Gender. If you specified a segment match value of M (for male), then, for
that match rule, Siperian Hub searches for matches (based on the other match
columns) only on male records—and can only match to other male records, unless you
also enabled Segment Matches All Data.

Note: Segment match values are case-sensitive. When using segment matching on
fuzzy and exact base objects, the values that you set are case-sensitive when executing
the Match batch job.

Concatenation of Values in Multiple Columns

For exact matches with segment matching enabled on concatenated columns, a space
character must be added to each piece of data present in the concatenated fields.

Note: Concatenating columns is not recommended for exact match columns.

Requirements for Exact-match Columns in Match Column


Rules

Exact-match columns are subject to the following rules:


• The names of exact-match columns cannot be longer than 26 characters.
• Exact-match columns must be of type VARCHAR or CHAR.
• Match columns can be used to match on any text column or combination of text
columns from a base object.
• If you want to use numerics or dates, you must convert them to VARCHAR using
cleanse functions before they are loaded into your base object. For more
information, see “Using Cleanse Functions” on page 414.

Configuring the Match Process 563


Configuring Match Column Rules for Match Rule Sets

• Match columns can also be used to match on a column from a child base object,
which in turn can be based on any text column or combination of text columns in
the child base object. Matching on the match columns of a child base object is
called intertable matching.
• When using intertable match and creating match rules for the child table (via a
foreign key), you must include the foreign key from the parent table in each match
rule on the child. If you do not, when the child is merged, the parent records
would lose the child records that had previously belonged to them.

For more information, see “Match Columns Depend on the Search Strategy” on page
515.

Command Buttons for Configuring Column Match Rules


In the Match Rule Sets tab, if you select a match rule set in the list, the Schema
Manager displays the following command buttons.

Button Description
Adds a match rule. For more information, see “Adding Match Column Rules” on
page 565.
Edits properties for the selected a match rule. For more information, see “Editing
Match Column Rules” on page 570.
Deletes the selected match rule. For more information, see “Deleting Match
Column Rules” on page 572.
Moves the selected match rule up in the sequence. For more information, see
“Changing the Execution Sequence of Match Column Rules” on page 573.
Moves the selected match rule down in the sequence. For more information, see
“Changing the Execution Sequence of Match Column Rules” on page 573.
Changes a manual consolidation rule to an automatic consolidation rule. Select a
manual consolidation record and then click the button. For more information, see
“Specifying Consolidation Options for Match Column Rules” on page 574.
Changes an automatic consolidation rule to a manual consolidation rule. Select an
automatic consolidation record and then click the button. For more information,
see “Specifying Consolidation Options for Match Column Rules” on page 574.

564 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

Important: If you change your match rules after matching, you are prompted to reset
your matches. When you reset your matches, it deletes everything in the match table
and, in records where the consolidation indicator is 2, resets the consolidation indicator
to 4. For more information, see “About the Consolidate Process” on page 335 and
“Reset Match Table Jobs” on page 744.

Adding Match Column Rules


To add a new match rule using match columns:
1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base
object that you want to configure. For more information, see “Navigating to the
Match/Merge Setup Details Dialog” on page 486.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Match Rule Sets tab. For more information, see “Navigating to the Match
Rule Set Tab” on page 537.
4. Select a match rule set in the list.

Configuring the Match Process 565


Configuring Match Column Rules for Match Rule Sets

The Schema Manager displays the properties for the selected match rule set.

5. In the Match Rules section of the screen, click the plus button .
The Schema Manager displays the Edit Match Rule dialog. This dialog differs
slightly between exact match and fuzzy-match base objects.

566 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

Exact-match Base Objects

Configuring the Match Process 567


Configuring Match Column Rules for Match Rule Sets

Fuzzy-match Base Objects

6. For fuzzy-match base objects, configure the match rule properties at the top of the
dialog box. For more information, see “Match Rule Properties for Fuzzy-match
Base Objects Only” on page 544.
7. Configure the match column(s) for this match rule.
Only columns you have previously defined as match columns are shown.
• For exact-match base objects or match rules with an exact match / search
strategy, only exact column types are available.
• For fuzzy-match base objects, you can choose fuzzy or exact column types.
To learn more, see “Match Columns Depend on the Search Strategy” on page 515.
a. Click the Edit button next to the Match Columns list.

568 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

The Schema Manager displays the Add/Remove Match Columns dialog.

b. Check (select) the check box next to any column that you want to include.
c. Uncheck (clear) the check box next to any column that you want to omit.
d. Click OK.
The Schema Manager displays the selected columns in the Match Columns list.

8. Configure the match properties for each match column in the Match Columns list.
For more information, see:
• “Match Column Properties for Match Rules” on page 559
• “Configuring the Match Weight of a Column” on page 575
• “Configuring Segment Matching for a Column” on page 576
• “NULL Matching” on page 561

Configuring the Match Process 569


Configuring Match Column Rules for Match Rule Sets

• “Match Subtype” on page 559


9. Click OK.
10. If this is an exact match, specify the match properties for this match rule. For more
information, see “Requirements for Exact-match Columns in Match Column
Rules” on page 563. Click OK.
11. Click the Save button to save your changes.
Before saving changes, the Schema Manager analyzes the match rule set and
prompts you with a message if the match rule set contains certain incongruences.
For more information, see “Rule Set Evaluation” on page 533.
12. If you are prompted to confirm saving changes, click OK button to save your
changes.

Editing Match Column Rules


To edit the properties for an existing match rule:
1. In the Schema Manager, display the Match/Merge Setup Details dialog for the
exact-match base object that you want to configure. For more information, see
“Navigating to the Match/Merge Setup Details Dialog” on page 486.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Match Rule Sets tab. For more information, see “Navigating to the Match
Rule Set Tab” on page 537.
4. Select a match rule set in the list.
The Schema Manager displays the properties for the selected match rule set.
5. In the Match Rules section of the screen, click the Edit button.
The Schema Manager displays the Edit Match Rule dialog. This dialog differs
slightly between exact match and fuzzy-match base objects. For more information,
see “Adding Match Column Rules” on page 565.
6. For fuzzy-match base objects, change the match rule properties at the top of the
dialog box, if you want. For more information, see “Match Rule Properties for
Fuzzy-match Base Objects Only” on page 544.

570 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

7. Configure the match column(s) for this match rule, if you want.
Only columns you have previously defined as match columns are shown.
• For exact-match base objects or match rules with an exact match / search
strategy, only exact column types are available.
• For fuzzy-match base objects, you can choose fuzzy or exact columns types.
To learn more, see “Match Columns Depend on the Search Strategy” on page 515.
a. Click the Edit button next to the Match Columns list.
The Schema Manager displays the Add/Remove Match Columns dialog.

b. Check (select) the check box next to any column that you want to include.
c. Uncheck (clear) the check box next to any column that you want to omit.
d. Click OK.
The Schema Manager displays the selected columns in the Match Columns list.
8. Change the match properties for any match column that you want to edit. For
more information, see:
• “Match Column Properties for Match Rules” on page 559
• “Configuring the Match Weight of a Column” on page 575
• “Configuring Segment Matching for a Column” on page 576
• “NULL Matching” on page 561
• “Match Subtype” on page 559
9. Click OK.

Configuring the Match Process 571


Configuring Match Column Rules for Match Rule Sets

10. If this is an exact match, specify the match properties for this match rule. For more
information, see “Requirements for Exact-match Columns in Match Column
Rules” on page 563. Click OK.
11. Click the Save button to save your changes.
Before saving changes, the Schema Manager analyzes the match rule set and
prompts you with a message if the match rule set contains certain incongruences.
For more information, see “Rule Set Evaluation” on page 533.
12. If you are prompted to confirm saving changes, click OK button to save your
changes.

Deleting Match Column Rules


To delete a match column rule:
1. In the Schema Manager, display the Match/Merge Setup Details dialog for the
exact-match base object that you want to configure. For more information, see
“Navigating to the Match/Merge Setup Details Dialog” on page 486.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Match Rule Sets tab. For more information, see “Navigating to the Match
Rule Set Tab” on page 537.
4. Select a match rule set in the list.
5. In the Match Rules section, select the match rule that you want to delete.
6. Click the Remove button.
The Schema Manager prompts you to confirm deletion.
7. Click Yes.

572 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

Changing the Execution Sequence of Match Column Rules


To change the execution sequence of match column rules:
1. In the Schema Manager, display the Match/Merge Setup Details dialog for the
exact-match base object that you want to configure. For more information, see
“Navigating to the Match/Merge Setup Details Dialog” on page 486.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Match Rule Sets tab. For more information, see “Navigating to the Match
Rule Set Tab” on page 537.
4. Select a match rule set in the list.
5. In the Match Rules section, select the match rule that you want to move up or
down.
6. Do one of the following:
• Click the button to move the selected match rule up in the execution
sequence.
• Click the button to move the selected match rule down in the execution
sequence.
7. Click the Save button to save your changes.
Before saving changes, the Schema Manager analyzes the match rule set and
prompts you with a message if the match rule set contains certain incongruences.
For more information, see “Rule Set Evaluation” on page 533.
8. If you are prompted to confirm saving changes, click OK button to save your
changes.

Configuring the Match Process 573


Configuring Match Column Rules for Match Rule Sets

Specifying Consolidation Options for Match Column Rules


During the match process, a match column rule must determine whether matched
records should be queued for manual or automatic consolidation. For more
information, see “About the Consolidate Process” on page 335.

Note: A base object cannot have more than 200 user-defined columns if it will have
match rules that are configured for automatic consolidation.

To toggle between manual and automatic consolidation for a match rule:


1. In the Schema Manager, display the Match/Merge Setup Details dialog for the
exact-match base object that you want to configure. For more information, see
“Navigating to the Match/Merge Setup Details Dialog” on page 486.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Match Rule Sets tab. For more information, see “Navigating to the Match
Rule Set Tab” on page 537.
4. Select a match rule set in the list.
5. In the Match Rules section, select the match rule that you want to configure.
6. Do one of the following:
• Click the button to change a manual consolidation rule to an automatic
consolidation rule.
• Click the button to change an automatic consolidation rule to a manual
consolidation rule.
7. Click the Save button to save your changes.
Before saving changes, the Schema Manager analyzes the match rule set and
prompts you with a message if the match rule set contains certain incongruences.
For more information, see “Rule Set Evaluation” on page 533.
8. If you are prompted to confirm saving changes, click OK button to save your
changes.

574 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

Configuring the Match Weight of a Column

For a fuzzy-match column, you can change its match weight in the Edit Match Rule
dialog box. For each column, Siperian Hub assigns an internal match weight, which is a
number that indicates the importance of this column (relative to other columns in the
table) for matching. The match weight varies according to the selected match purpose
and population. For example, if the match purpose is Person_Name, then Siperian
Hub, when evaluating matches, views a data match in the name column with greater
importance than a data match in a different column (such as the address).

By adjusting the match weight of a column, you give added weight to, and elevate the
significance of, that column (relative to other columns) when Siperian Hub analyzes
values for matches.

To configure the match weight of a column:


1. In the Edit Match Rule dialog box, select a column in the list.
2. Click the Match Weight Adjustment button.

Configuring the Match Process 575


Configuring Match Column Rules for Match Rule Sets

If adjusted, the name of the selected column shows in a bold font.

3. Click the Save button to save your changes.


Before saving changes, the Schema Manager analyzes the match rule set and
prompts you with a message if the match rule set contains certain incongruences.
For more information, see “Rule Set Evaluation” on page 533.
4. If you are prompted to confirm saving changes, click OK button to save your
changes.

Configuring Segment Matching for a Column

As described in “Segment Matching” on page 562, segment matching is used with


exact-match columns to limit match rules to specific subsets of data.

To configure segment matching for an exact-match column

576 Siperian Hub Administrator Guide


Configuring Match Column Rules for Match Rule Sets

1. In the Edit Match Rule dialog box, select an exact-match column in the Match
Columns list.
2. Check (select) the Segment Matching check box to enable this feature.
3. Check (select) the Segment Matches All Data check box, if you want. For more
information, see “Segment Matches All Data” on page 562.
4. Specify the segment match values for segment matching. For more information,
see “Segment Match Values” on page 563.
a. Click the Edit button.
The Schema Manager displays the Edit Values dialog.

b. Do one of the following:


• To add a value, click , type the value you want to add, and click OK.
• To delete a value, select it in the list, click , and choose Yes when
prompted to confirm deletion.
5. Click OK.
6. Click the Save button to save your changes.
Before saving changes, the Schema Manager analyzes the match rule set and
prompts you with a message if the match rule set contains certain incongruences.
For more information, see “Rule Set Evaluation” on page 533.
7. If you are prompted to confirm saving changes, click OK button to save your
changes.

Configuring the Match Process 577


Configuring Primary Key Match Rules

Configuring Primary Key Match Rules


This section describes how to configure primary key match rules for your Siperian Hub
implementation. If you want to configure match column match rules instead, see the
instructions in “Configuring Match Columns” on page 515.

About Primary Key Match Rules


Matching on primary keys can be used when two or more different source systems for
a base object have identical primary key values. This situation occurs infrequently in
source systems, but when it does occur, you can make use of the primary key matching
option in Siperian Hub to rapidly match and automatically consolidated records from
the source systems that have the matching primary keys.

For example, two systems might use the same set of customer IDs. If both systems
provide information about customer XYZ123 using identical primary key values, the
two systems are certainly referring to the same customer and the records should be
automatically consolidated.

When you specify a primary key match, you simply specify which source systems that
have the same primary key values. You also check the Auto-merge matching records
check box to have Siperian Hub automatically consolidate matching records when a
Merge or Link batch job is run. To learn more, see “Automerge Jobs” on page 717 and
“Autolink Jobs” on page 715.

Adding Primary Key Match Rules


To add a new primary key match rule:
1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base
object that you want to configure. For more information, see “Navigating to the
Match/Merge Setup Details Dialog” on page 486.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Primary Key Match Rules tab.

578 Siperian Hub Administrator Guide


Configuring Primary Key Match Rules

The Schema Manager displays the Primary Key Match Rules tab.

The Primary Key Match Rules tab has the following columns.

Column Description
Key Combination Two source systems for which this primary match key rule will
be used for matching. These source systems must already be
defined in Siperian Hub (see “Configuring Source Systems” on
page 348), and staging tables for this base object must be
associated with these source systems (see “Configuring Staging
Tables” on page 364).
Auto-Merge Specifies whether this primary key match rule results in
automatic or manual consolidation. For more information, see
“About the Consolidate Process” on page 335.

4. Click the Plus button to add a primary match key rule.


The Add Primary Key Match Rule dialog is displayed.

Configuring the Match Process 579


Configuring Primary Key Match Rules

5. Check (select) the check box next to two source systems for which you want to
match records based on the primary key.
6. Check (select) the Auto-merge matching records check box if you are certain
that records with identical primary keys are matches.
You can change your choice for Auto-merge matching records later, if you want.
7. Click OK.
The Schema Manager displays the new rule in the Primary Key Rule tab.

8. Click the Save button to save your changes.


The Schema Manager asks you whether you want to reset existing matches.

9. Choose Yes. to delete all matches currently stored in the match table, if you want.

580 Siperian Hub Administrator Guide


Configuring Primary Key Match Rules

Editing Primary Key Match Rules


Once you have defined a primary key match rule, you can change the value of the
Auto-merge matching records check box.

To edit an existing primary key match rule:


1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base
object that you want to configure. For more information, see “Navigating to the
Match/Merge Setup Details Dialog” on page 486.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Primary Key Match Rules tab.
The Schema Manager displays the Primary Key Match Rules tab.

4. Scroll to the primary key match rule that you want to edit.
5. Check or uncheck the Auto-merge matching records check box to enable or
disable auto-merging, respectively.
6. Click the Save button to save your changes.

Configuring the Match Process 581


Configuring Primary Key Match Rules

The Schema Manager asks you whether you want to reset existing matches.

7. Choose Yes to delete all matches currently stored in the match table, if you want.

Deleting Primary Key Match Rules


To delete an existing primary key match rule:
1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base
object that you want to configure. For more information, see “Navigating to the
Match/Merge Setup Details Dialog” on page 486.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Primary Key Match Rules tab.
The Schema Manager displays the Primary Key Match Rules tab.

4. Select the primary key match rule that you want to delete.
5. Click the Delete button.
The Schema Manager prompts you to confirm deletion.
6. Choose Yes.
The Schema Manager removes the deleted rule from the Primary Key Match Rules
tab.

582 Siperian Hub Administrator Guide


Investigating the Distribution of Match Keys

7. Click the Save button to save your changes.


The Schema Manager asks you whether you want to reset existing matches.

8. Choose Yes to delete all matches currently stored in your Match table, if you want.

Investigating the Distribution of Match Keys


This section describes how to investigate the distribution of match keys in the match
key table.

About Match Keys Distribution


As described in “Match Keys and the Tokenization Process” on page 322, match keys
are strings that encode data in the fuzzy match key column used to identify candidates
for matching. The tokenization process generates match keys for all the records in a
base object and stores them in its match key table. Depending on the nature of the data
in the base object record, the tokenization process generates at least one match
key—and possibly multiple match keys—for each base object record. Match keys are
used subsequently in the match process to help determine possible matches between
base object records.

In the Match / Merge Setup Details pane of the Schema Manager, the Match Keys
Distribution tab allows you to investigate the distribution of match keys in the match
key table. This tool can assist you with identifying potential hot spots in your data—high
concentrations of match keys that could result in overmatching—where the match
process generates too many matches, including matches that are not relevant.
By knowing where hot spots occur in your data, you can refine data cleansing and
match rules to reduce hot spots and generate an optimal distribution of match keys for
use in the match process. Ideally, you want to have a relatively even distribution across
all keys.

Configuring the Match Process 583


Investigating the Distribution of Match Keys

Navigating to the Match Keys Distribution Tab


To navigate to the Match Keys Distribution tab:
1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base
object that you want to configure. For more information, see “Navigating to the
Match/Merge Setup Details Dialog” on page 486.
2. Click the Match Keys Distribution tab.
The Schema Manager displays the Match Keys Distribution tab.

Histogram

Match Keys List

Match Columns

584 Siperian Hub Administrator Guide


Investigating the Distribution of Match Keys

Components of the Match Keys Distribution Tab


The Match Key Distribution tab displays a histogram, match keys, and match columns.

Histogram

The histogram displays the statistical distribution of match keys in the match key table.

Axis Description
Key (X-axis) Starting character(s) of the match key. If no filter is applied (the default),
this is the starting character of the match key. If a filter is applied, this is the
starting sequence of characters in the match key, beginning with the
left-most character. For more information, see “Filtering Match Keys” on
page 587.
Count (Y-axis) Number of match keys in the match key table that begins with the starting
character(s). Hotspots in the match key table show up as disproportionately
tall spikes (high number of match keys), relative to other characters in the
histogram.

Configuring the Match Process 585


Investigating the Distribution of Match Keys

Match Keys List

67

The Match Keys List on the Match Keys Distribution tab displays records in the match
key table. For each record, it displays cell data for the following columns:

Column Name Description


ROWID ROWID_OBJECT that uniquely identifies the record in the base object
that is associated with this match key.
KEY Generated match key. SSA_KEY column in the match key table.

Depending on the configured match rules and the nature of the data in a record, a
single record in the base object table can have multiple generated match keys.

Multiple Match Keys


for Base Object Record

586 Siperian Hub Administrator Guide


Investigating the Distribution of Match Keys

Paging Through Records in the Match Key Table

Use the following command buttons to navigate the records in the match key table.

Button Description
Displays the first page of records in the match key table.

Displays the previous page of records in the match key table.

Displays the next page of records in the match key table.

Jumps to the page number you enter.

Match Columns

The Match Columns area on the Match Keys Distribution tab displays match column
data for the selected record in the match keys list. This is the SSA_DATA column in
the match key table. For each match column that is configured for this base object (see
“Configuring Match Columns” on page 515), it displays the column name and cell data.

Filtering Match Keys


You can use a match key filter to focus your investigation on hotspots or other match
key distribution patterns. A match key filter restricts the data in the Histogram and the
Match Keys List to the subset of match keys that meets the filter condition. By default,
no filter is defined—all records in the match key table are displayed.

The filter condition specifies the beginning string sequence for qualified match keys,
evaluated from left to right. For example, to view only match keys beginning with the
letter M, you would select M for the filter. To further restrict match keys and view data
for only the match keys that start with the letters MD you would add the letter D to the
filter. The longer the filter expression, the more restrictive the display.

Configuring the Match Process 587


Investigating the Distribution of Match Keys

Setting a Filter

To set a filter:
• Click the vertical bar in the Histogram associated with the character you want to
add to the filter.

For example, suppose you started with the following default view in the Histogram.

If you click the vertical bar above the M character, the Histogram refreshes and displays
the distribution for all match keys beginning with the character M.

588 Siperian Hub Administrator Guide


Investigating the Distribution of Match Keys

Note that the Match Keys List now displays only those match keys that meet the filter
condition.

Navigating Filters

Use the following command buttons to navigate filters.

Button Description
Clears the filter. Displays the default view (no filter).

Displays the previously-selected filter (removes the right-most character from the
filter).

Configuring the Match Process 589


Excluding Records from the Match Process

Excluding Records from the Match Process

Siperian Hub provides a mechanism for selectively excluding records from the match
process. You might want to do this if, for example, your data contained records that
you wanted the match process to ignore.

To configure this feature, in the Schema Manager, you add a column named
EXCLUDE_FROM_MATCH to a base object. This column must be an integer type
with a default value of zero (0), as described in “Adding Columns” on page 134.

Once the table is populated and before running the Match job, to exclude a record
from matching, change its value in the EXCLUDE_FROM_MATCH column to a one
(1) in the Data Manager. When the Match job runs, only those records with an
EXCLUDE_FROM_MATCH value of zero (0) will be tokenized and processed—all
other records will be ignored. When the cell value is changed, the DIRTY_IND for
this record is set to 1 so that match keys will be regenerated when the tokenization
process is executed, as described in “Match Keys and the Tokenization Process” on
page 322.

590 Siperian Hub Administrator Guide


Excluding Records from the Match Process

Excluding records from the match process is available for:


• fuzzy-match base objects only (see “Exact-match and Fuzzy-match Base Objects”
on page 320),
• match column rules only (not primary key match rules) that do not match for
duplicates (see “Match for Duplicate Data Jobs” on page 740)

Configuring the Match Process 591


Excluding Records from the Match Process

592 Siperian Hub Administrator Guide


15
Configuring the Consolidate Process

This chapter describes how to configure the consolidate process for your Siperian Hub
implementation.

Chapter Contents
• Before You Begin
• About Consolidation Settings
• Changing Consolidation Settings

593
Before You Begin

Before You Begin


Before you begin, you must have installed Siperian Hub, created the Hub Store
according to the instructions in Siperian Hub Installation Guide, and built the schema
according to the instructions in Chapter 5, “Building the Schema.” To learn about the
consolidate process, see “Consolidate Process” on page 335.

About Consolidation Settings


Consolidation settings affect the behavior of the consolidate process in Siperian Hub.
This section describes the settings that you can configure on the Merge Settings tab in
the Match/Merge Setup Details dialog. To learn more, see “About the Consolidate
Process” on page 335.

Immutable Rowid Object


For a given base object, you can designate a source system as an immutable source, which
means that records from that source system will be accepted as unique
(CONSOLIDATION_IND = 1)—even in the event of a merge. Once a record from
that source has been fully consolidated, it will not be changed subsequently, nor will it
be matched to any other record (although other records can be matched to it). Only
one source system can be configured as an immutable source.

Note: If the Requeue on Parent Merge setting for a child base object is set to 2, in the
event of a merging parent, the consolidation indicator will be set to 4 for the child
record. For more information, see “Requeue On Parent Merge” on page 104.

Immutable sources are also distinct systems, as described in “Distinct Source Systems”
on page 596. All records are stored in the Siperian Hub as master records. For all
source records from an immutable source system, the consolidation indicator for Load
and PUT is always 1 (consolidated record). If the Requeue on Parent Merge setting for
a child base object is set to 2, then in the event of a merging parent, the consolidation
indicator will be set to 4 for the child record. For more information, see
“Consolidation Status for Base Object Records” on page 289.

594 Siperian Hub Administrator Guide


About Consolidation Settings

To specify an immutable source for a base object, click the drop-down list next to
Immutable Rowid Object and select a source system.

This list displays the source system(s) associated with this base object. Only one source
system can be designated an immutable source system. To learn more, see
“Configuring Source Systems” on page 348.

Immutable source systems are applicable when, for example, Siperian Hub is the only
persistent store for the source data. Designating an immutable source system
streamlines the load, match, and merge processes by preventing intra-source matches
and automatically accepting records from immutable sources as unique. If two
immutable records must be merged, then a data steward needs to perform a manual
verification in order to allow that change. At that point, Siperian Hub allows the data
steward to choose the key that remains.

Distinct Systems
A distinct system provides data that gets inserted into the base object without being
consolidated. Records from a distinct system will never match with other records from
the same system, but they can be matched to and from other records in other systems
(their CONSOLIDATION_IND is set to 4 on load). You can specify distinct source
systems and configure whether, for each source system, records are consolidated
automatically or manually.

Configuring the Consolidate Process 595


About Consolidation Settings

Distinct Source Systems

You can designate a source system as a distinct source (also known as a golden source),
which means that records from that source will not be merged. For example, if the
ABC source has been designated as a distinct source, then the match rules will never
match (or merge) two records that come from the same source. Records from a distinct
source will not match through a transient match in an Auto Match and Merge process
(see “Auto Match and Merge Jobs” on page 716). Such records can be merged only
manually by flagging them as matches.

To designate a distinct source system:


1. From the list of source systems on the Merge Settings tab, select (check) any
source system that should not allow intra-system merges to prevent records from
merging.
2. For each distinct source system, designate whether you want it to use Auto Rules
only (see “Auto Rules Only” on page 597).

The following example shows both options selected for the Billing system.

596 Siperian Hub Administrator Guide


About Consolidation Settings

Auto Rules Only

For distinct systems only, you can enable this option to allow you to configure what
types of rules are executed for the associated distinct source system. Check (select) this
check box if you want Siperian Hub to apply only the automatic consolidation rules
(not the manual consolidation rules) for this distinct system. By default, this option is
disabled (unchecked).

Unmerge Child When Parent Unmerges (Cascade


Unmerge)
Important: This feature applies only to child base objects with configured match rules
and foreign keys.

For child base objects, Siperian Hub provides a cascade unmerge feature that allows you to
specify what happens if records in the parent base object are unmerged. By default, this
feature is disabled, so that unmerging parent records does not unmerge associated child
records. In the Unmerge Child When Parent Unmerges portion near the bottom of the
Merge Settings tab, if you check (select) the Cascade Unmerge check box for a child
base object, when records in the parent object are unmerged, Siperian Hub also
unmerges affected records in the child base object.

Prerequisites for Cascade Unmerge

To enable cascade unmerge:


• the parent-child relationship must already be configured in the child base object
• the foreign key column in the child base object must be a match-enabled column

In the Unmerge Child When Parent Unmerges portion near the bottom of the Merge
Settings tab, the Schema Manager displays only those match-enabled columns in the
child base object that are configured with a foreign key. To learn more, see
“Configuring Foreign-Key Relationships Between Base Objects” on page 140.

Configuring the Consolidate Process 597


Changing Consolidation Settings

Parents with Multiple Children

In situations where a parent base object has multiple child base objects, you can
explicitly enable cascade unmerge for each child base object. Once configured, when
the parent base object is unmerged, then all affected records in all associated child base
objects are unmerged as well.

Considerations for Using Cascade Unmerge

A full unmerge of affected records is not required in all implementations, and it can
have a performance overhead on the unmerge because many child records can be
affected. In addition, it does not always make sense to enable this property. One
example is when Customer is a child of Customer Type. In this situation, you might not
want to unmerge Customers if Customer Type is unmerged. However, in most cases, it
is a good idea to unmerge addresses linked to customers if Customer unmerges.

Note: When cascade unmerge is enabled, the child record may not be unmerged if a
previous manual unmerge was done on the child base object.

When you enable the unmerge feature, it applies to the child table and the child
cross-reference table. Once enabled, if you then unmerge the parent cross-reference,
the original child cross-reference should be unmerged as well. This feature has no
impact on the parent—the feature operates on the child tables to provide additional
flexibility.

Changing Consolidation Settings


To change consolidation settings on the Merge Settings tab:
1. In the Schema Manager, display the Match/Merge Setup Details dialog for the base
object that you want to configure. To learn more, see “Navigating to the
Match/Merge Setup Details Dialog” on page 486.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Click the Merge Settings tab.

598 Siperian Hub Administrator Guide


Changing Consolidation Settings

The Schema Manager displays the Merge Settings tab for the selected base object.

4. Change any of the following settings:


• “Immutable Rowid Object” on page 594
• “Distinct Systems” on page 595
• “Unmerge Child When Parent Unmerges (Cascade Unmerge)” on page 597
5. Click the Save button to save your changes.

Configuring the Consolidate Process 599


Changing Consolidation Settings

600 Siperian Hub Administrator Guide


16
Configuring the Publish Process

This chapter describes how to configure the publish process for Siperian Hub data
using message triggers and embedded message queues. For an introduction, see
“Publish Process” on page 342.

Chapter Contents
• Before You Begin
• Configuration Steps for the Publish Process
• Starting the Message Queues Tool
• Configuring Global Message Queue Settings
• Configuring Message Queue Servers
• Configuring Outbound Message Queues
• Configuring Message Triggers
• JMS Message XML Reference

601
Before You Begin

Before You Begin


Before you begin, you must have completed the following tasks:
• Installed Siperian Hub, created the Hub Store, and successfully set up message
queues according to the instructions in the Siperian Hub Installation Guide for your
platform
• Completed the tasks in the Siperian Hub Installation Guide to configure Siperian Hub
to handle asynchronous Services Integration Framework (SIF) requests, if
applicable
Note: SIF uses a message-driven bean (MDB) on the JMS message queue (named
siperian.sif.jms.queue) to process incoming asynchronous SIF requests. This
required queue is set up during the installation process. as described in the Siperian
Hub Installation Guide for your platform. If your Siperian Hub implementation does
not require any additional message queues, then you can skip this chapter.
• Built the schema according to the instructions in Chapter 5, “Building the Schema”
• Read the introduction to the publish process in “Publish Process” on page 342.

Configuration Steps for the Publish Process


After installing Siperian Hub, you use the Message Queues tool in the Hub Console to
configure message queues for your Siperian Hub implementation. The following tasks
are mandatory if you want to publish events in the outbound message queue:
1. Configure the message queues on your application server.

The Siperian installer automatically sets up message queues and the connection
factory configuration. For more information, see the Siperian Hub Installation Guide
for your platform.
2. Configure global message queue settings. For more information, see “Configuring
Global Message Queue Settings” on page 604.
3. Add at least one message queue server. For more information, see “Configuring
Message Queue Servers” on page 605.
4. Add at least one message queue to the message queue server. For more
information, see “Configuring Outbound Message Queues” on page 608.

602 Siperian Hub Administrator Guide


Starting the Message Queues Tool

5. Generate the JMS event message schema for each ORS that has data that you want
to publish. For more information, see “Generating and Deploying ORS-specific
Schemas” on page 827.
6. Configure message triggers for your message queues. For more information, see
“Configuring Message Triggers” on page 612.

After you have configured message queues, you can review run-time activities using the
Audit Manager according to the instructions in “Auditing Message Queues” on page
928.

Starting the Message Queues Tool


To start the Message Queues tool:
1. In the Hub Console, connect to the Master Database.

Message queues are defined in the Master Database.


2. In the Hub Console, expand the Configuration workbench, and then click
Message Queues.
The Hub Console displays the Message Queues tool, as shown here:

Navigation Pane Properties Pane

Configuring the Publish Process 603


Configuring Global Message Queue Settings

The Message Queues tool is divided into two panes.

Pane Description
Navigation pane Shows (in a tree view) the message queues that are defined for this
Siperian Hub implementation.
Properties pane Shows the properties for the selected message queue.

Configuring Global Message Queue Settings


To configure the global message queue settings for your Siperian Hub implementation:
1. In the Hub Console, start the Message Queues tool. For more information, see
“Starting the Message Queues Tool” on page 603.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Specify settings for Data Changes Monitoring, which monitors the queue for
outgoing messages.
To enable or disable Data Changes Monitoring, click the Toggle Data Changes
Monitoring Status button.
4. Specify the following monitoring settings:

Monitoring Setting Description


Receive Timeout Default is 0. Amount of time allowed to receive the messages
(milliseconds) from the queue.
Receive Batch Size Default is 100. Maximum number of events processed and
placed in the message queue in a single pass.
Message Check Default is 300000. Amount of time to pause before polling for
Interval (milliseconds) inbound messages or processing outbound messages. The same
value applies to both inbound and outbound message queues.

604 Siperian Hub Administrator Guide


Configuring Message Queue Servers

Monitoring Setting Description


Out of sync check If configured, periodically polls for ORS metadata and
interval (milliseconds) regenerates the XML message schema if subsequent changes
have been made to design objects in the ORS. For more
information, see “Generating and Deploying ORS-specific
Schemas” on page 827.
By default, this feature is disabled—set to zero (0)—and is
available only if:
• Data Changes Monitoring is enabled.
• ORS-specific XML message schema has been generated
using the JMS Event Schema Manager.
Note: Make sure that this value is greater than or equal to the
Message Check Interval.

Click the button next to any property that you want to change.
5. Click the button to save your changes.

Configuring Message Queue Servers


This section describes how to configure message queue servers for your Siperian Hub
implementation.

About Message Queue Servers


Before you can define message queues in Siperian Hub, you must define the message
queue server(s) that Siperian Hub will use for handling message queues. Before you can
define a message queue server in Siperian Hub, it must already be defined on your
application server according to the documented instructions for your application
server. You will need the connection factory name.

Message Queue Server Properties


This section describes the settings that you can configure for message queue servers.

Configuring the Publish Process 605


Configuring Message Queue Servers

WebLogic and JBoss Properties

You can configure the following message queue server properties.

Property Description
Connection Factory Name of the connection factory for this message queue server.
Name
Display Name Name of this message queue server as it will be displayed in the Hub
Console.
Description Descriptive information for this message queue server.

WebSphere Properties

IBM WebSphere implementations have the following properties.

Property Description
Server Name Name of the server where the message queue is defined.
Channel Channel of the server where the message queue is defined.
Port Port on the server where the message queue is defined.

Adding Message Queue Servers


To add a message queue server:
1. In the Hub Console, start the Message Queues tool. For more information, see
“Starting the Message Queues Tool” on page 603.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Right-click anywhere in the Navigation pane and choose Add Message Queue
Server.

606 Siperian Hub Administrator Guide


Configuring Message Queue Servers

The Message Queues tool displays the Add Message Queue Server dialog.

4. the Message Queues tool displays Specify the properties for this message queue
server. For more information, see “Message Queue Server Properties” on page
605.

Editing Message Queue Server Properties


To edit the properties of an existing message queue server:
1. In the Hub Console, start the Message Queues tool. For more information, see
“Starting the Message Queues Tool” on page 603.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the navigation pane, select the name of the message queue server that you want
to configure.
4. Change the editable properties for this message queue server. For more
information, see “Message Queue Server Properties” on page 605.
Click the button next to any property that you want to change.
5. Click the button to save your changes.

Configuring the Publish Process 607


Configuring Outbound Message Queues

Deleting Message Queue Servers


To delete an existing message queue server:
1. In the Hub Console, start the Message Queues tool. For more information, see
“Starting the Message Queues Tool” on page 603.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the navigation pane, right-click the name of the message queue server that you
want to delete, and then choose Delete from the pop-up menu.
4. The Message Queues tool prompts you to confirm deletion.
5. Click Yes.

Configuring Outbound Message Queues


This section describes how to configure outbound JMS message queues for your
Siperian Hub implementation.

About Message Queues


Before you can define outbound JMS message queues in Siperian Hub, you must
define the message queue server(s) that will service the message queue. For more
information, see “Configuring Message Queue Servers” on page 605. In JMS, a message
queue is a staging area for XML messages. Siperian Hub publishes XML messages to the
message queue. External applications retrieve these published XML messages from the
message queue.

Message Queue Properties


You can configure the following message queue properties.

Property Description
Queue Name Name of this message queue. This must match the JNDI queue name
as configured on your application server.

608 Siperian Hub Administrator Guide


Configuring Outbound Message Queues

Property Description
Display Name Name of this message queue as it will be displayed in the Hub
Console.
Description Descriptive information for this message queue.

Adding Message Queues to a Message Queue Server


To add a message queue to a message queue server:
1. In the Hub Console, start the Message Queues tool. For more information, see
“Starting the Message Queues Tool” on page 603.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the navigation pane, right-lick the name of the message queue server to which
you want to add a message queue, and choose Add Message Queue.
The Message Queues tool displays the Add Message Queue dialog.

4. Specify the message queue properties. For more information, see “Message Queue
Properties” on page 608.
5. Click OK.

Configuring the Publish Process 609


Configuring Outbound Message Queues

The Message Queues tool prompts you to choose the queue assignment.

6. Select one of the following options:

Assignment Description
Leave Unassigned Queue is currently unassigned and not in use. Select this option
to use this queue as the outbound queue for Siperian Hub API
responses, or to indicate that the queue is currently unassigned
and is not in use.
Use with Message Queue is currently assigned and is available for use by message
Queue Triggers triggers that are defined in the Schema Manager according to the
instructions in “Configuring Message Triggers” on page 612.
Use Legacy XML Select (check) this option only if your Siperian Hub
implementation requires that you use the legacy XML message
format (Siperian Hub XU version) instead of the current version
of the XML message format. For more information, see “Legacy
JMS Message XML Reference” on page 644.

7. Click the button to save your changes.

610 Siperian Hub Administrator Guide


Configuring Outbound Message Queues

Editing Message Queue Properties


To edit the properties of an existing message queue:
1. In the Hub Console, start the Message Queues tool. For more information, see
“Starting the Message Queues Tool” on page 603.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the navigation pane, select the name of the message queue that you want to
configure.
4. Change the editable properties for this message queue. For more information, see
“Message Queue Properties” on page 608.
Click the button next to any property that you want to change.
5. Change the queue assignment, if you want.
6. Click the button to save your changes.

Deleting Message Queues


To delete an existing message queue:
1. In the Hub Console, start the Message Queues tool. For more information, see
“Starting the Message Queues Tool” on page 603.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. In the navigation pane, right-click the name of the message queue that you want to
delete, and then choose Delete from the pop-up menu.
4. The Message Queues tool prompts you to confirm deletion.
5. Click Yes.

Configuring the Publish Process 611


Configuring Message Triggers

Configuring Message Triggers


This section describes how to configure message triggers for your Siperian Hub
implementation. You configure message triggers in the Schema Manager tool.

About Message Triggers


Use message triggers to identify which actions within Siperian Hub are communicated to
external applications, and where to publish XML messages. When an action occurs for
which a rule is defined, an XML message is placed in a JMS message queue. A message
trigger specifies the JMS message queue in which messages are placed. For example:
1. A user inserts a record in a base object.

2. This insert action initiates a message trigger.


3. Siperian Hub evaluates the message trigger and sends a message to the appropriate
message queue.
4. An outside application polls the message queue, picks up the message, and
processes it.

You can use the same message queue for all triggers, or you can use a different message
queue for each trigger. In order for an action to trigger a message trigger, the message
queues must be configured, and a message trigger must be defined for that base object
and action.

Types of Events for Message Triggers

The following types of events can cause a message trigger to be fired and a message
placed in the queue.
Events for Which Message Queue Rules Can Be Defined
Event Description
Add new data • Add the data through the load process
• Add the data through the Data Manager
• Add the data through the API verb using PUT or CLEANSE_PUT
(either through HTTP, SOAP, MQ, and so on)

612 Siperian Hub Administrator Guide


Configuring Message Triggers

Events for Which Message Queue Rules Can Be Defined (Cont.)


Event Description
Add new pending A new record with a PENDING state is created. Applies to
data state-enabled base objects only.
Update existing • Update the data through the load process
data • Update the data through the Data Manager
• Update the data through the API verb using PUT or CLEANSE_
PUT (either through HTTP, SOAP, MQ, and so on)
Note:
• If trust rules prevent the base object columns from being updated,
no message is generated.
• If one or more of the specified columns are updated, a single
message is generated. This single message includes data from all of
the cross-references in all output systems.
Update existing An existing record with a PENDING state is updated. Applies to
pending data state-enabled base objects only. For more information, see Chapter 7,
“State Management.”
Update, only XREF • updating data when only the XREF has changed through the load
changed process
• updating data when only the XREF has changed through the API
using PUT or CLEANSE_PUT (either through HTTP, SOAP, MQ,
and so on)
Pending update, An XREF record with a PENDING state is updated. This includes
only XREF promotion of a record. Applies to state-enabled base objects only.
changed For more information, see Chapter 7, “State Management.”
Merging data • Manual Merge via Merge Manager
• Merge via the API Verb (either though HTTP, SOAP, MQ etc.)
• Automatch and Merge
Merging data, Base Merging data when the base object has been updated
object updated
Unmerging data • Unmerge the data through the Data Manager
• Unmerge the data through the API verb using UNMERGE (either
through HTTP, SOAP, EJB etc.)

Configuring the Publish Process 613


Configuring Message Triggers

Events for Which Message Queue Rules Can Be Defined (Cont.)


Event Description
Accepting data as • Accepting a single record as unique via the Merge Manager
unique • Accepting multiple records as unique via the Merge Manager
• Having Accept as Unique turned on in the Base Object's Match
rules (this happens during the match/merge process)
Note: When a record is accepted as unique—either automatically
through a match rule or manually by a data steward—Siperian Hub
generates a message with the record information, including the
cross-reference information for all output systems. This message is
placed in the queue.
Delete BO data A base object record is soft deleted (state changed to DELETED).
Applies to state-enabled base objects only. For more information, see
Chapter 7, “State Management.”
Delete XREF data An XREF record is soft deleted (state changed to DELETED). Applies
to state-enabled base objects only. For more information, see Chapter 7,
“State Management.”
Delete pending BO A base object record with a PENDING state is hard deleted. Applies to
data state-enabled base objects only. For more information, see Chapter 7,
“State Management.”
Delete pending An XREF record with a PENDING state is hard deleted. Applies to
XREF data state-enabled base objects only. For more information, see Chapter 7,
“State Management.”
No action Applies only to Activity Manager. Returned only by a cleanse_put
operation and only if delta detection is enabled. If delta detection is not
enabled, then an Update action type is returned.

Considerations for Message Triggers

Consider the following issues when setting up message triggers for your
implementation:
• If a message queue is used in any message trigger definition under a base object in
any Hub Store, the message queue displays the following message: “The message
queue is currently in use by message triggers.” In this case, you cannot edit the
properties of the message queue. Instead, you must create another message queue
to make the necessary changes.

614 Siperian Hub Administrator Guide


Configuring Message Triggers

• Message triggers apply to one base object only, and they fire only when a specific
action occurs directly on that base object. If you have two tables that are in a
parent-child relationship, then you need to explicitly define message queues
separately, for each table. Change detection is based on specific changes to each
base object (such as a load INSERT, load UPDATE, MERGE, or PUT). Changes
to a record of the parent table can fire a message trigger for the parent record only.
If changes in the parent record affect one or more associated child records, then a
message trigger for the child table must be explicitly configured to fire when such
an action occurs in the child records.
• In addition to base objects, message triggers can be configured for dependent and
relationship objects. However, only insert and update actions are available for
dependent and relationship objects.

Adding Message Triggers


To add a message trigger for a base object:
1. Configure the message queue to be usable with message triggers. For more
information, see “Editing Message Queue Properties” on page 611.
2. Start the Schema Manager according to the instructions in“Starting the Schema
Manager” on page 90.
3. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
4. Expand the base object that will be monitored, and select the Message Trigger
Setup node.

Configuring the Publish Process 615


Configuring Message Triggers

If no message triggers have been set up, then the Schema Tool displays an empty
screen.

5. Do one of the following:


• If no message triggers have been defined, click Add Message Trigger.
OR
• If message triggers have been defined, then click the button.

616 Siperian Hub Administrator Guide


Configuring Message Triggers

The Schema Manager displays the Add Message Trigger wizard.

6. Specify a name and description for the new message trigger.


7. Click Next.
The Add Message Trigger wizard prompts you to specify the messaging package.

8. Select the package that will be used to build the message. For more information,
see “Configuring Packages” on page 196.

Configuring the Publish Process 617


Configuring Message Triggers

9. Click Next.
The Add Message Trigger wizard prompts you to specify the target message queue.

10. Select the message queue to which the message will be written.
11. Click Next.
The Add Message Trigger wizard prompts you to specify the rules for this message
trigger.

618 Siperian Hub Administrator Guide


Configuring Message Triggers

12. Select the event type(s) for this message trigger.

For more information, see “Types of Events for Message Triggers” on page 612.
13. Configure the system properties for this message trigger:

Check Box Description


Triggering System(s) that will trigger the action.
In Message For each message that is placed on a message queue due to the
trigger, the message includes the pkey_src_object value for each
cross-reference that it has in one of the 'In Message' systems.

Note: You must select at least one Triggering system and one In Message system.
For example, suppose your implementation had three source systems (A, B, and C)
and a base object record had cross-reference records for A and B. Suppose the
cross-reference in system A for this base object record were updated.
The following table shows possible message trigger configurations and the
resulting message:

In Message Systems Resulting Message


A Message with cross-reference for system A
B Message with cross-reference for system B

Configuring the Publish Process 619


Configuring Message Triggers

In Message Systems Resulting Message


C No message – no cross-references from In Message
A&B Message with cross-reference for systems A and B
A&C Message with cross-reference for system A
B&C Message with cross-reference for system B
A&B&C Message with cross-reference for systems A and B

14. Identify the system to which the event applies, columns to listen to for changes,
and the package used to construct the message.
All events send the base object record—and all corresponding cross-references
that make up that record—to the message, based on the specified package.
15. Click Next if you have selected an Update option. Otherwise click Finish.
16. If you have clicked the Update action, the Schema Manager prompts you to select
the columns to monitor for update actions.

17. Do one of the following:


• Select the column(s) to monitor for the events associated with this message
trigger, or

620 Siperian Hub Administrator Guide


Configuring Message Triggers

• Select the Trigger message if change on any column check box to monitor
all columns for updates.
18. Click Finish.

Editing Message Triggers


To edit the properties of an existing message trigger:
1. Start the Schema Manager according to the instructions in“Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Expand the base object that will be monitored, and select the Message Trigger
Setup node.
4. In the Message Triggers list, click the message trigger that you want to configure.
The Schema Manager displays the settings for the selected message trigger.

5. Change the settings you want. For more information, see “Adding Message
Triggers” on page 615 and “Types of Events for Message Triggers” on page 612.

Configuring the Publish Process 621


JMS Message XML Reference

Click the button next to editable property that you want to change.
6. Click the button to save your changes.

Deleting Message Triggers


To delete an existing message trigger:
1. Start the Schema Manager according to the instructions in“Starting the Schema
Manager” on page 90.
2. Acquire a write lock according to the instructions in “Acquiring a Write Lock” on
page 30.
3. Expand the base object that will be monitored, and select the Message Trigger
Setup node.
4. In the Message Triggers list, click the message trigger that you want to delete.
5. Click the button.
The Schema Manager prompts you to confirm deletion.
6. Click Yes.

JMS Message XML Reference


This section describes the structure of Siperian Hub XML messages and provides
example messages.

Note: If your Siperian Hub implementation requires that you use the legacy XML
message format (Siperian Hub XU version) instead of the current version of the XML
message format (described in this section), see “Legacy JMS Message XML Reference”
on page 644 instead.

Generating ORS-specific XML Message Schemas


As described in “ORS-specific XML Message Schemas” on page 344, to create XML
messages, the publish process relies on an ORS-specific schema file
(<ors-name>-siperian-mrm-event.xsd) that you generate using the JMS Event

622 Siperian Hub Administrator Guide


JMS Message XML Reference

Schema Manager tool in the Hub Console. For more information, see “Generating and
Deploying ORS-specific Schemas” on page 827.

Elements in an XML Message


The following table describes the elements in an XML message.

Field Description
Root Node
<siperianEvent> Root node in the XML message.
Event Metadata
<eventMetadata> Root node for event metadata.
<messageId> Unique ID for siperianEvent messages.
<eventType> Type of event, as described in “Types of Events for Message
Triggers” on page 612. One of the following values:
• Insert
• Update
• Update XREF
• Accept as Unique
• Merge
• Unmerge
• Merge Update
<baseObjectUid> UID of the base object affected by this action.
<packageUid> UID of the package associated with this action.
<messageDate> Date/time when this message was generated.
<orsId> ID of the Operational Record Store (ORS) associated with this
event.
<triggerUid> UID of the rule that triggered the event that generated this
message.
Event Details
<eventTypeEvent> Root node for event details.
<sourceSystemName> Name of the source system associated with this event.

Configuring the Publish Process 623


JMS Message XML Reference

Field Description
<sourceKey> Value of the PKEY_SRC_OBJECT associated with this event.
<eventDate> Date/time when the event was generated.
<rowid> RowID of the base object record that was affected by the
event.
<xrefKey> Root node of a cross-reference record affected by this event.
<systemName> System name of the cross-reference record affected by this
event.
<sourceKey> PKEY_SRC_OBJECT of the cross-reference record affected
by this event.
<packageName> Name of the secure package associated with this event.
<columnName> Each column in the package is represented by an element in
the XML file. Examples: rowidObject and
consolidationInd. Defined in the ORS-specific XSD that
is generated using the JMS Event Schema Manager tool. For
more information, see “Generating and Deploying
ORS-specific Schemas” on page 827.
<mergedRowid> List of ROWID_OBJECT values for the losing records in the
merge. This field is included in messages for Merge events only.
<dependentSourceKey> Applies only to an insert in or update of the relationship of
dependent objects.

624 Siperian Hub Administrator Guide


JMS Message XML Reference

Filtering Messages
You can use the custom JMS header named MessageType to filter incoming messages
based on the message type. The following message types are indicated in the message
header.

Message Type Description


siperianEvent Event notification message.
<serviceNameReturn> For Services Integration Framework (SIF) responses, the
response begins with the name of the SIF request, as in the
following fragment of a response to a get request:
<getReturn>
<message>The GET was executed successfully
- retrieved 1 records</message>
<recordKey>
<ROWID>2</ROWID>
</recordKey>
...

Example XML Messages


This section provides listings of example XML messages.

Accept As Unique Message

The following is an example of an Accept As Unique message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>Accept as Unique</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>192</messageId>
<messageDate>2008-09-10T16:33:14.000-07:00</messageDate>
</eventMetadata>

Configuring the Publish Process 625


JMS Message XML Reference

<acceptAsUniqueEvent>
<sourceSystemName>Admin</sourceSystemName>
<sourceKey>SVR1.1T1</sourceKey>
<eventDate>2008-09-10T16:33:14.000-07:00</eventDate>
<rowid>2 </rowid>
<xrefKey>
<systemName>Admin</systemName>
<sourceKey>SVR1.1T1</sourceKey>
</xrefKey>
<contactPkg>
<rowidObject>2 </rowidObject>
<creator>admin</creator>
<createDate>2008-08-13T20:28:02.000-07:00</createDate>
<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-10T16:33:14.000-07:00</lastUpdateDate>
<consolidationInd>1</consolidationInd>
<lastRowidSystem>SYS0 </lastRowidSystem>
<dirtyInd>0</dirtyInd>
<firstName>Joey</firstName>
<lastName>Brown</lastName>
</contactPkg>
</acceptAsUniqueEvent>
</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

AMRule Message

The following is an example of an AMRule message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>AM Rule Event</eventType>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<interactionId>12</interactionId>
<activityName>Changed Contact and Address </activityName>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdateLegacy</triggerUid>
<messageId>291</messageId>
<messageDate>2008-09-19T11:43:42.979-07:00</messageDate>
</eventMetadata>
<amRuleEvent>

626 Siperian Hub Administrator Guide


JMS Message XML Reference

<eventDate>2008-09-19T11:43:42.979-07:00</eventDate>
<contactPkgAmEvent>
<amRuleUid>AM_RULE.RuleSet1|Rule1</amRuleUid>
<contactPkg>
<rowidObject>64 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-08T16:24:35.000-07:00</createDate>
<updatedBy>admin</updatedBy>

<lastUpdateDate>2008-09-18T16:26:45.000-07:00</lastUpdateDate>
<consolidationInd>2</consolidationInd>
<lastRowidSystem>SYS0 </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>Johnny</firstName>
<lastName>Brown</lastName>
<hubStateInd>1</hubStateInd>
</contactPkg>
<cContact>
<event>
<eventType>Update</eventType>
<system>Admin</system>
</event>
<event>
<eventType>Update XREF</eventType>
<system>Admin</system>
</event>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>PK1265</sourceKey>
</xrefKey>
<xrefKey>
<systemName>Admin</systemName>
<sourceKey>64</sourceKey>
</xrefKey>
</cContact>
</contactPkgAmEvent>
</amRuleEvent>
</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Configuring the Publish Process 627


JMS Message XML Reference

BoDelete Message

The following is an example of a BoDelete message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>BO Delete</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>328</messageId>
<messageDate>2008-09-19T14:35:53.000-07:00</messageDate>
</eventMetadata>
<boDeleteEvent>
<sourceSystemName>Admin</sourceSystemName>
<eventDate>2008-09-19T14:35:53.000-07:00</eventDate>
<rowid>107 </rowid>
<xrefKey>
<systemName>CRM</systemName>
</xrefKey>
<xrefKey>
<systemName>Admin</systemName>
</xrefKey>
<xrefKey>
<systemName>WEB</systemName>
</xrefKey>
<contactPkg>
<rowidObject>107 </rowidObject>
<creator>sifuser</creator>
<createDate>2008-09-19T14:35:28.000-07:00</createDate>
<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-19T14:35:53.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>CRM </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>John</firstName>
<lastName>Smith</lastName>
<hubStateInd>-1</hubStateInd>
</contactPkg>
</boDeleteEvent>
</siperianEvent>

628 Siperian Hub Administrator Guide


JMS Message XML Reference

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

BoSetToDelete Message
<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>BO set to Delete</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>319</messageId>
<messageDate>2008-09-19T14:21:03.000-07:00</messageDate>
</eventMetadata>
<boSetToDeleteEvent>
<sourceSystemName>Admin</sourceSystemName>
<eventDate>2008-09-19T14:21:03.000-07:00</eventDate>
<rowid>102 </rowid>
<xrefKey>
<systemName>CRM</systemName>
</xrefKey>
<xrefKey>
<systemName>Admin</systemName>
</xrefKey>
<xrefKey>
<systemName>WEB</systemName>
</xrefKey>
<contactPkg>
<rowidObject>102 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-19T13:57:09.000-07:00</createDate>
<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-19T14:21:03.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>SYS0 </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<hubStateInd>-1</hubStateInd>
</contactPkg>
</boSetToDeleteEvent>
</siperianEvent>

The following is an example of a BoSetToDelete message:

Configuring the Publish Process 629


JMS Message XML Reference

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Delete Message

The following is an example of a Delete message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>Delete</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>328</messageId>
<messageDate>2008-09-19T14:35:53.000-07:00</messageDate>
</eventMetadata>
<deleteEvent>
<sourceSystemName>Admin</sourceSystemName>
<eventDate>2008-09-19T14:35:53.000-07:00</eventDate>
<rowid>107 </rowid>
<xrefKey>
<systemName>CRM</systemName>
</xrefKey>
<xrefKey>
<systemName>Admin</systemName>
</xrefKey>
<xrefKey>
<systemName>WEB</systemName>
</xrefKey>
<contactPkg>
<rowidObject>107 </rowidObject>
<creator>sifuser</creator>
<createDate>2008-09-19T14:35:28.000-07:00</createDate>
<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-19T14:35:53.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>CRM </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>John</firstName>
<lastName>Smith</lastName>
<hubStateInd>-1</hubStateInd>
</contactPkg>

630 Siperian Hub Administrator Guide


JMS Message XML Reference

</deleteEvent>
</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Insert Message

The following is an example of an Insert message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>Insert</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdateLegacy</triggerUid>
<messageId>114</messageId>
<messageDate>2008-09-08T16:02:11.000-07:00</messageDate>
</eventMetadata>
<insertEvent>
<sourceSystemName>CRM</sourceSystemName>
<sourceKey>PK12658</sourceKey>
<eventDate>2008-09-08T16:02:11.000-07:00</eventDate>
<rowid>66 </rowid>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>PK12658</sourceKey>
</xrefKey>
<contactPkg>
<rowidObject>66 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-08T16:02:11.000-07:00</createDate>
<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-08T16:02:11.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>CRM </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>Joe</firstName>
<lastName>Brown</lastName>
</contactPkg>
</insertEvent>

Configuring the Publish Process 631


JMS Message XML Reference

</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Merge Message

The following is an example of a Merge message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>Merge</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdateLegacy</triggerUid>
<messageId>130</messageId>
<messageDate>2008-09-08T16:13:28.000-07:00</messageDate>
</eventMetadata>
<mergeEvent>
<sourceSystemName>CRM</sourceSystemName>
<sourceKey>PK126566</sourceKey>
<eventDate>2008-09-08T16:13:28.000-07:00</eventDate>
<rowid>65 </rowid>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>PK126566</sourceKey>
</xrefKey>
<xrefKey>
<systemName>Admin</systemName>
<sourceKey>SVR1.28E</sourceKey>
</xrefKey>
<mergedRowid>62 </mergedRowid>
<contactPkg>
<rowidObject>65 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-08T15:49:17.000-07:00</createDate>
<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-08T16:13:28.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>SYS0 </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>Joe</firstName>

632 Siperian Hub Administrator Guide


JMS Message XML Reference

<lastName>Brown</lastName>
</contactPkg>
</mergeEvent>
</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Merge Update Message

The following is an example of a Merge Update message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>Merge Update</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>269</messageId>
<messageDate>2008-09-10T17:25:42.000-07:00</messageDate>
</eventMetadata>
<mergeUpdateEvent>
<sourceSystemName>CRM</sourceSystemName>
<sourceKey>P45678</sourceKey>
<eventDate>2008-09-10T17:25:42.000-07:00</eventDate>
<rowid>83 </rowid>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>P45678</sourceKey>
</xrefKey>
<mergedRowid>58 </mergedRowid>
<contactPkg>
<rowidObject>83 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-10T16:44:56.000-07:00</createDate>
<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-10T17:25:42.000-07:00</lastUpdateDate>
<consolidationInd>1</consolidationInd>
<lastRowidSystem>CRM </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>Thomas</firstName>

Configuring the Publish Process 633


JMS Message XML Reference

<lastName>Jones</lastName>
</contactPkg>
</mergeUpdateEvent>
</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

No Action Message

The following is an example of a No Action message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>No Action</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>267</messageId>
<messageDate>2008-09-10T17:25:42.000-07:00</messageDate>
</eventMetadata>
<noActionEvent>
<sourceSystemName>CRM</sourceSystemName>
<sourceKey>P45678</sourceKey>
<eventDate>2008-09-10T17:25:42.000-07:00</eventDate>
<rowid>83 </rowid>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>P45678</sourceKey>
</xrefKey>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>P45678</sourceKey>
</xrefKey>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>P45678</sourceKey>
</xrefKey>
<contactPkg>
<rowidObject>83 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-10T16:44:56.000-07:00</createDate>

634 Siperian Hub Administrator Guide


JMS Message XML Reference

<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-10T17:25:42.000-07:00</lastUpdateDate>
<consolidationInd>1</consolidationInd>
<lastRowidSystem>CRM </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>Thomas</firstName>
<lastName>Jones</lastName>
</contactPkg>
</noActionEvent>
</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

PendingInsert Message

The following is an example of a PendingInsert message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>Pending Insert</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>302</messageId>
<messageDate>2008-09-19T13:57:10.000-07:00</messageDate>
</eventMetadata>
<pendingInsertEvent>
<sourceSystemName>Admin</sourceSystemName>
<sourceKey>SVR1.2V3</sourceKey>
<eventDate>2008-09-19T13:57:10.000-07:00</eventDate>
<rowid>102 </rowid>
<xrefKey>
<systemName>Admin</systemName>
<sourceKey>SVR1.2V3</sourceKey>
</xrefKey>
<contactPkg>
<rowidObject>102 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-19T13:57:09.000-07:00</createDate>
<updatedBy>admin</updatedBy>

Configuring the Publish Process 635


JMS Message XML Reference

<lastUpdateDate>2008-09-19T13:57:09.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>SYS0 </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>John</firstName>
<lastName>Smith</lastName>
<hubStateInd>0</hubStateInd>
</contactPkg>
</pendingInsertEvent>
</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

PendingUpdate Message

The following is an example of a PendingUpdate message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>Pending Update</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>306</messageId>
<messageDate>2008-09-19T14:01:36.000-07:00</messageDate>
</eventMetadata>
<pendingUpdateEvent>
<sourceSystemName>CRM</sourceSystemName>
<sourceKey>CPK125</sourceKey>
<eventDate>2008-09-19T14:01:36.000-07:00</eventDate>
<rowid>102 </rowid>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>CPK125</sourceKey>
</xrefKey>
<xrefKey>
<systemName>Admin</systemName>
<sourceKey>SVR1.2V3</sourceKey>
</xrefKey>
<contactPkg>
<rowidObject>102 </rowidObject>

636 Siperian Hub Administrator Guide


JMS Message XML Reference

<creator>admin</creator>
<createDate>2008-09-19T13:57:09.000-07:00</createDate>
<updatedBy>sifuser</updatedBy>
<lastUpdateDate>2008-09-19T14:01:36.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>CRM </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>John</firstName>
<lastName>Smith</lastName>
<hubStateInd>1</hubStateInd>
</contactPkg>
</pendingUpdateEvent>
</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

PendingUpdateXref Message

The following is an example of a PendingUpdateXref message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>Pending Update XREF</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>306</messageId>
<messageDate>2008-09-19T14:01:36.000-07:00</messageDate>
</eventMetadata>
<pendingUpdateXrefEvent>
<sourceSystemName>CRM</sourceSystemName>
<sourceKey>CPK125</sourceKey>
<eventDate>2008-09-19T14:01:36.000-07:00</eventDate>
<rowid>102 </rowid>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>CPK125</sourceKey>
</xrefKey>
<xrefKey>
<systemName>Admin</systemName>

Configuring the Publish Process 637


JMS Message XML Reference

<sourceKey>SVR1.2V3</sourceKey>
</xrefKey>
<contactPkg>
<rowidObject>102 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-19T13:57:09.000-07:00</createDate>
<updatedBy>sifuser</updatedBy>
<lastUpdateDate>2008-09-19T14:01:36.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>CRM </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>John</firstName>
<lastName>Smith</lastName>
<hubStateInd>1</hubStateInd>
</contactPkg>
</pendingUpdateXrefEvent>
</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

638 Siperian Hub Administrator Guide


JMS Message XML Reference

Unmerge Message

The following is an example of an unmerge message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>UnMerge</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>145</messageId>
<messageDate>2008-09-08T16:24:36.000-07:00</messageDate>
</eventMetadata>
<unmergeEvent>
<sourceSystemName>CRM</sourceSystemName>
<sourceKey>PK1265</sourceKey>
<eventDate>2008-09-08T16:24:36.000-07:00</eventDate>
<rowid>65 </rowid>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>PK1265</sourceKey>
</xrefKey>
<mergedRowid>64 </mergedRowid>
<contactPkg>
<rowidObject>65 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-08T15:49:17.000-07:00</createDate>
<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-08T16:24:35.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>SYS0 </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>Joe</firstName>
<lastName>Brown</lastName>
</contactPkg>
</unmergeEvent>
</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Configuring the Publish Process 639


JMS Message XML Reference

Update Message

The following is an example of an update message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>Update</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>120</messageId>
<messageDate>2008-09-08T16:05:13.000-07:00</messageDate>
</eventMetadata>
<updateEvent>
<sourceSystemName>CRM</sourceSystemName>
<sourceKey>PK12658</sourceKey>
<eventDate>2008-09-08T16:05:13.000-07:00</eventDate>
<rowid>66 </rowid>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>PK12658</sourceKey>
</xrefKey>
<contactPkg>
<rowidObject>66 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-08T16:02:11.000-07:00</createDate>
<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-08T16:05:13.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>CRM </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>Joe</firstName>
<lastName>Black</lastName>
</contactPkg>
</updateEvent>
</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

640 Siperian Hub Administrator Guide


JMS Message XML Reference

Update XREF Message

The following is an example of an Update XREF message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>Update XREF</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>121</messageId>
<messageDate>2008-09-08T16:05:13.000-07:00</messageDate>
</eventMetadata>
<updateXrefEvent>
<sourceSystemName>CRM</sourceSystemName>
<sourceKey>PK12658</sourceKey>
<eventDate>2008-09-08T16:05:13.000-07:00</eventDate>
<rowid>66 </rowid>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>PK12658</sourceKey>
</xrefKey>
<contactPkg>
<rowidObject>66 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-08T16:02:11.000-07:00</createDate>
<updatedBy>admin</updatedBy>
<lastUpdateDate>2008-09-08T16:05:13.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>CRM </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<firstName>Joe</firstName>
<lastName>Black</lastName>
</contactPkg>
</updateXrefEvent>
</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Configuring the Publish Process 641


JMS Message XML Reference

XRefDelete Message

The following is an example of an XRefDelete message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>XREF Delete</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>314</messageId>
<messageDate>2008-09-19T14:14:51.000-07:00</messageDate>
</eventMetadata>
<XrefDeleteEvent>
<sourceSystemName>CRM</sourceSystemName>
<sourceKey>CPK1256</sourceKey>
<eventDate>2008-09-19T14:14:51.000-07:00</eventDate>
<rowid>102 </rowid>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>CPK1256</sourceKey>
</xrefKey>
<contactPkg>
<rowidObject>102 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-19T13:57:09.000-07:00</createDate>
<updatedBy>sifuser</updatedBy>
<lastUpdateDate>2008-09-19T14:14:54.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>CRM </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<hubStateInd>1</hubStateInd>
</contactPkg>
</XrefDeleteEvent>
</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

642 Siperian Hub Administrator Guide


JMS Message XML Reference

XRefSetToDelete Message

The following is an example of an XRefSetToDelete message:


<?xml version="1.0" encoding="UTF-8"?>
<siperianEvent>
<eventMetadata>
<eventType>XREF set to Delete</eventType>
<baseObjectUid>BASE_OBJECT.C_CONTACT</baseObjectUid>
<packageUid>PACKAGE.CONTACT_PKG</packageUid>
<orsId>localhost-mrm-CMX_ORS</orsId>
<triggerUid>MESSAGE_QUEUE_RULE.ContactUpdate</triggerUid>
<messageId>314</messageId>
<messageDate>2008-09-19T14:14:51.000-07:00</messageDate>
</eventMetadata>
<XrefSetToDeleteEvent>
<sourceSystemName>CRM</sourceSystemName>
<sourceKey>CPK1256</sourceKey>
<eventDate>2008-09-19T14:14:51.000-07:00</eventDate>
<rowid>102 </rowid>
<xrefKey>
<systemName>CRM</systemName>
<sourceKey>CPK1256</sourceKey>
</xrefKey>
<contactPkg>
<rowidObject>102 </rowidObject>
<creator>admin</creator>
<createDate>2008-09-19T13:57:09.000-07:00</createDate>
<updatedBy>sifuser</updatedBy>
<lastUpdateDate>2008-09-19T14:14:54.000-07:00</lastUpdateDate>
<consolidationInd>4</consolidationInd>
<lastRowidSystem>CRM </lastRowidSystem>
<dirtyInd>1</dirtyInd>
<hubStateInd>1</hubStateInd>
</contactPkg>
</XrefSetToDeleteEvent>
</siperianEvent>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Configuring the Publish Process 643


Legacy JMS Message XML Reference

Legacy JMS Message XML Reference


This section describes the structure of legacy Siperian Hub XML messages and provides
example messages. This section applies only if you have selected the Use Legacy XML
check box in the Message Queues tool (see “Configuring Outbound Message Queues”
on page 608). Use this option only when your Siperian Hub implementation requires
that you use the legacy XML message format (Siperian Hub XU version) instead of the
current version of the XML message format (described in “JMS Message XML
Reference” on page 622).

Message Fields for Legacy XML


The contents of the data area of the message are determined by the package specified
in the trigger. The data area can contain the following fields:
Message Fields
Field Description
ACTION Action type: Insert, Update, Update XREF, Accept as Unique,
Merge, Unmerge, or Merge Update.
MESSAGE_DATE Time when the event was generated.
TABLE_NAME Name of the base object table or cross-reference object table
affected by this action.
RULE_NAME Name of the rule that triggered the event that generated this
message.
RULE_ID ID of the rule that triggered the event that generated this message.
ROWID_OBJECT Unique key for the base object affected by this action.
MERGED_OBJECTS List of ROWID_OBJECT values for the losing records in the
merge. This field is included in messages for MERGE events only.
SOURCE_XREF The SYSTEM and PKEY_SRC_OBJECT values for the
cross-reference that triggered the UPDATE event. This field is
included in messages for UPDATE events only.
XREFS List of SYSTEM and PKEY_SRC_OBJECT values for all of the
cross-references in the output systems for this base object.
RELATED_PKEY_ Applies only to an insert in or update of the relationship of
SRC_OBJECT dependent objects.

644 Siperian Hub Administrator Guide


Legacy JMS Message XML Reference

Message Fields (Cont.)


Field Description
SRC_RELATED_ Applies only to an update of relationship of dependent objects.
PKEY_SRC_OBJECT

Filtering Messages for Legacy XML


You can use the custom JMS header named MessageType to filter incoming messages
based on the message type. The following message types are indicated in the message
header.

Message Type Description


SIP_EVENT Event notification message.
<serviceNameReturn> For Services Integration Framework (SIF) responses, the response
begins with the name of the SIF request, as in the following
fragment of a response to a get request:
<getReturn>
<message>The GET was executed successfully -
retrieved 1 records</message>
<recordKey>
<ROWID>2</ROWID>
</recordKey>
...

Configuring the Publish Process 645


Legacy JMS Message XML Reference

Example Messages for Legacy XML


This section provides listings of example messages.

Accept as Unique Message

The following is an example of an accept as unique message:


<SIP_EVENT>
<CONTROLAREA>
<ACTION>Accept as Unique</ACTION>
<MESSAGE_DATE>2005-07-21 16:37:00.0</MESSAGE_DATE>
<TABLE_NAME>C_CUSTOMER</TABLE_NAME>
<RULE_NAME>CustomerRule1</RULE_NAME>
<RULE_ID>SVR1.8EO</RULE_ID>
<ROWID_OBJECT>74 </ROWID_OBJECT>
<XREFS>
<XREF>
<SYSTEM>CRM</SYSTEM>
<PKEY_SRC_OBJECT>196 </PKEY_SRC_OBJECT>
</XREF>
<XREF>
<SYSTEM>SFA</SYSTEM>
<PKEY_SRC_OBJECT>49 </PKEY_SRC_OBJECT>
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>74 </ROWID_OBJECT>
<CONSOLIDATION_IND>1</CONSOLIDATION_IND>
<FIRST_NAME>Jimmy</FIRST_NAME>
<MIDDLE_NAME>Neville</MIDDLE_NAME>
<LAST_NAME>Darwent</LAST_NAME>
<SUFFIX>Jr</SUFFIX>
<GENDER>M </GENDER>
<BIRTH_DATE>1938-06-22</BIRTH_DATE>
<SALUTATION>Mr</SALUTATION>
<SSN_TAX_NUMBER>659483774</SSN_TAX_NUMBER>
<FULL_NAME>Jimmy Darwent, Stony Brook Ny</FULL_NAME>
</DATA>
</DATAAREA>
</SIP_EVENT>

646 Siperian Hub Administrator Guide


Legacy JMS Message XML Reference

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

BO Delete Message

The following is an example of a BO delete message:


<?xml version="1.0" encoding="UTF-8"?>
<SIP_EVENT>
<CONTROLAREA>
<ACTION>BO Delete</ACTION>
<MESSAGE_DATE>2008-09-19 14:35:53.0</MESSAGE_DATE>
<TABLE_NAME>C_CONTACT</TABLE_NAME>
<PACKAGE>CONTACT_PKG</PACKAGE>
<RULE_NAME>ContactUpdateLegacy</RULE_NAME>
<RULE_ID>SVR1.28D</RULE_ID>
<ROWID_OBJECT>107 </ROWID_OBJECT>
<DATABASE>localhost-mrm-CMX_ORS</DATABASE>
<XREFS>
<XREF>
<SYSTEM>CRM</SYSTEM>
<PKEY_SRC_OBJECT />
</XREF>
<XREF>
<SYSTEM>Admin</SYSTEM>
<PKEY_SRC_OBJECT />
</XREF>
<XREF>
<SYSTEM>WEB</SYSTEM>
<PKEY_SRC_OBJECT />
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>107 </ROWID_OBJECT>
<CREATOR>sifuser</CREATOR>
<CREATE_DATE>19 Sep 2008 14:35:28</CREATE_DATE>
<UPDATED_BY>admin</UPDATED_BY>
<LAST_UPDATE_DATE>19 Sep 2008 14:35:53</LAST_UPDATE_DATE>
<CONSOLIDATION_IND>4</CONSOLIDATION_IND>
<DELETED_IND />
<DELETED_BY />
<DELETED_DATE />

Configuring the Publish Process 647


Legacy JMS Message XML Reference

<LAST_ROWID_SYSTEM>CRM </LAST_ROWID_SYSTEM>
<DIRTY_IND>1</DIRTY_IND>
<INTERACTION_ID />
<FIRST_NAME>John</FIRST_NAME>
<LAST_NAME>Smith</LAST_NAME>
<HUB_STATE_IND>-1</HUB_STATE_IND>
</DATA>
</DATAAREA>
</SIP_EVENT>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

BO set to Delete

The following is an example of a BO set to delete message:


<?xml version="1.0" encoding="UTF-8"?>
<SIP_EVENT>
<CONTROLAREA>
<ACTION>BO set to Delete</ACTION>
<MESSAGE_DATE>2008-09-19 14:21:03.0</MESSAGE_DATE>
<TABLE_NAME>C_CONTACT</TABLE_NAME>
<PACKAGE>CONTACT_PKG</PACKAGE>
<RULE_NAME>ContactUpdateLegacy</RULE_NAME>
<RULE_ID>SVR1.28D</RULE_ID>
<ROWID_OBJECT>102 </ROWID_OBJECT>
<DATABASE>localhost-mrm-CMX_ORS</DATABASE>
<XREFS>
<XREF>
<SYSTEM>CRM</SYSTEM>
<PKEY_SRC_OBJECT />
</XREF>
<XREF>
<SYSTEM>Admin</SYSTEM>
<PKEY_SRC_OBJECT />
</XREF>
<XREF>
<SYSTEM>WEB</SYSTEM>
<PKEY_SRC_OBJECT />
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>

648 Siperian Hub Administrator Guide


Legacy JMS Message XML Reference

<DATA>
<ROWID_OBJECT>102 </ROWID_OBJECT>
<CREATOR>admin</CREATOR>
<CREATE_DATE>19 Sep 2008 13:57:09</CREATE_DATE>
<UPDATED_BY>admin</UPDATED_BY>
<LAST_UPDATE_DATE>19 Sep 2008 14:21:03</LAST_UPDATE_DATE>
<CONSOLIDATION_IND>4</CONSOLIDATION_IND>
<DELETED_IND />
<DELETED_BY />
<DELETED_DATE />
<LAST_ROWID_SYSTEM>SYS0 </LAST_ROWID_SYSTEM>
<DIRTY_IND>1</DIRTY_IND>
<INTERACTION_ID />
<FIRST_NAME />
<LAST_NAME />
<HUB_STATE_IND>-1</HUB_STATE_IND>
</DATA>
</DATAAREA>
</SIP_EVENT>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Delete Message

The following is an example of a delete message:


<?xml version="1.0" encoding="UTF-8"?>
<SIP_EVENT>
<CONTROLAREA>
<ACTION>Delete</ACTION>
<MESSAGE_DATE>2008-09-19 14:35:53.0</MESSAGE_DATE>
<TABLE_NAME>C_CONTACT</TABLE_NAME>
<PACKAGE>CONTACT_PKG</PACKAGE>
<RULE_NAME>ContactUpdateLegacy</RULE_NAME>
<RULE_ID>SVR1.28D</RULE_ID>
<ROWID_OBJECT>107 </ROWID_OBJECT>
<DATABASE>localhost-mrm-CMX_ORS</DATABASE>
<XREFS>
<XREF>
<SYSTEM>CRM</SYSTEM>
<PKEY_SRC_OBJECT />
</XREF>

Configuring the Publish Process 649


Legacy JMS Message XML Reference

<XREF>
<SYSTEM>Admin</SYSTEM>
<PKEY_SRC_OBJECT />
</XREF>
<XREF>
<SYSTEM>WEB</SYSTEM>
<PKEY_SRC_OBJECT />
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>107 </ROWID_OBJECT>
<CREATOR>sifuser</CREATOR>
<CREATE_DATE>19 Sep 2008 14:35:28</CREATE_DATE>
<UPDATED_BY>admin</UPDATED_BY>
<LAST_UPDATE_DATE>19 Sep 2008 14:35:53</LAST_UPDATE_DATE>
<CONSOLIDATION_IND>4</CONSOLIDATION_IND>
<DELETED_IND />
<DELETED_BY />
<DELETED_DATE />
<LAST_ROWID_SYSTEM>CRM </LAST_ROWID_SYSTEM>
<DIRTY_IND>1</DIRTY_IND>
<INTERACTION_ID />
<FIRST_NAME>John</FIRST_NAME>
<LAST_NAME>Smith</LAST_NAME>
<HUB_STATE_IND>-1</HUB_STATE_IND>
</DATA>
</DATAAREA>
</SIP_EVENT>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

650 Siperian Hub Administrator Guide


Legacy JMS Message XML Reference

Insert Message

The following is an example of an insert message:


<SIP_EVENT>
<CONTROLAREA>
<ACTION>Insert</ACTION>
<MESSAGE_DATE>2005-07-21 16:07:26.0</MESSAGE_DATE>
<TABLE_NAME>C_CUSTOMER</TABLE_NAME>
<RULE_NAME>CustomerRule1</RULE_NAME>
<RULE_ID>SVR1.8EO</RULE_ID>
<ROWID_OBJECT>33 </ROWID_OBJECT>
<XREFS>
<XREF>
<SYSTEM>CRM</SYSTEM>
<PKEY_SRC_OBJECT>49 </PKEY_SRC_OBJECT>
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>33 </ROWID_OBJECT>
<CONSOLIDATION_IND>4</CONSOLIDATION_IND>
<FIRST_NAME>James</FIRST_NAME>
<MIDDLE_NAME>Neville</MIDDLE_NAME>
<LAST_NAME>Darwent</LAST_NAME>
<SUFFIX>Unknown</SUFFIX>
<GENDER>M </GENDER>
<BIRTH_DATE>1938-06-22</BIRTH_DATE>
<SALUTATION>Mr</SALUTATION>
<SSN_TAX_NUMBER>216275400</SSN_TAX_NUMBER>
<FULL_NAME>James Darwent,Stony Brook Ny</FULL_NAME>
</DATA>
</DATAAREA>
</SIP_EVENT>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Configuring the Publish Process 651


Legacy JMS Message XML Reference

Merge Message

The following is an example of a merge message:


<SIP_EVENT>
<CONTROLAREA>
<ACTION>Merge</ACTION>
<MESSAGE_DATE>2005-07-21 16:34:28.0</MESSAGE_DATE>
<TABLE_NAME>C_CUSTOMER</TABLE_NAME>
<RULE_NAME>CustomerRule1</RULE_NAME>
<RULE_ID>SVR1.8EO</RULE_ID>
<ROWID_OBJECT>74 </ROWID_OBJECT>
<XREFS>
<XREF>
<SYSTEM>CRM</SYSTEM>
<PKEY_SRC_OBJECT>196 </PKEY_SRC_OBJECT>
</XREF>
<XREF>
<SYSTEM>SFA</SYSTEM>
<PKEY_SRC_OBJECT>49 </PKEY_SRC_OBJECT>
</XREF>
</XREFS>
<MERGED_OBJECTS>
<ROWID_OBJECT>7 </ROWID_OBJECT>
</MERGED_OBJECTS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>74 </ROWID_OBJECT>
<CONSOLIDATION_IND>4</CONSOLIDATION_IND>
<FIRST_NAME>Jimmy</FIRST_NAME>
<MIDDLE_NAME>Neville</MIDDLE_NAME>
<LAST_NAME>Darwent</LAST_NAME>
<SUFFIX>Jr</SUFFIX>
<GENDER>M </GENDER>
<BIRTH_DATE>1938-06-22</BIRTH_DATE>
<SALUTATION>Mr</SALUTATION>
<SSN_TAX_NUMBER>659483774</SSN_TAX_NUMBER>
<FULL_NAME>Jimmy Darwent, Stony Brook Ny</FULL_NAME>
</DATA>
</DATAAREA>
</SIP_EVENT>

652 Siperian Hub Administrator Guide


Legacy JMS Message XML Reference

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Merge Update Message

The following is an example of a merge update message:


<SIP_EVENT>
<CONTROLAREA>
<ACTION>Merge Update</ACTION>
<MESSAGE_DATE>2005-07-21 16:34:28.0</MESSAGE_DATE>
<TABLE_NAME>C_CUSTOMER</TABLE_NAME>
<RULE_NAME>CustomerRule1</RULE_NAME>
<RULE_ID>SVR1.8EO</RULE_ID>
<ROWID_OBJECT>74 </ROWID_OBJECT>
<XREFS>
<XREF>
<SYSTEM>CRM</SYSTEM>
<PKEY_SRC_OBJECT>196 </PKEY_SRC_OBJECT>
</XREF>
<XREF>
<SYSTEM>SFA</SYSTEM>
<PKEY_SRC_OBJECT>49 </PKEY_SRC_OBJECT>
</XREF>
</XREFS>
<MERGED_OBJECTS>
<ROWID_OBJECT>7 </ROWID_OBJECT>
</MERGED_OBJECTS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>74 </ROWID_OBJECT>
<CONSOLIDATION_IND>4</CONSOLIDATION_IND>
<FIRST_NAME>Jimmy</FIRST_NAME>
<MIDDLE_NAME>Neville</MIDDLE_NAME>
<LAST_NAME>Darwent</LAST_NAME>
<SUFFIX>Jr</SUFFIX>
<GENDER>M </GENDER>
<BIRTH_DATE>1938-06-22</BIRTH_DATE>
<SALUTATION>Mr</SALUTATION>
<SSN_TAX_NUMBER>659483774</SSN_TAX_NUMBER>
<FULL_NAME>Jimmy Darwent, Stony Brook Ny</FULL_NAME>
</DATA>
</DATAAREA>

Configuring the Publish Process 653


Legacy JMS Message XML Reference

</SIP_EVENT>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Pending Insert Message

The following is an example of a pending insert message:


<?xml version="1.0" encoding="UTF-8"?>
<SIP_EVENT>
<CONTROLAREA>
<ACTION>Pending Insert</ACTION>
<MESSAGE_DATE>2008-09-19 13:57:10.0</MESSAGE_DATE>
<TABLE_NAME>C_CONTACT</TABLE_NAME>
<PACKAGE>CONTACT_PKG</PACKAGE>
<RULE_NAME>ContactUpdateLegacy</RULE_NAME>
<RULE_ID>SVR1.28D</RULE_ID>
<ROWID_OBJECT>102 </ROWID_OBJECT>
<DATABASE>localhost-mrm-CMX_ORS</DATABASE>
<XREFS>
<XREF>
<SYSTEM>Admin</SYSTEM>
<PKEY_SRC_OBJECT>SVR1.2V3</PKEY_SRC_OBJECT>
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>102 </ROWID_OBJECT>
<CREATOR>admin</CREATOR>
<CREATE_DATE>19 Sep 2008 13:57:09</CREATE_DATE>
<UPDATED_BY>admin</UPDATED_BY>
<LAST_UPDATE_DATE>19 Sep 2008 13:57:09</LAST_UPDATE_DATE>
<CONSOLIDATION_IND>4</CONSOLIDATION_IND>
<DELETED_IND />
<DELETED_BY />
<DELETED_DATE />
<LAST_ROWID_SYSTEM>SYS0 </LAST_ROWID_SYSTEM>
<DIRTY_IND>1</DIRTY_IND>
<INTERACTION_ID />
<FIRST_NAME>John</FIRST_NAME>
<LAST_NAME>Smith</LAST_NAME>
<HUB_STATE_IND>0</HUB_STATE_IND>

654 Siperian Hub Administrator Guide


Legacy JMS Message XML Reference

</DATA>
</DATAAREA>
</SIP_EVENT>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Pending Update Message

The following is an example of a pending update message:


<?xml version="1.0" encoding="UTF-8"?>
<SIP_EVENT>
<CONTROLAREA>
<ACTION>Pending Update</ACTION>
<MESSAGE_DATE>2008-09-19 14:01:36.0</MESSAGE_DATE>
<TABLE_NAME>C_CONTACT</TABLE_NAME>
<PACKAGE>CONTACT_PKG</PACKAGE>
<RULE_NAME>ContactUpdateLegacy</RULE_NAME>
<RULE_ID>SVR1.28D</RULE_ID>
<ROWID_OBJECT>102 </ROWID_OBJECT>
<DATABASE>localhost-mrm-CMX_ORS</DATABASE>
<XREFS>
<XREF>
<SYSTEM>CRM</SYSTEM>
<PKEY_SRC_OBJECT>CPK125</PKEY_SRC_OBJECT>
</XREF>
<XREF>
<SYSTEM>Admin</SYSTEM>
<PKEY_SRC_OBJECT>SVR1.2V3</PKEY_SRC_OBJECT>
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>102 </ROWID_OBJECT>
<CREATOR>admin</CREATOR>
<CREATE_DATE>19 Sep 2008 13:57:09</CREATE_DATE>
<UPDATED_BY>sifuser</UPDATED_BY>
<LAST_UPDATE_DATE>19 Sep 2008 14:01:36</LAST_UPDATE_DATE>
<CONSOLIDATION_IND>4</CONSOLIDATION_IND>
<DELETED_IND />
<DELETED_BY />

Configuring the Publish Process 655


Legacy JMS Message XML Reference

<DELETED_DATE />
<LAST_ROWID_SYSTEM>CRM </LAST_ROWID_SYSTEM>
<DIRTY_IND>1</DIRTY_IND>
<INTERACTION_ID />
<FIRST_NAME>John</FIRST_NAME>
<LAST_NAME>Smith</LAST_NAME>
<HUB_STATE_IND>1</HUB_STATE_IND>
</DATA>
</DATAAREA>
</SIP_EVENT>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Pending Update XREF Message

The following is an example of a pending update XREF message:


<?xml version="1.0" encoding="UTF-8"?>
<SIP_EVENT>
<CONTROLAREA>
<ACTION>Pending Update XREF</ACTION>
<MESSAGE_DATE>2008-09-19 14:01:36.0</MESSAGE_DATE>
<TABLE_NAME>C_CONTACT</TABLE_NAME>
<PACKAGE>CONTACT_ADDRESS_PKG</PACKAGE>
<RULE_NAME>ContactAM</RULE_NAME>
<RULE_ID>SVR1.1VU</RULE_ID>
<ROWID_OBJECT>102 </ROWID_OBJECT>
<DATABASE>localhost-mrm-CMX_ORS</DATABASE>
<XREFS>
<XREF>
<SYSTEM>CRM</SYSTEM>
<PKEY_SRC_OBJECT>CPK125</PKEY_SRC_OBJECT>
</XREF>
<XREF>
<SYSTEM>Admin</SYSTEM>
<PKEY_SRC_OBJECT>SVR1.2V3</PKEY_SRC_OBJECT>
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_CONTACT>102 </ROWID_CONTACT>
<CREATOR>admin</CREATOR>

656 Siperian Hub Administrator Guide


Legacy JMS Message XML Reference

<CREATE_DATE>19 Sep 2008 13:57:09</CREATE_DATE>


<UPDATED_BY>sifuser</UPDATED_BY>
<LAST_UPDATE_DATE>19 Sep 2008 14:01:36</LAST_UPDATE_DATE>
<CONSOLIDATION_IND>4</CONSOLIDATION_IND>
<DELETED_IND />
<DELETED_BY />
<DELETED_DATE />
<LAST_ROWID_SYSTEM>CRM </LAST_ROWID_SYSTEM>
<DIRTY_IND>1</DIRTY_IND>
<INTERACTION_ID />
<FIRST_NAME>John</FIRST_NAME>
<LAST_NAME>Smith</LAST_NAME>
<HUB_STATE_IND>1</HUB_STATE_IND>
<CITY />
<STATE />
</DATA>
</DATAAREA>
</SIP_EVENT>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Update Message

The following is an example of an update message:


<SIP_EVENT>
<CONTROLAREA>
<ACTION>Update</ACTION>
<MESSAGE_DATE>2005-07-21 16:44:53.0</MESSAGE_DATE>
<TABLE_NAME>C_CUSTOMER</TABLE_NAME>
<RULE_NAME>CustomerRule1</RULE_NAME>
<RULE_ID>SVR1.8EO</RULE_ID>
<ROWID_OBJECT>74 </ROWID_OBJECT>
<SOURCE_XREF>
<SYSTEM>Admin</SYSTEM>
<PKEY_SRC_OBJECT>196 </PKEY_SRC_OBJECT>
</SOURCE_XREF>
<XREFS>
<XREF>
<SYSTEM>CRM</SYSTEM>
<PKEY_SRC_OBJECT>196 </PKEY_SRC_OBJECT>
</XREF>

Configuring the Publish Process 657


Legacy JMS Message XML Reference

<XREF>
<SYSTEM>SFA</SYSTEM>
<PKEY_SRC_OBJECT>49 </PKEY_SRC_OBJECT>
</XREF>
<XREF>
<SYSTEM>Admin</SYSTEM>
<PKEY_SRC_OBJECT>74 </PKEY_SRC_OBJECT>
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>74 </ROWID_OBJECT>
<CONSOLIDATION_IND>1</CONSOLIDATION_IND>
<FIRST_NAME>Jimmy</FIRST_NAME>
<MIDDLE_NAME>Neville</MIDDLE_NAME>
<LAST_NAME>Darwent</LAST_NAME>
<SUFFIX>Jr</SUFFIX>
<GENDER>M </GENDER>
<BIRTH_DATE>1938-06-22</BIRTH_DATE>
<SALUTATION>Mr</SALUTATION>
<SSN_TAX_NUMBER>659483773</SSN_TAX_NUMBER>
<FULL_NAME>Jimmy Darwent, Stony Brook Ny</FULL_NAME>
</DATA>
</DATAAREA>
</SIP_EVENT>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Update XREF Message

The following is an example of an update XREF message:


<SIP_EVENT>
<CONTROLAREA>
<ACTION>Update XREF</ACTION>
<MESSAGE_DATE>2005-07-21 16:44:53.0</MESSAGE_DATE>
<TABLE_NAME>C_CUSTOMER</TABLE_NAME>
<RULE_NAME>CustomerRule1</RULE_NAME>
<RULE_ID>SVR1.8EO</RULE_ID>
<ROWID_OBJECT>74 </ROWID_OBJECT>
<SOURCE_XREF>
<SYSTEM>Admin</SYSTEM>

658 Siperian Hub Administrator Guide


Legacy JMS Message XML Reference

<PKEY_SRC_OBJECT>196 </PKEY_SRC_OBJECT>
</SOURCE_XREF>
<XREFS>
<XREF>
<SYSTEM>CRM</SYSTEM>
<PKEY_SRC_OBJECT>196 </PKEY_SRC_OBJECT>
</XREF>
<XREF>
<SYSTEM>SFA</SYSTEM>
<PKEY_SRC_OBJECT>49 </PKEY_SRC_OBJECT>
</XREF>
<XREF>
<SYSTEM>Admin</SYSTEM>
<PKEY_SRC_OBJECT>74 </PKEY_SRC_OBJECT>
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>74 </ROWID_OBJECT>
<CONSOLIDATION_IND>1</CONSOLIDATION_IND>
<FIRST_NAME>Jimmy</FIRST_NAME>
<MIDDLE_NAME>Neville</MIDDLE_NAME>
<LAST_NAME>Darwent</LAST_NAME>
<SUFFIX>Jr</SUFFIX>
<GENDER>M </GENDER>
<BIRTH_DATE>1938-06-22</BIRTH_DATE>
<SALUTATION>Mr</SALUTATION>
<SSN_TAX_NUMBER>659483773</SSN_TAX_NUMBER>
<FULL_NAME>Jimmy Darwent, Stony Brook Ny</FULL_NAME>
</DATA>
</DATAAREA>
</SIP_EVENT>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Configuring the Publish Process 659


Legacy JMS Message XML Reference

Unmerge Message

The following is an example of an unmerge message:


<SIP_EVENT>
<CONTROLAREA>
<ACTION>UnMerge</ACTION>
<MESSAGE_DATE>2006-11-07 21:37:56.0</MESSAGE_DATE>
<TABLE_NAME>C_CONSUMER</TABLE_NAME>
<PACKAGE>CONSUMER_PKG</PACKAGE>
<RULE_NAME>Unmerge</RULE_NAME>
<RULE_ID>SVR1.97S</RULE_ID>
<ROWID_OBJECT>10</ROWID_OBJECT>
<DATABASE>edsel-edselsp2-CMX_AT</DATABASE>
<XREFS>
<XREF>
<SYSTEM>Retail System</SYSTEM>
<PKEY_SRC_OBJECT>8</PKEY_SRC_OBJECT>
</XREF>
</XREFS>
<MERGED_OBJECTS>
<ROWID_OBJECT>0</ROWID_OBJECT>
</MERGED_OBJECTS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>10</ROWID_OBJECT>
<CONSOLIDATION_IND>4</CONSOLIDATION_IND>
<LAST_ROWID_SYSTEM>SVR1.7NK</LAST_ROWID_SYSTEM>
<DIRTY_IND>1</DIRTY_IND>
<INTERACTION_ID />
<CONSUMER_ID>8</CONSUMER_ID>
<FIRST_NAME>THOMAS</FIRST_NAME>
<MIDDLE_NAME>L</MIDDLE_NAME>
<LAST_NAME>KIDD</LAST_NAME>
<SUFFIX />
<TELEPHONE>2178952323</TELEPHONE>
<GENDER>M</GENDER>
<DOB>1940</DOB>
</DATA>
</DATAAREA>
</SIP_EVENT>

660 Siperian Hub Administrator Guide


Legacy JMS Message XML Reference

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

XREF Delete Message

The following is an example of an XREF delete message:


<?xml version="1.0" encoding="UTF-8"?>
<SIP_EVENT>
<CONTROLAREA>
<ACTION>XREF Delete</ACTION>
<MESSAGE_DATE>2008-09-19 14:14:51.0</MESSAGE_DATE>
<TABLE_NAME>C_CONTACT</TABLE_NAME>
<PACKAGE>CONTACT_PKG</PACKAGE>
<RULE_NAME>ContactUpdateLegacy</RULE_NAME>
<RULE_ID>SVR1.28D</RULE_ID>
<ROWID_OBJECT>102 </ROWID_OBJECT>
<DATABASE>localhost-mrm-CMX_ORS</DATABASE>
<XREFS>
<XREF>
<SYSTEM>CRM</SYSTEM>
<PKEY_SRC_OBJECT>CPK1256</PKEY_SRC_OBJECT>
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>102 </ROWID_OBJECT>
<CREATOR>admin</CREATOR>
<CREATE_DATE>19 Sep 2008 13:57:09</CREATE_DATE>
<UPDATED_BY>sifuser</UPDATED_BY>
<LAST_UPDATE_DATE>19 Sep 2008 14:14:54</LAST_UPDATE_DATE>
<CONSOLIDATION_IND>4</CONSOLIDATION_IND>
<DELETED_IND />
<DELETED_BY />
<DELETED_DATE />
<LAST_ROWID_SYSTEM>CRM </LAST_ROWID_SYSTEM>
<DIRTY_IND>1</DIRTY_IND>
<INTERACTION_ID />
<FIRST_NAME />
<LAST_NAME />
<HUB_STATE_IND>1</HUB_STATE_IND>
</DATA>
</DATAAREA>

Configuring the Publish Process 661


Legacy JMS Message XML Reference

</SIP_EVENT>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

XREF set to Delete

The following is an example of an XREF set to delete message:


<?xml version="1.0" encoding="UTF-8"?>
<SIP_EVENT>
<CONTROLAREA>
<ACTION>XREF set to Delete</ACTION>
<MESSAGE_DATE>2008-09-19 14:14:51.0</MESSAGE_DATE>
<TABLE_NAME>C_CONTACT</TABLE_NAME>
<PACKAGE>CONTACT_PKG</PACKAGE>
<RULE_NAME>ContactUpdateLegacy</RULE_NAME>
<RULE_ID>SVR1.28D</RULE_ID>
<ROWID_OBJECT>102 </ROWID_OBJECT>
<DATABASE>localhost-mrm-CMX_ORS</DATABASE>
<XREFS>
<XREF>
<SYSTEM>CRM</SYSTEM>
<PKEY_SRC_OBJECT>CPK1256</PKEY_SRC_OBJECT>
</XREF>
</XREFS>
</CONTROLAREA>
<DATAAREA>
<DATA>
<ROWID_OBJECT>102 </ROWID_OBJECT>
<CREATOR>admin</CREATOR>
<CREATE_DATE>19 Sep 2008 13:57:09</CREATE_DATE>
<UPDATED_BY>sifuser</UPDATED_BY>
<LAST_UPDATE_DATE>19 Sep 2008 14:14:54</LAST_UPDATE_DATE>
<CONSOLIDATION_IND>4</CONSOLIDATION_IND>
<DELETED_IND />
<DELETED_BY />
<DELETED_DATE />
<LAST_ROWID_SYSTEM>CRM </LAST_ROWID_SYSTEM>
<DIRTY_IND>1</DIRTY_IND>
<INTERACTION_ID />
<FIRST_NAME />
<LAST_NAME />
<HUB_STATE_IND>1</HUB_STATE_IND>

662 Siperian Hub Administrator Guide


Legacy JMS Message XML Reference

</DATA>
</DATAAREA>
</SIP_EVENT>

Your messages will not look exactly like this. The data will reflect your data, and the
fields will reflect your packages.

Configuring the Publish Process 663


Legacy JMS Message XML Reference

664 Siperian Hub Administrator Guide


Part 4
Executing Siperian Hub Processes

Contents
• Chapter 17, “Using Batch Jobs”
• Chapter 18, “Writing Custom Scripts to Execute Batch Jobs”

665
666 Siperian Hub Administrator Guide
17
Using Batch Jobs

This chapter describes how to configure and execute Siperian Hub batch jobs using the
Batch Viewer and Batch Group tools in the Hub Console. For more information about
creating batch jobs using job execution scripts, see Chapter 18, “Writing Custom
Scripts to Execute Batch Jobs.”

Chapter Contents
• Before You Begin
• About Siperian Hub Batch Jobs
• Running Batch Jobs Using the Batch Viewer Tool
• Running Batch Jobs Using the Batch Group Tool
• Batch Jobs Reference

667
Before You Begin

Before You Begin


Before you begin working with batch jobs, you must have performed the following
prerequisites:
• installed Siperian Hub and created the Hub Store according to the instructions in
the Siperian Hub Installation Guide for your platform
• built the schema; see “About the Schema” on page 82

About Siperian Hub Batch Jobs


In Siperian Hub, a batch job is a program that, when executed, completes a discrete unit
of work (a process). For example, the Match job carries out the match process: it
generates search keys for a base object, searches through the data for match candidates
(records that are possible matches), applies the match rules to the match candidates,
generates the matches, and then queues the matches for either automatic or manual
consolidation. For merge-style base objects, automatic consolidation is handled by the
Automerge job, and manual consolidation is handled by the Manual Merge job.

Ways to Execute Batch Jobs


You can execute batch jobs in the following ways:
• Hub Console tools:
• Batch Viewer tool—Execute batch jobs individually. For more information,
see “Running Batch Jobs Using the Batch Viewer Tool” on page 674.
• Batch Group tool—Execute batch jobs in a group. The Batch Group tool
allows you to configure the execution sequence for batch jobs and to execute
batch jobs in parallel. For more information, see “Running Batch Jobs Using
the Batch Group Tool” on page 688.
• Stored procedures—Execute public Siperian Hub processes (batch jobs and
batch groups) through stored procedures using any job scheduling software (such
as Tivoli, CA Unicenter, and so on). For more information, see “About Executing
Siperian Hub Batch Jobs” on page 750 You can also create and run stored
procedures using the SIF API (using Java, SOAP, or HTTP/XML). For more
information, see the Siperian Services Integration Framework Guide.

668 Siperian Hub Administrator Guide


About Siperian Hub Batch Jobs

• Services Integration Framework (SIF) requests—Applications can invoke the


SIF ExecuteBatchGroupRequest request to execute batch groups directly. For
more information, see the Siperian Services Integration Framework Guide.

Support Tables Used By Batch Jobs


The following graphic shows the various support tables used by Siperian Hub batch
jobs:

Using Batch Jobs 669


About Siperian Hub Batch Jobs

Running Batch Jobs in Sequence


Certain batch jobs require that other batch jobs be completed first. For example, the
landing tables for a base object must be populated before running any batch jobs.
Similarly, before you can run a Match job for a base object, you must run its
corresponding Stage and Load jobs. Finally, when a base object has dependencies (for
example, it is the child of a parent table, or it has foreign key relationships that point to
other base objects), batch jobs must be run first for the tables on which the base object
depends. You or your organization should consider the best practice of developing an
administration or operations plan that specifies which batch processes and
dependencies should be completed before running batch jobs.

Populating Landing Tables Before Running Batch Jobs

One of the tasks Siperian Hub batch jobs perform is to move data from landing tables
to the appropriate target location in Siperian Hub. Therefore, before you run Siperian
Hub batch jobs, you must first have your source systems or an ETL tool write data into
the landing tables. The landing tables are Siperian Hub’s interface for batch loads. You
deliver the data to the landing tables, and Siperian Hub batch procedures manipulate
the data and copy it to the appropriate location(s). For more information, see the
description of the Siperian Hub data management process in the Siperian Hub Overview.

Match Jobs and Subsequent Consolidation Jobs

Batch jobs need to be executed in a certain sequence. For example, a Match job must
be run for a base object before running the consolidation process. For merge-style base
objects, you can run the Auto Match and Merge job, which executes the Match job and
then Automerge job repeatedly, until either all records in the base object have been
checked for matches, or until the maximum number of records for manual
consolidation limit is reached (see “Maximum Matches for Manual Consolidation” on
page 490).

Loading Data from Parent Tables First

The general rule of thumb is that all parent tables (tables that other tables reference)
must be loaded first.

670 Siperian Hub Administrator Guide


About Siperian Hub Batch Jobs

Loading Data for Objects With Foreign Key Relationships

If two tables have a foreign key relationship between them, you must load the table that
is being referenced gets loaded first, and the table doing the referencing gets loaded
second. The following foreign key relationships can exist in Siperian Hub:
• from one base object (child with foreign key) to another base object (parent with
primary key)
• from a dependent object to the base object that owns it

In most cases, you will schedule these jobs to run on a regular basis.

Best Practices for Working With Batch Jobs


While you design and plan your batch jobs, consider the following issues:
• Define your schema.
The schema is fundamental to all your Siperian Hub tasks. Without a schema, your
batch jobs have nothing to do. For more information about defining the schema,
see “About the Schema” on page 82
• Define mappings before executing Stage jobs.
Mappings define the transformations performed in Stage jobs. If you have no
mappings defined, then the Stage job will not perform any transformations in the
staging process. For more information about mappings, see “Mapping Columns
Between Landing and Staging Tables” on page 380.
• Define match rules before executing Match jobs.
If you have no match rules, then the Match job will produce no matches. For more
information, see “Configuring Primary Key Match Rules” on page 578.
• Before running production jobs:
• Run tests with small data sets.
• Run tests of your cleanse engine and other components to determine whether
each component is working as expected.
• After testing each of the components separately, test the integrated system in
its entirety to determine whether the overall system is working as expected.

Using Batch Jobs 671


About Siperian Hub Batch Jobs

Batch Job Creation


Batch jobs are created in either of two says:
• automatically when you configure Hub Store, or
• when certain changes occur in your Siperian Hub configuration, such as changes to
trust settings for a base object

Batch Jobs That Are Created Automatically

When you configure your Hub Store, the following types of batch jobs are
automatically created:
• Auto Match and Merge Jobs
• Autolink Jobs
• Automerge Jobs
• BVT Snapshot Jobs
• External Match Jobs
• Generate Match Tokens Jobs
• Load Jobs
• Manual Link Jobs
• Manual Merge Jobs
• Manual Unlink Jobs
• Manual Unmerge Jobs
• Match Jobs
• Match Analyze Jobs
• Migrate Link Style To Merge Style Jobs
• Promote Jobs
• Reset Links Jobs
• Stage Jobs

672 Siperian Hub Administrator Guide


About Siperian Hub Batch Jobs

Batch Jobs That Are Created When Changes Occur

The following batch jobs are created when you make changes to the match and merge
setup, set properties, or enable trust settings after initial loads:
• Accept Non-Matched Records As Unique
• Key Match Jobs
• Reset Links Jobs
• Reset Match Table Jobs
• Revalidate Jobs (that is, if you enable validation for a column)
• Synchronize Jobs

Information-Only Batch Jobs (Not Run in the Hub Console)


The following batch jobs are for information only and cannot be manually run from
the Hub Console.
• Accept Non-Matched Records As Unique
• BVT Snapshot Jobs
• Manual Link Jobs
• Manual Merge Jobs
• Manual Unlink Jobs
• Manual Unmerge Jobs
• Migrate Link Style To Merge Style Jobs
• Multi Merge Jobs
• Reset Match Table Jobs

Other Batch Jobs


• Hub Delete Jobs

Using Batch Jobs 673


Running Batch Jobs Using the Batch Viewer Tool

Running Batch Jobs Using the Batch Viewer Tool


This section describes how to use the Batch Viewer tool in the Hub Console to run
batch jobs individually. To run batch jobs in a group, see “Running Batch Jobs Using
the Batch Group Tool” on page 688.

Batch Viewer Tool


The Batch Viewer tool provides a way to execute batch jobs individually and to view
the job execution logs. The Batch Viewer is useful for starting the run of a single job,
or for running jobs that do not need to run often, such as the Synchronize job that is
run after trust settings change. The job execution log shows job completion status with
any associated messages, such as success, failure, or warning. The Batch Viewer tool
also shows job statistics, if applicable.

Note: The Batch Viewer does not provide automated scheduling. For more
information about how to create custom scripts to execute batch jobs and batch
groups, see “About Executing Siperian Hub Batch Jobs” on page 750

Starting the Batch Viewer Tool


To start the Batch Viewer tool:
• In the Hub Console, expand the Utilities workbench, and then click Batch
Viewer.

674 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Viewer Tool

The Hub Console displays the Batch Viewer tool, as shown in the following example.

Navigation Tree Properties Pane (Selected Item)

Grouping by Table, Data, or Procedure Type


You can change the top-level view of the navigation tree by right-clicking Group By
control at the bottom of the tree. Note that the grayed-out item with the check mark
represents the current selection.

Using Batch Jobs 675


Running Batch Jobs Using the Batch Viewer Tool

Selecting one of the following options:

Group By Option Description


Table Displays items in the hierarchy at the following levels:
• top level: tables
• second level: procedure type
• third level: batch job
• fourth level: date / timestamp
Date Displays items in the hierarchy at the following levels:
• top level: date / timestamp
• second level: batch jobs by date/timestamp
Procedure Type Displays items in the hierarchy at the following levels:
• top level: procedure type
• second level: batch job
• third level: date / timestamp

The following example shows batch jobs grouped by table.

676 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Viewer Tool

Running Batch Jobs Manually


To run a batch job manually:
1. Select the Batch Job to run

2. Execute the Batch Job

Selecting a Batch Job

To select a batch job to run:


1. Start the Batch Viewer tool, as described in “Starting the Batch Viewer Tool” on
page 674.
In the following example, the tree displays a list of batch jobs (the list is grouped
by procedure type).

2. Expand the tree to display the batch job that you want to run, and then click it to
select it.

Using Batch Jobs 677


Running Batch Jobs Using the Batch Viewer Tool

The Batch Viewer displays a screen for the selected batch job with properties and
command buttons.

Batch Job Properties

The following batch job properties are read-only.

Field Description
Identification information for this batch job. Stored in the
Identity C_REPOS_TABLE_OBJECT_V table
Name Type code for this batch job. For example, Load jobs have
the CMXLD.LOAD_MASTER type code. Stored in the
OBJECT_NAME column of the C_REPOS_TABLE_
OBJECT_V table.

678 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Viewer Tool

Field Description
Description Description for this batch job in the format:
JobName for | from BaseObjectName
Examples:
• Load from Consumer_Credit_Stg
• Match for Address
This description is stored in the OBJECT_DESC column of the
C_REPOS_TABLE_OBJECT_V table.
Status Status information for this batch job
Current Status Current status of the job. Examples:
• Executing
• Incomplete
• Completed
• Not Executing
• <Batch Job> Successful
• Description of failure

Options to Set Before Executing Batch Jobs

Certain types of batch jobs have additional fields that you can configure before running
the batch job.

Field Only For Description


Re-generate All Match Generate Match Controls the scope of match tokens generation:
Tokens Token Jobs tokenizes the entire base object (checked) or
tokenizes only those records that are flagged in the
BO as requiring re-tokenization (un-checked). For
more information, see “Regenerating All Match
Tokens” on page 726.
Force Update Load Jobs If selected, the Load job forces a refresh and loads
records from the staging table to the base object
(or dependent object) regardless of whether the
records have already been loaded. For more
information, see “Forcing Updates in Load Jobs”
on page 730.

Using Batch Jobs 679


Running Batch Jobs Using the Batch Viewer Tool

Field Only For Description


Match Set Match Jobs Enables you to choose which match rule set to use
for this match job. To learn more, see “Selecting a
Match Rule Set” on page 737.

Command Buttons for Batch Jobs

After you have selected a batch job, you can click the following command buttons.
.

Button Description
Executes the selected batch job.

Clears the job execution history in the Batch Viewer.


To learn more, see “Clearing the Job Execution History” on
page 687.
Sets the status of the currently-executing batch job to
Incomplete. For more information, see “Setting the Job
Status to Incomplete” on page 681.
Refreshes the status display of the currently-executing batch
job. For more information, see “Refreshing the Status” on
page 681.

Executing a Batch Job

Important: You must have the application server running for the duration of an
executing batch job.

To execute a batch job in the Batch Viewer:


1. In the Batch Viewer, select the batch job that you want to run. For more
information, see “Selecting a Batch Job” on page 677.
2. In the right panel, click Execute Batch (or right-click on the job in the left panel
and select Execute from the pop-up menu)
If the current status of the job is Executing, then the Execute Batch button is
disabled. You must wait for the batch job to finish before you can run it again.

680 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Viewer Tool

To execute batch jobs in other ways, see “Ways to Execute Batch Jobs” on page 668.

Refreshing the Status

While a batch job is running, you can click Refresh Status to check if the status has
changed.

Setting the Job Status to Incomplete

In very rare circumstances, you might want to change the status of a running job by
clicking Set Status to Incomplete and execute the job again. Only do this if the batch
job has stopped executing (due to an error, such as a server reboot or crash) but
Siperian Hub has not detected that the job has stopped due to a job application lock in
the metadata. You will know this is a problem if the current status is Executing but
the database, application server, and logs show no activity. If this occurs, click this
button to clear the job application lock so that you can run the batch job again;
otherwise, you will not be able to execute the batch job. Setting the status to
Incomplete just updates the status of the batch job—it does not abort the job.

Note: This option is available only if your user ID has Siperian Administrator rights.

Using Batch Jobs 681


Running Batch Jobs Using the Batch Viewer Tool

Viewing Job Execution Logs


Siperian Hub creates a job execution log each time that it executes a batch job.

Job Execution Status

Each job execution log entry has one of the following status values:

Icon Description
Batch job is currently running.

Batch job completed successfully.

Batch job completed successfully, but additional information is available. For


example, for Stage and Load jobs, this can indicate that some records were rejected
(see “Viewing Rejected Records” on page 685). For Match jobs, this can indicate that
the base object is empty or that there are no more records to match.
Batch job failed. For more information, see “Handling the Failed Execution of a
Batch Job” on page 686.
Batch job status was manually changed from “Executing” to “Incomplete.” For more
information, see “Setting the Job Status to Incomplete” on page 681.

Viewing the Job Execution Log for a Batch Job

To view the job execution log for a batch job:


1. Start the Batch Viewer tool, as described in “Starting the Batch Viewer Tool” on
page 674.
2. Expand the tree to display the job execution log that you want to view, and then
click it.

682 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Viewer Tool

The Batch Viewer displays a screen for the selected job execution log.

Job Execution Log Entry Properties

For each job execution log entry, the Batch Viewer displays the following information:

Field Description
Identification information for this batch job. Stored in the
Identity C_REPOS_TABLE_OBJECT_V table
Name Name of this job execution log. Date / time when the batch job started.
Description Description for this batch job in the format:
JobName for / from BaseObjectName
Examples:
• Load from Consumer_Credit_Stg
• Match for Address

Using Batch Jobs 683


Running Batch Jobs Using the Batch Viewer Tool

Field Description
Source system One of the following:
• source system of the processed data
• Admin
Source table Source table of the processed data.
Status Status information for this batch job
Current Status Current status of this batch job. If an error occurred, displays
information about the error. For more information, see “Job Execution
Status” on page 682.
Metrics Metrics for this batch job
[Various] Statistics collected during the execution of the batch job (if applicable).
For more information, see:
• “Auto Match and Merge Metrics” on page 716
• “Automerge Metrics” on page 718
• “Load Job Metrics” on page 731
• “Match Job Metrics” on page 737
• “Match Analyze Job Metrics” on page 739
• “Stage Job Metrics” on page 746
• “Promote Job Metrics” on page 743
Time Timestamp for this batch job
Start Date / time when this batch job started.
Stop Date / time when this batch job ended.
Elapsed time Elapsed time for the execution of this batch job.

684 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Viewer Tool

Viewing Rejected Records

For Stage jobs or Load jobs only, if the batch job resulted in records being written to
the rejects table, then the job execution log displays a View Rejects button.

Note: Records are rejected if the HUB_STATE_IND value is not valid.

To view the rejected records and the reason why each was rejected:
1. Click the View Rejects button.

Using Batch Jobs 685


Running Batch Jobs Using the Batch Viewer Tool

The Batch Viewer displays a table of rejected records.

2. Click Close.

Handling the Failed Execution of a Batch Job

If executing a batch job failed, perform the following steps:


• Display the execution log entry for this batch job.
• Read the error text in the Current Status field for diagnostic information.
• Take corrective action as necessary.

Copying the Current Status to the Windows Clipboard

To copy the current status of a batch to the Windows Clipboard (to paste into a
document or e-mail, for example):
• Click the button.

686 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Viewer Tool

Deleting Job Execution Log Entries

To delete the selected job execution log:


• Click the button in the top right hand corner of the job properties page.

Clearing the Job Execution History


After running batch jobs over time, the list of executed jobs can become very large.
You should periodically remove the extraneous job execution logs from this list.

Note: The actual procedure steps to clear job history will be slightly different
depending on the view (By Table, By Date, or By Procedure Type); the following
procedure assumes you are using the By Table view.

To clear the job history:


1. Start the Batch Viewer tool, as described in “Starting the Batch Viewer Tool” on
page 674.
2. In the Batch Viewer, expand the tree underneath your base object.
3. Expand the tree under the type of batch job.
4. Select the job for which you want to clear the history. The top of the properties
screen looks like the following example.

5. Click Clear History.


6. Click Yes to confirm that you want to delete all the execution history for this batch
job.

Using Batch Jobs 687


Running Batch Jobs Using the Batch Group Tool

Running Batch Jobs Using the Batch Group Tool


This section describes how to use the Batch Group tool in the Hub Console to run
batch jobs in groups. To run batch jobs individually, see “Running Batch Jobs Using
the Batch Viewer Tool” on page 674.

The Batch Viewer does not provide automated scheduling. For more information
about how to create custom scripts to execute batch jobs and batch groups, see
Chapter 18, “Writing Custom Scripts to Execute Batch Jobs.”

About Batch Groups


A batch group is a collection of individual batch jobs (for example, Stage, Load, and
Match jobs) that can be executed with a single command. Each batch job in a batch
group can be executed sequentially or in parallel with other jobs. You use the Batch
Group tool to configure and run batch groups. For more information about batch jobs,
see “Batch Jobs Reference” on page 713.

For more information about developing custom batch jobs and batch groups that can
be made available in the Batch Group tool, see “Developing Custom Stored
Procedures for Batch Jobs” on page 806.

Note: If you delete an object from the Hub Console (for example, if you delete a
mapping), the Batch Group tool highlights any batch jobs that depend on that object
(for example, a stage job) in red. You must resolve this issue prior to re-executing the
batch group.

Sequential and Parallel Execution

Batch jobs can be executed in the following ways:

Execution Approach Description


sequentially Only one batch job in the batch group is executed at one time.
parallel Multiple batch jobs in the batch group are executed concurrently and
in parallel.

688 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Group Tool

Execution Paths

An execution path is the sequence in which batch jobs are executed when the entire batch
group is executed. The execution path begins with the Start node and ends with the
End node. The Batch Group tool does not validate the execution sequence for you—it
is up to you to ensure that the execution sequence is correct. For example, the Batch
Group tool would not notify you of an error if you incorrectly specified the Load job
for a base object ahead of its Stage job, or if you specified the Load job for a
dependent object ahead of the Load job for the base object on which it depends.

Levels

In a batch group, the execution path consists of a series of one or more levels that are
executed in sequence (see “Running Batch Jobs in Sequence” on page 670).

Start Node

Batch Job

Levels

End Node

A level is a collection of one or more batch jobs.


• If a level contains multiple batch jobs, then these batch jobs are executed in
parallel.
• If a level contains only a single batch job, then this batch job is executed singly.

All batch jobs in the level must complete before the batch group proceeds to the next
task in the sequence.

Using Batch Jobs 689


Running Batch Jobs Using the Batch Group Tool

Note: Because all of the batch jobs in a level are executed in parallel, none of the batch
jobs in the same level should have any dependencies. For example, the Stage and Load
jobs for a base object should be in separate levels that are executed in the proper
sequence. For more information, see “Running Batch Jobs in Sequence” on page 670.

Other Ways to Execute Batch Groups

In addition to using the Batch Group tool, you can execute batch groups in the
following ways:
• Services Integration Framework (SIF) requests—Applications can invoke the
SIF ExecuteBatchGroupRequest request to execute batch groups directly. For
more information, see the Siperian Services Integration Framework Guide.
• Stored procedures—Execute batch groups through stored procedures using any
job scheduling software (such as Tivoli, CA Unicenter, and so on). For more
information, see “Executing Batch Groups Using Stored Procedures” on page 798.

Starting the Batch Group Tool


To start the Batch Group tool:
• In the Hub Console, expand the Utilities workbench, and then click Batch Group.

The Hub Console displays the Batch Group tool:

Navigation Tree Properties Pane (Selected Item)

690 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Group Tool

The Batch Group tool consist of the following areas:

Area Description
Navigation Tree Hierarchical list of batch groups and execution logs.
Properties Pane Properties and command

Configuring Batch Groups


This section describes how to add, edit, and delete batch groups. For more
information, see “About Batch Groups” on page 688.

Adding Batch Groups

To add a batch group:


1. Start the Batch Group tool. For more information, see “Starting the Batch Group
Tool” on page 690.
2. Acquire a write lock. For more information, see “Acquiring a Write Lock” on page
30.
3. Right-click the Batch Groups node in the Batch Group tree and choose Add
Batch Group from the pop-up menu.

Using Batch Jobs 691


Running Batch Jobs Using the Batch Group Tool

The Batch Group tool adds a “New Batch Group” to the Batch Group tree.
Batch Group Properties

Execution Sequence (Start / Finish Nodes)


Note the empty execution sequence. You will configure this after adding the new
batch group. For more information, see “Configuring Levels for Batch Groups”
on page 694.
4. Specify the following information:

Field Description
Name Specify a unique, descriptive name for this batch group.
Description Enter a description for this batch group.

5. Click the button to save your changes.


The Batch Group tool saves your changes and updates the navigation tree.
To add batch jobs to the new batch group, see “Assigning Batch Jobs to Batch
Group Levels” on page 698.

692 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Group Tool

Editing Batch Group Properties

To edit batch group properties:


1. Start the Batch Group tool. For more information, see “Starting the Batch Group
Tool” on page 690.
2. Acquire a write lock. For more information, see “Acquiring a Write Lock” on page
30.
3. In the navigation tree, expand the Batch Group node to show the batch group that
you want to edit.
4. Specify a different batch group name, if you want.
5. Specify a different description, if you want.
6. Click the button to save your changes.

Deleting Batch Groups

To delete a batch group:


1. Start the Batch Group tool. For more information, see “Starting the Batch Group
Tool” on page 690.
2. Acquire a write lock. For more information, see “Acquiring a Write Lock” on page
30.
3. In the navigation tree, expand the Batch Group node to show the batch group that
you want to delete.
4. Right-click the batch group that you want to delete, and then click Delete Batch
Group.
The Batch Group tool prompts you to confirm deletion.
5. Click Yes.
The Batch Group tool removes the deleted batch group from the navigation tree.

Using Batch Jobs 693


Running Batch Jobs Using the Batch Group Tool

Configuring Levels for Batch Groups

As described in “About Batch Groups” on page 688, a batch group contains one or
more levels that are executed in sequence. This section describes how to specify the
execution sequence by configuring the levels in a batch group.

Adding Levels to a Batch Group

To add a level to a batch group:


1. Start the Batch Group tool. For more information, see “Starting the Batch Group
Tool” on page 690.
2. Acquire a write lock. For more information, see “Acquiring a Write Lock” on page
30.
3. In the navigation tree, expand the Batch Group node to show the batch group that
you want to configure.
4. In the batch groups tree, right click on any level, and choose one of the following
options:

Command Description
Add Level Above Add a level to this batch group above the selected item.
Add Level Below Add a level to this batch group below the selected item.
Move Level Up Move this batch group level above the prior level.
Move Level Down Move this batch group level below the next level.
Remove this Level Remove this batch group level.

694 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Group Tool

The Batch Group tool displays the Choose Jobs to Add to Batch Group dialog.

5. Expand the base object(s) for the job(s) that you want to add.

6. Select the job(s) that you want to add.

Using Batch Jobs 695


Running Batch Jobs Using the Batch Group Tool

To select jobs that you want to execute in parallel, hold down the CTRL key and
click each job that you want to select.
7. Click OK. The Batch Group tool adds the selected job(s) to the batch group.

8. Click the button to save your changes.

Removing Levels From a Batch Group

To remove a level from a batch group:


1. Start the Batch Group tool. For more information, see “Starting the Batch Group
Tool” on page 690.
2. Acquire a write lock. For more information, see “Acquiring a Write Lock” on page
30.
3. In the navigation tree, expand the Batch Group node to show the batch group that
you want to configure.
4. In the batch group, right click on the level that you want to delete, and choose
Remove this Level.

696 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Group Tool

Siperian Hub displays the delete confirmation dialog.

5. Click Yes.
The Batch Group tool removes the deleted level from the batch group.

To Move a Level Up Within a Batch Group

To move a level up within a batch group:


1. Start the Batch Group tool. For more information, see “Starting the Batch Group
Tool” on page 690.
2. Acquire a write lock. For more information, see “Acquiring a Write Lock” on page
30.
3. In the navigation tree, expand the Batch Group node to show the batch group that
you want to configure.
4. In the batch groups tree, right click on the level you want to move up, and choose
Move Level Up.
The Batch Group tool moves the level up within the batch group.

To Move a Level Down Within a Batch Group

To move a level down within a batch group:


1. Start the Batch Group tool. For more information, see “Starting the Batch Group
Tool” on page 690.
2. Acquire a write lock. For more information, see “Acquiring a Write Lock” on page
30.
3. In the navigation tree, expand the Batch Group node to show the batch group that
you want to configure.

Using Batch Jobs 697


Running Batch Jobs Using the Batch Group Tool

4. In the batch groups tree, right click on the level you want to move down, and
choose Move Level Down.
The Batch Group tool moves the level down within the batch group.

Assigning Batch Jobs to Batch Group Levels

In the Batch Group tool, a job is a Siperian Hub batch job. Each level contains one or
more batch jobs. If a level contains multiple batch jobs, then all of those batch jobs are
executed in parallel.

Adding a Batch Job to a Batch Group Level

To add a batch job to a batch group:


1. Start the Batch Group tool. For more information, see “Starting the Batch Group
Tool” on page 690.
2. Acquire a write lock. For more information, see “Acquiring a Write Lock” on page
30.
3. In the navigation tree, expand the Batch Group node to show the batch group that
you want to configure.
4. In the batch groups tree, right click on the level to which you want to add jobs, and
choose Add jobs to this level....
The Batch Group tool displays the Choose Jobs to Add to Batch Group dialog.

698 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Group Tool

5. Expand the base object(s) for the job(s) that you want to add.

6. Select the job(s) that you want to add.


To select multiple jobs at once (to execute them in parallel), hold down the CTRL
key while clicking jobs.
7. Click OK.
8. Save your changes.
The Batch Group tool adds the selected jobs to the target level box. Siperian Hub
executes all batch jobs in a group level in parallel.

Configuring Options for Batch Jobs

When configuring a batch group, you can configure job options for certain kinds of
batch jobs. For more information about these job options, see “Options to Set Before
Executing Batch Jobs” on page 679.

Using Batch Jobs 699


Running Batch Jobs Using the Batch Group Tool

Removing a Batch Job From a Level

To remove a batch job from a level:


1. Start the Batch Group tool. For more information, see “Starting the Batch Group
Tool” on page 690.
2. Acquire a write lock. For more information, see “Acquiring a Write Lock” on page
30.
3. In the navigation tree, expand the Batch Group node to show the batch group that
you want to configure.
4. In the batch group, right click on the job that you want to delete, and choose
Remove Job.
The Batch Group tool displays the delete confirmation dialog.

5. Click Yes to delete the selected job.


The Batch Group tool removes the deleted job from this level in the batch group.

To Move a Batch Job Up a Level

To move a batch job up a level:


1. Start the Batch Group tool. For more information, see “Starting the Batch Group
Tool” on page 690.
2. Acquire a write lock. For more information, see “Acquiring a Write Lock” on page
30.
3. In the navigation tree, expand the Batch Group node to show the batch group that
you want to configure.
4. In the batch group, right click on the job that you want to move up, and choose
Move job up.

700 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Group Tool

The Batch Group tool moves the selected job up one level in the batch group.

To Move a Batch Job Down a Level

To move a batch job down a level:


1. Start the Batch Group tool. For more information, see “Starting the Batch Group
Tool” on page 690.
2. Acquire a write lock. For more information, see “Acquiring a Write Lock” on page
30.
3. In the navigation tree, expand the Batch Group node to show the batch group that
you want to configure.
4. In the batch group, right click on the job that you want to move up, and choose
Move job down.
The Batch Group tool moves the selected job down one level in the batch group.

Refreshing the Batch Groups List


To refresh the batch groups list:
• Right-click anywhere in the navigation pane and choose Refresh.

Executing Batch Groups Using the Batch Group Tool


This section describes how to manage batch group execution in the Batch Group tool.
For more information about executing batch jobs in other ways, such as, using stored
procedures or the Siperian Services Integration Framework, see “Ways to Execute
Batch Jobs” on page 668.

Important: You must have the application server running for the duration of an
executing batch group.

Note: If you delete an object from the Hub Console (for example, if you delete a
mapping), the Batch Group tool highlights any batch jobs that depend on that object
(for example, a stage job) in red. You must resolve this issue prior to re-executing the
batch group.

Using Batch Jobs 701


Running Batch Jobs Using the Batch Group Tool

Navigating to the Control & Logs Screen

The Control & Logs screen is where you can control the execution of a batch group
and view its execution logs.

To navigate to the Control & Logs screen for a batch group.


1. Start the Batch Group tool. For more information, see “Starting the Batch Group
Tool” on page 690.
2. Expand the Batch Group tree to display the batch group that you want to execute.

3. Expand the batch group and click the Control & Logs node.

702 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Group Tool

The Batch Group tool displays the Control & Logs screen for this batch group.
Toolbar Execution logs for this batch group

Execution logs for individual batch jobs


in this batch group

Components of the Control & Logs Screen

This screen contains the following components:

Component Description
Toolbar Command buttons for managing batch group execution.
To learn more, see “Command Buttons for Batch Groups” on
page 703.
Logs for the Batch Group Execution logs for this batch group.
Logs for Batch Jobs Execution logs for individual batch jobs in this batch group.

Command Buttons for Batch Groups

Use the following command buttons to manage batch group execution.

Button Description
Executes this batch group.

Sets the execution status of a failed batch group to restart.


To learn more, see “Restarting a Batch Group That Failed
Execution” on page 707.

Using Batch Jobs 703


Running Batch Jobs Using the Batch Group Tool

Button Description
Sets the execution status of a running batch group to
incomplete. To learn more, see “Handling Incomplete Batch
Group Execution” on page 708.
Removes the selected group or job execution log.

Removes all group and job execution logs.

Refreshes the screen for this batch group.

Executing a Batch Group

To execute a batch group:


1. Navigate to the Control & Logs screen for the batch group.

For more information, see “Navigating to the Control & Logs Screen” on page
702.
2. Click on the node and then select Batch Group > Execute, or click on the
Execute button.
The Batch Group tool executes the batch group and updates the logs panel with
the status of the batch group execution.
3. Click the Refresh button to see the execution result.

704 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Group Tool

The Batch Group tool displays progress information.

When finished, the Batch Group tool adds entries to:


• the group execution log for this batch group
• the job execution log for individual batch jobs

Group Execution Status

Each execution log has one of the following status values:

Icon Description
Processing. The batch group is currently running.

Batch group execution completed successfully.

Batch group execution completed with additional information. For example, for
Stage and Load jobs, this can indicate that some records were rejected (see “Viewing
Rejected Records” on page 710). For Match jobs, this can indicate that the base
object is empty or that there are no more records to match.

Using Batch Jobs 705


Running Batch Jobs Using the Batch Group Tool

Icon Description
Batch group execution failed. For more information, see “Restarting a Batch Group
That Failed Execution” on page 707.
Batch group execution is incomplete. For more information, see “Handling
Incomplete Batch Group Execution” on page 708.
Batch group execution has been reset to start over. For more information, see
“Restarting a Batch Group That Failed Execution” on page 707.

Viewing the Group Execution Log for a Batch Group

Each time that it executes a batch group, the Batch Group tool generates a group
execution log entry. Each log entry has the following properties:

Field Description
Status Current status of this batch job. If batch group execution failed, displays
a description of the problem. For more information, see “Group
Execution Status” on page 705.
Start Date / time when this batch job started.
End Date / time when this batch job ended.
Message Any messages regarding batch group execution.

Viewing the Job Execution Log for a Batch Job

Each time that it executes a batch job within a batch group, the Batch Group tool
generates a job execution log entry.

706 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Group Tool

Each log entry has the following properties:

Field Description
Job Name Name of this batch job.
Status Current status of this batch job. For more information, see “Job
Execution Status” on page 682.
Start Date / time when this batch job started.
End Date / time when this batch job ended.
Message Any messages regarding batch group execution.

Note: If you want to view the metrics for a completed batch job, you can use the Batch
Viewer. For more information, see “Viewing Job Execution Logs” on page 682.

Restarting a Batch Group That Failed Execution

If batch group execution fails, then you can resolve any problems that may have caused
the failure to occur, then restart batch group from the beginning.

To execute the batch group again:


1. In the Logs for My Batch Group list, select the execution log entry for the batch
group that failed.

2. Click Set to Restart.

Using Batch Jobs 707


Running Batch Jobs Using the Batch Group Tool

The Batch Group tool changes the status of this batch job to Restart.

3. Resolve any problems that may have caused the failure to occur and execute the
batch group again. For more information, see “Executing a Batch Group” on page
704.
The Batch Group tool executes the batch group and creates a new execution log
entry.

Note: If a batch group fails and you do not click either the Set to Restart button (see
“Restarting a Batch Group That Failed Execution” on page 707) or the Set to
Incomplete button (see “Handling Incomplete Batch Group Execution” on page 708)
in the Logs for My Batch Group list, Siperian Hub restarts the batch job from the prior
failed level.

Handling Incomplete Batch Group Execution

In very rare circumstances, you might want to change the status of a running batch
group.
• If the batch group status says it is still executing, you can click Set Status to
Incomplete and execute the batch group again. You do this only if the batch
group has stopped executing (due to an error, such as a server reboot or crash) but
Siperian Hub has not detected that the batch group has stopped due to a job
application lock in the metadata.
You will know this is a problem if the current status is Executing but the
database, application server, and logs show no activity. If this occurs, click this
button to clear the job application lock so that you can run the batch group again;

708 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Group Tool

otherwise, you will not be able to execute the batch group. Setting the status to
Incomplete just updates the status of the batch group (as well as all batch jobs
within the batch group)—it does not terminate processing.
Note that, if the job status is Incomplete, you cannot set the job status to Restart.
• If the job status is Failed, you can click Set to Restart. Note that, if the job status
is Restart, you cannot set the job status to Incomplete.

Changing the status allows you to continue doing something else while the batch group
completes.

To set the status of a running batch group to incomplete:


1. In the Logs for My Batch Group list, select the execution log entry for the running
batch group that you want to mark as incomplete.

2. Click Set to Incomplete.


The Batch Group tool changes the status of this batch job to Incomplete.

3. Execute the batch group again. For more information, see “Executing a Batch
Group” on page 704.

Using Batch Jobs 709


Running Batch Jobs Using the Batch Group Tool

Note: If a batch group fails and you do not click either the Set to Restart button (see
“Restarting a Batch Group That Failed Execution” on page 707) or the Set to
Incomplete button (see “Handling Incomplete Batch Group Execution” on page 708)
in the Logs for My Batch Group list, Siperian Hub restarts the batch job from the prior
failed level.

Viewing Rejected Records

If batch group execution resulted in records being written to the rejects table (during
the execution of Stage jobs or Load jobs), then the job execution log enables the View
Rejects button.

To view rejected records:


1. Click the View Rejects button.

The Batch Group tool displays the Rejects window.

2. Navigate and inspect the rejected records as needed.


3. Click Close.

710 Siperian Hub Administrator Guide


Running Batch Jobs Using the Batch Group Tool

Filtering Execution Logs By Status


You can view history logs across all Batch Groups, based on their execution status by
clicking on the appropriate node under the Logs By Status node.

To filter execution logs by status:


1. Start the Batch Group tool. For more information, see “Starting the Batch Group
Tool” on page 690.
2. In the Batch Group tree, expand the Logs by Status node.
The Batch Group tool displays the log status list.

3. Click the particular batch group log entry you want to review in the upper half of
the logs panel.
Siperian Hub displays the detailed job execution logs for that batch group in the
lower half of the panel. For additional information, see:
• “Group Execution Status” on page 705
• “Viewing the Group Execution Log for a Batch Group” on page 706
• “Viewing the Job Execution Log for a Batch Job” on page 706

Note: Batch group logs can be deleted by selecting a batch group log and clicking the
Clear Selected button. To delete all logs shown in the panel, click the Clear All
button.

Using Batch Jobs 711


Running Batch Jobs Using the Batch Group Tool

Deleting Batch Groups


To delete a batch group:
1. Start the Batch Group tool. For more information, see “Starting the Batch Group
Tool” on page 690.
2. Acquire a write lock. For more information, see “Acquiring a Write Lock” on page
30.
3. In the navigation tree, expand the Batch Group node to show the batch group that
you want to delete.
4. In the batch group, right click on the job that you want to move up, and choose
Delete Batch Group (or select Batch Group > Delete Batch Group).

712 Siperian Hub Administrator Guide


Batch Jobs Reference

Batch Jobs Reference


This section describes each Siperian Hub batch job.

Alphabetical List of Batch Jobs


Batch Job Description
Accept Non-Matched For records that have undergone the match process but had no matching data, sets the
Records As Unique consolidation indicator to 1 (consolidated), meaning that the record was unique and did
not require consolidation.
Autolink Jobs Automatically links records that have qualified for autolinking during the match process
and are flagged for autolinking (Automerge_ind=1).
Auto Match and Executes a continual cycle of a Match job, followed by an Automerge job, until there are
Merge Jobs no more records to match, or until the number of matches ready for manual
consolidation exceeds the configured threshold. Used with merge-style base objects only.
Automerge Jobs Automatically merges records that have qualified for automerging during the match
process and are flagged for automerging (Automerge_ind = 1). Used with merge-style
base objects only.
BVT Snapshot Jobs Generates a snapshot of the best version of the truth (BVT) for a base object. Used with
link-style base objects only.
External Match Jobs Matches “externally managed/prepared” records with an existing base object, yielding
the results based on the current match settings—all without actually modifying the data
in the base object.
Generate Match Prepares data for matching by generating match tokens according to the current match
Tokens Jobs settings. Match tokens are strings that encode the columns used to identify candidates for
matching.
Hub Delete Jobs Deletes data from the Hub based on BO / XREF level input.
Key Match Jobs Matches records from two or more sources when these sources use the same primary key.
Compares new records to each other and to existing records, and identifies potential
matches based on the comparison of source record keys as defined by the match rules.
Load Jobs Copies records from a staging table to the corresponding target table in the Hub Store (a
base object or dependent object). During the load process, applies the current trust and
validation rules to the records.
Manual Link Jobs Shows logs for records that have been manually linked in the Merge Manager tool. Used
with link-style base objects only.

Using Batch Jobs 713


Batch Jobs Reference

Batch Job Description


Manual Merge Jobs Shows logs for records that have been manually merged in the Merge Manager tool. Used
with merge-style base objects only.
Manual Unlink Jobs Shows logs for records that have been manually unlinked in the Merge Manager tool.
Used with link-style base objects only.
Manual Unmerge Jobs Shows logs for records that have been manually unmerged in the Merge Manager tool.
Match Jobs Finds duplicate records in the base object, based on the current match rules.
Match Analyze Jobs Conducts a search to gather match statistics but does not actually perform the match
process. If areas of data with the potential for huge match requirements are discovered,
Siperian Hub moves the records to a hold status, which allows a data steward to review
the data manually before proceeding with the match process.
Match for Duplicate For data with a high percentage of duplicate records, compares new records to each other
Data Jobs and to existing records, and identifies exact duplicates. The maximum number of exact
duplicates is based on the Duplicate Match Threshold setting for this base object.
Migrate Link Style To Used with link-style base objects only. Migrates link-style base objects to merge-style base
Merge Style Jobs objects.
Multi Merge Jobs Allows the merge of multiple records in one job.
Promote Jobs Reads the PROMOTE_IND column from an XREF table and changes to ACTIVE the
state on all rows where the column’s value is 1.
Recalculate BO Jobs Recalculates all base objects identified by ROWID_OBJECT column in the table/inline
view if you include the ROWID_OBJECT_TABLE parameter.
If you do not include the parameter, this batch job recalculates all records in the BO, in
batches of MATCH_BATCH_SIZE or 1/4 the number of the records in the table,
whichever is less.
Recalculate BVT Jobs Recalculates the BVT for the specified ROWID_OBJECT.
Reset Links Jobs Updates the records in the _LINK table to account for changes in the data. Used with
link-style base objects only.
Reset Match Table Shows logs of the operation where all matched records have been reset to be queued for
Jobs match.
Revalidate Jobs Executes the validation logic/rules for records that have been modified since the initial
validation during the Load Process.
Stage Jobs Copies records from a landing table into a staging table. During execution, cleanses the
data according to the current cleanse settings.

714 Siperian Hub Administrator Guide


Batch Jobs Reference

Batch Job Description


Synchronize Jobs Updates metadata for base objects. Used after a base object has been loaded but not yet
merged, and subsequent trust configuration changes (such as enabling trust) have been
made to columns in that base object. This job must be run before merging data for this
base object.

Accept Non-Matched Records As Unique


Accept Non-matched Records As Unique jobs change the status of records that have
undergone the match process but had no matching data. This job sets the
consolidation indicator to 1, meaning that the record is consolidated or (in this case)
did not require consolidation. The Automerge job adheres to this setting and treats
these as unique records.

The Accept Non-matched Records As Unique job is created:


• only if the base object has Accept All Unmatched Rows as Unique enabled (set
to Yes) in the Match / Merge Setup configuration. For more information, see
“Accept All Unmatched Rows as Unique” on page 492.
• only after a merge job is run, as described in “Batch Jobs That Are Created When
Changes Occur” on page 673.

Note: This job cannot be executed from the Batch Viewer.

Autolink Jobs
For link-style base objects only, after the Match job has been run, you can run the
Autolink job to automatically link any records that qualified for autolinking during the
match process.

Using Batch Jobs 715


Batch Jobs Reference

Auto Match and Merge Jobs


Auto Match and Merge batch jobs execute a continual cycle of a Match job, followed
by an Automerge job, until there are no more records to match, or until the maximum
number of records for manual consolidation limit is reached (see “Maximum Matches
for Manual Consolidation” on page 490). The match batch size parameter (see
“Number of Rows per Match Job Batch Cycle” on page 491) controls the number of
records per cycle that this process goes through to finish the match and merge cycles.
To learn more, see “Match Jobs” on page 734 and “Automerge Jobs” on page 717.

Important: Do not run an Auto Match and Merge job on a base object that is used to
define relationships between records in inter-table or intra-table match paths. Doing so
will change the relationship data, resulting in the loss of the associations between
records. For more information, see “Relationship Base Objects” on page 498.

Second Jobs Shown After Application Server Restart

If you execute an Auto Match and Merge job, it completes successfully with one job
shown in the status. However, if you stop and restart the application server and return
to the Batch Viewer, you see a second job (listed under Match jobs) with a warning a
few seconds later. The second job is to ensure that either the base object is empty or
there are no more records to match.

Auto Match and Merge Metrics

After running an Auto Match and Merge job, the Batch Viewer displays the following
metrics (if applicable) in the job execution log:

716 Siperian Hub Administrator Guide


Batch Jobs Reference

The following table describes these metrics.

Metric Description
Matched records Number of records that were matched by the Auto Match and Merge
job.
Records tokenized Number of records that were tokenized prior to the Auto Match and
Merge job.
Automerged records Number of records that were merged by the Auto Match and Merge
job.
Accepted as unique Number of records that were accepted as unique records by the Auto
records Match and Merge job. For more information, see “Automerge Jobs”
on page 717.
Applies only if this base object has Accept All Unmatched Rows as
Unique enabled (set to Yes) in the Match / Merge Setup
configuration. For more information, see “Accept All Unmatched
Rows as Unique” on page 492.
Queued for Number of records that were queued for automerge by a Match job
automerge that was executed by the Auto Match and Merge job. For more
information, see “Automerge Jobs” on page 717.
Queued for manual Number of records that were queued for manual merge. Use the
merge Merge Manager in the Hub Console to process these records. For
more information, see the Siperian Hub Data Steward Guide.

Automerge Jobs
For merge-style base objects only, after the Match job has been run, you can run the
Automerge job to automatically merge any records that qualified for automerging
during the match process. When an Automerge job is run, it processes all matches in
the MATCH table that are flagged for automerging (Automerge_ind=1).

Note: For state-enabled objects only, records that are PENDING (source and target
records) or DELETED are never automerged. When a record is deleted, it is removed
from the match table and its consolidation_ind is reset to 4. For more information
regarding how to manage the state of base object or XREF records, refer to
“Configuring State Management for Base Objects” on page 211.

Using Batch Jobs 717


Batch Jobs Reference

Automerge Jobs and Auto Match and Merge

Auto Match and Merge batch jobs execute a continual cycle of a Match job, followed
by an Automerge job, until there are no more records to match, or until the maximum
number of records for manual consolidation limit is reached (see “Maximum Matches
for Manual Consolidation” on page 490). For additional information, see “Auto Match
and Merge Jobs” on page 716.

Automerge Jobs and Trust-Enabled Columns

An Automerge job will fail if there is a large number of trust-enabled columns. The
exact number of columns that cause the job to fail is variable and based on the length
of the column names and the number of trust-enabled columns. Long column names
are at—or close to—the maximum allowable length of 26 characters. To avoid this
problem, keep the number of trust-enabled columns below 40 and/or the length of the
column names short.

Automerge Metrics

After running an Automerge job, the Batch Viewer displays the following metrics (if
applicable) in the job execution log:

The following table describes these metrics.

Metric Description
Automerged records Number of records that were automerged by the Automerge job.
Accepted as unique Number of records that were accepted as unique records by the
records Automerge job. Applies only if this base object has Accept All
Unmatched Rows as Unique enabled (set to Yes) in the Match /
Merge Setup configuration. For more information, see “Accept All
Unmatched Rows as Unique” on page 492.

718 Siperian Hub Administrator Guide


Batch Jobs Reference

BVT Snapshot Jobs


For a base object table, the best version of the truth (BVT) is a record that has been
consolidated with the best cells of data from the source records. For more information,
see “Best Version of the Truth” on page 340.

Note: For state-enabled base objects only, the BVT logic uses the HUB_STATE_IND
to ignore the non contributing base objects where the HUB_STATE_IND is -1 or 0
(PENDING or DELETED state). For the online BUILD_BVT call, provide
INCLUDE_PENDING_IND parameter.

Possible scenarios include:


1. If this parameter is 0 then include only ACTIVE base object records.

2. If this parameter is 1 then include ACTIVE and PENDING base object records.
3. If this parameter is 2 then calculate based on ACTIVE and PENDING XREF
records to provide “what-if ” functionality.
4. If this parameter is 3 then calculate based on ACTIVE XREF records to provide
current BVT based on XREFs, which may be different than the scenario 1.

For more information regarding how to manage the state of base object or XREF
records, refer to Chapter 7, “State Management.”

External Match Jobs


External match jobs match “externally managed/prepared” records with an existing
base object, yielding the results based on the current match settings—all without
actually loading the data from the input table into the base object, changing data in the
base object in any way, or changing the match table associated with the base object.
You can use external matching to pretest data, test match rules, and inspect the results
before running the actual Match job. The base object for External Match jobs must be
a fuzzy-match base object, as described in “Exact-match and Fuzzy-match Base
Objects” on page 320.

Using Batch Jobs 719


Batch Jobs Reference

The External Match job executes as a batch job only—there is no corresponding SIF
request that external applications can invoke. For more information, see “Running
External Match Jobs” on page 724.

Input and Output Tables Used for External Match Jobs

In addition to the base object and its associated match key table, the External Match
job uses the following input and output tables.

External Match Input (EMI) Table

Each base object has an External Match Input (EMI) table for External Match jobs.
This table uses the following naming pattern:

C_BaseObject_EMI

where BaseObject is the name of the base object associated with this External Match job.

720 Siperian Hub Administrator Guide


Batch Jobs Reference

When you create a base object, the Schema Manager automatically creates the
associated EMI table, and automatically adds the following system columns:

Column Name Data Type Size Not Null Description


SOURCE_KEY VARCHAR 50 Used as part of a three-column
composite primary key to uniquely
identify this record and to map to
records in the C_BaseObject_EMO
table.
SOURCE_NAME VARCHAR 50 Used as part of a three-column
composite primary key to uniquely
identify this record and to map to
records in the C_BaseObject_EMO
table.
FILE_NAME VARCHAR 50 Used as part of a three-column
composite primary key to uniquely
identify this record and to map to
records in the C_BaseObject_EMO
table.

When populating the EMI table (see “Populating the Input Table” on page 724), at
least one of these columns must contain data. Note that the column names are
non-restrictive—they can contain any identifying data, as long as the composite
three-column primary key is unique.

In addition, when you configure match rules for a particular column (for example,
Person_Name, Address_Part1, or Exact_Cust_ID), the Schema Manager adds that
column automatically to the C_BaseObject_EMI table.

Using Batch Jobs 721


Batch Jobs Reference

You can view the columns of an external match table in the Schema Manager by
expanding the External Match Table node, as shown in the following example.

The records in the EMI table are analogous to the match batch used in Match jobs. As
described in “Flagging the Match Batch” on page 329, the match batch contains the set
of records that are matched against the rest of records in the base object. The
difference is that, for Match jobs, the match batch records reside in the base object,
while for External Match, these records reside in a separate input table.

External Match Output (EMO) Table

Each base object has an External Match Output (EMO) table that contains the output
data for External Match jobs. This table uses the following naming pattern:

C_BaseObject_EMO

where BaseObject is the name of the base object associated with this External Match job.

Before the External Match job is executed, Siperian Hub drops and re-creates this
table.

722 Siperian Hub Administrator Guide


Batch Jobs Reference

An EMO table contains the following columns:

Column Name Data Type Size Not Null Description


SOURCE_KEY VARCHAR 50 Used as part of a three-column composite
primary key to uniquely identify this
record. Maps back to the source record in
the C_BaseObject_EMI table.
SOURCE_NAME VARCHAR 50 Used as part of a three-column composite
primary key to uniquely identify this
record. Maps back to the source record in
the C_BaseObject_EMI table.
FILE_NAME VARCHAR 50 Used as part of a three-column composite
primary key to uniquely identify this
record. Maps back to the source record in
the C_BaseObject_EMI table.
ROWID_OBJECT_MATCHED CHAR 14 X ROWID_OBJECT of the record in the
base object that matched the record in the
EMI table.
ROWID_MATCH_RULE CHAR 14 Identifies the match rule that was used to
determine whether the two rows matched.
AUTOMERGE_IND NUMBER 38 X Specifies whether a record qualifies for
automatic consolidation during the match
process. One of the following values:
• Zero (0): Record does not qualify for
automatic consolidation. Record
• One (1): Record qualifies for automatic
consolidation.
The Automerge Autolink job processes
any records with an AUTOMERGE_IND
of 1. For more information, see
“Automerge Jobs” on page 717.
CREATOR VARCHAR2‘ 50 User or process responsible for creating
the record.
CREATE_DATE DATE Date on which the record was created.

Instead of populating the match table for the base object, the External Match job
populates this EMO table with match pairs. Each row in the EMO represents a pair of
matched records—one from the EMI table and one from the base object:

Using Batch Jobs 723


Batch Jobs Reference

• The primary key (SOURCE_KEY + SOURCE_NAME + FILE_NAME)


uniquely identifies the record in the EMI table.
• ROWID_OBJECT_MATCHED uniquely identifies the record in the base object.

Populating the Input Table

Before running an External Match job, the EMI table must be populated with records
to match against the records in the base object. The process of loading data into an
EMI table is external to Siperian Hub—you must use a data loading tool that works
with your database platform (such as SQL*Loader).

Important: When you populate this table, you must supply data for at least one of the
system columns (SOURCE_KEY, SOURCE_NAME, and FILE_NAME) to help link
back from the _EMI table. In addition, the C_BaseObject_EMI table must contain flat
records—like the output of a JOIN, with unique source keys and no foreign keys to
other tables.

Running External Match Jobs

To run an external match job for a base object:


1. Populate the data in the C_BaseObject_EMI table using a data loading process that
is external to Siperian Hub. For requirements, see “Populating the Input Table” on
page 724.
2. In the Hub Console, start either of the following tools:
• Batch Viewer according to the instructions in “Starting the Batch Viewer
Tool” on page 674
• Batch Group according to the instructions in “Starting the Batch Group Tool”
on page 690
3. Select the External Match job for the base object.
4. Select the match rule set that you want to use for external match.
The default match rule set is automatically selected. For more information, see
“Configuring Match Rule Sets” on page 531.

724 Siperian Hub Administrator Guide


Batch Jobs Reference

5. Execute the External Match job according to the instructions in “Running Batch
Jobs Manually” on page 677 or “Executing Batch Groups Using the Batch Group
Tool” on page 701.
• The External Match job matches all records in the C_BaseObject_EMI table
against the records in the base object. There is no concept of a consolidation
indicator in the input or output tables.
• The Build Match Group is not run for the results.
6. Inspect the results in the C_BaseObject_EMO table using a data management tool
(external to Siperian Hub).
7. If you want to save the results, make a backup copy of the data before running the
External Match job again.
Note: The C_BaseObject_EMO table is dropped and recreated after every External
Match Job execution.

Generate Match Tokens Jobs


Before you can run the Match job for a given base object, you must first generate the
match tokens. The Generate Match Tokens job generates the match tokens for the base
object according to the current match settings. If you change a match rule, Siperian
Hub might need to regenerate the tokens for the new match criteria, so Siperian Hub
automatically creates a Key Match job, as described in “Batch Jobs That Are Created
When Changes Occur” on page 673. For more information about configuring match
token generation, see “Match Keys and the Tokenization Process” on page 322.

Note: For state-enabled base objects only, the tokenize batch process skips records
that are in the DELETED state. These records can be tokenized through the Tokenize
API, but will be ignored in batch processing. PENDING records can be matched on a
per base object basis by setting the MATCH_PENDING_IND (default off). For more
information regarding how to manage the state of base object or XREF records, refer
to “Configuring State Management for Base Objects” on page 211.

Using Batch Jobs 725


Batch Jobs Reference

Regenerating All Match Tokens

Before you run a Generate Match Tokens job, you can use the Re-generate All Match
Tokens check box to specify the scope of match token generation.

Do one of the following:


• Check (select) this check box to have the Generate Match Tokens job truncate the
match key table and then tokenize the entire base object.
• Uncheck (clear) this check box to have the Generate Match Tokens job generate
only tokens that are missing from the match key table based on the changed match
criteria.

After Generating Match Tokens

After the match tokens are generated, you can run the Match job for a base object.

Hub Delete Jobs


Hub Delete jobs remove data from the Hub based on base object / XREFs input to
the cmxdm.hub_delete_batch stored procedure. You can use the Hub Delete job to
remove an entire source system from the Hub.

Note: Hub Delete jobs execute as a batch only stored procedure—you can not call a
Hub Delete job from the Batch Viewer or Batch Group tools, and there is no
corresponding SIF request that external applications can invoke. For more
information, see “Hub Delete Jobs” on page 769.

726 Siperian Hub Administrator Guide


Batch Jobs Reference

Key Match Jobs


Key Match jobs match records from two or more sources when these sources use the
same primary key. Key Match jobs compare new records to each other and to existing
records, and then identify potential matches based on the comparison of source record
keys as defined by the primary key match rules. A Key Match job is automatically
created for a base object after a primary key match rule has been created or changed in
the Match / Merge Setup configuration for this base object. For more information, see
“Configuring Primary Key Match Rules” on page 578.

Load Jobs
Load jobs move data from a staging table to the corresponding target table (base object
or dependent object) in the Hub Store. Load jobs also calculate trust values for base
objects with defined trusted columns, and they apply validation rules (if defined) to
determine the final trust values. For more information about loading data, including
trust, validation, and delta detection, see “Configuration Tasks for Loading Data” on
page 454.

For state-enabled base objects, the load batch process can load records in any state.
The state is specified as an input column on the staging table. The input state can be
specified in the mapping view a landing table column or it can be derived. If an input
state is not specified in the mapping, then the state is assumed to be ACTIVE. For
more information regarding how to manage the state of base object or XREF records,
refer to “Configuring State Management for Base Objects” on page 211.

The following table describes how input states affect the states of existing XREFs.

Existing No XREF
XREF (Load by No Base
State: ACTIVE PENDING DELETED rowid) Object
Incoming
XREF
State:
Update Update + Update + Insert Insert
ACTIVE Promote Restore

Using Batch Jobs 727


Batch Jobs Reference

Existing No XREF
XREF (Load by No Base
State: ACTIVE PENDING DELETED rowid) Object
Pending Pending Pending Pending Pending
Update Update Update + Update Insert
PENDING Restore
DELETED Soft Delete Hard Delete Hard Delete Error Error
Treat as Treat as Treat as Treat As Treat As
Undefined Active Pending Deleted Active Active

Note: Records are rejected if the HUB_STATE_IND value is not valid.

The following table provides a matrix of how Siperian Hub processes records (for
state-enabled base objects) during Load (and Put) for certain operations based on the
record state:

Incoming Existing Record


Record State State Notes
Update the XREF ACTIVE ACTIVE
record when:
DELETED ACTIVE
PENDING PENDING
ACTIVE PENDING
DELETED DELETED
PENDING DELETED
DELETED When a base object rowid delete
record comes in, Siperian Hub
updates the base object and all
XREF records (regardless of
ROWID_SYSTEM) to
DELETED state.
Insert the XREF PENDING ACTIVE The second record for the pair is
record when: created.
ACTIVE No Record
PENDING No Record

728 Siperian Hub Administrator Guide


Batch Jobs Reference

Incoming Existing Record


Record State State Notes
Delete the XREF ACTIVE PENDING (for Delete the ACTIVE record in
record when: paired records) the pair, the PENDING record
is then updated.
Paired records are two records with
the same PKEY_SRC_OBJECT
and ROWID_SYSTEM.
DELETED PENDING
Siperian Hub PENDING ACTIVE (for Paired records are two records
displays an error paired records) with the same PKEY_SRC_
when: OBJECT and ROWID_
SYSTEM.

Additional notes:
• If the incoming state is not specified (for a Load update), then the incoming state
is assumed to be the same as the current state. For example if the incoming state is
null and the existing state of the XREF or base object to update is PENDING,
then the incoming state is assumed to be PENDING instead of null.
• Siperian Hub deletes XREF records using the Hub Delete batch job. The Hub
Delete batch job removes specified data—up to and including an entire source
system—from Siperian Hub based on your base object/XREF input to the
cmxdm.hub_delete_batch stored procedure. For more information, see “Hub
Delete Jobs” on page 769.

For more information regarding how to manage the state of base object or XREF
records, refer to “Configuring State Management for Base Objects” on page 211.

Rules for Running Load Jobs

The following rules apply to Load jobs:


• Run a Load job only if the Stage job that loads the staging table used by the Load
job has completed successfully.
• Run the Load job for a parent table before you run the Load job for a child table.

Using Batch Jobs 729


Batch Jobs Reference

• Run the Load job for a parent base object before you run the Load job for a
dependent object.
• If a lookup on the child object is not defined (the lookup table and column were
not populated), in order to successfully load data, you must repeat the Stage job on
the child object prior to running the Load job.
• Only one Load job at a time can be run for the same base object or dependent
object. Multiple Load jobs for the same base object or dependent object cannot be
run concurrently.

Forcing Updates in Load Jobs

Before you run a Load job, you can use the Force Update check box to configure how
the Load job loads data from the staging table to the target base object or dependent
object. By default, Siperian Hub checks the Last Update Date for each record in the
staging table to ensure that it has not already loaded the record. To override this
behavior, check (select) the Force Update check box, which ignores the Last Update
Date, forces a refresh, and loads each record regardless of whether it might have
already been loaded from the staging table. Use this approach prudently, however.
Depending on the volume of data to load, forcing updates can carry a price in
processing time.

Generating Match Tokens During Load Jobs

When configuring the advanced properties of a base object in the Schema tool, you can
check (select) the Generate Match Tokens on Load check box to generate match
tokens during Load jobs, after the records have been loaded into the base object. By
default, this check box is unchecked (cleared), and match tokens are generated during
the Match process instead. For more information, see “Editing Base Object
Properties” on page 108 and “Run-time Execution Flow of the Load Process” on page
304.

730 Siperian Hub Administrator Guide


Batch Jobs Reference

Load Job Metrics

After running a Load job, the Batch Viewer displays the following metrics (if
applicable) in the job execution log:

The following table describes these metrics.

Metric Description
Total records Number of records processed by the Load job.
Inserted Number of records inserted by the Load job into the target object.
Updated Number of records updated by the Load job in the target object.
No action Number of records on which no action was taken (the records
already existed in the base object).
Updated XREF Number of records that updated the cross-reference table for this
base object. If you are loading a record during an incremental load,
that record has already been consolidated (exists only in the XREF
and not in the base object).
Records tokenized Number of records tokenized by the Load job. Applies only if the
Generate Match Tokens on Load check box is selected in the Schema
tool. For more information, see “Generating Match Tokens During
Load Jobs” on page 730.
Unmerged source Number of source records that were not merged by the Load job.
records

Using Batch Jobs 731


Batch Jobs Reference

Metric Description
Missing Lookup / Number of source records that were missing lookup information or
Invalid rowid_object had invalid rowid_object records.
records

Manual Link Jobs


For link-style base objects only, after the Match job has been run, data stewards can use
the Merge Manager to process records that have been queued by a Match job for
manual linking.

Manual Merge Jobs


After the Match job has been run, data stewards can use the Merge Manager to process
records that have been queued by a Match job for manual merge. Manual Merge jobs
are run in the Merge Manager—not in the Batch Viewer. The Batch Viewer only allows
you to inspect job execution logs for Manual Merge jobs that were run in the Merge
Manager.

Maximum Matches for Manual Consolidation

In the Schema Manager, you can configure the maximum number of matches ready for
manual consolidation to prevent data stewards from being overwhelmed with
thousands of manual merges for processing. Once this limit is reached, the Match jobs
and the Auto Match and Merge jobs will not run until the number of matches has been
reduced. For more information, see “Maximum Matches for Manual Consolidation” on
page 490.

Executing a Manual Merge Job in the Merge Manager

When you start a Manual Merge job, the Merge Manager displays a dialog with a
progress indicator. A manual merge can take some time to complete. If problems occur
during processing, an error message is displayed on completion. This error also shows
up in the job execution log for the Manual Merge job in the Batch Viewer.

732 Siperian Hub Administrator Guide


Batch Jobs Reference

In the Merge Manager, the process dialog includes a button labeled Mark process as
incomplete that updates the status of the Manual Merge job but does not abort the
Manual Merge job. If you click this button, the merge process continues in the
background. At this point, there will be an entry in the Batch Viewer for this process.
When the process completes, the success or failure is reported. For more information
about the Merge Manager, see the Siperian Hub Data Steward Guide.

Manual Unlink Jobs


For link-style base objects only, after a Manual Link job has been run, data stewards
can use the Data Manager to manually unlink records that have been manually linked.

Manual Unmerge Jobs


For merge-style base objects only, after a Manual Merge job has been run, data
stewards can use the Data Manager to manually unmerge records that have been
manually merged. Manual Unmerge jobs are run in the Data Manager—not in the
Batch Viewer. The Batch Viewer only allows you to inspect job execution logs for
Manual Unmerge jobs that were run in the Data Manager. For more information about
the Data Manager, see the Siperian Hub Data Steward Guide.

Executing a Manual Unmerge Job in the Data Manager

When you start a Manual Unmerge job, the Data Manager displays a dialog with a
progress indicator. A manual unmerge can take some time to complete, especially when
a record in question is the product of many constituent records If problems occur
during processing, an error message is displayed on completion. This error also shows
up in the job execution log for the Manual Unmerge in the Batch Viewer.

In the Data Manager, the process dialog includes a button labeled Mark process as
incomplete that updates the status of the Manual Unmerge job but does not abort the
Manual Unmerge job. If you click this button, the unmerge process continues in the
background. At this point, there will be an entry in the Batch Viewer for this process.
When the process completes, the success or failure is reported.

Using Batch Jobs 733


Batch Jobs Reference

Match Jobs
A match job generates search keys for a base object, searches through the data for match
candidates (records that are possible matches), applies the match rules to the match
candidates, generates the matches, and then queues the matches for either automatic or
manual consolidation. For an introduction, see “Match Process” on page 317.

When you create a new base object in an ORS, Siperian Hub automatically creates its
Match job. Each Match job compares new or updated records in a base object with all
records in the base object. For a detailed description, see “Run-Time Execution Flow
of the Match Process” on page 329.

After running a Match job, the matched rows are queued for automatic and manual
consolidation. Siperian Hub creates jobs that automatically consolidate the appropriate
records (automerge or autolink). If a record is flagged for manual consolidation
(manual merge or manual link), data stewards must use the Merge Manager to perform
the manual consolidation. For more information about manual consolidation, see the
Siperian Hub Data Steward Guide. For more information about consolidation, see “About
the Consolidate Process” on page 335.

You configure Match jobs in the Match / Merge Setup node in the Schema Manager.
For more information, see “Configuration Tasks for the Match Process” on page 484.

Important: Do not run a Match job on a base object that is used to define
relationships between records in inter-table or intra-table match paths. Doing so will
change the relationship data, resulting in the loss of the associations between records.
For more information, see “Relationship Base Objects” on page 498.

Match Tables

When a Siperian Hub Match job runs for a base object, it populates its match table.
Match tables are usually named as Base_Object_MTCH. For more information, see
“Populating the Match Table with Match Pairs” on page 330.

734 Siperian Hub Administrator Guide


Batch Jobs Reference

Match Jobs and State-enabled Base Objects

The following table describes the details of the match batch process behavior given the
incoming states for state-enabled base objects:

Source Base Object Target Base Object


State State Operation Result
ACTIVE ACTIVE The records are analyzed for matching
PENDING ACTIVE Whether PENDING records are ignored in
Batch Match is a table-level parameter. If set,
then batch match will include PENDING
records for the specified Base Object. But the
PENDING records can only be the source
record in a match.
DELETED Any state DELETED records are ignored in Batch Match
ANY PENDING PENDING records cannot be the target of a
match.

Note: For Build Match Group (BMG), do not build groups with PENDING records.
PENDING records to be left as individual matches. PENDING matches will have
automerge_ind=2. For more information regarding how to manage the state of base
object or XREF records, refer to “Configuring State Management for Base Objects”
on page 211.

Auto Match and Merge Jobs

For merge-style base objects only, you can run the Auto Match and Merge job for a
base object. Auto Match and Merge batch jobs execute a continual cycle of a Match
job, followed by an Automerge job, until there are no more records to match, or until
the maximum number of records for manual consolidation limit is reached (see
“Maximum Matches for Manual Consolidation” on page 490). For more information,
see “Auto Match and Merge Jobs” on page 716.

Using Batch Jobs 735


Batch Jobs Reference

Match Stored Procedure

When executing the MATCH job stored procedure:


• CMXMA.MATCH just runs one batch.
• the Match job is dependent on the successful completion of all tokenization jobs
for the base object and any child tables used in intertable match. For more
information about the tokenization job, see “Generate Match Tokens Jobs” on
page 725. For more information about tokens for match, see “About the
Consolidate Process” on page 335.
• the Generate Match Tokens job need not be scheduled. Siperian Hub automatically
runs it.

Setting Limits for Batch Jobs

The Match job for a base object does not attempt to match every record in the base
object against every other record in the base object. Instead, you specify (in the Schema
tool):
• how many records the job should match each time it runs. For more information,
see “Number of Rows per Match Job Batch Cycle” on page 491.
• how many matches are allowed for manual consolidation.
This feature helps to prevent data stewards from being overwhelmed with manual
merges for processing. Once this limit is reached, the Match job will not run until
the number of matches ready for manual consolidation has been reduced. For
more information, see “Maximum Matches for Manual Consolidation” on page
490.

736 Siperian Hub Administrator Guide


Batch Jobs Reference

Selecting a Match Rule Set

For Match jobs, before executing the job, you can select the match rule set that you
want to use for evaluating matches.

The default match rule set for this base object is automatically selected. To choose any
other match rule set, click the drop-down list and select any other match rule set that
has been defined for this base object. For more information, see “Configuring Match
Rule Sets” on page 531.

Match Job Metrics

After running a Match job, the Batch Viewer displays the following metrics (if
applicable) in the job execution log:

Using Batch Jobs 737


Batch Jobs Reference

The following table describes these metrics.

Metric Description
Matched records Number of records that were matched by the Match job.
Records tokenized Number of records that were tokenized by the Match job.
Queued for automerge Number of records that were queued for automerge by the Match
job. Use the Automerge job to process these records. For more
information, see “Automerge Jobs” on page 717.
Queued for manual Number of records that were queued for manual merge by the Match
merge job. Use the Merge Manager in the Hub Console to process these
records. For more information, see the Siperian Hub Data Steward
Guide.

Match Analyze Jobs


Match Analyze jobs perform a search to gather metrics but do not conduct any actual
matching. If areas of data with the potential for huge match requirements (hot spots)
are discovered, Siperian Hub moves these records to an on-hold status to prevent
overmatching. Records that are on hold have a consolidation indicator of 9, which
allows a data steward to review the data manually in the Data Manager tool before
proceeding with the match and consolidation. Match Analyze jobs are typically used to
tune match rules or simply to determine whether data for a base object is overly
“matchy” or has large intersections of data (“hot spots”) that will result in
overmatching.

Dependencies for Match Analyze Jobs

Each Match Analyze job is dependent on new / updated records in the base object that
have been tokenized and are thus queued for matching. For base objects that have
intertable match enabled, the Match Analyze job is also dependent on the successful
completion of the data tokenization jobs for all child tables, which in turn is dependent
on successful Load jobs for the child tables.

738 Siperian Hub Administrator Guide


Batch Jobs Reference

Limiting the Number of On-Hold Records

You can limit the number of records that the Match Analyze job moves to the on-hold
status. By default, no limit is set. To configure a limit, edit the cmxcleanse.properties
file and add the following setting:
cmx.server.match.threshold_to_move_range_to_hold = n

where n is the maximum number of records that the Match Analyze job can move to
the on-hold status. For more information about the cmxcleanse.properties file, see the
Siperian Hub Installation Guide for your platform.

Match Analyze Job Metrics

After running a Match Analyze job, the Batch Viewer displays the following metrics (if
applicable) in the job execution log.

Metrics in Execution Log

Metric Description
Records moved to Hold Status Number of records moved to Hold
Records analyzed (to be matched) Number of records analyzed for match
Match comparisons required Number of actual matches that would be required to
process this base object

Statistics

Statistic Description
Top 10 range count Top ten number of records in a given search range.
Top 10 range comparison count Top ten number of match comparison that will need to
be performed for a given search range.
Total records moved to hold Count of the records moved to hold.
Total matches moved to hold Total number of matches these records moved to hold
required.
Total ranges processed Number of ranges required to process all the matches
in base object.

Using Batch Jobs 739


Batch Jobs Reference

Statistic Description
Total candidates Total number of match candidates required to process
all matches for this base object.
Time for analyze Amount of time required to run the analysis.

Match for Duplicate Data Jobs


Match for Duplicate Data jobs search for exact duplicates to consider them matched.
The maximum number of exact duplicates is based on the base object columns defined
in the Duplicate Match Threshold property in the Schema Manager for each base
object. For more information, see “Duplicate Match Threshold” on page 103. For
more information, see also “Matching for Duplicate Data” on page 326.

Note: The Match for Duplicate Data job does not display in the Batch Viewer when
the duplicate match threshold is set to 1 and non-equal matches are enabled on the
base object.

To match for duplicate data:


1. Execute the Match for Duplicate Data job right after the Load job is finished.

2. Once the Match for Duplicate Data job is complete, run the Automerge job to
process the duplicates found by the Match for Duplicate Data job.
3. Once the Automerge job is complete, run the regular match and merge process
(Match job and then Automerge job, or the Auto Match and Merge job).

Migrate Link Style To Merge Style Jobs


For link-style base objects only, migrates link-style base objects to merge-style base
objects.

740 Siperian Hub Administrator Guide


Batch Jobs Reference

Multi Merge Jobs


A Multi Merge job allows the merge of multiple records in a single job—essentially
incorporating the entire set of records to be merged as one batch. This batch job is
initiated only by external applications that invoke the SIF MultiMergeRequest
request. For more information, see Siperian Services Integration Framework Guide.

Promote Jobs
For state-enabled objects, the Promote job reads the PROMOTE_IND column from
an XREF table and changes the system state to ACTIVE for all rows where the
column’s value is 1. Siperian Hub resets PROMOTE_IND after the Promote job has
run.

Note: The PROMOTE_IND column on a record is not changed to 0 during the


promote batch process if the record is not promoted.

Here are the behavior details for the Promote batch job:

XREF State Base Object Hub Hub


Before State Before Action Action Refresh Resulting BO
Promote Promote on XREF on BO BVT? State Operation Result
PENDING ACTIVE Promote Update Yes ACTIVE Siperian Hub promotes the
pending XREF and recalculates
the BVT to include the
promoted XREF.
PENDING PENDING Promote Promote Yes ACTIVE Siperian Hub promotes the
pending XREF and base object.
The BVT is then calculated
based on the promoted XREF.
DELETED This operation None None No The state of the Siperian Hub ignores
behaves the resulting base DELETED records in Batch
same way object record is Promote. This scenario can only
regardless of unchanged by happen if a record that had been
the state of this operation. flagged for promotion is deleted
the base prior to running the Promote
object record. batch process.

Using Batch Jobs 741


Batch Jobs Reference

XREF State Base Object Hub Hub


Before State Before Action Action Refresh Resulting BO
Promote Promote on XREF on BO BVT? State Operation Result
ACTIVE This operation None None No The state of the Siperian Hub ignores ACTIVE
behaves the resulting base records in Batch Promote. This
same way object record is scenario can only happen if a
regardless of unchanged by record that had been flagged for
the state of this operation. promotion is made ACTIVE
the base prior to running the Promote
object record. batch process.

You can run the Promote job using the following methods:
• Using the Hub Console; for more information, see “Running Promote Jobs Using
the Hub Console”.
• Using the CMXSM.AUTO_PROMOTE stored procedure; for more information,
see “Promote Jobs” on page 790.
• Using the Services Integration Framework (SIF) API (and the associated
SiperianClient Javadoc); for more information, see the Siperian Services Integration
Framework Guide.

Running Promote Jobs Using the Hub Console

To run an Promote job:


1. In the Hub Console, start either of the following tools:

• Batch Viewer according to the instructions in “Starting the Batch Viewer


Tool” on page 674
• Batch Group according to the instructions in “Starting the Batch Group Tool”
on page 690
2. Select the Promote job for the desired base object.
3. Execute the Promote job according to the instructions in “Running Batch Jobs
Manually” on page 677 or “Executing Batch Groups Using the Batch Group Tool”
on page 701.
4. Display the results of the Promote job according to the instructions in “Viewing
Job Execution Logs” on page 682.

742 Siperian Hub Administrator Guide


Batch Jobs Reference

Siperian Hub displays the results of the Promote job:

Promote Job Metrics

After running a Promote job, the Batch Viewer displays the following metrics (if
applicable) in the job execution log.

Once the Promote job has run, you can view these statistics on the job summary page
in the Batch Viewer.

Recalculate BO Jobs
There are two versions of Recalculate BO:
• Using the ROWID_OBJECT_TABLE Parameter—Recalculates all base
objects identified by ROWID_OBJECT column in the table/inline view (note that
brackets are required around inline view).
• Without the ROWID_OBJECT_TABLE Parameter—Recalculates all records
in the base object, in batches of MATCH_BATCH_SIZE or 1/4 the number of
the records in the table, whichever is less.

For more information, see “Recalculate BO Jobs” on page 791.

Using Batch Jobs 743


Batch Jobs Reference

Recalculate BVT Jobs


Recalculates the BVT for the specified ROWID_OBJECT.

For more information, see “Recalculate BVT Jobs” on page 792.

Reset Links Jobs


For link-style base objects only, allows you to remove links for an existing base object.

Reset Match Table Jobs


The Reset Match Table job is created automatically after you run a match job and the
following conditions exist: if records have been updated to consolidation_ind = 2, and
if you then change your match rules, as described in “Configuring Match Column Rules
for Match Rule Sets” on page 542.

If you change your match rules after matching, you are prompted to reset your
matches. When you reset matches, everything in the match table is deleted. In addition,
the Reset Match Table job then resets the consolidation_ind=4 where it is =2. To learn
more, see “About the Consolidate Process” on page 335.

When you save changes to the schema match columns, the following message box is
displayed.

Click Yes to reset the existing matches and create a Reset Match Table job in the Batch
Viewer.

744 Siperian Hub Administrator Guide


Batch Jobs Reference

Note: If you do not reset the existing matches, your next Match job will take longer to
execute because Siperian Hub will need to regenerate the match tokens before running
the Match job.

Note: This job cannot be run from the Batch Viewer.

Revalidate Jobs
Revalidate jobs execute the validation logic/rules for records that have been modified
since the initial validation during the Load Process. You can run Revalidate if/when
records change post the initial Load process’s validation step. If no records change, no
records are updated. If some records have changed and get caught by the existing
validation rules, the metrics will show the results.

Note: Revalidate jobs can only be run if validation is enabled on a column after an
initial load and prior to merge on base objects that have validate rules setup.

Revalidate is executed manually using the batch viewer for base objects. For more
information, see “Running Batch Jobs Using the Batch Viewer Tool” on page 674.

Stage Jobs
Stage jobs move data from a landing table to a staging table, performing any cleansing
that has been configured in the Siperian Hub mapping between the tables (see
“Mapping Columns Between Landing and Staging Tables” on page 380). Stage jobs
have parallel cleanse jobs that you can run (see “About Data Cleansing in Siperian
Hub” on page 406). The stage status indicates which Cleanse Match Server is hit
during a stage. For more information about staging data, see “Configuration Tasks for
the Stage Process” on page 364.

For state-enabled base objects, records are rejected if the HUB_STATE_IND value is
not valid. For more information regarding how to manage the state of base object or
XREF records, refer to “About State Management in Siperian Hub” on page 206.

Note: If the Stage job is grayed out, then the mapping has become invalid due to
changes in the staging table, in a column mapping, or in a cleanse function. Open the

Using Batch Jobs 745


Batch Jobs Reference

specific mapping using the Mappings tool, verify it, and then save it. For more
information, see “Mapping Columns Between Landing and Staging Tables” on page
380.

Stage Job Stored Procedure

When executing the Stage job stored procedure:


• Run the Stage job only if the ETL process responsible for loading the landing table
used by the Stage job completes successfully.
• Make sure that there are no dependencies between Stage jobs.
• You can run multiple Stage jobs simultaneously if there are multiple Cleanse Match
Servers set up to run the jobs.

For more information, see “Stage Jobs” on page 795.

Stage Job Metrics

After running a Stage job, the Batch Viewer displays the following metrics in the job
execution log:

The following table describes these metrics.

Metric Description
Total records Number of records processed by the Stage job.
Inserted Number of records inserted by the Stage job into the target object.
Rejected Number of records rejected by the Stage job. For more information,
see “Viewing Rejected Records” on page 685.

746 Siperian Hub Administrator Guide


Batch Jobs Reference

Synchronize Jobs
You must run the Synchronize job after any changes are made to the schema trust
settings. The Synchronize job is created when any changes are made to the schema
trust settings, as described in “Batch Jobs That Are Created When Changes Occur” on
page 673. For more information, see “Configuring Trust for Source Systems” on page
455.

Reminder Prompt for Running Synchronize Jobs

When you save changes to schema column trust settings in the Systems and Trust tool,
the following message box is displayed.

Clicking OK does not synchronize the column trust settings—this is just an


information box that tells you to run the Synchronize job.

Running Synchronize Jobs

To run the Synchronize job, navigate to the Batch Viewer, find the correct Synchronize
job for the base object, and run it. Siperian Hub updates the metadata for the base
objects that have trust enabled after initial load has occurred.

Considerations for Running Synchronize Jobs


• If you do not run the Synchronize job, you will not be able to run a Load job.
• This job can be run from the Batch Viewer only when a trust update is required
for the base object. For more information, see “Running Synchronize Batch Jobs
After Changes to Trust Settings” on page 467.
• A Synchronize job fails if a large number of trust-enabled columns are defined.
The exact number of columns that cause the job to fail is variable and is based on

Using Batch Jobs 747


Batch Jobs Reference

the length of the column names and the number of trust-enabled columns. Long
column names are at—or close to—the maximum allowable length of 26
characters. To avoid this problem, keep the number of trust-enabled columns
below 48 and/or the length of the column names short. A workaround is to enable
all trust/validation columns before saving the base object to avoid running the
Synchronize job.

748 Siperian Hub Administrator Guide


18
Writing Custom Scripts to Execute Batch
Jobs

This chapter explains how to create custom scripts to execute batch jobs and batch
groups in a Siperian Hub implementation. The information in this chapter is intended
for implementation teams and system administrators. For information how to
configure and execute Siperian Hub batch jobs using the Batch Viewer and Batch
Group tools in the Hub Console, see “About Siperian Hub Batch Jobs” on page 668.

Important: You must have the application server running for the duration of a batch
job.

Chapter Contents
• About Executing Siperian Hub Batch Jobs
• Setting Up Job Execution Scripts
• Monitoring Job Results and Statistics
• Stored Procedure Reference
• Executing Batch Groups Using Stored Procedures
• Developing Custom Stored Procedures for Batch Jobs

749
About Executing Siperian Hub Batch Jobs

About Executing Siperian Hub Batch Jobs


A Siperian Hub batch job is a program that, when executed, completes a discrete unit of
work (a process). All public batch jobs in Siperian Hub can be executed as database
stored procedures. For more information about batch jobs, see the “Using Batch Jobs”
on page 667.

In the Hub Console, the Siperian Hub Batch Viewer and Batch Group tools provide
simple mechanisms for executing Siperian Hub batch jobs. However, they do not
provide a means for executing and managing jobs on a scheduled basis. To execute and
manage jobs according to a schedule, you need to execute stored procedures that do
the work of batch jobs or batch groups. Most organizations have job management
tools that are used to control IT processes. Any such tool capable of executing Oracle
PL*SQL or DB2 SQL commands can be used to schedule and manage Siperian Hub
batch jobs.

Setting Up Job Execution Scripts


This section describes how to set up job execution scripts for running Siperian Hub
stored procedures.

About Job Execution Scripts


Execution scripts enable you to run stored procedures on a scheduled basis to execute
and manage jobs.

Use job execution scripts to perform the following tasks:


• determine whether stored procedures can be run using job scheduling tools; for
more information see “Determining Available Execution Scripts” on page 754
• retrieve identifiers for scripts that execute stored procedures; for more
information, see “Retrieving Values from C_REPOS_TABLE_OBJECT_V at
Execution Time” on page 755
• determine which batch jobs are available to be executed using stored procedures;
for more information, see “Determining Available Execution Scripts” on page 754.

750 Siperian Hub Administrator Guide


Setting Up Job Execution Scripts

• schedule stored procedures to run synchronously or asynchronously; for more


information, see “Running Scripts Asynchronously” on page 755.

Siperian Hub provides information regarding stored procedures, such as whether a


stored procedure can be run using job scheduling tools, or how to retrieve identifiers
that execute stored procedures in the C_REPOS_TABLE_OBJECT_V view.

About the C_REPOS_TABLE_OBJECT_V View


The C_REPOS_TABLE_OBJECT_V view contains metadata and identifiers for the
Siperian Hub stored procedures.

Metadata in the C_REPOS_TABLE_OBJECT_V View

Siperian Hub populates the C_REPOS_TABLE_OBJECT_V view with metadata


about its stored procedures. You use this metadata to:
• determine whether a stored procedure can be run using job scheduling tools, as
described in “Determining Available Execution Scripts” on page 754
• retrieve identifiers in the job execution scripts that execute Siperian Hub stored
procedures, as described in “Retrieving Values from C_REPOS_TABLE_
OBJECT_V at Execution Time” on page 755

C_REPOS_TABLE_OBJECT_V has the following columns:


C_REPOS_TABLE_OBJECT_V Columns
Column Name Description
ROWID_TABLE_OBJECT Uniquely identifies a batch job.
ROWID_TABLE Depending on the type of batch job, this is the table identifier for either the
table affected by the job (target table) or the table providing the data for the job
(source table).
• For Stage jobs, ROWID_TABLE refers to the target table (staging table).
• For Load jobs, ROWID_TABLE refers to the source table (staging table).
• For Match, Match Analyze, Autolink, Automerge, Auto Match and Merge,
External Match, Generate Match Tokens, and Key Match jobs, ROWID_
TABLE refers to the base object table, which is both source and target for
the jobs.

Writing Custom Scripts to Execute Batch Jobs 751


Setting Up Job Execution Scripts

C_REPOS_TABLE_OBJECT_V Columns (Cont.)


Column Name Description
OBJECT_NAME Description of the type of batch job. Examples include:
• Stage jobs: CMX_CLEANSE.EXE.
• Load jobs: CMXLD.LOAD_MASTER.
• Match and Match Analyze jobs: CMXMA.MATCH.
OBJECT_DESC Description of the batch job, including the type of batch job as well as the
object affected by the batch job. Examples include:
• Stage for C_STG_CUSTOMER_CREDIT
• Load from C_STG_CUSTOMER_CREDIT
• Match and Merge for C_CUSTOMER
OBJECT_TYPE_CODE Together with OBJECT_FUNCTION_TYPE_CODE, this is a foreign key to
C_REPOS_OBJ_FUNCTION_TYPE.
An OBJECT_TYPE_CODE of “P” indicates a procedure that can potentially
be executed by a scheduling tool.
OBJECT_FUNCTION_TYPE_ Indicates the actual procedure type (stage, load, match, and so on).
CODE
PUBLIC_IND Indicates whether the procedure is a procedure that can be displayed in the
Batch Viewer.
PARAMETER Describes the parameter list for the procedure. Where specific ROWID_
TABLE values are required for the procedure, these are shown in the
parameter list. Otherwise, the name of the parameter is simply displayed in the
parameter list.
An exception to this is the parameter list for Stage jobs (where OBJECT_
NAME = CMX_CLEANSE.EXE). In this case, the full parameter list is not
shown. For a list of parameters, see “Stage Jobs” on page 795.
VALID_IND If VALID_IND is not equal to 1, do not execute the procedure. It means
that some repository settings have changed that affect the procedure. This
usually applies to changes that affect the Stage jobs if the mappings have not
been checked and saved again. For more information, see “Determining
Available Execution Scripts” on page 754.

752 Siperian Hub Administrator Guide


Setting Up Job Execution Scripts

Identifiers in the C_REPOS_TABLE_OBJECT_V View

Use the following identifier values in C_REPOS_TABLE_OBJECT_V to execute


stored procedures.

OBJECT_ OBJECT_ OBJECT_


TYPE_ FUNCTION_ FUNCTION_
OBJECT_NAME OBJECT_DESC CODE TYPE_CODE TYPE_DESC
CMXUT.ACCEPT_NON_ Change the status of records P U Accept
MATCH_UNIQUE that have undergone the Non-matched
match process but had no Records As
matching data. Unique
CMXMM.AUTOLINK Link data in BaseObjectName P (Procedure) I Autolink
CMXMM.AUTOMERGE Merge data in BaseObjectName P (Procedure) G Automerge
CMXMM.BUILD_BVT Generate BVT snapshot for P V BVT snapshot
BaseObjectName
CMXMA.EXTERNAL_ External Match for P E External match
MATCH BaseObjectName
CMXMA.GENERATE_ Generate Match Tokens for P N Generate match
MATCH_TOKENS BaseObjectName tokens
CMXMA.KEY_MATCH Key Match for BaseObjectName P K Key match
CMXLD.LOAD_MASTER Load from Link P L Load
BaseObjectName
CMXMM.MERGE Process records that have P Y Manual merge
been queued by a Match job
for manual merge.
CMXMA.MATCH Match Analyze for P Z Match analyze
BaseObjectName
CMXMA.MATCH Match for BaseObjectName P M Match
CMXMA.MATCH_AND_ Match and Merge for P B Auto match and
MERGE BaseObjectName merge
CMXMA.MATCH_FOR_ Match for Duplicate Data for P D Match for
DUPS BaseObjectName duplicate data
CMXMM.MLINK Manual Link for P O Manual link
BaseObjectName

Writing Custom Scripts to Execute Batch Jobs 753


Setting Up Job Execution Scripts

OBJECT_ OBJECT_ OBJECT_


TYPE_ FUNCTION_ FUNCTION_
OBJECT_NAME OBJECT_DESC CODE TYPE_CODE TYPE_DESC
CMXMA.MIGRATE_LINK_ Migrate Link Style to Merge P J Migrate link
STYLE_TO_MERGE_ Style for BaseObjectName style to merge
STYLE style
CMXMM.MULTI_MERGE Multi Merge for P P Multi merge
BaseObjectName
CMXSM.AUTO_PROMOTE Reads the PROMOTE_IND P PR Promote
column from an XREF table
and for all rows where the
column’s value is 1, changes
the ACTIVE state to on.
CMXMM.MUNLINK Manual Unlink for P Q Manual unlink
BaseObjectName
CMXMA.RESET_LINKS Reset Links for P W Reset links
BaseObjectName
CMXMA.RESET_MATCH Reset Match table for P R Reset match
BaseObjectName table
CMXUT.REVALIDATE_BO Revalidate BaseObjectName P H Revalidate BO
CMXCL.START_CLEANSE Stage for P C Stage
TargetStagingTableName
CMXUT.SYNC Synchronize after changes are P S Synchronize
made to the schema trust
settings.
CMXMM.UNMERGE Unmerge for BaseObjectName P X Manual
unmerge

Determining Available Execution Scripts


To determine which batch jobs are available to be executed using stored procedures,
run a query using the standard Siperian Hub view called
C_REPOS_TABLE_OBJECT_V, as shown in the following example:
SELECT *
FROM C_REPOS_TABLE_OBJECT_V
WHERE PUBLIC_IND = 1 :

754 Siperian Hub Administrator Guide


Monitoring Job Results and Statistics

Retrieving Values from C_REPOS_TABLE_OBJECT_V at


Execution Time
Use SQL statements to retrieve values from C_REPOS_TABLE_OBJECT_V when
executing scripts at run time. The following example code retrieves the
STG_ROWID_TABLE and ROWID_TABLE_OBJECT for cleanse jobs.
SELECT A.ROWID_TABLE, A.ROWID_TABLE_OBJECT INTO IN_STG_ROWID_TABLE,
IN_ROWID_TABLE_OBJECT
FROM C_REPOS_TABLE_OBJECT_V A, C_REPOSE_TABLE B
WHERE A.OBJECT_NAME = 'CMX_CLEANSE.EXE'
AND B.ROWID_TABLE = A.ROWID_TABLE
AND B.TABLE_NAME = 'C_HMO_ADDRESS'
AND A.VALID_IND = 1;

Running Scripts Asynchronously


By default, the execution scripts run synchronously (IN_RUN_SYNCH = ‘TRUE’ or
IN_RUN_SYNCH = NULL). To run the execution scripts asynchronously, specify
IN_RUN_SYNCH = ‘FALSE’. Note that these Boolean values are case-sensitive and
must be specified in upper-case characters.

Monitoring Job Results and Statistics


This section describes how to monitor the results and view the associated statistics of
batch jobs run in job execution scripts.

Error Messages and Return Codes


Siperian Hub stored procedures return an error message and return code.

Returned Parameter Description


OUT_ERROR_MSG Error message if an error occurred.
OUT_RETURN_CODE Return code. Zero (0) if no errors occurred, or one (1) if an
error occurred.

Writing Custom Scripts to Execute Batch Jobs 755


Monitoring Job Results and Statistics

Error handling code in job execution scripts can look for return codes and trap any
associated error messaged.

Job Execution Status


Siperian Hub stored procedures log their job execution status and statistics in the
Siperian Hub repository. The following figure illustrates the repository tables that can
be used for monitoring job results and statistics:

756 Siperian Hub Administrator Guide


Monitoring Job Results and Statistics

The following table describes the various repository tables.


Repository Tables Used for Monitoring Job Results and Statistics
Table Name Description
C_REPOS_JOB_CONTROL As soon as a job starts to run, it registers itself in C_REPOS_JOB_
CONTROL with a RUN_STATUS of 2 (Running/Processing). Once the
job completes, its status is updated to one of the following values:
• 0 (Completed Successfully)—Completed without any errors or
warnings.
• 1 (Completed with Errors)—Completed, but with some warnings
or data rejections. See the RETURN_CODE for any error code and
the STATUS_MESSAGE for a description of the error/warning.
• 2 (Running / Processing)
• 3 (Failed—Job did not complete). Corrective action must be
taken and the job must be run again. See the RETURN_CODE for
any error code and the STATUS_MESSAGE for the reason for
failure.
• 4 (Incomplete)—The job failed before updating its job status and
has been manually marked as incomplete. Corrective action must be
taken and the job must be run again. RETURN_CODE and
STATUS_MESSAGE will not provide any useful information.
Marked as incomplete by clicking the Set Status to Incomplete
button in the Batch Viewer.
C_REPOS_JOB_METRIC When a batch job has completed, it registers its statistics in
C_REPOS_JOB_METRIC. There can be multiple statistics for each job.
Join to C_REPOS_JOB_METRIC_TYPE to get a description for each
statistic.
C_REPOS_JOB_METRIC_TYPE Stores the descriptions of the types of metrics that can be registered in
C_REPOS_JOB_METRIC.
C_REPOS_JOB_STATUS_TYPE Stores the descriptions of the RUN_STATUS values that can be
registered in C_REPOS_JOB_CONTROL.

Writing Custom Scripts to Execute Batch Jobs 757


Stored Procedure Reference

Stored Procedure Reference


This section provides a reference for the stored procedures that represent Siperian Hub
batch jobs. Siperian Hub provides these stored procedures, in compiled form, for each
Operational Record Store (ORS), for Oracle databases. You can use any job scheduling
software (such as Tivoli, CA Unicenter, and so on) to execute these stored procedures.

Note: All the input parameters that need a delimited list require a trailing “~”
character.

Alphabetical List of Batch Jobs


Batch Job Description
Accept Non-matched For records that have undergone the match process but had no matching data, sets the
Records As Unique consolidation indicator to 1 (consolidated), meaning that the record was unique and did
not require consolidation.
Autolink Jobs Automatically links records that have qualified for autolinking during the match process
and are flagged for autolinking (Autolink_ind=1). Used with link-style base objects only.
Auto Match and Executes a continual cycle of a Match job, followed by an Automerge job, until there are
Merge Jobs no more records to match, or until the size of the manual merge queue exceeds the
configured threshold. Used with merge-style base objects only.
Automerge Jobs Automatically merges records that have qualified for automerging during the match
process and are flagged for automerging (Automerge_ind=1). Used with merge-style
base objects only.
BVT Snapshot Jobs Generates a snapshot of the best version of the truth (BVT) for a base object. Used with
link-style base objects only.
Execute Batch Group Constructs an XML message and sends it to the MRM Server SIF API
Jobs (ExecuteBatchGroupRequest), which performs the operation. For more information, see
“Stored Procedures for Batch Groups” on page 799.
External Match Jobs Matches “externally managed/prepared” records with an existing base object, yielding
the results based on the current match settings—all without actually modifying the data
in the base object.
Generate Match Token Prepares data for matching by generating match tokens according to the current match
Jobs settings. Match tokens are strings that encode the columns used to identify candidates for
matching.
Get Batch Group Returns the status of a batch group. For more information, see “Stored Procedures for
Status Jobs Batch Groups” on page 799.

758 Siperian Hub Administrator Guide


Stored Procedure Reference

Batch Job Description


Hub Delete Jobs Deletes data from the Hub based on base object / XREF level input.
Key Match Jobs Matches records from two or more sources when these sources use the same primary key.
Compares new records to each other and to existing records, and identifies potential
matches based on the comparison of source record keys as defined by the match rules.
Load Jobs Copies records from a staging table to the corresponding target table in the Hub Store (a
base object or dependent object). During the load process, it also applies the current
trust and validation rules to the records.
Manual Link Jobs Shows logs for records that have been manually linked in the Merge Manager tool. Used
with link-style base objects only.
Manual Merge Jobs Shows logs for records that have been manually merged in the Merge Manager tool. Used
with merge-style base objects only.
Manual Unlink Jobs Shows logs for records that have been manually unlinked in the Data Manager tool. Used
with link-style base objects only.
Manual Unmerge Jobs Shows logs for records that have been manually unmerged in the Merge Manager tool.
Used with merge-style base objects only.
Match Jobs Finds duplicate records in the base object, based on the current match rules.
Match Analyze Jobs Conducts a search to gather match statistics but does not actually perform the match
process. If areas of data with the potential for huge match requirements are discovered,
Siperian Hub moves the records to a hold status, which allows a data steward to review
the data manually before proceeding with the match process.
Match for Duplicate For data with a high percentage of duplicate records, compares new records to each other
Data Jobs and to existing records, and identifies exact duplicates. The maximum number of exact
duplicates is based on the Duplicate Match Threshold setting for this base object.
Note: The Match for Duplicate Data batch job has been deprecated.
Promote Jobs Reads the PROMOTE_IND column from an XREF table and changes to ACTIVE the
state on all rows where the column’s value is 1.
Recalculate BO Jobs Recalculates all base objects identified by ROWID_OBJECT column in the table/inline
view if you include the ROWID_OBJECT_TABLE parameter.
If you do not include the parameter, this batch job recalculates all records in the BO, in
batches of MATCH_BATCH_SIZE or 1/4 the number of the records in the table,
whichever is less.
Recalculate BVT Jobs Recalculates the BVT for the specified ROWID_OBJECT.
Reset Batch Group Resets a batch group. For more information, see “Stored Procedures for Batch Groups”
Status Jobs on page 799.

Writing Custom Scripts to Execute Batch Jobs 759


Stored Procedure Reference

Batch Job Description


Reset Links Jobs Updates the records in the _LINK table to account for changes in the data. Used with
link-style base objects only.
Reset Match Table Shows logs of the operation where all matched records have been reset to be queued for
Jobs match.
Revalidate Jobs Executes the validation logic/rules for records that have been modified since the initial
validation during the Load Process. You can run Revalidate if/when records change after
the initial Load process’s validation step. If no records change, no records are updated. If
some records have changed and get caught by the existing validation rules, the metrics
will show the results.
Stage Jobs Copies records from a landing table into a staging table. During execution, cleanses the
data according to the current cleanse settings.
Synchronize Jobs Updates metadata for base objects. Used after a base object has been loaded but not yet
merged, and subsequent trust configuration changes (such as enabling trust) have been
made to columns in that base object. This job must be run before merging data for this
base object.

Accept Non-matched Records As Unique


Accept Non-matched Records As Unique jobs change the status of records that have
undergone the match process but had no matching data. This job sets the
consolidation indicator to 1, meaning that the record is consolidated or (in this case)
did not require consolidation. The Automerge job adheres to this setting and treats
these as unique records.

The Accept Non-matched Records As Unique job is created:


• only if the base object has Accept All Unmatched Rows as Unique enabled (set
to Yes) in the Match / Merge Setup configuration. For more information, see
“Accept All Unmatched Rows as Unique” on page 492.
• only after a merge job is run, as described in “Batch Jobs That Are Created When
Changes Occur” on page 673.

Note: This job cannot be executed from the Batch Viewer.

760 Siperian Hub Administrator Guide


Stored Procedure Reference

Stored Procedure Definition for Accept Non-matched Records


As Unique Jobs
PROCEDURE CMXUT.ACCEPT_NON_MATCH_UNIQUE (
IN_ROWID_TABLE IN CHAR(14)
,IN_ROWID_USER IN CHAR(14)
,IN_ASSIGNMENT_IND INT
,OUT_ACCEPT_UNIQUE_CNT OUT INT
,OUT_ERROR_MSG OUT VARCHAR2(1024)
,RC OUT INT
)

Sample Job Execution Script for Accept Non-matched Records


As Unique
-- ACCEPT RECORDS ASSIGNED TO ALL USERS
DECLARE
V_ROWID_TABLE CHAR( 14 );
OUT_ACCEPT_UNIQUE_CNT INTEGER;
OUT_ERROR_MESSAGE VARCHAR2( 1024 );
OUT_RETURN_CODE INTEGER;
BEGIN
SELECT ROWID_TABLE
INTO V_ROWID_TABLE
FROM C_REPOS_TABLE
WHERE TABLE_NAME = 'C_CUSTOMER';

CMXUT.ACCEPT_NON_MATCH_UNIQUE( V_ROWID_TABLE, NULL, 0,


OUT_ACCEPT_UNIQUE_CNT, OUT_ERROR_MESSAGE, OUT_RETURN_CODE );
DBMS_OUTPUT.PUT_LINE( 'NUMBER FOR RECORDS ACCEPTED AS UNIQUE: '
|| OUT_ACCEPT_UNIQUE_CNT );
DBMS_OUTPUT.PUT_LINE( 'RETURN MESSAGE: ' || SUBSTR( OUT_ERROR_
MESSAGE, 1, 255 ));
DBMS_OUTPUT.PUT_LINE( 'RETURN CODE: ' || OUT_RETURN_CODE );
END;
/
-- ACCEPT ONLY RECORDS ASSIGNED TO SPECIFIC USER
DECLARE
V_ROWID_TABLE CHAR( 14 );
V_ROWID_USER CHAR( 14 );
OUT_ACCEPT_UNIQUE_CNT INTEGER;
OUT_ERROR_MESSAGE VARCHAR2( 1024 );
OUT_RETURN_CODE INTEGER;
BEGIN
SELECT ROWID_TABLE

Writing Custom Scripts to Execute Batch Jobs 761


Stored Procedure Reference

INTO V_ROWID_TABLE
FROM C_REPOS_TABLE
WHERE TABLE_NAME = 'C_CUSTOMER';

SELECT ROWID_USER
INTO V_ROWID_USER
FROM C_REPOS_USER
WHERE USER_NAME = 'ADMIN';

CMXUT.ACCEPT_NON_MATCH_UNIQUE( V_ROWID_TABLE, V_ROWID_USER, 1,


OUT_ACCEPT_UNIQUE_CNT, OUT_ERROR_MESSAGE, OUT_RETURN_CODE );
DBMS_OUTPUT.PUT_LINE( 'NUMBER FOR RECORDS ACCEPTED AS UNIQUE: '
|| OUT_ACCEPT_UNIQUE_CNT );
DBMS_OUTPUT.PUT_LINE( 'RETURN MESSAGE: ' || SUBSTR( OUT_ERROR_
MESSAGE, 1, 255 ));
DBMS_OUTPUT.PUT_LINE( 'RETURN CODE: ' || OUT_RETURN_CODE );
COMMIT;
END;
/

Autolink Jobs
Autolink jobs automatically link records that have qualified for autolinking during the
match process and are flagged for autolinking (Autolink_ind = 1).

Auto Match and Merge Jobs


Auto Match and Merge batch jobs execute a continual cycle of a Match job, followed
by an Automerge job, until there are no more records to match, or until the size of the
manual merge queue exceeds the configured threshold. Auto Match and Merge jobs are
used with merge-style base objects only. For more information, see “Auto Match and
Merge Jobs” on page 716.

Important: Do not run an Auto Match and Merge job on a base object that is used to
define relationships between records in inter-table or intra-table match paths. Doing so
will change the relationship data, resulting in the loss of the associations between
records. For more information, see “Relationship Base Objects” on page 498.

762 Siperian Hub Administrator Guide


Stored Procedure Reference

Identifiers for Executing Auto Match and Merge Jobs

To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.

Dependencies for Auto Match and Merge Jobs

The Auto Match and Merge jobs for a target base object can either be run on
successful completion of each Load job, or on successful completion of all Load jobs
for the object.

Successful Completion of Auto Match and Merge Jobs

Auto Match and Merge jobs must complete with a RUN_STATUS of 0 (Completed
Successfully) or 1 (Completed with Errors) to be considered successful.

Stored Procedure Definition for Auto Match and Merge Jobs


PROCEDURE CMXMA.MATCH_AND_MERGE (
IN_ROWID_TABLE IN CHAR(14) --Rowid of a table
,IN_USER_NAME IN VARCHAR2(200) --User name
,IN_MATCH_SET_NAME IN VARCHAR2(500) DEFAULT NULL
,OUT_ERROR_MSG OUT VARCHAR2(2000) --Error message, if any
,RC OUT INT
,IN_JOB_GRP_CTRL IN CHAR(14) DEFAULT NULL
,IN_JOB_GRP_ITEM IN CHAR(14) DEFAULT NULL
)

Sample Job Execution Script for Auto Match and Merge Jobs
DECLARE
IN_ROWID_TABLE CHAR(14);
IN_USER_NAME VARCHAR2(200);
IN_MATCH_SET_NAME VARCHAR(200);
OUT_ERROR_MSG VARCHAR2(2000);
OUT_RETURN_CODE NUMBER;
BEGIN
IN_ROWID_TABLE := 'SVR1.188';
IN_USER_NAME := 'CMX_ORS';
IN_MATCH_SET_NAME := 'MRS2';
OUT_ERROR_MSG := NULL;

Writing Custom Scripts to Execute Batch Jobs 763


Stored Procedure Reference

OUT_RETURN_CODE := NULL;
CMXMA.MATCH_AND_MERGE ( IN_ROWID_TABLE, IN_USER_NAME,
IN_MATCH_SET_NAME, OUT_ERROR_MSG, OUT_RETURN_CODE );
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('RC = ' || TO_CHAR(RC));
COMMIT;
END;

Automerge Jobs
Automerge jobs automatically merge records that have qualified for automerging
during the match process and are flagged for automerging (Automerge_ind = 1).
Automerge jobs are used with merge-style base objects only. For more information, see
“Automerge Jobs” on page 717.

Identifiers for Executing Automerge Jobs

To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.

Dependencies for Automerge Jobs

Each Automerge job is dependent on the successful completion of the match process,
and the queuing of records for automerge.

Successful Completion of Automerge Jobs

Automerge jobs must complete with a RUN_STATUS of 0 (Completed Successfully)


or 1 (Completed with Errors) to be considered successful.

Stored Procedure Definition for Automerge Jobs


PROCEDURE CMXMM.AUTOMERGE (
IN_ROWID_TABLE IN CHAR(14) --Rowid of a table
,IN_USER_NAME IN VARCHAR2(200) --User name
,OUT_ERROR_MESSAGE OUT VARCHAR2(2000) --Error message, if any
,OUT_RETURN_CODE OUT NUMBER --Return code (if no errors, 0 is
returned)
)

764 Siperian Hub Administrator Guide


Stored Procedure Reference

Sample Job Execution Script for Automerge Jobs


DECLARE
IN_ROWID_TABLE CHAR(14);
IN_USER_NAME VARCHAR2(200);
OUT_ERROR_MESSAGE VARCHAR2(2000);
OUT_RETURN_CODE NUMBER;

BEGIN
IN_ROWID_TABLE := NULL;
IN_USER_NAME := NULL;
OUT_ERROR_MESSAGE := NULL;
OUT_RETURN_CODE := NULL;

CMXMM.AUTOMERGE ( IN_ROWID_TABLE, IN_USER_NAME, OUT_ERROR_MESSAGE,


OUT_RETURN_CODE );
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('OUT_RETURN_CODE = ' || TO_CHAR(OUT_RETURN_
CODE));
COMMIT;
END;

BVT Snapshot Jobs


The BVT Snapshot stored procedure generates a snapshot of the best version of the
truth (BVT) for a base object.

Execute Batch Group Jobs


Execute Batch Group jobs (CMXBG.EXECUTE_BATCHGROUP) execute a batch
group. Note that there are two other related batch group stored procedures:
• Reset Batch Group Jobs (CMXBG.RESET_BATCHGROUP)
• Get Batch Group Status Jobs (CMXBG.GET_BATCHGROUP_STATUS)

For more information, see “Stored Procedures for Batch Groups” on page 799.

Writing Custom Scripts to Execute Batch Jobs 765


Stored Procedure Reference

External Match Jobs


Matches “externally managed/prepared” records with an existing base object, yielding
the results based on the current match settings—all without actually loading the data
from the input table into the base object, changing data in the base object in any way,
or changing the match table associated with the base object. You can use external
matching to pretest data, test match rules, and inspect the results before running the
actual Match job. For more information, see “External Match Jobs” on page 719.

Note: The External Batch job executes as a batch job only—there is no corresponding
SIF request that external applications can invoke.

Stored Procedure Definition for External Match Jobs


PROCEDURE CMXMA.EXTERNAL_MATCH(
IN_ROWID_TABLE IN CHAR(14)
, IN_USER_NAME IN VARCHAR2(50)
, IN_MATCH_SET_NAME IN VARCHAR2(500) DEFAULT NULL
, OUT_ERROR_MSG OUT VARCHAR2(1024)
, RC OUT INT
, IN_JOB_GRP_CTRL IN CHAR(14) DEFAULT NULL
, IN_JOB_GRP_ITEM IN CHAR(14) DEFAULT NULL
)

Sample Job Execution Script for External Match Jobs


DECLARE
IN_ROWID_TABLE CHAR(14);
IN_USER_NAME VARCHAR2(200);
IN_MATCH_SET_NAME VARCHAR2(200);
OUT_ERROR_MSG VARCHAR2(2000);
RC NUMBER;
BEGIN
IN_ROWID_TABLE := NULL;
IN_USER_NAME := NULL;
IN_MATCH_SET_NAME := NULL;
OUT_ERROR_MSG := NULL;
RC := NULL;
IN_JOB_GRP_CTRL := NULL;
IN_JOB_GRP_ITEM := NULL;

CMXMA.EXTERNAL_MATCH ( IN_ROWID_TABLE, IN_USER_NAME,

766 Siperian Hub Administrator Guide


Stored Procedure Reference

IN_MATCH_SET_NAME, OUT_ERROR_MSG, RC, IN_JOB_GRP_CTRL,


IN_JOB_GRP_ITEM, );
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('RC = ' || TO_CHAR(RC));
COMMIT;
END;

Generate Match Token Jobs


Generate Match Tokens jobs prepare data for matching by generating match tokens
according to the current match settings. Match tokens are strings that encode the
columns used to identify candidates for matching. For more information, see
“Generate Match Tokens Jobs” on page 725.

Schedule Generate Match Tokens jobs if you run the load process without data
tokenization, or if match failed during tokenization. The Generate Match Tokens job
generates the match tokens for the entire base object (when IN_FULL_RESTRIP_
IND is set to 1).

Note: Check (select) the Re-generate All Match Tokens check box in the Batch Viewer
to populate the IN_FULL_RESTRIP_IND parameter.

Identifiers for Executing Generate Match Token Jobs

To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.

Dependencies for Generate Match Token Jobs

Each Generate Match Tokens job is dependent on the successful completion of the
Load job responsible for loading data into the base object.

Successful Completion of Generate Match Token Jobs

Generate Match Tokens jobs must complete with a RUN_STATUS of 0 (Completed


Successfully).

Writing Custom Scripts to Execute Batch Jobs 767


Stored Procedure Reference

Stored Procedure Definition for Generate Match Token Jobs


PROCEDURE CMXMA.GENERATE_MATCH_TOKENS (
IN_ROWID_TABLE IN CHAR(14) --Rowid of a table
,IN_USER_NAME IN VARCHAR2(200) --User name
,OUT_ERROR_MSG OUT VARCHAR2(2000) --Error message, if any
,OUT_RETURN_CODE OUT NUMBER --Return code (if no errors, 0 is
returned)
,IN_JOB_GRP_CTRL IN CHAR(14) DEFAULT NULL
,IN_JOB_GRP_ITEM IN CHAR(14) DEFAULT NULL
,IN_FULL_RESTRIP_IND IN NUMBER --Default 0, retokenize entire table
if set to 1 (strip_truncate_insert)
)

Sample Job Execution Script for Generate Match Token Jobs


DECLARE
IN_ROWID_TABLE CHAR(14);
IN_USER_NAME VARCHAR2(200);
OUT_ERROR_MSG VARCHAR2(2000);
OUT_RETURN_CODE NUMBER;
IN_FULL_RESTRIP_IND NUMBER;
BEGIN
IN_ROWID_TABLE := NULL;
IN_USER_NAME := NULL;
OUT_ERROR_MSG := NULL;
OUT_RETURN_CODE := NULL;
IN_FULL_RESTRIP_IND := NULL;

CMXMA.GENERATE_MATCH_TOKENS ( IN_ROWID_TABLE, IN_USER_NAME,


OUT_ERROR_MSG, OUT_RETURN_CODE, IN_FULL_RESTRIP_IND );
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('OUT_RETURN_CODE = ' || TO_CHAR(OUT_RETURN_
CODE));
COMMIT;
END;

768 Siperian Hub Administrator Guide


Stored Procedure Reference

Get Batch Group Status Jobs


Get Batch Group Status jobs returns the status of a batch group. Note that there are
two other related batch group stored procedures:
• Execute Batch Group Jobs (CMXBG.EXECUTE_BATCHGROUP)
• Reset Batch Group Jobs (CMXBG.RESET_BATCHGROUP)

For more information, see “Stored Procedures for Batch Groups” on page 799.

Hub Delete Jobs


The Hub Delete job removes specified data—up to and including an entire source
system—from Siperian Hub based on your base object / XREF input to the
CMXDM.HUB_DELETE_BATCH stored procedure.

Although the Hub Delete job deletes the XREF record, a pointer to the deleted record
(actually to the parent base object of this XREF) could potentially be present on the
_HMXR table (on column ORIG_TGT_ROWID_OBJECT). The Match Tree tool
displays REMOVED (ID#: xxxx) for the removed record(s).

Important:
• The Hub Delete batch job will not delete the data if there are records queued for
an Automerge job.
• Do not run a Hub Delete job when there are automerge records in the match
table. Run the Hub Delete job after the automerge matches are processed.

Cascade Delete

The Hub Delete job performs a cascade delete if you set the parameter IN_ALLOW_
CASCADE_DELETE_IND=1 for a base object in the stored procedure. With
cascade delete, when records in the parent object are deleted, Hub Delete also removes
the affected records in the child base object. Hub Delete checks each child BO table
for related data that should be deleted given the removal of the parent BO record.

Writing Custom Scripts to Execute Batch Jobs 769


Stored Procedure Reference

Important: For the prior example, the Hub Delete job may potentially delete XREF
records from other source systems. To ensure that Hub Delete does not delete XREF
records from other systems, do not use cascade delete. IN_ALLOW_CASCADE_
DELETE_IND forces Hub Delete to delete the child base objects and
cross-references (regardless of system) when the parent base object is being deleted.

Notes:
• If you do not set the IN_ALLOW_CASCADE_DELETE_IND=1, Siperian Hub
generates an error message if there are child base objects referencing the deleted
base objects record; Hub Delete fails, and Siperian Hub performs a rollback
operation for the associated data.
• IN_CASCADE_CHILD_SYSTEM_XREF=1 is not supported in XU SP1. Since
there may be situations where you would want to selectively cascade deletes to
child records, you would have to perform child deletes first, and then parent
deletes with the cascade delete feature disabled.

Hub Delete Impact on History Tables

If you set IN_OVERRIDE_HISTORY_IND=1, Hub Delete does not write to history


tables when deleting.

If you set IN_OVERRIDE_HISTORY_IND=1 and set IN_PURGE_HISTORY_


IND=1, then Hub Delete removes history tables to delete all traces of the data.

If IN_PURGE_HISTORY_IND=1 and IN_OVERRIDE_HISTORY_IND=0, there


is no effect.

Note: Siperian Hub sets the HUB_STATE_IND to -9 in the HXRF when XREFs are
deleted. The HIST table will be set to -9 if the BO record is deleted.

Hub Delete Impact on Records on Hold

The Hub Delete job removes “records on hold” or records that have had their
CONSOLIDATION_IND column set to 9.

770 Siperian Hub Administrator Guide


Stored Procedure Reference

Stored Procedure Definition for Hub Delete Jobs


PROCEDURE CMXDM.HUB_DELETE_BATCH (
IN_BO_TABLE_NAME IN VARCHAR2(30)
,IN_XREF_LIST_TO_BE_DELETED IN VARCHAR2(30)
,OUT_DELETED_XREF_COUNT OUT INT
,OUT_DELETED_BO_COUNT OUT INT
,OUT_ERROR_MSG OUT VARCHAR2(1024)
,OUT_RETURN_CODE OUT INT
,OUT_TMP_TABLE_LIST IN OUT VARCHAR2(32000)
,IN_RECALCULATE_BVT IN INT DEFAULT 1
,IN_ALLOW_CASCADE_DELETE IN INT DEFAULT 1
,IN_CASCADE_CHILD_SYSTEM_XREF IN INT DEFAULT 0
,IN_OVERRIDE_HISTORY_IND IN INT DEFAULT 0
,IN_PURGE_HISTORY_IND IN INT DEFAULT 0
,IN_USER_NAME IN VARCHAR2(50) DEFAULT NULL
,IN_ALLOW_COMMIT_IND IN INT DEFAULT 1
)

Parameters

Parameter Description
IN_BO_TABLE_NAME Name of the table that contains the list of base
objects to delete.
IN_XREF_LIST_TO_BE_DELETED Name of the table that contains the list of XREFs to
delete.
IN_RECALCULATE_BVT_IND If set to one (1), recalculates BVT following BO
and/or XREF delete.
IN_ALLOW_CASCADE_DELETE_IND If set to one (1), specifies that when records in the
parent object are deleted, Hub Delete also removes
the affected records in the child base object. Hub
Delete checks each child BO table for related data
that should be deleted given the removal of the
parent BO record.
IN_CASCADE_CHILD_SYSTEM_XREF Not supported in XU SP1. Leave the value for this
parameter as the default (0) when executing the
procedure.
IN_OVERRIDE_HISTORY_IND If set to one (1), Hub Delete does not write to
history tables when deleting. If you set IN_
OVERRIDE_HISTORY_IND=1 and set IN_PURGE_
HISTORY_IND=1, then Hub Delete removes
history tables to delete all traces of the data.

Writing Custom Scripts to Execute Batch Jobs 771


Stored Procedure Reference

Parameter Description
IN_PURGE_HISTORY_IND If set to one (1) Hub Delete

Returns

Parameter Description
OUT_DELETED_XREF_COUNT Number of deleted XREFs.
OUT_DELETED_BO_COUNT Number of deleted BOs.
OUT_ERROR_MSG Error message text.
OUT_RETURN_CODE Error code. If zero (0), then the stored procedure
completed successfully.
The procedure will return a non-zero value in case
of an error.

Sample Job Execution Script for Hub Delete Jobs


DECLARE
IN_BO_TABLE_NAME VARCHAR2( 40 );
IN_XREF_LIST_TO_BE_DELETED VARCHAR2( 40 );
IN_RECALCULATE_BVT_IND NUMBER;
IN_ALLOW_CASCADE_DELETE_IND NUMBER;
IN_CASCADE_CHILD_SYSTEM_XREF NUMBER;
IN_OVERRIDE_HISTORY_IND NUMBER;
IN_PURGE_HISTORY_IND NUMBER;
IN_USER_NAME VARCHAR2( 100 );
IN_ALLOW_COMMIT_IND NUMBER;
OUT_DELETED_XREF_COUNT NUMBER;
OUT_DELETED_BO_COUNT NUMBER;
OUT_ERROR_MESSAGE VARCHAR2( 1024 );
OUT_RETURN_CODE NUMBER;
BEGIN
IN_BO_TABLE_NAME := 'C_CUSTOMER';
IN_XREF_LIST_TO_BE_DELETED := 'TMP_DELETE_KEYS';
OUT_DELETED_XREF_COUNT := NULL;
OUT_DELETED_BO_COUNT := NULL;
OUT_ERROR_MESSAGE := NULL;
OUT_RETURN_CODE := NULL;
IN_RECALCULATE_BVT_IND := 1;
IN_ALLOW_CASCADE_DELETE_IND := 1;
IN_CASCADE_CHILD_SYSTEM_XREF := 0;

772 Siperian Hub Administrator Guide


Stored Procedure Reference

IN_OVERRIDE_HISTORY_IND := 0;
IN_PURGE_HISTORY_IND := 0;
IN_USER_NAME := 'ADMIN';
IN_ALLOW_COMMIT_IND := 0;

DELETE TMP_DELETE_KEYS;

INSERT INTO TMP_DELETE_KEYS


SELECT PKEY_SRC_OBJECT, ROWID_SYSTEM
FROM C_CUSTOMER_XREF
WHERE ROWID_SYSTEM = 'SALES';
COMMIT;
--
CMXDM.HUB_DELETE_BATCH( IN_BO_TABLE_NAME,
IN_XREF_LIST_TO_BE_DELETED, OUT_DELETED_XREF_COUNT,
OUT_DELETED_BO_COUNT, OUT_ERROR_MESSAGE, OUT_RETURN_CODE,
OUT_TMP_TABLE_LIST, IN_RECALCULATE_BVT_IND,
IN_ALLOW_CASCADE_DELETE_IND, IN_CASCADE_CHILD_SYSTEM_XREF,
IN_OVERRIDE_HISTORY_IND, IN_PURGE_HISTORY_IND, IN_USER_NAME,
IN_ALLOW_COMMIT_IND );
DBMS_OUTPUT.PUT_LINE( ' RETURN CODE IS ' || OUT_RETURN_CODE );
DBMS_OUTPUT.PUT_LINE( ' MESSAGE IS ' || OUT_ERROR_MESSAGE );
DBMS_OUTPUT.PUT_LINE( ' XREF RECORDS DELETED: ' || OUT_DELETED_XREF_
COUNT );
DBMS_OUTPUT.PUT_LINE( ' BO RECORDS DELETED: ' || OUT_DELETED_BO_
COUNT );
COMMIT;
END; /

Key Match Jobs


Key Match jobs are used to match records from two or more sources when these
sources use the same primary key. Key Match jobs compare new records to each other
and to existing records, and identifies potential matches based on the comparison of
source record keys as defined by the match rules. For more information, see “Key
Match Jobs” on page 727.

Identifiers for Executing Key Match Jobs

To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.

Writing Custom Scripts to Execute Batch Jobs 773


Stored Procedure Reference

Dependencies for Key Match Jobs

Key Match jobs are dependent on the successful completion of the Load job
responsible for loading data into the base object. The Key Match job cannot have been
run after any changes were made to the data.

Successful Completion of Key Match Jobs

Key Match jobs must complete with a RUN_STATUS of 0 (Completed Successfully).

Stored Procedure Definition for Key Match Jobs


PROCEDURE CMXMA.KEY_MATCH (
IN_ROWID_TABLE IN CHAR(14) --Rowid of a table
,IN_USER_NAME IN VARCHAR2(200) --User name
,OUT_ERROR_MSG OUT VARCHAR2(2000)--Error message, if any
,OUT_RETURN_CODE OUT NUMBER --Return code (if no errors, returns 0)
)

Sample Job Execution Script for Key Match Jobs


DECLARE
IN_ROWID_TABLE VARCHAR2(200);
IN_USER_NAME VARCHAR2(200);
OUT_ERROR_MESSAGE VARCHAR2(200);
OUT_RETURN_CODE NUMBER;
BEGIN
IN_ROWID_TABLE := NULL;
IN_USER_NAME := 'myusername';
OUT_ERROR_MESSAGE := NULL;
OUT_RETURN_CODE := NULL;

CMXMA.KEY_MATCH (IN_ROWID_TABLE, IN_USER_NAME, OUT_ERROR_MESSAGE,


OUT_RETURN_CODE);
DBMS_OUTPUT.Put_Line(' Row id table = ' || IN_ROWID_TABLE);
CMXMA.KEY_MATCH ( IN_ROWID_TABLE, IN_USER_NAME, OUT_ERROR_MESSAGE,
OUT_RETURN_CODE);
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('OUT_RETURN_CODE = ' || TO_CHAR(OUT_RETURN_
CODE));
COMMIT;
END;

774 Siperian Hub Administrator Guide


Stored Procedure Reference

Load Jobs
Load jobs move data from staging tables to the final target objects, and apply any trust
and validation rules where appropriate. For more information about Load jobs and the
load process, see “Load Jobs” on page 727.

Identifiers for Executing Load Jobs

To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.

Dependencies for Load Jobs

Each Load job is dependent on the success of the Stage job that precedes it.
In addition, each Load job is governed by the demands of referential integrity
constraints and is dependent on the successful completion of all other Load jobs
responsible for populating tables referenced by the table that is the target of the load.

For Run
Base Objects Run the loads for parent tables before the loads for child tables.
Dependent Objects Run the loads for all referenced base objects before the load for the
dependent object.

Successful Completion of Load Jobs

A Load job must complete with a RUN_STATUS of 0 (Completed Successfully)


or 1 (Completed with Errors) to be considered successful. The Auto Match and Merge
jobs for a target base object can either be run on successful completion of each Load
job, or on successful completion of all Load jobs for the base object.

Writing Custom Scripts to Execute Batch Jobs 775


Stored Procedure Reference

Stored Procedure Definition for Load Jobs


PROCEDURE CMXLD.LOAD_MASTER (
IN_STG_ROWID_TABLE IN CHAR(14) --Rowid of staging table
,IN_USER_NAME IN VARCHAR2(200) --Database user name
,IN_ROWID_JOB_GRP_CTRL IN CHAR(14);
,IN_ROWID_JOB_GRP_ITEM IN CHAR(14);
,OUT_ERROR_MSG OUT VARCHAR2(2000) --Error message, if any
,OUT_RETURN_CODE OUT NUMBER --Return code (if no errors, 0 is
returned)
,IN_FORCE_UPDATE_IND IN NUMBER --Forced update value Default 0, 1
for Forced update
)

Sample Job Execution Script for Load Jobs


DECLARE
IN_STG_ROWID_TABLE CHAR(14);
IN_USER_NAME VARCHAR2(200);
IN_ROWID_JOB_GRP_CTRL CHAR(14);
IN_ROWID_JOB_GRP_ITEM CHAR(14);
OUT_ERROR_MSG VARCHAR2(2000);
OUT_RETURN_CODE NUMBER;
IN_FORCE_UPDATE_IND NUMBER;
BEGIN
IN_STG_ROWID_TABLE := NULL;
IN_USER_NAME := NULL;
IN_ROWID_JOB_GRP_CTRL := NULL;
IN_ROWID_JOB_GRP_ITEM := NULL;
OUT_ERROR_MSG := NULL;
OUT_RETURN_CODE := NULL;
IN_FORCE_UPDATE_IND := NULL;

CMXLD.LOAD_MASTER ( IN_STG_ROWID_TABLE, IN_USER_NAME, IN_ROWID_JOB_


GRP_CTRL, IN_ROWID_JOB_GRP_ITEM, OUT_ERROR_MSG, OUT_RETURN_CODE,
IN_FORCE_UPDATE_IND );
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('OUT_RETURN_CODE = ' || TO_CHAR(OUT_RETURN_
CODE));
COMMIT;
END;

776 Siperian Hub Administrator Guide


Stored Procedure Reference

Manual Link Jobs


Manual Link jobs execute manually linking in the Merge Manager tool. Manual Link
jobs are used with link-style base objects only. Results are stored in the _LINK table.
To learn more, see “Manual Link Jobs” on page 732.

Manual Merge Jobs


After the Match job has been run, data stewards can use the Merge Manager to process
records that have been queued by a Match job for manual merge. Manual Merge jobs
are run in the Merge Manager—not in the Batch Viewer. The Batch Viewer only allows
you to inspect job execution logs for Manual Merge jobs that were run in the Merge
Manager. For more information, see “Executing a Manual Merge Job in the Merge
Manager” on page 732.

Stored Procedure Definition for Manual Merge Jobs


PROCEDURE CMXMM.MERGE(
IN_ROWID_TABLE IN CHAR(14)
,IN_SRC_ROWID_OBJECT IN CHAR(14)
,IN_TGT_ROWID_OBJECT IN CHAR(14)
,IN_ROWID_MATCH_RULE IN CHAR(14)
,IN_AUTOMERGE_IND IN INT
,IN_PROMOTE_STRING IN VARCHAR2(4000)
,IN_ROWID_JOB_CTL IN CHAR(14)
,IN_INTERACTION_ID IN INT
,IN_USER_NAME IN VARCHAR2(50)
,OUT_MERGED_IS_UNIQUE_IND OUT INT
,OUT_ERROR_MESSAGE OUT VARCHAR2(1024)
,OUT_RETURN_CODE OUT INT
,CALLED_MANUALLY_IND IN INT DEFAULT 1
,OUT_TMP_TABLE_LIST OUT NOCOPY VARCHAR2(32000)
)

Writing Custom Scripts to Execute Batch Jobs 777


Stored Procedure Reference

Sample Job Execution Script for Manual Merge Jobs


DECLARE
V_ROWID_TABLE CHAR( 14 );
V_SRC_ROWID_OBJECT CHAR( 14 );
V_TGT_ROWID_OBJECT CHAR( 14 );
V_PROMOTE_STRING VARCHAR2( 2000 );
V_INTERACTION_ID INT := NULL;
OUT_MERGE_COUNT INT;
OUT_MERGED_IS_UNIQUE_IND INT;
OUT_ERROR_MESSAGE VARCHAR2( 2000 );
OUT_RETURN_CODE INT;
BEGIN
SELECT ROWID_TABLE
INTO V_ROWID_TABLE
FROM C_REPOS_TABLE
WHERE TABLE_NAME = 'C_CUSTOMER';
V_TGT_ROWID_OBJECT := 1;
V_SRC_ROWID_OBJECT := 2;
V_PROMOTE_STRING := NULL; --Contains Rowid_column~winner~ For
trusted columns to force the winning cell for that column.
--Winner can either be "s"ource or
"t"arget. Example: 'svr1.7sv~t~svr1.7sw~s~'
V_INTERACTION_ID := NULL;

CMXMM.MANUAL_MERGE( V_ROWID_TABLE, V_SRC_ROWID_OBJECT,


V_TGT_ROWID_OBJECT, V_PROMOTE_STRING, V_INTERACTION_ID, 'ADMIN',
OUT_MERGED_IS_UNIQUE_IND, OUT_ERROR_MESSAGE, OUT_RETURN_CODE );
DBMS_OUTPUT.PUT_LINE( 'MERGED IS UNIQUE IND: ' ||
OUT_MERGED_IS_UNIQUE_IND );
DBMS_OUTPUT.PUT_LINE( 'RETURN MESSAGE: ' || SUBSTR( OUT_ERROR_
MESSAGE, 1, 255 ));
DBMS_OUTPUT.PUT_LINE( 'RETURN CODE: ' || OUT_RETURN_CODE );
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('OUT_RETURN_CODE = ' || TO_CHAR(OUT_RETURN_
CODE));
COMMIT;
END;

778 Siperian Hub Administrator Guide


Stored Procedure Reference

Manual Unlink Jobs


Manual Unlink jobs execute manually unlinking of records that were previously linked
manually in the Merge Manager tool or through one of these stored procedure jobs.

Manual Unmerge Jobs


The Unmerge job can unmerge already-consolidated records, whether those records
were consolidated using Automerge, Manual Merge, manual edit, Load by Rowid_
Object, or Put Xref. The Unmerge job succeeds or fails as a single transaction: if the
server fails while the Unmerge job is executing, the unmerge process is rolled back.

Cascade Unmerge

The Unmerge job performs a cascade unmerge if this feature is enabled for this base
object in the Schema Manager in the Hub Console. With cascade unmerge, when
records in the parent object are unmerged, Siperian Hub also unmerges affected
records in the child base object.

This feature applies to unmerging records across base objects. This is configured per
base object (using the Unmerge Child When Parent Unmerges check box on the
Merge Settings tab in the Schema Manager). Cascade unmerge applies only when a
foreign-key relationship exists between two base objects.

For example: Customer A record (parent) in the Customer base object has multiple
address records (children) in the Address base object. The two tables are linked by a
unique key (Customer_ID).
• When cascade unmerge is enabled—Unmerging the parent record (Customer
A) in the Customer base object also unmerges Customer A's child address records
in the Address base object.
• When cascade unmerge is disabled—Unmerging the parent record (Customer
A) in the Customer base object has no effect on Customer A's child records in the
Address base object; they are NOT unmerged.

Writing Custom Scripts to Execute Batch Jobs 779


Stored Procedure Reference

Unmerging All Records or One Record

In your job execution script, you can specify the scope of records to unmerge by
setting IN_UNMERGE_ALL_XREFS_IND.
• IN_UNMERGE_ALL_XREFS_IND=0: Default setting. Unmerges the single
record identified in the specified XREF to its state prior to the merge.
• IN_UNMERGE_ALL_XREFS_IND=1: Unmerges all XREFs to their state prior
to the merge. Use this option to quickly unmerge all XREFs for a single
consolidated record in a single operation.

Linear and Tree Unmerge

These features apply to unmerging contributing records from within a single base
object. There is a hierarchy of merges consisting of a root (top of the tree, or BVT),
branches (merged records), and leaves (the original contributing records at end of the
branches). This hierarchy can be many levels deep.

In your job execution script, you can specify the type of unmerge (linear or tree
unmerge) by setting IN_TREE_UNMERGE_IND:
• IN_TREE_UNMERGE_IND=0: Default setting. Linear Unmerge
• IN_TREE_UNMERGE_IND=1: Tree Unmerge

Linear Unmerge

Linear unmerge is the default behavior. During a linear unmerge, a base object record is
unmerged and taken out of the existing merge tree structure. Only the unmerged base
object record itself will come out the merge tree structure, and all base object records
below it in the merge tree will stay in the original merge tree.

Tree Unmerge

Tree unmerge is an optional alternative. A tree of merged base object records is a


hierarchical structure of the merge history, reflecting the sequence of merge operations
that have occurred. Merge history is kept during the merge process in these tables:
• HMXR provides the current state view of merges

780 Siperian Hub Administrator Guide


Stored Procedure Reference

• HMRG table provides a hierarchical view of the merge history, a tree of merged
base object records, as well as an interactive unmerge history.

During a tree unmerge, you unmerge a tree of merged base object records as an intact
sub-structure. A sub-tree having unmerged base object records as root will come out
from the original merge tree structure. (For example, merge a1 and a2 into a, then
merge b1 and b2 into b, and then finally merge a and b into c. If you then perform a
tree unmerge on a, and then unmerge a from a1, a2 is a sub tree and will come out
from the original tree c. As a result, a is the root of the tree after the unmerge.)

Identifiers for Executing Manual Unmerge Jobs

To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.

Dependencies for Manual Unmerge Jobs

Each Manual Unmerge job is dependent on data having already been merged.

Successful Completion of Manual Unmerge Jobs

A Manual Unmerge job must complete with a RUN_STATUS of 0 (Completed


Successfully) or 1 (Completed with Errors) to be considered successful.

Stored Procedure Definition for Manual Unmerge Jobs


PROCEDURE CMXMM.UNMERGE (
IN_ROWID_TABLE IN CHAR(14)
,IN_ROWID_SYSTEM IN CHAR(14)
,IN_PKEY_SRC_OBJECT IN VARCHAR2(255)
,IN_TREE_UNMERGE_IND IN INT
,IN_ROWID_JOB_CTL IN CHAR(14)
,IN_INTERACTION_ID IN INT
,IN_USER_NAME IN VARCHAR2(50)
,OUT_UNMERGED_ROWID OUT CHAR(14)
,OUT_TMP_TABLE_LIST OUT VARCHAR2(32000)
,OUT_ERROR_MESSAGE OUT VARCHAR2(1024)
,RC OUT INT
,IN_UNMERGE_ALL_XREFS_IND IN INT DEFAULT 0 )

Writing Custom Scripts to Execute Batch Jobs 781


Stored Procedure Reference

Sample Job Execution Script for Manual Unmerge Jobs


DECLARE
IN_ROWID_TABLE CHAR (14);
IN_ROWID_SYSTEM CHAR (14);
IN_PKEY_SRC_OBJECT VARCHAR2 (255);
IN_TREE_UNMERGE_IND NUMBER;
IN_ROWID_JOB_CTL CHAR (14);
IN_INTERACTION_ID NUMBER;
IN_USER_NAME VARCHAR2 (50);
OUT_UNMERGED_ROWID CHAR (14);
OUT_TMP_TABLE_LIST VARCHAR2 (32000);
OUT_ERROR_MESSAGE VARCHAR2 (1024);
RC NUMBER;
IN_UNMERGE_ALL_XREFS_IND NUMBER;
BEGIN
IN_ROWID_TABLE := 'SVR1.8ZC ';
IN_ROWID_SYSTEM := 'SVR1.7NJ ';
IN_PKEY_SRC_OBJECT := '6';
IN_TREE_UNMERGE_IND := 0; -- Default 0, 1 for tree unmerge
IN_ROWID_JOB_CTL := NULL;
IN_INTERACTION_ID := NULL;
IN_USER_NAME := 'XHE';
OUT_UNMERGED_ROWID := NULL;
OUT_TMP_TABLE_LIST := NULL;
OUT_ERROR_MESSAGE := NULL;
RC := NULL;
IN_UNMERGE_ALL_XREFS_IND := 0; -- default 0, 1 for unmerge_all

CMXMM.UNMERGE ( IN_ROWID_TABLE, IN_ROWID_SYSTEM,


IN_PKEY_SRC_OBJECT, IN_TREE_UNMERGE_IND, IN_ROWID_JOB_CTL,
IN_INTERACTION_ID, IN_USER_NAME, OUT_UNMERGED_ROWID,
OUT_TMP_TABLE_LIST, OUT_ERROR_MESSAGE, RC, IN_UNMERGE_ALL_XREFS_IND
);
DBMS_OUTPUT.PUT_LINE (' Return Code = ' || rc);
DBMS_OUTPUT.PUT_LINE (' Message is = ' || out_error_message);
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('RC = ' || TO_CHAR(RC));
COMMIT;
END;

782 Siperian Hub Administrator Guide


Stored Procedure Reference

Match Jobs
Match jobs find duplicate records in the base object, based on the current match rules.
For more information about Match jobs and the match process, see “Match Jobs” on
page 734.

Important: Do not run a Match job on a base object that is used to define
relationships between records in inter-table or intra-table match paths. Doing so will
change the relationship data, resulting in the loss of the associations between records.
For more information, see “Relationship Base Objects” on page 498.

Identifiers for Executing Match Jobs

For a complete list of the identifiers used to execute the stored procedure associated
with this batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on
page 753.

Dependencies for Match Jobs

Each Match job is dependent on new / updated records in the base object that have
been tokenized and are thus queued for matching. For parent base objects that have
children, the Match job is also dependent on the successful completion of the data
tokenization jobs for all child tables, which in turn is dependent on successful Load
jobs for the child tables.

Successful Completion of Match Jobs

Match jobs must complete with a RUN_STATUS of 0 (Completed Successfully) or 1


(Completed with Errors) to be considered successful.

Writing Custom Scripts to Execute Batch Jobs 783


Stored Procedure Reference

Stored Procedure for Match Jobs


PROCEDURE CMXMA.MATCH (
IN_ROWID_TABLE IN CHAR(14) --Rowid of a table
,IN_USER_NAME IN VARCHAR2(50) --User name
,OUT_ERROR_MSG OUT VARCHAR2(1024) --Error message, if any
,OUT_RETURN_CODE OUT NUMBER --Return code (if no errors, 0 is
returned)
,IN_VALIDATE_TABLE_NAME IN VARCHAR2(200) --Validate table name
,IN_MATCH_ANALYZE_IND IN NUMBER --Match analyze to check for match
data
)

Sample Job Execution Script for Match Jobs


DECLARE
IN_ROWID_TABLE CHAR(14);
IN_USER_NAME VARCHAR2(50);
OUT_ERROR_MSG VARCHAR2(1024);
RC NUMBER;
IN_VALIDATE_TABLE_NAME VARCHAR2(30);
IN_MATCH_ANALYZE_IND NUMBER;
IN_MATCH_SET_NAME VARCHAR2(500);
IN_JOB_GRP_CTRL CHAR(14);
IN_JOB_GRP_ITEM CHAR(14);
BEGIN
IN_ROWID_TABLE := NULL;
IN_USER_NAME := NULL;
OUT_ERROR_MSG := NULL;
RC := NULL;
IN_VALIDATE_TABLE_NAME := NULL;
IN_MATCH_ANALYZE_IND := NULL;
IN_MATCH_SET_NAME := NULL;
IN_JOB_GRP_CTRL := NULL;
IN_JOB_GRP_ITEM := NULL;

CMXMA.MATCH ( IN_ROWID_TABLE, IN_USER_NAME, OUT_ERROR_MSG, RC,


IN_VALIDATE_TABLE_NAME, IN_MATCH_ANALYZE_IND, IN_MATCH_SET_NAME,
IN_JOB_GRP_CTRL, IN_JOB_GRP_ITEM );
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('RC = ' || TO_CHAR(RC));
COMMIT;
END;

784 Siperian Hub Administrator Guide


Stored Procedure Reference

Match Analyze Jobs


Match Analyze jobs perform a search to gather metrics about matching without
conducting any actual matching. Match Analyze jobs are typically used to fine-tune
match rules. For more information, see “Match Analyze Jobs” on page 738.

Identifiers for Executing Match Analyze Jobs

For a complete list of the identifiers used to execute the stored procedure associated
with this batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on
page 753.

Dependencies for Match Analyze Jobs

Each Match Analyze job is dependent on new / updated records in the BO that have
been tokenized and are thus queued for matching. For parent BOs, the Match Analyze
job is also dependent on the successful completion of the data tokenization jobs for all
child tables, which in turn is dependent on successful Load jobs for the child tables.

Successful Completion of Match Analyze Jobs

Match Analyze jobs must complete with a RUN_STATUS of 0 (Completed


Successfully) or 1 (Completed with Errors) to be considered successful.

Stored Procedure for Match Analyze Jobs


PROCEDURE CMXMA.MATCH (
IN_ROWID_TABLE IN CHAR(14) --Rowid of a table
,IN_USER_NAME IN VARCHAR2(200) --User name
,OUT_ERROR_MSG OUT VARCHAR2(2000) --Error message, if any
,OUT_RETURN_CODE OUT NUMBER --Return code (if no errors, 0 is
returned)
,IN_VALIDATE_TABLE_NAME IN VARCHAR2(200) --Validate table name
,IN_MATCH_ANALYZE_IND IN NUMBER --Match analyze to check for match
data
)

Writing Custom Scripts to Execute Batch Jobs 785


Stored Procedure Reference

Sample Job Execution Script for Match Analyze Jobs


DECLARE
IN_ROWID_TABLE CHAR(14);
IN_USER_NAME VARCHAR2(200);
OUT_ERROR_MSG VARCHAR2(2000);
OUT_RETURN_CODE NUMBER;
IN_VALIDATE_TABLE_NAME VARCHAR2(200);
IN_MATCH_ANALYZE_IND NUMBER;

BEGIN
IN_ROWID_TABLE := NULL;
IN_USER_NAME := NULL;
OUT_ERROR_MSG := NULL;
OUT_RETURN_CODE := NULL;
IN_VALIDATE_TABLE_NAME := NULL;
IN_MATCH_ANALYZE_IND := 1;

CMXMA.MATCH ( IN_ROWID_TABLE, IN_USER_NAME, OUT_ERROR_MSG, OUT_


RETURN_CODE, IN_VALIDATE_TABLE_NAME, IN_MATCH_ANALYZE_IND );
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('OUT_RETURN_CODE = ' || TO_CHAR(OUT_RETURN_
CODE));
COMMIT;
END;

Match for Duplicate Data Jobs


A Match for Duplicate Data job searches for exact duplicates to consider them
matched. Use it to manually run the Match for Duplicate Data process when you want
to use your own rule as the match for duplicates criteria instead of all the columns in
the base object. The maximum number of exact duplicates is based on the base object
columns defined in the Duplicate Match Threshold property in the Schema Manager
for each base object. To learn more, see “Match for Duplicate Data Jobs” on page 737.

Note: The Match for Duplicate Data batch job has been deprecated.

Identifiers for Executing Match for Duplicate Jobs

To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 743.

786 Siperian Hub Administrator Guide


Stored Procedure Reference

Dependencies for Match for Duplicate Data Jobs

Match for Duplicate Data jobs require the existence of unconsolidated data in the BO.

Successful Completion of Match for Duplicate Data Jobs

Match for Duplicate Data jobs must complete with a RUN_STATUS of 0 (Completed
Successfully).

Stored Procedure Definition for Match for Duplicate Data Jobs


PROCEDURE CMXMA.MATCH_FOR_DUPS (
IN_ROWID_TABLE IN CHAR(14) --Rowid of a table
,IN_USER_NAME IN VARCHAR2(200) --User name
,OUT_ERROR_MSG OUT VARCHAR2(2000) --Error message, if any
,OUT_RETURN_CODE OUT INT --Return code (if no errors, 0 is returned)
)

Sample Job Execution Script for Match for Duplicate Data Jobs
DECLARE
IN_ROWID_TABLE CHAR(14);
IN_USER_NAME VARCHAR2(200);
OUT_ERROR_MSG VARCHAR2(2000);
OUT_RETURN_CODE NUMBER;
BEGIN
IN_ROWID_TABLE := NULL;
IN_USER_NAME := NULL;
OUT_ERROR_MSG := NULL;
OUT_RETURN_CODE := NULL;

CMXMA.MATCH_FOR_DUPS ( IN_ROWID_TABLE, IN_USER_NAME, OUT_ERROR_MSG,


OUT_RETURN_CODE);
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('OUT_RETURN_CODE = ' || TO_CHAR(OUT_RETURN_
CODE));
COMMIT;
END;

Writing Custom Scripts to Execute Batch Jobs 787


Stored Procedure Reference

Multi Merge Jobs


Multi Merge jobs allow the merge of multiple records in a single job. This batch job is
initiated only by external applications that invoke the SIF MultiMergeRequest. For
more information, see Siperian Services Integration Framework Guide.

The Multi Merge stored procedure:


• calls group_merge based on the incoming list of base object records
• uses PUT_XREF to process user-selected winning values (new XREF record from
CMX Admin + merge lineage) into the base object record

When executing the Multi Merge stored procedure:


• Merge rowid_objects in the IN_MEMBER_ROWID_LIST into IN_
SURVIVING_ROWID have the column values provided from IN_VAL_LIST as
the base object’s winning cell values. Values are delimited by ~. For example:
val1~val2~val3~
• The first rowid_object in IN_MEMBER_ROWID_LIST will be selected as the
surviving rowid_object if the IN_SURVIVING_ROWID is not provided.
• If IN_MEMBER_ROWID_LIST is NULL, IN_SURVIVING_ROWID will be
considered as group_id in the link table. In this case, all active member rowid_
objects belonging to this group_id will be merged into IN_SURVIVING_
ROWID.
• Values in the IN_MEMBER_ROWID_LIST, IN_COL_LIST, and IN_VAL_LIST
columns are delimited by ‘~’. For example: value1~value2~value3~

Identifiers for Executing Multi Merge Jobs

To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.

Dependencies for Multi Merge Jobs

Each Multi Merge job is dependent on the successful completion of the match process
for this base object.

788 Siperian Hub Administrator Guide


Stored Procedure Reference

Successful Completion of Multi Merge Jobs

Multi Merge jobs must complete with a RUN_STATUS of 0 (Completed Successfully)


or 1 (Completed with Errors) to be considered successful.

Stored Procedure Definition for Multi Merge Jobs


PROCEDURE CMXMM.MULTI_MERGE (
IN_ROWID_TABLE IN CHAR(14)
,IN_SURVIVING_ROWID IN CHAR(14)
,IN_MEMBER_ROWID_LIST IN VARCHAR2(32000) --delimited by '~'
,IN_ROWID_MATCH_RULE IN CHAR(14)
,IN_COL_LIST IN VARCHAR2(32000) --delimited by '~'
,IN_VAL_LIST IN VARCHAR2(32000) --delimited by '~'
,IN_INTERACTION_ID IN INT
,IN_USER_NAME IN VARCHAR2(50)
,OUT_ERROR_MESSAGE OUT VARCHAR2(1024)
,OUT_RETURN_CODE OUT INT
)

Sample Job Execution Script for Multi Merge Jobs


DECLARE
IN_ROWID_TABLE CHAR(14);
IN_SURVIVING_ROWID CHAR(14);
IN_MEMBER_ROWID_LIST VARCHAR2(4000);
IN_ROWID_MATCH_RULE VARCHAR2(4000);
IN_COL_LIST VARCHAR2(4000);
IN_VAL_LIST VARCHAR2(4000);
IN_INTERACTION_ID NUMBER;
IN_USER_NAME VARCHAR2(200);
OUT_ERROR_MESSAGE VARCHAR2(200);
OUT_RETURN_CODE NUMBER;

BEGIN
IN_ROWID_TABLE := 'SVR1.CP4 ';
IN_SURVIVING_ROWID := '40 ';
IN_MEMBER_ROWID_LIST := '42 ~44 ~45
~47 ~48 ~49 ~';
IN_ROWID_MATCH_RULE := NULL;
IN_COL_LIST := 'SVR1.CSB ~SVR1.CSE ~SVR1.CSG
~SVR1.CSH ~SVR1.CSA ~';
IN_VAL_LIST := 'INDU~THOMAS~11111111111~F~1000~';

Writing Custom Scripts to Execute Batch Jobs 789


Stored Procedure Reference

IN_INTERACTION_ID := 0;
IN_USER_NAME := 'INDU';
OUT_ERROR_MESSAGE := NULL;
OUT_RETURN_CODE := NULL;

CMXMM.MUTLI_MERGE ( IN_ROW_TABLE ,IN_SURVIVING_ROWID ,IN_MEMBER_


ROWID_LIST ,IN_ROWID_MATCH_RULE ,IN_COL_LIST,IN_VAL_LIST,IN_
INTERACTION_ID ,IN_USER_NAME ,OUR_ERROR_MESSAGE,OUT_RETURN_CODE);
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('OUT_RETURN_CODE = ' || TO_CHAR(OUT_RETURN_
CODE));
COMMIT;
END;

Promote Jobs
For state-enabled objects, a promote job reads the PROMOTE_IND column from an
XREF table and for all rows where the column’s value is 1, changes the ACTIVE state
to on. Siperian Hub resets PROMOTE_IND after the Promote job has run. For more
information regarding how to manage the state of base object or XREF records, refer
to “About State Management in Siperian Hub” on page 206.

Note: The PROMOTE_IND column on a record is not changed to 0 during the


Promote batch process if the record is not promoted.

Stored Procedure Definition for Promote Jobs


PROCEDURE CMXSM.AUTO_PROMOTE(
IN_ROWID_TABLE IN CHAR(14)
,IN_USER_NAME IN VARCHAR2(50)
,OUT_ERROR_MESSAGE OUT VARCHAR2(1024)
,OUT_RETURN_CODE OUT INT
,IN_ROWID_JOB_GRP_CTRL IN CHAR(14) DEFAULT NULL
,IN_ROWID_JOB_GRP_ITEM IN CHAR(14) DEFAULT NULL
)

790 Siperian Hub Administrator Guide


Stored Procedure Reference

Recalculate BO Jobs
There are two versions of Recalculate BO:
• Using the ROWID_OBJECT_TABLE Parameter—Recalculates all BOs
identified by ROWID_OBJECT column in the table/inline view (note that
brackets are required around inline view).
• Without the ROWID_OBJECT_TABLE Parameter—Recalculates all records
in the BO, in batches of MATCH_BATCH_SIZE or 1/4 the number of the
records in the table, whichever is less.

Stored Procedure Definition for Recalculate BO Jobs

Note: If you include the ROWID_OBJECT_TABLE parameter, the Recalculate BO


batch job recalculates all BOs identified by ROWID_OBJECT column in the
table/inline view. If you do not include the parameter, this batch job recalculates all
records in the BO, in batches of MATCH_BATCH_SIZE or 1/4 the number of the
records in the table, whichever is less.
PROCEDURE CMXBV.RECALCULATE_BO(
IN_TABLE_NAME IN VARCHAR2(128)
,IN_ROWID_OBJECT_TABLE IN VARCHAR2(128)
,IN_USER_NAME IN VARCHAR2(50)
,OUT_TMP_TABLE_LIST OUT VARCHAR2(32000)
,OUT_ERROR_MESSAGE OUT VARCHAR2(1024)
,OUT_RETURN_CODE OUT INT
)

Sample Job Execution Script for Recalculate BO Jobs


DECLARE
OUT_ERROR_MESSAGE VARCHAR2( 1024 );
OUT_RETURN_CODE NUMBER;
BEGIN
DELETE TEST_RECALC_BO;
INSERT INTO TEST_RECALC_BO
SELECT ROWID_OBJECT
FROM C_CUSTOMER;

CMXBV.RECALCULATE_BO( 'C_CUSTOMER', 'TEST_RECALC_BO', 'TNEFF',


OUT_ERROR_MESSAGE, OUT_RETURN_CODE );
COMMIT;

Writing Custom Scripts to Execute Batch Jobs 791


Stored Procedure Reference

DBMS_OUTPUT.PUT_LINE( ' RETURN CODE = ' || OUT_RETURN_CODE );


DBMS_OUTPUT.PUT_LINE( ' MESSAGE IS = ' || OUT_ERROR_MESSAGE );
COMMIT;
END;

Recalculate BVT Jobs


Recalculates the BVT for the specified ROWID_OBJECT.

Stored Procedure Definition for Recalculate BVT Jobs


PROCEDURE CMXBV.RECALCULATE_BVT(
IN_TABLE_NAME IN VARCHAR2(128)
,IN_ROWID_OBJECT IN CHAR(14)
,IN_USER_NAME IN VARCHAR2(50)
,OUT_TMP_TABLE_LIST OUT VARCHAR2(32000)
,OUT_ERROR_MESSAGE OUT VARCHAR2(1024)
,OUT_RETURN_CODE OUT INT
)

Reset Batch Group Status Jobs


Rest Batch Group Status jobs (CMXBG.RESET_BATCHGROUP) resets a batch
group. Note that there are two other related batch group stored procedures:
• Execute Batch Group Jobs (CMXBG.EXECUTE_BATCHGROUP)
• Get Batch Group Status Jobs (CMXBG.GET_BATCHGROUP_STATUS)

For more information, see “Stored Procedures for Batch Groups” on page 799.

Reset Links Jobs


Updates the records in the _LINK table to account for changes in the data. Used with
link-style base objects only.

792 Siperian Hub Administrator Guide


Stored Procedure Reference

Reset Match Table Jobs


The Reset Match Table job is created automatically after you run a match job and the
following conditions exist: if records have been updated to CONSOLIDATION_
IND=2, and if you then change your match rules, as described in “Configuring Match
Column Rules for Match Rule Sets” on page 542.

Note: This job cannot be run from the Batch Viewer. For more information, see
“Reset Match Table Jobs” on page 744.

Stored Procedure Definition for Reset Match Table Jobs


PROCEDURE CMXMA.RESET_MATCH(
IN_ROWID_TABLE IN CHAR(14)
,IN_USER_NAME IN VARCHAR2(50)
,OUT_ERROR_MSG OUT VARCHAR2(1024)
,RC OUT INT
,IN_JOB_GRP_CTRL IN CHAR(14) DEFAULT NULL
,IN_JOB_GRP_ITEM IN CHAR(14) DEFAULT NULL
)

Sample Job Execution Script for Reset Match Table Jobs


DECLARE
V_ROWID_TABLE CHAR( 14 );
OUT_ERROR_MESSAGE VARCHAR2( 1024 );
OUT_RETURN_CODE INTEGER;
BEGIN
SELECT ROWID_TABLE
INTO V_ROWID_TABLE
FROM C_REPOS_TABLE
WHERE TABLE_NAME = 'C_CUSTOMER';
CMXMA.RESET_MATCH( V_ROWID_TABLE, 'ADMIN', OUT_ERROR_MESSAGE, OUT_
RETURN_CODE );
DBMS_OUTPUT.PUT_LINE( 'RETURN MESSAGE: ' || SUBSTR(
OUT_ERROR_MESSAGE, 1, 255 ));
DBMS_OUTPUT.PUT_LINE( 'RETURN CODE: ' || OUT_RETURN_CODE );
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('RC = ' || TO_CHAR(RC));
COMMIT;
END;

Writing Custom Scripts to Execute Batch Jobs 793


Stored Procedure Reference

Revalidate Jobs
Revalidate jobs execute the validation logic/rules for records that have been modified
since the initial validation during the Load Process. You can run Revalidate if/when
records change post the initial Load process’s validation step. If no records change, no
records are updated. If some records have changed and get caught by the existing
validation rules, the metrics will show the results. Revalidate is executed manually using
the batch viewer for base objects. For more information. see “Running Batch Jobs
Using the Batch Viewer Tool” on page 674.

Note: Revalidate can only be run after an initial load and prior to merge on base
objects that have validate rules setup.

Stored Procedure Definition for Revalidate Jobs


PROCEDURE CMXUT.REVALIDATE_BO(
IN_TABLE_NAME IN CMXUT.CMX_OBJECT_NAME
,OUT_ERROR_MSG OUT CMXUT.CMX_MESSAGE
,RC OUT INT
)

Sample Job Execution Script for Revalidate Jobs


DECLARE
IN_TABLE_NAME VARCHAR2(200);
OUT_ERROR_MESSAGE VARCHAR2(200);
RC NUMBER;
BEGIN
IN_TABLE_NAME := UPPER('&TBL');
OUT_ERROR_MESSAGE := NULL;
RC := NULL;

CMXUT.REVALIDATE_BO(IN_TABLE_NAME, IN_TABLE_NAME,
OUT_ERROR_MESSAGE, RC);
DBMS_OUTPUT.PUT_LINE ( 'OUT_ERROR_MESSAGE= ' ||
SUBSTR(OUT_ERROR_MESSAGE,1,200) );
DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);
DBMS_OUTPUT.Put_Line('RC = ' || TO_CHAR(RC));
COMMIT;
END;

794 Siperian Hub Administrator Guide


Stored Procedure Reference

Stage Jobs
Stage jobs copy records from a landing to a staging table. During execution, Stage jobs
optionally cleanse data according to the current cleanse settings. For more information
about Stage jobs and the stage process, see “Stage Jobs” on page 745.

Identifiers for Executing Stage Jobs

To learn about the identifiers used to execute the stored procedure associated with this
batch job, see “Identifiers in the C_REPOS_TABLE_OBJECT_V View” on page 753.

Dependencies for Stage Jobs

Each Stage job is dependent on the successful completion of the Extraction Transform
Load (ETL) process responsible for loading the Landing table used by the Stage job.
There are no dependencies between Stage jobs.

Successful Completion of Stage Jobs

A Stage job must complete with a RUN_STATUS of 0 (Completed Successfully) or 1


(Completed with Errors) to be considered successful. On successful completion of a
Stage job, the Load job for the target staging table can be run, provided that all other
dependencies for the Load job have been met.

Stored Procedure Definition for Stage Jobs


PROCEDURE CMXCL.START_CLEANSE(
IN_ROWID_TABLE_OBJECT IN VARCHAR2(500) --From the view
,IN_USER_NAME IN VARCHAR2(50)
,OUT_ERROR_MSG OUT VARCHAR2(1024)
,OUT_ERROR_CODE OUT INT
,IN_STG_ROWID_TABLE IN VARCHAR2(500) --rowid_table_object
,IN_RUN_SYNCH IN VARCHAR2(500) --Set to true, else runs asynch
,IN_ROWID_JOB_GRP_CTRL IN CHAR(14) DEFAULT NULL
,IN_ROWID_JOB_GRP_ITEM IN CHAR(14) DEFAULT NULL
)

Writing Custom Scripts to Execute Batch Jobs 795


Stored Procedure Reference

Sample Job Execution Script for Stage Jobs


DECLARE

IN_STG_ROWID_TABLE VARCHAR2(200);
IN_ROWID_TABLE_OBJECT VARCHAR2(200);
IN_RUN_SYNCH VARCHAR2(200);
OUT_ERROR_MSG VARCHAR2(2000);
OUT_ERROR_CODE NUMBER;
BEGIN
IN_STG_ROWID_TABLE := NULL;
IN_ROWID_TABLE_OBJECT := NULL;
IN_RUN_SYNCH := NULL;
OUT_ERROR_MSG := NULL;
OUT_ERROR_CODE := NULL;

SELECT A.ROWID_TABLE, A.ROWID_TABLE_OBJECT INTO IN_STG_ROWID_TABLE,


IN_ROWID_TABLE_OBJECT
FROM C_REPOS_TABLE_OBJECT_V A, C_REPOS_TABLE B
WHERE A.OBJECT_NAME = 'CMX_CLEANSE.EXE'
AND B.ROWID_TABLE = A.ROWID_TABLE
AND B.TABLE_NAME = 'C_HMO_ADDRESS'
AND A.VALID_IND = 1;

CMXCL.START_CLEANSE ( IN_STG_ROWID_TABLE, IN_ROWID_TABLE_OBJECT,


IN_RUN_SYNCH, OUT_ERROR_MSG, OUT_ERROR_CODE );
DBMS_OUTPUT.PUT_LINE(' MESSAGE IS = ' || OUT_ERROR_MSG);

DBMS_OUTPUT.Put_Line('OUT_ERROR_MESSAGE = ' || OUT_ERROR_MESSAGE);


DBMS_OUTPUT.Put_Line('OUT_RETURN_CODE = ' || TO_CHAR(OUT_RETURN_
CODE));
COMMIT;
END;

Synchronize Jobs
You must run the Synchronize job after any changes are made to the schema trust
settings. The Synchronize job is created when any changes are made to the schema
trust settings, as described in “Batch Jobs That Are Created When Changes Occur” on
page 673. For more information, see “Configuring Trust for Source Systems” on page
455.

796 Siperian Hub Administrator Guide


Stored Procedure Reference

Running Synchronize Jobs

To run the Synchronize job, navigate to the Batch Viewer, find the correct Synchronize
job for the base object, and run it. Siperian Hub updates the metadata for the base
objects that have trust enabled after initial load has occurred. For more information,
see “Synchronize Jobs” on page 747.

Stored Procedure Definition for Synchronize Jobs


PROCEDURE CMXUT.SYNC(
IN_ROWID_TABLE IN CHAR(14)
,IN_USER_NAME IN VARCHAR2(50)
,OUT_ERROR_MSG OUT VARCHAR2(1024)
,OUT_RETURN_CODE OUT INT
,IN_JOB_GRP_CTRL IN CHAR(14) DEFAULT NULL
,IN_JOB_GRP_ITEM IN CHAR(14) DEFAULT NULL
)

Sample Job Execution Script for Synchronize Jobs


DECLARE
V_ROWID_TABLE CHAR( 14 );
OUT_ERROR_MESSAGE VARCHAR2( 1024 );
OUT_RETURN_CODE INTEGER;
BEGIN
SELECT ROWID_TABLE
INTO V_ROWID_TABLE
FROM C_REPOS_TABLE
WHERE TABLE_NAME = 'C_CUSTOMER';

CMXUT.SYNCH( V_ROWID_TABLE, 'ADMIN', OUT_ERROR_MESSAGE,


OUT_RETURN_CODE );
DBMS_OUTPUT.PUT_LINE( 'RETURN MESSAGE: ' || SUBSTR(
OUT_ERROR_MESSAGE, 1, 255 ));
DBMS_OUTPUT.PUT_LINE( 'RETURN CODE: ' || OUT_RETURN_CODE );
COMMIT;
END;

Writing Custom Scripts to Execute Batch Jobs 797


Executing Batch Groups Using Stored Procedures

Executing Batch Groups Using Stored Procedures


This section describes how to execute batch groups for your Siperian Hub
implementation.

About Executing Batch Groups


A batch group is a collection of individual batch jobs (for example, Stage, Load, and
Match jobs) that can be executed with a single command; some sequentially and some
in parallel according to the configuration. When one job has an error, the group will
stop; that is, no more jobs will be started, however, running jobs will run to
completion. To learn important background information about batch groups, see
“Running Batch Jobs Using the Batch Group Tool” on page 688.

This section describes how to execute batch groups using stored procedures and job
scheduling software (such as Tivoli, CA Unicenter, and so on). Siperian Hub provides
stored procedures for managing batch groups, as described in “Stored Procedures for
Batch Groups” on page 799. Siperian Hub also allows you to create and run custom
stored procedures for batch groups, as described in “Developing Custom Stored
Procedures for Batch Jobs” on page 806. You can also create and run stored
procedures using the SIF API (using Java, SOAP, or HTTP/XML).

You can also use the Batch Group tool in the Hub Console to configure and run batch
groups. However, to schedule batch groups, you need to do so using stored procedures,
as described in this section. For more information about the Batch Group tool, see
“Running Batch Jobs Using the Batch Group Tool” on page 688.

Note: If a batch group fails and you do not click either the Set to Restart button (see
“Restarting a Batch Group That Failed Execution” on page 707) or the Set to
Incomplete button (see “Handling Incomplete Batch Group Execution” on page 708)
in the Logs for My Batch Group list, Siperian Hub restarts the batch job from the prior
failed level.

798 Siperian Hub Administrator Guide


Executing Batch Groups Using Stored Procedures

Stored Procedures for Batch Groups


Siperian Hub provides the following stored procedures for managing batch groups:

Stored Procedure Description


CMXBG.EXECUTE_BATCHGROUP Performs an HTTP POST to the SIF
ExecuteBatchGroupRequest. For more
information, see “CMXBG.EXECUTE_
BATCHGROUP” on page 799
CMXBG.RESET_BATCHGROUP Performs an HTTP POST to the SIF
ResetBatchGroupRequest. For more
information, see “CMXBG.RESET_
BATCHGROUP” on page 802.
CMXBG.GET_BATCHGROUP_STATUS Performs an HTTP POST to the SIF
GetBatchGroupStatusRequest. For more
information, see “CMXBG.GET_
BATCHGROUP_STATUS” on page 803.

In addition to using parameters that are associated with the corresponding SIF request,
these stored procedures require the following parameters:
• URL of the Hub Server (for example, http://localhost:7001/cmx/request)
• username and password
• target ORS

Note: These stored procedures construct an XML message, perform an HTTP POST
to a server URL using SIF, and return the results.

CMXBG.EXECUTE_BATCHGROUP

Execute Batch Group jobs execute a batch group. Execute Batch Groups jobs have an
option to execute asynchronously, but not to receive a JMS response for asynchronous
execution. If you need to use asynchronous execution and need to know when
execution is finished, then poll with the cmxbg.get_batchgroup_status stored
procedure. Alternatively, if you need to receive a JMS response for asynchronous
execution, then execute the batch group directly in an external application (instead of a

Writing Custom Scripts to Execute Batch Jobs 799


Executing Batch Groups Using Stored Procedures

job execution script) by invoking the SIF ExecuteBatchGroup request, which is


described in the Siperian Services Integration Framework Guide.

Signature
FUNCTION CMXBG.EXECUTE_BATCHGROUP(
IN_MRM_SERVER_URL IN VARCHAR2(500)
, IN_USERNAME IN VARCHAR2(500)
, IN_PASSWORD IN VARCHAR2(500)
, IN_ORSID IN VARCHAR2(500)
, IN_BATCHGROUP_UID IN VARCHAR2(500)
, IN_RESUME IN VARCHAR2(500)
, IN_ASYNCRONOUS IN VARCHAR2(500)
, OUT_ROWID_BATCHGROUP_LOG OUT VARCHAR2(500)
, OUT_ERROR_MSG OUT VARCHAR2(500)
) RETURN NUMBER --Return the error code

Parameters

Name Description
IN_MRM_SERVER_ Hub Server SIF URL.
URL
IN_USERNAME User account with role-based permissions to execute batch groups.
IN_PASSWORD Password for the user account with role-based permissions to
execute batch groups.
IN_ORSID ORS ID as shown in Console > Configuration > Databases.
To learn more, see “Configuring Operational Record Stores” on
page 62.
IN_BATCHGROUP_ Siperian Object UID of batch group to [execute, reset, get status,
UID etc.].
IN_RESUME One of the following values:
• true: if previous execution failed, resume at that point
• false: regardless of previous execution, start from the beginning
IN_ASYNCRONOUS Specifies whether to execute asynchronously or synchronously. One
of the following values:
• true: start execution and return immediately (asynchronous
execution).
• false: return when group execution is complete (synchronous
execution).

800 Siperian Hub Administrator Guide


Executing Batch Groups Using Stored Procedures

Returns

Parameter Description
OUT_ROWID_ c_repos_job_group_control.rowid_job_group_control
BATCHGROUP_LOG
OUT_ERROR_MSG Error message text.
NUMBER Error code. If zero (0), then the stored procedure completed
successfully. If one (1), then the stored procedure returns an
explanation in out_error_msg.

Sample Job Execution Script for Execute Batch Group Jobs


DECLARE
OUT_ROWID_BATCHGROUP_LOG CMXLB.CMX_SMALL_STR;
OUT_ERROR_MSG CMXLB.CMX_SMALL_STR;
RET_VAL INT;
BEGIN
RET_VAL := CMXBG.EXECUTE_BATCHGROUP(
'HTTP://LOCALHOST:7001/CMX/REQUEST/PROCESS/'
, 'ADMIN'
, 'ADMIN'
, 'LOCALHOST-MRM-XU_3009'
, 'BATCH_GROUP.MYBATCHGROUP'
, 'TRUE' -- OR 'FALSE'
, 'TRUE' -- OR 'FALSE'
, OUT_ROWID_BATCHGROUP_LOG
, OUT_ERROR_MSG
);
CMXLB.DEBUG_PRINT('EXECUTE_BATCHGROUP:
' || ' CODE='|| RET_VAL || ' MESSAGE='|| OUT_ERROR_MSG ||
' | OUT_ROWID_BATCHGROUP_LOG='|| OUT_ROWID_BATCHGROUP_LOG);
);
COMMIT;
END;

Writing Custom Scripts to Execute Batch Jobs 801


Executing Batch Groups Using Stored Procedures

CMXBG.RESET_BATCHGROUP

Reset Batch Group Status jobs resets a batch group.

Note: In addition to this stored procedure, there are Java API requests and the SOAP
and HTTP XML protocols available using Services Integration Framework (SIF). The
Reset Batch Group Status job has the following SIF API requests available:
ResetBatchGroup. For more information about this SIF API request, see the Siperian
Services Integration Framework Guide.

Signature
FUNCTION CMXBG.RESET_BATCHGROUP(
IN_MRM_SERVER_URL IN VARCHAR2(500)
, IN_USERNAME IN VARCHAR2(500)
, IN_PASSWORD IN VARCHAR2(500)
, IN_ORSID IN VARCHAR2(500)
, IN_BATCHGROUP_UID IN VARCHAR2(500)
, OUT_ROWID_BATCHGROUP_LOG OUT VARCHAR2(500)
, OUT_ERROR_MSG OUT VARCHAR2(500)
) RETURN NUMBER --Return the error code

Parameters

Name Description
IN_MRM_SERVER_URL Hub Server SIF URL.
IN_USERNAME User account with role-based permissions to execute batch
groups.
IN_PASSWORD Password for the user account with role-based permissions to
execute batch groups.
IN_ORSID ORS ID as specified in the Database tool in the Hub
Console. To learn more, see “Configuring Operational
Record Stores” on page 62.
IN_BATCHGROUP_UID Siperian Object UID of batch group to [execute, reset, get
status of, and so on].

802 Siperian Hub Administrator Guide


Executing Batch Groups Using Stored Procedures

Returns

Parameter Description
OUT_ROWID_BATCHGROUP_LOG c_repos_job_group_control.rowid_job_group_
control
OUT_ERROR_MSG Error message text.
NUMBER Error code. If zero (0), then the stored procedure
completed successfully. If one (1), then the stored
procedure returns an explanation in out_error_
msg.

Sample Job Execution Script for Reset Batch Group Jobs


DECLARE
OUT_ROWID_BATCHGROUP_LOG CMXLB.CMX_SMALL_STR;
OUT_ERROR_MSG CMXLB.CMX_SMALL_STR;
RET_VAL INT;
BEGIN
RET_VAL := CMXBG.RESET_BATCHGROUP(
'HTTP://LOCALHOST:7001/CMX/REQUEST/PROCESS/'
, 'ADMIN'
, 'ADMIN'
,'LOCALHOST-MRM-XU_3009'
, 'BATCH_GROUP.MYBATCHGROUP'
, OUT_ROWID_BATCHGROUP_LOG
, OUT_ERROR_MSG
);
CMXLB.DEBUG_PRINT('RESET_BATCHGROUP: CODE=' || RET_VAL || '
MESSAGE=' || OUT_ERROR_MSG || ' OUT_ROWID_BATCHGROUP_LOG=' || OUT_
ROWID_BATCHGROUP_LOG);
/

CMXBG.GET_BATCHGROUP_STATUS

Get Batch Group Status jobs return the batch group status.

Note: In addition to this stored procedure, there are Java API requests and the SOAP
and HTTP XML protocols available using Services Integration Framework (SIF). The
Get Batch Group Status job has the following SIF API requests available:
GetBatchGroupStatus. For more information about this SIF API request, see the
Siperian Services Integration Framework Guide.

Writing Custom Scripts to Execute Batch Jobs 803


Executing Batch Groups Using Stored Procedures

Signature
FUNCTION CMXBG.GET_BATCHGROUP_STATUS(
IN_MRM_SERVER_URL IN VARCHAR2(500)
, IN_USERNAME IN VARCHAR2(500)
, IN_PASSWORD IN VARCHAR2(500)
, IN_ORSID IN VARCHAR2(500)
, IN_BATCHGROUP_UID IN VARCHAR2(500)
, IN_ROWID_BATCHGROUP_LOG IN VARCHAR2(500)
, OUT_ROWID_BATCHGROUP OUT VARCHAR2(500)
, OUT_ROWID_BATCHGROUP_LOG OUT VARCHAR2(500)
, OUT_START_RUNDATE OUT VARCHAR2(500)
, OUT_END_RUNDATE OUT VARCHAR2(500)
, OUT_RUN_STATUS OUT VARCHAR2(500)
, OUT_STATUS_MESSAGE OUT VARCHAR2(500)
, OUT_ERROR_MSG OUT VARCHAR2(500)
) RETURN NUMBER --Return the error code

Parameters

Name Description
IN_MRM_SERVER_URL Hub Server SIF URL.
IN_USERNAME User account with role-based permissions to execute batch
groups.
IN_PASSWORD Password for the user account with role-based permissions to
execute batch groups.
IN_ORSID ORS ID as specified in the Database tool in the Hub Console.
To learn more, see “Configuring Operational Record Stores”
on page 62.
IN_BATCHGROUP_UID Siperian Object UID of batch group to [execute, reset, get
status of, and so on].
If IN_ROWID_BATCHGROUP_LOG is null, the most
recent log for this group will be used.
IN_ROWID_ c_repos_job_group_control.rowid_job_group_control
BATCHGROUP_LOG
Either IN_BATCHGROUP_UID or IN_ROWID_
BATCHGROUP_LOG is required.

804 Siperian Hub Administrator Guide


Executing Batch Groups Using Stored Procedures

Returns

Parameter Description
OUT_ROWID_BATCHGROUP c_repos_job_group.rowid_job_group
OUT_ROWID_BATCHGROUP_LOG c_repos_job_group_control.rowid_job_group_
control
OUT_START_RUNDATE Date / time when this batch job started.
OUT_END_RUNDATE Date / time when this batch job ended.
OUT_RUN_STATUS Job execution status code that is displayed in the
Batch Group tool. For more information, see
“Executing Batch Groups Using the Batch
Group Tool” on page 701.
OUT_STATUS_MESSAGE Job execution status message that is displayed in
the Batch Group tool. For more information, see
“Executing Batch Groups Using the Batch
Group Tool” on page 701.
OUT_ERROR_MSG Error message text for this stored procedure call,
if applicable.
NUMBER Error code. If zero (0), then the stored
procedure completed successfully. If one (1),
then the stored procedure returns an explanation
in out_error_msg.

Sample Job Execution Script for Get Batch Group Status Jobs
DECLARE
OUT_ROWID_BATCHGROUP CMXLB.CMX_SMALL_STR;
OUT_ROWID_BATCHGROUP_LOG CMXLB.CMX_SMALL_STR;
OUT_START_RUNDATE CMXLB.CMX_SMALL_STR;
OUT_END_RUNDATE CMXLB.CMX_SMALL_STR;
OUT_RUN_STATUS CMXLB.CMX_SMALL_STR;
OUT_STATUS_MESSAGE CMXLB.CMX_SMALL_STR;
OUT_ERROR_MSG CMXLB.CMX_SMALL_STR;
OUT_RETURNCODE INT;
RET_VAL INT;
BEGIN
RET_VAL := CMXBG.GET_BATCHGROUP_STATUS(
'HTTP://LOCALHOST:7001/CMX/REQUEST/PROCESS/'
, 'ADMIN'
, 'ADMIN'

Writing Custom Scripts to Execute Batch Jobs 805


Developing Custom Stored Procedures for Batch Jobs

,'LOCALHOST-MRM-XU_3009'
, 'BATCH_GROUP.MYBATCHGROUP'
, NULL
, OUT_ROWID_BATCHGROUP
, OUT_ROWID_BATCHGROUP_LOG
, OUT_START_RUNDATE
, OUT_END_RUNDATE
, OUT_RUN_STATUS
, OUT_STATUS_MESSAGE
, OUT_ERROR_MSG
);
CMXLB.DEBUG_PRINT('GET_BATCHGROUP_STATUS: CODE='|| RET_VAL || '
MESSAGE='|| OUT_ERROR_MSG || ' STATUS=' || OUT_STATUS_MESSAGE || ' |
OUT_ROWID_BATCHGROUP_LOG='|| OUT_ROWID_BATCHGROUP_LOG);
END;
/

Developing Custom Stored Procedures for Batch


Jobs
This section describes how to create and register custom stored procedures for batch
jobs that can be added to batch groups for your Siperian Hub implementation.

About Custom Stored Procedures


Siperian Hub also allows you to create and run custom stored procedures for batch
jobs. After developing the custom stored procedure, you must register it in order to
make it available to users as batch jobs in the Batch Viewer and Batch Group tools in
the Hub Console. For more information about these tools, see the “About Siperian
Hub Batch Jobs” on page 658

806 Siperian Hub Administrator Guide


Developing Custom Stored Procedures for Batch Jobs

Required Execution Parameters for Custom Batch Jobs


The following parameters are required for custom batch jobs. During its execution, a
custom batch job can call cmxut.set_metric_value to register metrics.

Signature
PROCEDURE EXAMPLE_JOB(
IN_ROWID_TABLE_OBJECT IN CHAR(14) --C_REPOS_TABLE_OBJECT.ROWID_
TABLE_OBJECT, RESULT OF CMXUT.REGISTER_CUSTOM_TABLE_OBJECT
,IN_USER_NAME IN VARCHAR2(50) --Username calling the function
,IN_ROWID_JOB IN CHAR(14) --C_REPOS_JOB_CONTROL.ROWID_JOB, for
reference, do not update status
,OUT_ERR_MSG OUT VARCHAR --Message about success or error
,OUT_ERR_CODE OUT INT -- >=0: Completed successfully. <0: Error
)

Parameters
Name Description
in_rowid_table_object IN c_repos_table_object.rowid_table_object
cmxlb.cmx_rowid
Result of cmxut.REGISTER_CUSTOM_TABLE_
OBJECT
in_user_name IN User name calling the function.
cmxlb.cmx_user_name

Returns
Parameter Description
out_err_msg Error message text.
out_err_code Error code.

Writing Custom Scripts to Execute Batch Jobs 807


Developing Custom Stored Procedures for Batch Jobs

Registering a Custom Stored Procedure


You must register a custom stored procedure with Siperian Hub in order to make it
available to users in the Batch Group tool in the Hub Console. You can register the
same custom job multiple times for different tables (in_rowid_table). To register a
custom stored procedure, you need to call this stored procedure in c_repos_table_
object:
CMXUT.REGISTER_CUSTOM_TABLE_OBJECT

Signature
PROCEDURE REGISTER_CUSTOM_TABLE_OBJECT(
IN_ROWID_TABLE IN CHAR(14)
, IN_OBJ_FUNC_TYPE_CODE IN VARCHAR
, IN_OBJ_FUNC_TYPE_DESC IN VARCHAR
, IN_OBJECT_NAME IN VARCHAR
)

Parameters
Name Description
IN_ROWID_TABLE Foreign key to c_repos_table.rowid_table.
CMXLB.CMX_ROWID
When the Hub Server calls the custom job in a batch
group, this value is passed in.
IN_OBJ_FUNC_TYPE_CODE Job type code. Must be 'A' for batch group custom jobs.
IN_OBJ_FUNC_TYPE_DESC Display name for the custom batch job in the Batch
Groups tool in the Hub Console.
IN_OBJECT_NAME package.procedure name of the custom job.

Example
BEGIN
cmxut.REGISTER_CUSTOM_TABLE_OBJECT (
'SVR1.RS1B ' -- c_repos_table.rowid_table
,'A' -- Job type, must be 'A' for batch group
,'CMXBG_EXAMPLE.UPDATE_TABLE EXAMPLE' -- Display name
,'CMXBG_EXAMPLE.UPDATE_TABLE' -- Package.procedure
);
END;

808 Siperian Hub Administrator Guide


Developing Custom Stored Procedures for Batch Jobs

Registering a Custom Index


There are a number of user-defined indexes that have been created, but that are not
registered in the repository. If you create your own indexes, you should register them in
the repository. Some batch processes drop and recreate indexes based on the
repository info, so if your indexes aren’t registered, you run the risk of having them
dropped.

Example
DECLARE
IN_ROWID_TABLE CHAR(14);
IN_ROWID_COL_LIST VARCHAR2(2000);
IN_USER_NAME VARCHAR2(50);
IN_INDEX_TYPE VARCHAR2(200);
BEGIN
IN_ROWID_TABLE := '<ROWID_TABLE>' ; -- rowid_table from c_repos_
table where table_name = 'your table name'

IN_ROWID_COL_LIST := NULL; -- List of rowid_column values


from c_repos_column where rowid_table = '<rowid_table value for your
table>'

-- Notes:
-- 1. Trailing spaces in the rowid_column values are significant
-- 2. Separate each rowid_column with a ~ character and end the
list with ~ character e.g. '123 ~456 ~'

IN_USER_NAME := NULL; -- Your name / identifier; does not have to


be a Siperian user name
IN_INDEX_TYPE := NULL; -- FK, PK, NI (non-unique index), UI
(Unique Index). You should ONLY create and register indexes of type
NI.

CMXUT.REGISTER_CUSTOM_INDEX ( IN_ROWID_TABLE, IN_ROWID_COL_LIST,


IN_USER_NAME, IN_INDEX_TYPE );
COMMIT;
END;

Writing Custom Scripts to Execute Batch Jobs 809


Developing Custom Stored Procedures for Batch Jobs

Removing Data from a Base Object and Supporting


Metadata Tables
Use the CMXUT.CLEAN_TABLE procedure to remove all data from a base object and its
supporting metadata tables. If a base object is referenced by a foreign key in another
base object, then the referencing base object must be empty before you run
cmxut.clean_table for the referenced base object.

Example
DECLARE
IN_TABLE_NAME VARCHAR2(30);
OUT_ERROR_MESSAGE VARCHAR2(1024);
RC NUMBER;
BEGIN
IN_TABLE_NAME := 'C_BO_TO_CLEAN'; --Name of the BO table
OUT_ERROR_MESSAGE := NULL; --Return msg; output parameter
RC := NULL; --Return code; output parameter
CMXUT.CLEAN_TABLE ( IN_TABLE_NAME, OUT_ERROR_MESSAGE, RC );
COMMIT;
END;

Writing Messages to Siperian Hub Database Debug Log


Use the CMXLB.DEBUG_PRINT procedure to write your own messages to Siperian Hub
database debug log file. The message is written to the log if logging is enabled and if it
has been configured correctly; for details, see the Siperian Hub Installation Guide.

Example
DECLARE
IN_DEBUG_TEXT VARCHAR2(32000);
BEGIN
IN_DEBUG_TEXT := NULL; --String that you want to print in the log
file
CMXLB.DEBUG_PRINT ( IN_DEBUG_TEXT );
COMMIT;
END;

810 Siperian Hub Administrator Guide


Developing Custom Stored Procedures for Batch Jobs

Example Custom Stored Procedure


CREATE OR REPLACE PACKAGE CMXBG_EXAMPLE
AS

PROCEDURE UPDATE_TABLE(
IN_ROWID_TABLE_OBJECT IN CMXLB.CMX_ROWID
,IN_USER_NAME IN CMXLB.CMX_USER_NAME
,IN_ROWID_JOB IN CMXLB.CMX_ROWID
,OUT_ERR_MSG OUT VARCHAR
,OUT_ERR_CODE OUT INT

);
END CMXBG_EXAMPLE;
/
CREATE OR REPLACE PACKAGE BODY CMXBG_EXAMPLE
AS
BEGIN
DECLARE
CUTOFF_DATE DATE;
RECORD_COUNT INT;
RUN_STATUS INT;
STATUS_MESSAGE VARCHAR2 (2000);
START_DATE DATE := SYSDATE;
MRM_ROWID_TABLE CMXLB.CMX_ROWID;
OBJ_FUNC_TYPE CHAR (1);
JOB_ID CHAR (14);
SQL_STMT VARCHAR2 (2000);
TABLE_NAME VARCHAR2(30);
RET_CODE INT;
REGISTER_JOB_ERR EXCEPTION;
BEGIN
SQL_STMT :=
'ALTER SESSION SET NLS_DATE_FORMAT=''DD MON YYYY
HH24:MI:SS''';

EXECUTE IMMEDIATE SQL_STMT;


CMXUT.DEBUG_PRINT ('START OF CUSTOM BATCH JOB...');
OBJ_FUNC_TYPE := 'A';

SELECT ROWID_TABLE
INTO MRM_ROWID_TABLE
FROM C_REPOS_TABLE_OBJECT
WHERE ROWID_TABLE_OBJECT = IN_ROWID_TABLE_OBJECT;

Writing Custom Scripts to Execute Batch Jobs 811


Developing Custom Stored Procedures for Batch Jobs

SELECT START_RUN_DATE
INTO CUTOFF_DATE
FROM C_REPOS_JOB_CONTROL
WHERE ROWID_JOB = IN_ROWID_JOB;

IF CUTOFF_DATE IS NULL THEN


CUTOFF_DATE := SYSDATE - 7;
END IF;
SELECT TABLE_NAME
INTO TABLE_NAME
FROM C_REPOS_TABLE RT, C_REPOS_TABLE_OBJECT RTO
WHERE RTO.ROWID_TABLE_OBJECT = IN_ROWID_TABLE_OBJECT
AND RTO.ROWID_TABLE = RT.ROWID_TABLE;

-- THE REAL WORK!


SQL_STMT :=
'UPDATE ' || TABLE_NAME || ' SET ZIP4 = ''0000'',
LAST_UPDATE_DATE = '''
|| CUTOFF_DATE
|| ''''
|| ' WHERE ZIP4 IS NULL';
CMXUT.DEBUG_PRINT (SQL_STMT);
EXECUTE IMMEDIATE SQL_STMT;
RECORD_COUNT := SQL%ROWCOUNT;
COMMIT;
-- For testing, sleep to make the procedure take longer
-- dbms_lock.sleep(5);
-- Set zero or many metrics about the job
CMXUT.SET_METRIC_VALUE (IN_ROWID_JOB, 1, RECORD_COUNT,
OUT_ERR_CODE, OUT_ERR_MSG);
COMMIT;

IF RECORD_COUNT <= 0 THEN


OUT_ERR_MSG := 'FAILED TO UPDATE RECORDS.';
OUT_ERR_CODE := -1;
ELSE
IF OUT_ERR_CODE >= 0 THEN
OUT_ERR_MSG := 'COMPLETED SUCCESSFULLY.';
END IF;
-- Else keep success code and msg from set_metric_value
END IF;

812 Siperian Hub Administrator Guide


Developing Custom Stored Procedures for Batch Jobs

EXCEPTION
WHEN OTHERS
THEN
OUT_ERR_CODE := SQLCODE;
OUT_ERR_MSG := SUBSTR (SQLERRM, 1, 200);
END;
END;
END CMXBG_EXAMPLE;
/

Writing Custom Scripts to Execute Batch Jobs 813


Developing Custom Stored Procedures for Batch Jobs

814 Siperian Hub Administrator Guide


Part 5
Configuring Application Access

Contents
• Chapter 19, “Generating ORS-specific APIs and Message Schemas”
• Chapter 20, “Setting Up Security”
• Chapter 21, “Viewing Registered Custom Code”
• Chapter 22, “Auditing Siperian Hub Services and Events”

815
816 Siperian Hub Administrator Guide
19
Generating ORS-specific APIs and
Message Schemas

This chapter describes how to use the SIF Manager tool to generate ORS-specific APIs
and how to use the JMS Event Schema Manager tool to generate ORS-specific JMS
Event Message objects.

Chapter Contents
• Before You Begin
• Generating ORS-specific APIs
• Generating ORS-specific Message Schemas

817
Before You Begin

Before You Begin


The SIF SDK requires a Java Development Kit (JDK) and the Apache Jakarta Ant
build system. It can build client applications and custom web services, but only for
supported application servers. Refer to the Siperian Hub Release Notes for information
about the specific versions of JDK, Ant, and supported application servers. For more
information about the Siperian SIF SDK, see the Siperian Services Integration Framework
Guide.

Note: Use of the ORS-specific API does not imply that you must use the SIF SDK.
Alternatively, you could use the ORS-specific API as SOAP web-services.

Generating ORS-specific APIs


You use the SIF Manager tool to generate and deploy the code to support SIF APIs for
packages, remote packages, mappings, and cleanse functions in an ORS database. Once
generated, the ORS-specific APIs will be available with SiperianClient by using the
client jar and also as a web service. For more information about the SiperianClient, see
the Siperian Services Integration Framework Guide.

About ORS-specific Schemas


The ORS-specific message schema is an XML schema XSD file which defines the
structure of the JMS data change event messages. For more information regarding JMS
event messages, see “JMS Message XML Reference” on page 622.

About the SIF Manager Tool


Use the SIF Manager tool in the Siperian console to produce ORS-specific APIs.

818 Siperian Hub Administrator Guide


Generating ORS-specific APIs

Starting the SIF Manager Tool


To start the SIF Manager tool:
1. In the Hub Console, connect to an Operational Record Store (ORS). To learn
more, see “Changing the Target Database” on page 31.
2. Expand the Siperian Utilities workbench and then click SIF Manager.
The Hub Console displays the SIF Manager tool, as shown in the following
example.

Generating ORS-specific APIs and Message Schemas 819


Generating ORS-specific APIs

The SIF Manager tool displays the following areas:

Area Description
SIF ORS-Specific APIs Shows the logical name, java name, WSDL URL, and API
generation time for the SIF ORS-specific APIs.
Use this function to generate and deploy SIF APIs for
packages, remote packages, mappings, and cleanse functions in
an ORS database. Once generated, the ORS-specific APIs will
be available with SiperianClient by using the client jar and also
as a web service. The logical name is used to name the
components of the deployment.
Out of Sync Objects Shows the database objects in the schema that are out of sync.
with the generated schema.

Generating and Deploying ORS-specific SIF APIs


This operation requires access to a Java compiler on the application server machine.
The Java software development kit (SDK) includes a compiler in tools.jar. The Java
runtime environment (JRE) does not contain a compiler. If the SDK is not available,
you will need to add the tool.jar file to the classpath of the application server.

Note: The following procedure assumes that you have already configured the base
objects and packages of the ORS. If you subsequently change any of these, regenerate
the ORS-specific APIs.

Note: SIF API generation requires at least one secure package, remote package,
cleanse function or mapping.

To produce and use ORS-specific APIs:


1. Start the SIF Manager. To learn more, see “Starting the SIF Manager Tool” on
page 819.
The Hub Console displays the SIF Manager in the right pane.
2. Acquire a write lock.
In order to make any changes to the schema, you must have a write lock. To learn
more, see “Acquiring a Write Lock” on page 30.

820 Siperian Hub Administrator Guide


Generating ORS-specific APIs

3. Enter a value in the Logical Name field.


You can keep the default value, which is the name of the ORS. If you change the
logical name, it must be different from the logical name of any other ORS
registered on this server.
4. Click Generate and Deploy ORS-specific SIF APIs.
SIF Manager generates the APIs. The time this requires depends on the size of the
ORS schema. When the generation is complete, SIF Manager deploys the
ORS-specific APIs and displays their URL. You can use the URL to access the
WSDL descriptions from your development environment.

Note: To prevent running out of heap space for the associated SIF API Javadocs, you
may need to increase the size of the heap. The default heap size is 256M. You can also
override this default using the SIF.JVM.HEAP.SIZE parameter.

Renaming ORS-specific SIF APIs

To rename the ORS-specific APIs:


1. Start the SIF Manager. To learn more, see “Starting the SIF Manager Tool” on
page 819.
The Hub Console displays the SIF Manager in the right pane.
2. Acquire a write lock.
In order to make any changes to the schema, you must have a write lock. To learn
more, see “Acquiring a Write Lock” on page 30.
3. Enter a new value in the Logical Name field and save it.
You can keep the default value, which is the name of the ORS. If you change the
logical name, it must be different from the logical name of any other ORS
registered on this server.
4. Click Generate and Deploy ORS-specific SIF APIs.
SIF Manager generates the APIs. The time this requires depends on the size of the
ORS schema. When the generation is complete, SIF Manager deploys the
ORS-specific web services and displays their URL. Note that these are not
required for Java ORS-specific APIs to work. Java and web services ORS-specific

Generating ORS-specific APIs and Message Schemas 821


Generating ORS-specific APIs

APIs have no dependencies on each other, so you can use one while the other is
not in use.
You can use the resulting URL to access the WSDL descriptions from your
development environment.

Note: To prevent running out of heap space for any associated SIF API Javadocs, you
may need to increase the size of the heap. The default heap size is 256M. You can also
override this default using the SIF.JVM.HEAP.SIZE parameter.

Downloading ORS-specific Client JAR Files

You can download ORS-specific JAR file at any point after the APIs have been
generated.

To download client JAR files:


1. Start the SIF Manager. To learn more, see “Starting the SIF Manager Tool” on
page 819.
The Hub Console displays the SIF Manager in the right pane.
2. Click Download ORS-specific Client JAR File.
SIF Manager downloads a file called nameClient.jar, where name is the logical name
you provided in step 2, to a location you specify on your local machine. The JAR
file includes the classes that represent your ORS-specific configuration and their
Javadoc.
Note: This jar file needs to be used in conjunction with sifsdk folder client jar
(generic client jar).
3. If you are using an integrated development environment (IDE) and have a project
file for building web services, add the JAR file to your build classpath.
4. Modify the SIF SDK build.xml file so that the build_war macro includes the JAR
file. For more information about the SIF SDK, see the Siperian Services Integration
Framework Guide.

822 Siperian Hub Administrator Guide


Generating ORS-specific Message Schemas

Finding Out-of-Sync Objects

The SIF Manager Find Out of Sync Objects function compares the last generated
APIs to the defined objects in the ORS. The SIF Manager reports any differences
between these. If differences are found, the ORS-specific API should be regenerated.

To find the out-of-sync objects:


1. Start the SIF Manager. To learn more, see “Starting the SIF Manager Tool” on
page 819.
The Hub Console displays the SIF Manager in the right pane.
2. Click Find Out of Sync Objects.
The SIF Manager displays all out-of-sync objects in the lower panel.

Note: Once you have evaluated the impact of the out-of-sync objects, you can then
decide whether or not to re-generate the schema (typically, external components which
interact with the Hub are written to work with a specific version of the generated
schema). If you regenerate the schema, these external components may no longer
work.

Removing ORS-specific APIs

To remove the ORS-specific APIs:


1. Start the SIF Manager. To learn more, see “Starting the SIF Manager Tool” on
page 819.
The Hub Console displays the SIF Manager in the right pane.
2. Click Remove ORS-specific SIF APIs.

Generating ORS-specific Message Schemas


Siperian Hub now supports two formats for JMS events: the legacy XML format and
the new ORS-specific XML format. By default, the ORS-specific format is used. You
can choose to use the legacy format in the Message Queues tool

Generating ORS-specific APIs and Message Schemas 823


Generating ORS-specific Message Schemas

Note: If your Siperian Hub implementation requires that you use the legacy XML
message format (Siperian Hub XU version) instead of the current version of the XML
message format (described in this section), see “Legacy JMS Message XML Reference”
on page 644 instead.

Use the JMS Event Schema Manager tool to generate and deploy ORS-specific JMS
Event Messages for the current ORS. The XML schema for these messages can be
downloaded or accessed using a URL. For more information about JMS Event
Messages, see “JMS Message XML Reference” on page 622.

About the JMS Event Schema Manager Tool


The JMS Event Schema Manager uses an XML schema that defines the message
structure the Hub uses to generate JMS messages. This XML schema is included as
part of the Siperian Hub Resource Kit. (The ORS-specific schema is available using a
URL or downloadable as a file).

Note: JMS Event Schema generation requires at least one secure package or remote
package.

Important: If there are two databases that have the same schema (for example, CMX_
ORS), the logical name (which is the same as the schema name) will be duplicated for
JMS Events when the configuration is initially saved. Consequently, the database
display name is unique and should be used as the initial logical name instead of the
schema name to be consistent with the SIF APIs. You will need to change the logical
name before generating the schema.

Additionally, each ORS has an XSD file specific to the ORS that uses the elements
from the common XSD file (siperian-mrm-events.xsd). The ORS-specific XSD is
named as <ors-name>-siperian-mrm-event.xsd. The XSD defines two objects for
each package and remote package in the schema:

Object Name Description


[packageName]Event Complex type containing elements of type EventMetadata and
[packageName].

824 Siperian Hub Administrator Guide


Generating ORS-specific Message Schemas

Object Name Description


[packageName]Record Complex type representing a package and its fields. Also
includes an element of type SipMetadata. This complex type
resembles the package record structures defined in the Siperian
Hub Services Integration Framework (SIF). For more
information, refer to the Siperian Services Integration Framework
Guide.

Note: If legacy XML event message objects are to be used, ORS-specific message
object generation is not required.

Starting the JMS Event Schema Manager Tool


To start the JMS Event Schema Manager tool:
1. In the Hub Console, connect to an Operational Record Store (ORS). To learn
more, see “Changing the Target Database” on page 31.
2. Expand the Siperian Utilities workbench and then click SIF Manager.
3. Click the JMS Event Schema Manager tab.

Generating ORS-specific APIs and Message Schemas 825


Generating ORS-specific Message Schemas

The Hub Console displays the JMS Event Schema Manager tool, as shown in the
following example.

The JMS Event Schema Manager tool displays the following areas:

Area Description
JMS ORS-specific Event Shows the event message schema for the ORS.
Message Schema
Use this function to generate and deploy ORS-specific JMS
Event Messages for the current ORS. The logical name is used
to name the components of the deployment. The schema can
be downloaded or accessed using a URL.
Note: If legacy XML event message objects are to be used,
ORS-specific message object generation is not required.
Out of Sync Objects Shows the database objects in the schema that are out of sync.
with the generated API.

826 Siperian Hub Administrator Guide


Generating ORS-specific Message Schemas

Generating and Deploying ORS-specific Schemas


This operation requires access to a Java compiler on the application server machine.
The Java software development kit (SDK) includes a compiler in tools.jar. The Java
runtime environment (JRE) does not contain a compiler. If the SDK is not available,
you will need to add the tool.jar file to the classpath of the application server.

Note: The following procedure assumes that you have already configured the base
objects, packages, and mappings of the ORS. If you subsequently change any of these,
regenerate the ORS-specific schemas.

Note: JMS Event Schema generation requires at least one secure package or remote
package.

Important: If there are two databases that have the same schema (for example, CMX_
ORS), the logical name (which is the same as the schema name) will be duplicated for
JMS Events when the configuration is initially saved. Consequently, the database
display name is unique and should be used as the initial logical name instead of the
schema name to be consistent with the SIF APIs. You will need to change the logical
name before generating the schema.

To generate and deploy ORS-specific schemas:


1. Start the JMS Event Schema Manager. To learn more, see “Starting the JMS Event
Schema Manager Tool” on page 825.
The Hub Console displays the JMS Event Schema Manager tool.
2. Enter a value in the Logical Name field for the event schema.
In order to make any changes to the schema, you must have a write lock. To learn
more, see “Acquiring a Write Lock” on page 30.
3. Click Generate and Deploy ORS-specific Schemas.

Note: There must be at least one secure package or remote package configured to
generate the schema. If there are no secure objects to generate, the Siperian Hub
generates a runtime error message.

Generating ORS-specific APIs and Message Schemas 827


Generating ORS-specific Message Schemas

Downloading an XSD File

An XSD file defines the structure of an XML file and can also be used to validate the
XML file. For example, if an XML file contains a reference to an XSD, an XML
validation tool can be used to verify that the tags in the XML conform to the
definitions defined in the XSD.

To download an XSD file:


1. Start the JMS Event Schema Manager. To learn more, see “Starting the JMS Event
Schema Manager Tool” on page 825.
The Hub Console displays the JMS Event Schema Manager tool.
2. Acquire a write lock.
In order to make any changes to the schema, you must have a write lock. To learn
more, see “Acquiring a Write Lock” on page 30.
3. Click Download XSD File.
Alternatively, you can use the URL specified in the Schema URL to access the
XSD file.

Finding Out-of-Sync Objects

You use Find Out Of Sync Objects to determine if the event schema needs to be
re-generated to reflect changes in the system. The JMS Event Schema Manager displays
a list of packages and remote packages that have changed since the last schema
generation.

Note: The Out of Sync Objects function compares the generated APIs to the database
objects in the schema so both must be present to find the out-of-sync objects.

To find the out-of-sync objects:


1. Start the JMS Event Schema Manager. To learn more, see “Starting the JMS Event
Schema Manager Tool” on page 825.
The Hub Console displays the JMS Event Schema Manager tool.
2. Acquire a write lock.

828 Siperian Hub Administrator Guide


Generating ORS-specific Message Schemas

In order to make any changes to the schema, you must have a write lock. To learn
more, see “Acquiring a Write Lock” on page 30.
3. Click Find Out of Sync Objects.
The JMS Event Schema Manager displays all out of sync objects in the lower panel.

Note: Once you have evaluated the impact of the out-of-sync objects, you can then
decide whether or not to re-generate the schema (typically, external components which
interact with the Hub are written to work with a specific version of the generated
schema). If you regenerate the schema, these external components may no longer
work.

If the JMS Event Schema Manager returns any out-of-sync objects, click Generate and
Deploy ORS-specific Schema to re-generate the event schema. For more
information, see “Generating and Deploying ORS-specific Schemas” on page 827.

Auto-searching for Out-of-Sync Objects

You can configure Siperian Hub to periodically search for out-of-sync objects and
re-generate the schema as needed. This auto-poll feature operates within the data
change monitoring thread which automatically engages a specified number of
milliseconds between polls. You specify this time frame using the Message Check
Interval in the Message Queues tool. When the monitoring thread is active, this
automatic service first checks if the out-of-sync interval has elapsed and if so, performs
the out-of-sync check and then re-generates the event schema as needed.

To configure the Hub to periodically search for out-of-sync objects:


1. Set the logical name of the schema to be generated in the JMS Event Schema
Manager.
For more information, see “Generating and Deploying ORS-specific Schemas” on
page 827.
Note: If you bypass this step, the Hub issues a warning in the server log asking
you configure the schema generation.
2. Enable the Queue Status for Data Changes Monitoring message. For more
information, see “Configuring Global Message Queue Settings” on page 604.

Generating ORS-specific APIs and Message Schemas 829


Generating ORS-specific Message Schemas

3. Select the root node Message Queues and set the Out of sync check interval
(milliseconds). For more information, see “Configuring Global Message Queue
Settings” on page 604.
Since the out-of-sync auto-poll feature effectively depends on the Message check
interval, you should set the Out-of-sync check interval to a value greater than or
equal to that of the Message check interval.
Note: You can disable to out-of-sync check by setting the out-of-sync check
interval to 0.

830 Siperian Hub Administrator Guide


20
Setting Up Security

This chapter describes how to set up security for your Siperian Hub implementation
using the Hub Console. To learn how to configure user access to the Hub Console, see
“About User Access to Hub Console Tools” on page 989.

To learn more about configuring security using the Services Integration Framework
(SIF) instead, see the Siperian Services Integration Framework Guide.

Chapter Contents
• About Setting Up Security
• Securing Siperian Hub Resources
• Configuring Roles
• Configuring Siperian Hub Users
• Configuring User Groups
• Assigning Users to the Current ORS Database
• Assigning Roles to Users and User Groups
• Managing Security Providers

831
About Setting Up Security

About Setting Up Security


This section provides an overview of—and introduction to—Siperian Hub security.

Note: Before you begin, you must have:


• installed Siperian Hub and created the Hub Store according to the instructions in
the Siperian Hub Installation Guide for your platform
• built the schema; for more information, see “About the Schema” on page 82

Siperian Hub Security Concepts


Security is the ability to protect information privacy, confidentiality, and data integrity by
guarding against unauthorized access to, or tampering with, data and other resources in
your Siperian Hub implementation.

Before setting up security for your Siperian Hub implementation, it is important for
you to understand some key concepts.

Security Access Manager

Siperian Hub Security Access Manager (SAM) is Siperian’s comprehensive security


framework for protecting Siperian Hub resources from unauthorized access. At run
time, SAM enforces your organization’s security policy decisions for your Siperian Hub
implementation, handling user authentication and access authorization according to
your security configuration.

Note: SAM security applies primarily to users of third-party applications who want to
gain access to Siperian Hub resources. SAM applies only tangentially to Hub Console
users. The Hub Console has its own security mechanisms to authenticate users and
authorize access to Hub Console tools and resources.

Authentication

Authentication is the process of verifying the identity of a user to ensure that they are
who they claim to be. A user is an individual who wants to access Siperian Hub

832 Siperian Hub Administrator Guide


About Setting Up Security

resources (see “Configuring Siperian Hub Users” on page 866). In Siperian Hub, users
are authenticated based on their supplied credentials—user name / password, security
payload, or a combination of both.

Siperian Hub supports the following types of authentication:

Authentication Type Description


Internal Siperian Hub’s authentication mechanism in which the user logs
in with a user name and password (see “Starting the Hub
Console” on page 19)
External Directory User authentication using an external user directory, with native
support for LDAP-enabled directory serves, Microsoft Active
Directory, and Kerberos (see “External User Directory” on page
837)
External Authentication User authentication using third-party authentication providers
Providers (see “Managing Security Providers” on page 889)
When configuring user accounts, you designate
externally-authenticated users by checking (selecting) the Use
external authentication? check box, as described in “Using
External Authentication” on page 872.

Siperian Hub implementations can use each type of authentication exclusively, or they
can use a combination of them. The type of authentication used in your Siperian Hub
implementation depends on how you configure security, as described in “Security
Implementation Scenarios” on page 836.

Authorization

Authorization is the process of determining whether a user has sufficient privileges to


access a requested Siperian Hub resource.

Siperian Hub provides two types of authorization:


• Internal: Siperian Hub’s internal authorization mechanism, in which a user’s access
to secure resources is determined by the privileges associated with any roles that
are assigned to their user account.
• External: Authorization using third-party authorization providers (see “Managing
Security Providers” on page 889)

Setting Up Security 833


About Setting Up Security

Siperian Hub implementations can use either type of authorization exclusively, or they
can use a combination of both. The type of authorization used in your Siperian Hub
implementation depends on how you configure security, as described in “Security
Implementation Scenarios” on page 836.

Secure Resources and Privileges

Siperian Hub provides general types of resources that you can configure to be secure
resources: base objects, dependent objects, mappings, packages, remote packages,
cleanse functions, match rule sets, batch groups, metadata, content metadata, Metadata
Manager, HM profiles, the audit table, and the users table. You can configure security
for these resources in a highly granular way, granting access to Siperian Hub resources
according to various privileges (read, create, update, merge, and execute). Resources are
either PRIVATE (the default) or SECURE. Privileges can be granted only to secure
resources. To learn more see “Securing Siperian Hub Resources” on page 841.

Roles

In Siperian Hub, resource privileges are allocated to roles. A role represents a set of
privileges to access secure Siperian Hub resources (see “Configuring Roles” on page
854). Users and user groups are assigned to roles. A user’s resource privileges are
determined by the roles to which they are assigned, as well as by the roles assigned to
the user group(s) to which the user belongs. Security Access Manager enforces
resource authorization for requests from external application users. Administrators and
data stewards who use the Hub Console to access Siperian Hub resources are less
directly affected by resource privileges (see “Privileges” on page 843).

Access to Hub Console Tools

For users who will be using the Hub Console to access Siperian Hub resources, you
can use the Tool Access tool in the Configuration workbench to control access
privileges to Hub Console tools. For example, data stewards typically have access to
only the Data Manager and Merge Manager tools. To learn more, see “About User
Access to Hub Console Tools” on page 989.

834 Siperian Hub Administrator Guide


About Setting Up Security

How Users, Roles, Privileges, and Resources Are Related


The following diagram shows how users, roles, privileges, and resources are related in
Siperian Hub’s internal security framework.

When configuring security in Siperian Hub:


• a specific resource is configured to be secure (not private).
• a specific role is configured to have access to one or more secure resources.
• each secure resource is configured with specific privileges (READ, WRITE,
CREATE, and so on) that define that role’s access to the secure resource.
• a user is assigned one or more roles.

Setting Up Security 835


About Setting Up Security

At run time, in order to execute a SIF request, the logged-in user must be assigned a
role that has the required privilege(s) to access the resource(s) involved with the
request. Otherwise, the user’s request will be denied.

Security Implementation Scenarios


This section describes a range of high-level scenarios in which security can be
configured in Siperian Hub implementations. Policy decision points (PDPs) are specific
security check points that determine, at run-time, the validity of a user’s identity
(authentication), along with that user’s access to Siperian Hub resources
(authorization). These scenarios vary in the degree to which PDPs are handled
internally by Siperian Hub or externally by third-party security providers or other
security services.

Internal-only PDP

The following figure shows a security deployment in which all PDPs are handled
internally by Siperian Hub.

In this scenario, Siperian Hub makes all policy decisions based on how users, groups,
roles, privileges, and resources are configured using the Hub Console.

836 Siperian Hub Administrator Guide


About Setting Up Security

External User Directory

The following figure shows a security deployment in which Siperian Hub integrates
with an external directory.

In this scenario, the external user directory manages user accounts, groups, and user
profiles. The external user directory is able to authenticate users and provide
information to Siperian Hub about group membership and user profile information.

Users or user groups that are maintained in the external user directory must still be
registered in Siperian Hub. Registration is required so that Siperian Hub roles—and
their associated privileges—can be assigned to these users and groups.

Roles-based Centralized PDP

The following figure shows a security deployment where role assignment—in addition
to user accounts, groups, and user profiles—is handled externally to Siperian Hub.

Setting Up Security 837


About Setting Up Security

In this scenario, external roles are explicitly mapped to Siperian Hub roles.

Comprehensive Centralized PDP

The following figure shows a security deployment in which role definition and privilege
assignment—in addition to user accounts, groups, user profiles, and role
assignment—is handled externally to Siperian Hub.

In this scenario, Siperian Hub simply exposes the protected resources using external
proxies, which are synchronized with the internally-protected resources using SIF
requests (RegisterUsers, UnregisterUsers, and ListSiperianObjects). All policy decisions
are external to Siperian Hub.

Summary of Security Configuration Tasks


To configure security for a Siperian Hub implementation using Siperian Hub’s internal
security framework, you complete the following minimal tasks using tools in the Hub
Console:
1. Define global password policies for all users according to your organization’s
security policies and procedures. For instructions on using the Users tool to define
global password policies, see “Managing the Global Password Policy” on page 877.
2. Add user accounts for your users. For instructions on using the Users tool to
configure user accounts, see “Configuring Users” on page 869.
3. Provide users with access to the database(s) they need to use. For instructions on
using the Users tool to provide database access, see “Configuring User Access to
ORS Databases” on page 875.

838 Siperian Hub Administrator Guide


About Setting Up Security

4. Optionally, configure user groups and assign users to them, if applicable. For
instructions on using the Users and Groups tool to configure user groups, see
“Configuring User Groups” on page 881.
5. Configure secure Siperian Hub resources and (optionally) resource groups. For
instructions on using the Secure Resources tool to configure resources and
resource groups, see “Setting the Status of a Siperian Hub Resource” on page 847.
6. Define roles and assign resource privileges to roles. For instructions on using the
Roles tool to configure roles, see “Configuring Roles” on page 854.
7. Assign roles to users and (optionally) user groups. For instructions on using the
Users and Groups tool to assign roles, see “Assigning Roles to Users and User
Groups” on page 887.
8. For non-administrator users who will interact with Siperian Hub using the Hub
Console, provide them with access to the Hub Console tools that they will need to
use, as described in “Configuring User Access to ORS Databases” on page 875.
For example, data stewards typically need access to the Merge Manager and Data
Manager tools (which are described in the Siperian Hub Data Steward Guide).

If you are using external security providers instead to handle any portion of security in
your Siperian Hub implementation, you must configure them in the Hub Console, as
described in “Managing Security Providers” on page 889.

Configuration Tasks For Security Scenarios


The following table shows the security configuration tasks that pertain to each of the
scenarios described in “Security Implementation Scenarios” on page 836. If a cell does
not contain an “X”, then the associated task is handled externally to Siperian Hub.

Internal-only External User Roles-based Comprehensive


Service / Task PDP Directory Centralized PDP Centralized PDP
Users and Groups
“Configuring Siperian Hub Users” on page X X
866
“Using External Authentication” on page 872 X

Setting Up Security 839


About Setting Up Security

Internal-only External User Roles-based Comprehensive


Service / Task PDP Directory Centralized PDP Centralized PDP
“Assigning Users to the Current ORS X X
Database” on page 886
“Managing the Global Password Policy” on X
page 877
“Configuring User Groups” on page 881 X X
Secure Resources
“Securing Siperian Hub Resources” on page X X X X
841
“Setting the Status of a Siperian Hub X X X X
Resource” on page 847
Roles
“Configuring Roles” on page 854 X X X
“Mapping Internal Roles to External Roles” X
on page 859
“Assigning Resource Privileges to Roles” on X X X
page 859
Security Providers
“Managing Security Providers” on page 889 X X X
Role Assignment
“Assigning Roles to Users and User Groups” X X
on page 887

Note: This document describes how to configure Siperian Hub’s internal security
framework using the Hub Console. If you are using third-party security providers to
handle any portion of security in your Siperian Hub implementation, refer to your
security provider’s configuration instructions instead.

840 Siperian Hub Administrator Guide


Securing Siperian Hub Resources

Securing Siperian Hub Resources


This section describes how to configure Siperian Hub resources for your Siperian Hub
implementation. The instructions in this section apply to all scenarios described in
“Security Implementation Scenarios” on page 836.

About Siperian Hub Resources


The Hub Console allows you to expose or hide Siperian Hub resources to external
applications.

Types of Siperian Hub Resources

The following types of Siperian Hub resources can be configured as secure resources:

Resource Type Notes


BASE_OBJECT User has access to all secure base objects, columns, and content
metadata. For details, see “Configuring Base Objects” on page
92.
CLEANSE_FUNCTION User can execute all secure cleanse functions. For details, see
“Using Cleanse Functions” on page 414.
DEPENDENT_OBJECT User has access to all secure dependent object and their
columns. For details, see “Configuring Dependent Objects” on
page 117.
HM_PROFILE User has access to all secure HM Profiles. For details, see
“Deleting Relationship Types from a Profile” on page 284
MAPPING User has access to all secure mappings and their columns.
For details, see “Mapping Columns Between Landing and
Staging Tables” on page 380.
PACKAGE User has access to all secure packages and their columns.
For details, see “Configuring Packages” on page 196.
REMOTE_PACKAGE User has access to all secure remote packages. Applicable only to
Siperian Hub implementations with an Activity Manager license.

Setting Up Security 841


Securing Siperian Hub Resources

In addition, the Hub Console allows you to protect other resources that are accessible
by SIF requests, including content metadata, match rule sets, metadata, batch groups,
validate metadata, the audit table, and the users table.

Secure and Private Resources

A protected Siperian Hub resource can be configured as either secure or private.

Status Setting Description


SECURE Exposes this Siperian Hub resource to the Roles tool, allowing the resource
to be added to roles with specific privileges. When a user account is
assigned to a specific role, then that user account is authorized to access the
secure resources using SIF requests according to the privileges associated
with that role.
PRIVATE Hides this Siperian Hub resource from the Roles tool. Default. Prevents its
access using SIF requests. When you add a new resource in Hub Console
(such as a new base object), it is designated a PRIVATE resource by default.

In order for external applications to access a Siperian Hub resource using SIF requests,
that resource must be configured as SECURE. Because all Siperian Hub resources are
PRIVATE by default, you must explicitly make a resource SECURE after the resource
has been added.

There are certain Siperian Hub resources that you might not want to expose to external
applications. For example, your Siperian Hub implementation might have mappings or
packages that are used only in batch jobs (not in SIF requests), so these could remain
private.

Note: Package columns are not considered to be secure resources. They inherit the
secure status and privileges from the parent base object (or dependent object) columns.
If package columns are based on system table columns (that is, C_REPOS_AUDIT),
or columns of tables that are not based on the base object / dependent object (that is,
landing tables), there is no need to set up security for them, since they are accessible by
default.

842 Siperian Hub Administrator Guide


Securing Siperian Hub Resources

Privileges

With Siperian Hub internal authorization, each role is assigned one of the following
privileges.

Privilege Allows the User To....


READ View but not change data.
CREATE Create data records in the Hub Store.
UPDATE Update data records in the Hub Store.
MERGE Merge and unmerge data.
EXECUTE Execute cleanse functions (see “Using Cleanse Functions” on page 414) and
batch groups (see “Running Batch Jobs Using the Batch Group Tool” on page
688).

Privileges determine the access that external application users have to Siperian Hub
resources. For example, a role might be configured to have READ, CREATE,
UPDATE, and MERGE privileges on particular packages.

Note: Each privilege is distinct and must be explicitly assigned. Privileges do not
aggregate other privileges. For example, having UPDATE access to a resource does
automatically give you READ access to it as well—both privileges must be individually
assigned.

These privileges are not enforced when using the Hub Console, although the settings
still affect the use of Hub Console to some degree. For example, the only packages that
data stewards can see in the Merge Manager and Data Manager tools are those
packages to which they have READ privileges. In order for data stewards to edit and
save changes to data in a particular package, they must have UPDATE and CREATE
privileges to that package (and associated columns). If they do not have UPDATE or
CREATE privileges, then any attempts to change the data in the Data Manager will
fail. Similarly, a data steward must have MERGE privileges to merge or unmerge
records using the Merge Manager. To learn more about the Merge Manager and Data
Manager tools, see the Siperian Hub Data Steward Guide.

Setting Up Security 843


Securing Siperian Hub Resources

Resource Groups

A resource group is a logical collection of secure resources. Using the Secure Resources
tool, you can define resource groups, and then assign related resources to them.
Resource groups simplify privilege assignment, allowing you to assign privileges to
multiple resources at once and easily assigning resource groups to a role.

Resource Group Hierarchies

A resource group can also contain other resource groups—except itself or any resource
group to which it belongs—allowing you to build a hierarchy of resource groups and to
simplify the management of a large collection of resources.

SECURE Resources Only

Only SECURE resources can belong to resource groups—PRIVATE resources


cannot. If you change the status of a resource to PRIVATE, then the resource is
automatically removed from any resource groups to which it belongs. When the status
of a resource is set to SECURE, the resource is added automatically to the appropriate
resource group (ALL_* resource groups by object type, which are visible in the Roles
tool).

Guidelines for Defining Resource Groups

To simplify administration, consider the implications of creating the following kinds of


resource groups:
• Define an ALL_RESOURCES resource group that contains all secure resources,
which allows you to set minimal privileges globally.
• Define resource groups by resource type (such as PACKAGES_READ) so that
you can easily set minimal privileges to those kinds of resources.
• Define resource groups by functional area (such as TEST_ONLY or
TRAINING_RESOURCES).
• Define a catch-all resource group that can be assigned to many different roles that
have similar privileges.

844 Siperian Hub Administrator Guide


Securing Siperian Hub Resources

About the Secure Resources Tool


You use the Secure Resources tool in the Hub Console to manage security for Siperian
Hub resources in a highly granular manner, including setting the status (secure or
private) of any Siperian Hub resource, and configuring a hierarchy of resources using
resource groups. The Secure Resources tool allows you to expose resources to, or hide
resources from, the Roles tool and SIF requests. To use the tool, you must be
connected to an ORS.

Starting the Secure Resources Tool


To start the Secure Resources tool:
• In the Hub Console, expand the Security Access Manager workbench, and then
click Secure Resources.

The Hub Console displays the Secure Resources tool, as shown in the following
example.

Resource Status
Resources (Global Resources) (SECURE or PRIVATE)

Setting Up Security 845


Securing Siperian Hub Resources

The Secure Resources tool contains the following tabs:

Column Description
Resources Used to set the status of individual Siperian Hub resources (SECURE
or PRIVATE). Siperian Hub resources organized in a hierarchy that
shows the relationships among resources. Global resources appear at
the top of the hierarchy. For details, see “Configuring Resources” on
page 846.
Resource Groups Used to configure resource groups. For details, see “Configuring
Resource Groups” on page 849.

Configuring Resources
Use the Resources tab in the Secure Resources tool to browse and configure Siperian
Hub resources.

Navigating the Resources Tree

Resources are organized hierarchically in the navigation tree by resource type, as shown
in the following example.

846 Siperian Hub Administrator Guide


Securing Siperian Hub Resources

To expand the hierarchy:


• Click the plus (+) sign next to a resource type or resource.
OR
• Click to expand the entire tree (if you have acquired a write lock).

To hide resources beneath a resource type:


• Click the minus (-) sign next to its resource type.
OR
• Click to collapse the entire tree (if you have acquired a write lock).

Setting the Status of a Siperian Hub Resource

You can configure the resource status (SECURE or PRIVATE) for any concrete
Siperian Hub resource.

Note: This status setting does not apply to resource groups (which contain only
SECURE resources) or to global resources (for example, BASE_OBJECT.*)—only to
the resources that they contain.

To set the status of a one or more Siperian Hub resources:


1. Start the Secure Resources tool. To learn more, see “Starting the Secure Resources
Tool” on page 845.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. On the Resources tab, navigate the Resources tree to find the resource(s) that you
want to configure.
4. Do one of the following:
• Double-click the resource name to toggle between SECURE or PRIVATE.
OR
• Select one or more resource name(s) (hold down the CTRL key to select
multiple resources at a time) and:
• Click to make all selected resources secure.

Setting Up Security 847


Securing Siperian Hub Resources

OR
• Click to make all selected resources private.

5. Click the Save button to save your changes.

Filtering Resources

To simplify changing the status of a collection of Siperian Hub resources, especially for
an implementation with a large and complex schema, you can specify a filter that
displays only the resources that you want to change.

To filter Siperian Hub resources:


1. Start the Secure Resources tool. To learn more, see “Starting the Secure Resources
Tool” on page 845.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Filter Resources button.
The Secure Resources tool displays the Resources Filter dialog.

Select All Resource


Types

Clear All Selected


Resource Types

4. Do the following:
• Check (select) the resource type(s) that you want to include in the filter.
• Uncheck (clear) the resource type(s) that you want to exclude in the filter.

848 Siperian Hub Administrator Guide


Securing Siperian Hub Resources

5. Click OK.
The Secure Resources tool displays the filtered Resources tree.

Configuring Resource Groups


As described in “Resource Groups” on page 844, you can use the Secure Resources
tool to define resources groups and create a hierarchy of resources. You can then use
the Roles tool to assign privileges to multiple resources in a single operation.

Direct and Indirect Membership

The Secure Resources tool differentiates visually between resources that belong directly
to the current resource group (explicitly added) and resources that belong indirectly
because they are members of a resource group that belongs to this resource group
(implicitly added). For example, suppose you have two resource groups:
• Resource Group A contains the Consumer base object, which means that the
Consumer base object is a direct member of Resource Group A
• Resource Group B contains the Address base object
• Resource Group A contains Resource Group B, which means that the Address
base object is an indirect member of Resource Group A

While editing Resource Group A, the Address base object is slightly grayed, as shown
in the following example.

Indirect Membership
Direct Membership

In this example, you cannot change the check box for the Address base object when
you are editing Resource Group A. You can change the check box only when editing
Resource Group B.

Setting Up Security 849


Securing Siperian Hub Resources

Adding Resource Groups

To add a resource group:


1. Start the Secure Resources tool. To learn more, see “Starting the Secure Resources
Tool” on page 845.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Resource Groups tab.
The Secure Resources tool displays the Resource Group tab.

4. Click the Add button.

850 Siperian Hub Administrator Guide


Securing Siperian Hub Resources

The Secure Resources displays the Add Resources to Resource Group dialog.

Select All Resources


Clear All Selected
Resources

5. Enter a unique, descriptive name for the resource group.


6. Click the plus (+) sign to expand the resource hierarchy as needed.
Each resource has a check box indicating membership in the resource group. If a
parent in the tree is selected, all its children are automatically selected as well. For
example, if the Base Objects item in the tree is selected, then all base objects and
their child resources are selected.
7. Check (select) the resource(s) that you want to assign to this resource group.
8. Click OK.
The Secure Resources tool adds the new resource to the Resource Groups node.

Setting Up Security 851


Securing Siperian Hub Resources

Editing Resource Groups

To edit a resource group:


1. Start the Secure Resources tool. To learn more, see “Starting the Secure Resources
Tool” on page 845.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Resource Groups tab.
4. Select the resource group whose properties you want to edit.
5. Click the Edit button.
The Secure Resources tool displays the Assign Resources to Resource Group
dialog.

Select All Resources


Clear All Selected
Resources

6. Edit the resource group name, if you want.


7. Click the plus (+) sign to expand the resource hierarchy as needed.
8. Check (select) the Show Only Resources Selected for this Resource Group
check box, if you want.

852 Siperian Hub Administrator Guide


Securing Siperian Hub Resources

9. Check (select) the resources that you want to assign to this resource group.
10. Uncheck (clear) the resources that you want to remove this resource group.
11. Click OK.

Deleting Resource Groups

To delete a resource group:


1. Start the Secure Resources tool. To learn more, see “Starting the Secure Resources
Tool” on page 845.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Resource Groups tab.
4. Select the resource group that you want to remove.
5. Click the Remove button.
The Secure Resources tool prompts you to confirm deletion.
6. Click Yes.
The Secure Resources tool removes the deleted resource from the Resource
Groups node.

Refreshing the Resources List


If a resource such as a package or mapping has been recently added, be sure to refresh
the resources list to ensure that you can make it secure.

To refresh the Resources list:


• From the Secure Resources menu, choose Refresh.

The Secure Resources tool updates the Resources list.

Setting Up Security 853


Configuring Roles

Refreshing Other Security Changes


You can also change the refresh interval for all other security changes.

To set the refresh rate for security changes:


• Set the following parameter in the cmxserver.properities file:
cmx.server.sam.cache.resources.refresh_interval

Note: The default refresh interval is 5 minutes if not set.

Configuring Roles
This section describes how to configure roles for your Siperian Hub implementation.

Note: If you are using a Comprehensive Centralized PDP security deployment (see
“Comprehensive Centralized PDP” on page 838), in which users are authorized
externally, if your external authorization provider does not require you to define roles
in Siperian Hub, then you can skip this section.

About Roles
In Siperian Hub, a role represents a set of privileges to access secure Siperian Hub
resources. In order for a user to view or manipulate a secure Siperian Hub resource,
that user must be assigned a role that grants them sufficient privileges to access the
resource. Roles determine what a user is authorized to access and do in Siperian Hub.
To learn more, see “Authorization” on page 833 and “Privileges” on page 843.

Siperian Hub roles are highly granular and flexible, allowing administrators to
implement complex security safeguards according to your organization’s unique
security policies, procedures, and requirements. Some users might be assigned to a
single role with access to everything (such as an administrator) or with
explicitly-restricted privileges (such as a data steward), while others might be assigned
to multiple roles of varying privileges.

854 Siperian Hub Administrator Guide


Configuring Roles

A role can also have other roles assigned to it, thereby inheriting the access privileges
configured for those roles. Privileges are additive, meaning that, when roles are
combined, their privileges are combined as well. For example, suppose Role A has
READ privileges to an Address base object, and Role B has CREATE and UPDATE
privileges to it. If a user account is assigned Role A and Role B, then that user account
will have READ, CREATE, and UPDATE privileges to the Address base object.
A user account inherits the privileges configured for any role to which the user account
is assigned.

Resource privileges vary depending on the scope of access that is required for users to
do their jobs—ranging from broad and deep access (for example, super-user
administrators) to very narrow, focused access (for example, READ privileges on one
base object). It is generally recommended that you follow the principle of least
privilege—users should be assigned the least set of privileges needed to do their work.

Because Siperian Hub provides you with the ability to vary resource privileges per role,
and because resource privileges are additive, you can define roles in a highly-granular
manner for your Siperian Hub implementation. For example, you could define separate
roles to provide different access levels to human resources data (such as
HRAppReadOnly, HRAppCreateOnly, and HRAppUpdateOnly), and then combine
them into another aggregate role (such as HRAppAll). You would then assign to
various users just the role(s) that are appropriate for their job function.

Starting the Roles Tool


You use the Roles tool in the Security Access Manager workbench to configure roles
and assign access privileges to Siperian Hub resources.

To start the Roles tool:


• In the Hub Console, expand the Security Access Manager workbench, and then
click Roles.
The Hub Console displays the Roles tool, as shown in the following example.

Setting Up Security 855


Configuring Roles

Navigation Pane Properties Pane

The Roles tool contains the following tabs:

Column Description
Resource Privileges Used to assign resource privileges to roles. For details, see “Assigning
Resource Privileges to Roles” on page 859.
Roles Used to assign roles to other roles. For details, see “Assigning Roles
to Other Roles” on page 862.
Report Used to generate a distilled report of resource privileges granted to a
given role. For details, see “Generating a Report of Resource
Privileges for Roles” on page 863.

856 Siperian Hub Administrator Guide


Configuring Roles

Adding Roles
To add a new role:
1. Start the Roles tool. To learn more, see “Starting the Roles Tool” on page 855.

2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Point anywhere in the navigation pane, right-click the mouse, and choose Add
Role.
The Roles tool displays the Add Role dialog.

4. Specify the following information.

Field Description
Name Name of this role. Enter a unique, descriptive name.
Description Optional description of this role.
External Name External name (alias) of this role. To learn more, see “Mapping
Internal Roles to External Roles” on page 859.

5. Click OK.
The Roles tool adds the new role to the roles list.

Setting Up Security 857


Configuring Roles

Editing Roles
To edit an existing role:
1. Start the Roles tool. To learn more, see “Starting the Roles Tool” on page 855.

2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Scroll the roles list and select the role that you want to edit.
4. For each property that you want to edit, click the Edit button next to it, and
specify the new value.
5. Click the Save button to save your changes.

Editing Resource Privileges

You can also assign and edit resource privileges for roles. To learn more, see “Assigning
Resource Privileges to Roles” on page 859.

Inheriting Privileges

You can also edit the privileges for a specific role to inherit privileges from other roles;
to learn more see “Assigning Roles to Other Roles” on page 862.

Deleting Roles
To delete an existing role:
1. Start the Roles tool. To learn more, see “Starting the Roles Tool” on page 855.

2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Scroll the roles list and select the role that you want to delete.
4. Point anywhere in the navigation pane, right-click the mouse, and choose Delete
Role.
The Roles tool prompts you to confirm deletion.
5. Click Yes.
The Roles tool removes the deleted role from the roles list.

858 Siperian Hub Administrator Guide


Configuring Roles

Mapping Internal Roles to External Roles


For the Roles-Based Centralized PDP scenario (see “Roles-based Centralized PDP” on
page 837), you need to create a mapping (alias) between the Siperian Hub internal role
and the external role that is managed separately from Siperian Hub. The external role
name used by an organization (for example, APAppsUser) might be very different from
an internal role name (such as VendorReadOnly) that makes sense in the context of a
Siperian Hub environment.

Configuration details depend on the role mapping implementation of the security


provider. Role mapping is done within a configuration (XML) file. It is possible to map
one external role to more than one internal role.

Note: There is no predefined format for a configuration file. It might not be an XML
file or even a file at all. The mapping is a part of the custom user profile or
authentication provider implementation. The purpose of the mapping is to populate a
user profile object roles list with internal role IDs (rowids).

Assigning Resource Privileges to Roles


To assign resource privileges to a role:
1. Start the Roles tool. To learn more, see “Starting the Roles Tool” on page 855.

2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Scroll the roles list and select the role for which you want to assign resource
privileges.
4. Click the Resource Privileges tab.

Setting Up Security 859


Configuring Roles

The Roles tool displays the Resource Privileges tab.

Secure Resources Privileges


This tab contains the following columns:

Field Description
Resources Hierarchy of secure Siperian Hub resources. Displays only those
Siperian Hub resources whose status has been set to SECURE in
the Secure Resources tool. To learn more, see “Setting the Status
of a Siperian Hub Resource” on page 847.
Privileges Privileges to assign to secure resources. To learn more, see
“Privileges” on page 843

860 Siperian Hub Administrator Guide


Configuring Roles

5. Expand the Resources hierarchy to show the secure resources that you want to
configure for this role.

6. For each resource that you want to configure:


• Check (select) any privilege that you want to grant to this role.
• Uncheck (clear) any privilege that you want to remove from this role.

Setting Up Security 861


Configuring Roles

7. Click the Save button to save your changes.

Assigning Roles to Other Roles


A role can also inherit other roles, except itself or any role to which it belongs.
For example, if you assign Role B to Role A, then Role A inherits Role B’s access
privileges. To learn more, see “About Roles” on page 854.

To assign roles to a role:


1. Start the Roles tool. To learn more, see “Starting the Roles Tool” on page 855.

2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Scroll the roles list and select the role to which you want to assign other roles.
4. Click the Roles tab.
The Roles tool displays the Roles tab.

The Roles tool displays any role(s) that can be assigned to the selected role.
5. Check (select) any role that you want to assign to the selected role.
6. Uncheck (clear) any role that you want to remove from this role.
7. Click the Save button to save your changes.

862 Siperian Hub Administrator Guide


Configuring Roles

Generating a Report of Resource Privileges for Roles


You can generate a report that describes only the resource privileges granted to a given
role.

To generate a report of resource privileges for a role:


1. Start the Roles tool. To learn more, see “Starting the Roles Tool” on page 855.

2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Scroll the roles list and select the role for which you want to generate a report.
4. Click the Report tab.
The Roles tool displays the Report tab.

Setting Up Security 863


Configuring Roles

5. Click Generate. The Roles tool generates the report and displays it on the tab.

Clearing the Report Window

To clear the report window:


• Click Clear.

Saving the Generated Report as an HTML File

To clear a generated report as an HTML file:


1. Click Save.

864 Siperian Hub Administrator Guide


Configuring Roles

The Roles tool prompts you to specify the target location for the saved report.

2. Navigate to the target location.


3. Click Save.
The Security Access Manager saves the report using the following naming
convention:
<ORS_Name>-<Role_Name>-RolePrivilegeReport.html
where:
• ORS_Name—Name of the target database.
• Role_Name—Role associated with the generated report.

Setting Up Security 865


Configuring Siperian Hub Users

The Roles tool saves the current report as an HTML file in the target location.
You can subsequently display this report using a browser.

Configuring Siperian Hub Users


This section describes how to configure users for your Siperian Hub implementation.
Whenever Siperian Hub internal authorization is involved in an implementation, users
must be registered in the Master Database.

Before You Begin


Depending on how you have deployed security (see “Security Implementation
Scenarios” on page 836), your Siperian Hub implementation might or might not
require that you add users to the Master Database.

You must configure users in the Master Database if:


• you are using Siperian Hub’s internal authorization (see “Internal-only PDP” on
page 836)

866 Siperian Hub Administrator Guide


Configuring Siperian Hub Users

• you are using Siperian Hub’s external authorization (see “External User Directory”
on page 837)
• multiple users will run the Hub Console using different accounts (for example,
administrators and data stewards).

About Configuring Siperian Hub Users


This section provides an overview of configuring Siperian Hub users. In Siperian Hub,
a user is an individual who can access Siperian Hub resources. For an introduction to
Siperian Hub users, see the Siperian Hub Overview.

How Users Access Siperian Hub Resources

Users can access Siperian Hub resources in the following ways:

Access using Description


Hub Console Users who interact with Siperian Hub by logging into the Hub
Console and using the tool(s) to which they have access, such as
administrators and data stewards.
Third-Party Users (called external application users) who interact with Siperian Hub
Applications data indirectly using third-party applications that use SIF classes.
These users never log into Hub Console. They log into Siperian Hub
using the applications that they use to invoke SIF classes. To learn
more about the kinds of SIF requests that developers can invoke, see
the Siperian Services Integration Framework Guide.

User Accounts

Users are represented in Siperian Hub by user accounts, which are defined in the master
database in the Hub Store. You use the Users tool in the Configuration workbench to
define and configure user accounts for Siperian Hub users, as well as to change
passwords and enable external authentication. External applications with sufficient
authorization can also register user accounts using SIF requests, as described in the
Siperian Services Integration Framework Guide. A user needs to be defined only once, even if
the same user will access more than one ORS associated with the Master Database.

Setting Up Security 867


Configuring Siperian Hub Users

A user account gains access to Siperian Hub resources using the role(s) assigned to it,
inheriting the privileges configured for each role, as described in “About Roles” on
page 854.

Siperian Hub allows for multiple concurrent SIF requests from the same user account.
For an external application in which granular auditing and user tracking is not required,
multiple users can use the same user account when submitting SIF requests.

Starting the Users Tool


To start the Users tool:
1. In the Hub Console, connect to the master database, if you have not already done
so.
2. Expand the Configuration workbench and click Users.
The Hub Console displays the Users tool, as shown in the following example:

The Users tool contains the following tabs:

Tab Description
User Displays a list of all users that have been defined, except the
default admin user (which is created when Siperian Hub is
installed). To learn more, see “Configuring Users” on page 869.
Target Database Assign users to target databases. To learn more, see “Configuring
User Access to ORS Databases” on page 875.
Global Password Policy Specify global password policies. To learn more, see “Managing
the Global Password Policy” on page 877.

868 Siperian Hub Administrator Guide


Configuring Siperian Hub Users

Configuring Users
This section describes how to configure users in the Users tool. It refers to
functionality that is available on the Users tab of the Users tool.

Adding User Accounts

To add a user account:


1. Start the Users tool. To learn more, see “Starting the Users Tool” on page 868.

2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Users tab.
4. Click the button. The Users tool displays the Add User dialog.

5. Specify the following settings for this user.

Setting Up Security 869


Configuring Siperian Hub Users

Property Description
First name First name for this user.
Middle name Middle name for this user.
Last name Last name for this user.
User name Name of the user account for this user. Name that this user will
enter to log into the Hub Console.
Default database Default database for this user. This is the database that is
automatically selected when the user logs into Hub Console, as
described in “Starting the Hub Console” on page 19. If you want
to change this database later, see “Configuring User Access to
ORS Databases” on page 875.
Password Password for this user. If you want to change this password later,
see “Changing Password Settings for User Accounts” on page 874.
Verify password Type the password again to verify.
Use external One of the following settings:
authentication?
• Check (select) this option to use external authentication using
a third-party security provider instead of Siperian Hub’s
default authentication. To learn more, see “Managing Security
Providers” on page 889.
• Uncheck (clear) this option to use the default Siperian Hub
authentication.

6. Click OK.
The Users tool adds the new user to the list of users on the Users tab.

Editing User Accounts

For each user, you can update their name, their default login database, and specify
other settings—such as whether Siperian Hub retains a log of user logins/logouts,
whether they can log into Siperian Hub, and whether they have administrator-level
privileges.

To edit user account settings:


1. Start the Users tool. To learn more, see “Starting the Users Tool” on page 868.

870 Siperian Hub Administrator Guide


Configuring Siperian Hub Users

2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Users tab.
4. Select the user account that you want to configure.

5. To change a name, double-click the cell and type a different name.


6. Select a different login database and server, if you want.
7. Change any of the following settings, if you want.

Property Description
Administrator One of the following settings:
• Check (select) this option to give this user administrative
access, which allows them to have access to all Hub Console
tools and all databases.
• Uncheck (clear) this option if you do not want to grant
administrative access to this user. This is the default.
Enable One of the following settings:
• Check (select) this option to activate this user account and
allow this user to log in.
• Uncheck (clear) this option to disable this user account and
prevent this user from logging in.

8. Click the Save button to save your changes.

Setting Up Security 871


Configuring Siperian Hub Users

Using External Authentication

When adding or editing a user account that will be authenticated externally, you need to
check (select) the Use External Authentication check box. If unchecked (cleared),
then Siperian Hub’s default authentication will be used for this user account instead. To
learn more, see “Managing Security Providers” on page 889.

Editing Supplemental User Information

In Siperian Hub implementations that are not tied to an external user directory (see
“External User Directory” on page 837), you can use Siperian Hub to manage
supplemental information for each user, such as their e-mail address and phone
numbers. Siperian Hub does not require that you provide this information, nor does
Siperian Hub use this information in any special way.

To edit supplemental user information:


1. Start the Users tool. To learn more, see “Starting the Users Tool” on page 868.

2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Users tab.
4. Select the user whose properties you want to edit.
5. Click the Edit button.

872 Siperian Hub Administrator Guide


Configuring Siperian Hub Users

The Users tool displays the Edit User dialog.

6. Specify any of the following properties:

Property Description
Title User’s title, such as Dr. or Ms. Click the drop-down list and
select a title.
Initials User’s initials.
Suffix User’s suffix, such as MD or Jr.
Job title User’s job title.
Email User’s e-mail address.
Telephone area code Area code for user’s telephone number.
Telephone number User’s telephone number.
Fax area code Area code for user’s fax number.

Setting Up Security 873


Configuring Siperian Hub Users

Property Description
Fax number User’s fax number.
Mobile area code Area code for user’s mobile phone.
Mobile number User’s mobile phone number.
Login message Message that the Hub Console displays after this user logs in.

7. Click OK.
8. Click the Save button to save your changes.

Deleting User Accounts

To remove a user:
1. Start the Users tool. To learn more, see “Starting the Users Tool” on page 868.

2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Users tab.
4. Select the user that you want to remove.
5. Click the button.
In the Users tool prompts you to confirm deletion.
6. Click Yes to confirm deletion.
The Users tool removes the deleted user account from the list of users on the
Users tab.

Changing Password Settings for User Accounts

To change password settings for a user:


1. Start the Users tool. To learn more, see “Starting the Users Tool” on page 868.

2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Users tab.
4. Select the user whose password you want to change.

874 Siperian Hub Administrator Guide


Configuring Siperian Hub Users

5. Click the button.


The Users tool displays the Change Password dialog for the selected user.

6. Specify the new password and in both the Password and Verify password fields, if
you want.
7. Do one of the following:
• Check (select) this option to use external authentication using a third-party
security provider instead of Siperian Hub’s default authentication. To learn
more, see “Managing Security Providers” on page 889.
• Uncheck (clear) this option to use the default Siperian Hub authentication.
8. Click OK.

Configuring User Access to ORS Databases


Once a user account is defined in Siperian Hub, you need to explicitly provide the
account with access to one or more ORS databases.

To configure user access to databases:


1. Start the Users tool. To learn more, see “Starting the Users Tool” on page 868.

2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Target Database tab.

Setting Up Security 875


Configuring Siperian Hub Users

The Users tool displays the Target Database tab.

4. Expand each database node to see which users that can access that database.
5. To change user assignments to a database, right-click on the database name and
choose Assign User.
The Users tool displays the Assign User to Database dialog.

Select All Users


Clear All Selected
Users

6. Check (select) the names of any users that you want to assign to the selected
database.
7. Uncheck (clear) the names of any users that you want to unassign from the
selected database.
8. Click OK.

876 Siperian Hub Administrator Guide


Configuring Siperian Hub Users

Configuring Password Policies


You can define password policies for all users (global password policy) as well as for
individual users (private password policies that override the global password policy).

Managing the Global Password Policy

The global password policy applies to users who do not have private password policies
specified for them (as described in “Specifying Private Password Policies for Individual
Users” on page 879).

To manage the global password policy:


1. Start the Users tool. To learn more, see “Starting the Users Tool” on page 868.

2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Global Password Policy tab.

Setting Up Security 877


Configuring Siperian Hub Users

The Global Password Policy window is displayed.

4. Specify the following password policy settings.

Policy Description
Password Length Minimum and maximum length, in characters.
Password Expiry Do one of the following:
• Check (select) the Password Expires check box and specify the
number of days before the password expires.
• Uncheck (clear) the Password Expires check box so that the
password never expires.
Login Settings Number of grace logins and maximum number of failed logins.

878 Siperian Hub Administrator Guide


Configuring Siperian Hub Users

Policy Description
Password History Number of times that a password can be re-used.
Password Other configuration settings, such as:
Requirements
• enforce case-sensitivity
• enforce password validation
• enforce a minimum number of unique characters
• password patterns

5. Click to save your global settings.

Specifying Private Password Policies for Individual Users

For any given user, you can specify a private password policy that overrides the global
password policy (see “Managing the Global Password Policy” on page 877).

Note: For ease of password policy maintenance, it is recommended that, whenever


possible, password policies be managed at the global policy level rather than at private
policy levels.

To specify the private password policy for a user:


1. Start the Users tool. To learn more, see “Starting the Users Tool” on page 868.

2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Users tab.
4. Select the user for whom you want to set the private password policy.
5. Click the button.

Setting Up Security 879


Configuring Siperian Hub Users

The Users tool displays the Private Password Policy window for the selected user.

6. Check (select) Private password policy enabled.


7. Specify the password policy settings you want for this user, as described in
“Managing the Global Password Policy” on page 877.
8. Click OK.
9. Click the Save button to save your changes.

Configuring Secured JDBC Data Sources


In Siperian Hub implementations, if a JDBC data source has been secured using
application server security, you need to store the application server’s user name and
password for the JDBC data source in the cmxserver.properties file. Passwords
must be encrypted—they cannot be stored as clear text. To learn more about secured
JDBC data sources, see your application server documentation.

To configure user names and passwords for a secured JDBC data source in the
cmxserver.properties file, use the following parameters:

880 Siperian Hub Administrator Guide


Configuring User Groups

databaseId.username=username
databaseId.password=encryptedPassword

where databaseId is the unique ID of the JDBC data source. For example:
localhost-jdbc-ds.username=weblogic
localhost-jdbc-ds.password=9C03B113CD8E4BBFD236C56D5FEA56EB

To generate an encrypted password, use the following commands:


C:\>java -cp siperian-common.jar com.siperian.common.security.Blowfish password
Plaintext Password: password
Encrypted Password: 9C03B113CD8E4BBFD236C56D5FEA56EB

Configuring User Groups


This section describes how to configure user groups in your Siperian Hub
implementation.

About User Groups


A user group is a logical collection of user accounts. User groups simplify security
administration. For example, you can combine external application users into a single
user group, and then grant security privileges to the user group rather than to each
individual user. In addition to users, user groups can contain other user groups.
To learn about users and user accounts, see “Configuring Siperian Hub Users” on page
866.

You use the Groups tab in the Users and Groups tool in the Security Access Manager
workbench to configure users groups and assign user accounts to user groups. To use
the Users and Groups tool, you must be connected to an ORS.

Setting Up Security 881


Configuring User Groups

Starting the Users and Groups Tool


To start the Users and Groups tool:
1. In the Hub Console, connect to an ORS, if you have not already done so.

2. Expand the Security Access Manager workbench and click Users and Groups.
The Hub Console displays the Users and Groups tool, as shown in the following
example.

The Users and Groups tool contains the following tabs:

Tab Description
Groups Used to define user groups and assign users to user groups. To learn
more, see “Configuring User Groups” on page 881.
Users Assigned to Used to associate user accounts with a database. To learn more, see
Database “Assigning Users to the Current ORS Database” on page 886.
Assign Users/Groups Used to associate users and user groups with roles. To learn more,
to Role see “Assigning Users and User Groups to Roles” on page 887.
Assign Roles to User / Used to associate roles with users and user groups. To learn more,
Group see “Assigning Roles to Users and User Groups” on page 888.

882 Siperian Hub Administrator Guide


Configuring User Groups

Adding User Groups


To add a user group:
1. Start the Users and Groups tool. To learn more, see “Starting the Users and
Groups Tool” on page 882.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Groups tab.
4. Click the button.
The Users and Groups tool displays the Add User Group dialog.

5. Enter a descriptive name for the user group.


6. Optionally, enter a description of the user group.
7. Click OK.
The Users and Groups tool adds the new user group to the list.

Editing User Groups


To edit an existing user group:
1. Start the Users and Groups tool. To learn more, see “Starting the Users and
Groups Tool” on page 882.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Groups tab.

Setting Up Security 883


Configuring User Groups

4. Scroll the list of user groups and select the user group that you want to edit.

5. For each property that you want to edit, click the Edit button next to it, and
specify the new value.
6. Click the Save button to save your changes.

Deleting User Groups


To delete a user group:
1. Start the Users and Groups tool. To learn more, see “Starting the Users and
Groups Tool” on page 882.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Groups tab.
4. Scroll the list of user groups and select the user group that you want to delete.
5. Click the button.
The Users and Groups tool prompts you to confirm deletion.
6. Click Yes.
The Users and Groups tool removes the deleted user group from the list.

884 Siperian Hub Administrator Guide


Configuring User Groups

Assigning Users and Users Groups to User Groups


To assign members (users and user groups) to a user group:
1. Start the Users and Groups tool. To learn more, see “Starting the Users and
Groups Tool” on page 882.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Group tab.
4. Scroll the list of user groups and select the user group to which you want to edit.
5. Right-click the user group that you just created and choose Assign Users and
Groups.
The Users and Groups tool displays the Assign to User Group dialog.

Select All
Users/ User Groups

Clear All Selected


Users/ User Groups

6. Check (select) the names of any users and user groups that you want to assign to
the selected user group.
7. Uncheck (clear) the names of any users and user groups that you want to unassign
from the selected user group.
8. Click OK.

Setting Up Security 885


Assigning Users to the Current ORS Database

Assigning Users to the Current ORS Database


This section describes how to assign users to the currently-targeted ORS database.
To assign user access to other ORS databases, see “Configuring User Access to ORS
Databases” on page 875.

To assign users to the current ORS database:


1. Start the Users and Groups tool. To learn more, see “Starting the Users and
Groups Tool” on page 882.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Users Assigned to Database tab.

4. Click to assign users to an ORS database.


The Users and Groups tool displays the Assign User to Database dialog.

Select All Users


Clear All Selected
Users

5. Check (select) the names of any users that you want to assign to the selected ORS
database.
6. Uncheck (clear) the names of any users that you want to unassign from the
selected ORS database.

886 Siperian Hub Administrator Guide


Assigning Roles to Users and User Groups

7. Click OK.

Assigning Roles to Users and User Groups


This section describes how to associate roles with users and user groups. The Users
and Groups tool provides two ways to define the association:
• assigning users and user groups to roles
• assigning roles to users and user groups

You can choose the way that is most expedient for your implementation.

Assigning Users and User Groups to Roles


To assign users and user groups to a role:
1. Start the Users and Groups tool. To learn more, see “Starting the Users and
Groups Tool” on page 882.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Assign Users/Groups to Role tab.

4. Select the role to which you want to assign users and user groups.
5. Click the Edit button.

Setting Up Security 887


Assigning Roles to Users and User Groups

The Users and Groups tool displays the Assign Users to Role dialog.

Select All
Users / User Groups

Clear All Selected


Users / User Groups

6. Check (select) the names of any users and user groups that you want to assign to
the selected role.
7. Uncheck (clear) the names of any users and user groups that you want to unassign
from the selected role.
8. Click OK.

Assigning Roles to Users and User Groups


To assign roles to users and user groups:
1. Start the Users and Groups tool. To learn more, see “Starting the Users and
Groups Tool” on page 882.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. Click the Assign Roles to User/Group tab.

888 Siperian Hub Administrator Guide


Managing Security Providers

4. Select the user or user group to which you want to assign roles.

5. Click the Edit button.


The Users and Groups tool displays the Assign Roles to User dialog.

Select All Roles

Clear All
Selected Roles

6. Check (select) the roles that you want to assign to the selected user or user group.
7. Uncheck (clear) the roles that you want to unassign from the selected user or user
group.
8. Click OK.

Managing Security Providers


This section describes how to manage security providers in your Siperian Hub
implementation.

About Security Providers


A security provider is a third-party organization that provides security services for users
accessing Siperian Hub. Security providers are used in certain Siperian Hub security
deployment scenarios, as described in “Security Implementation Scenarios” on page
836.

Setting Up Security 889


Managing Security Providers

Types of Security Providers

Siperian Hub supports the following types of security providers:

Service Description
Authentication Authenticates a user by validating their identity. Informs Siperian Hub only
that the user is who they claim to be—not whether they have access to any
Siperian Hub resources.
Authorization Informs Siperian Hub whether a user has the required privilege(s) to access
particular Siperian Hub resources.
User Profile Informs Siperian Hub about individual users, such as user-specific
attributes and the roles to which the user belongs.

Internal Providers

Siperian Hub comes with a set of default internal security providers (labeled Internal
Provider in the Security Providers tool). You can also add your own third-party
security providers. Internal security providers cannot be removed.

Starting the Security Providers Tool


You use the Security Providers tool in the Configuration workbench to register and
manage security providers for Siperian Hub. To use the Security Providers tool, you
must be connected to the master database.

To start the Security Providers tool:


• In the Hub Console, expand the Configuration workbench, and then click
Security Providers.

890 Siperian Hub Administrator Guide


Managing Security Providers

The Hub Console displays the Security Providers tool, as shown in the following
example.

In the Security Providers tool, the navigation tree has the following main nodes:

Tab Description
Provider Files Expand to display the provider files that have been uploaded in your
Siperian Hub implementation. For more information, see “Managing
Provider Files” on page 892.
Providers Expand to display the list of providers that are defined in your Siperian
Hub implementation. For more information, see “Managing Security
Provider Settings” on page 896.

Siperian Hub provides a set of default providers:


• Internal providers represent Siperian Hub’s internal implementations for
authentication, authorization, and user profile services.
• Super providers always return a positive response for authentication and
authorization requests. Super providers are useful in development environments
when you do not want to configure users, roles, privileges, and so on. For this
purpose, these should be set first in an adjudication sequence and enabled.
Super providers can also be used in a production environment in which security is
provided as a layer on top of the SIF requests for performance gains.

Setting Up Security 891


Managing Security Providers

Managing Provider Files


If you want to use your own third-party security providers (in addition to Siperian
Hub’s default internal security providers), you must explicitly register using the Security
Providers tool. To register a provider, you upload a provider file that contains the
profile information needed for registration.

About Provider Files

A provider file is a JAR file that contains the following information:


• A manifest that describes one or more external security provider(s). Each security
provider has the following settings:
• Provider Name
• Provider Description
• Provider Type
• Provider Factory Class Name
• Properties for configuring the provider (a list of name-value pairs: property
names with default values)
• One or more JAR files containing the provider implementation and any required
third-party libraries.

Sample Provider File

The Siperian sample installer copies a sample implementation of a provider file into the
SamSample subdirectory under the target samples directory (such as
c:\siperian\oracle\sample\SamSample). To learn more, see the Siperian Hub
Installation Guide for your platform.

Provider Files List

The Security Providers tool displays a list of provider files under the Provider Files
node in the left navigation pane. You use right-click menus in the left navigation pane

892 Siperian Hub Administrator Guide


Managing Security Providers

of the Security Providers tool to upload, delete, and move provider files in the Provider
Files list.

Selecting a Provider File

To select a provider file in the Security Providers tool:


1. Start the Security Providers tool. To learn more, see “Starting the Security
Providers Tool” on page 890.
2. In the left navigation pane, click the provider file that you want to select.
The Security Providers tool displays the Provider File panel for the selected
provider file, as shown in the following example.

The Provider File panel contains no editable fields.

Uploading a Provider File

To upload a provider file to add or update provider information:


1. Start the Security Providers tool. To learn more, see “Starting the Security
Providers Tool” on page 890.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.

Setting Up Security 893


Managing Security Providers

3. In the left navigation pane, right-click Provider Files and choose Upload Provider
File.

The Security Provider tool prompts you to select the JAR file for this provider.

4. Specify the JAR file, navigating the file system as needed and selecting the JAR file
that you want to upload.

5. Click Open.

894 Siperian Hub Administrator Guide


Managing Security Providers

The Security Provider tool checks the selected file to determine whether it is a
valid provider file.
If the provider name from the manifest is the same as the name of an existing
provider file, then the Security Provider tool asks you whether to overwrite the
existing provider file. Click Yes to confirm.
The Security Provider tool uploads the JAR file to the application server, adds the
provider file to the list, populates the Providers list with the additional provider
information, and refreshes the left navigation pane.

Added Provider File

Added Authentication Provider

Added Authorization Provider

Added User Profile Provider

Once the file has been uploaded, the original file can be removed from the file
system, if you want. The Security Provider tool has already imported the
information and does not subsequently refer to the original file.

Deleting a Provider File

Note: Internal security providers that are shipped with Siperian Hub cannot be
removed. For internal security providers, there is no separate provider file under the
Provider Files node.

To delete a provider file:


1. Start the Security Providers tool. To learn more, see “Starting the Security
Providers Tool” on page 890.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.

Setting Up Security 895


Managing Security Providers

3. In the left navigation pane, right-click the provider file that you want to delete, and
then choose Delete Provider File.
The Security Provider tool prompts you to confirm deletion.
4. Click Yes.
The Security Provider tool removes the deleted provider file from the list.

Managing Security Provider Settings


The Security Providers tool displays a list of registered providers under the Provider
node in the left navigation pane. This list is sorted by provider type (Authentication,
Authorization, or User Profile provider).

You use right-click menus in the left navigation pane of the Security Providers tool to
move providers up and down in the Providers list.

Sequence of the Providers List

The order of providers in the Provider list represents the order in which they are
invoked. For example, when a user attempts to log in and supplies their user name and
password, Siperian Hub submits their login credentials to each authentication provider
in the Authentication list, proceeding sequentially through the list. If authentication
succeeds with one of the providers in the list, then the user is deemed authenticated.
If authentication fails with all available authentication providers, then authentication for
that user fails. To learn about changing the processing order, see “Moving a Security
Provider Up in the Processing Order” on page 906 and “Moving a Security Provider
Down in the Processing Order” on page 907.

Selecting a Security Provider

To select a provider in the Security Providers tool:


• In the left navigation pane, click the provider that you want to select.

896 Siperian Hub Administrator Guide


Managing Security Providers

The Security Providers tool displays the Provider panel for the selected provider
file, as shown in the following example.

Properties on the Provider Panel

The Provider panel contains the following fields:

Field Description
Name Name of this security provider.
Description Description of this security provider.
Provider Type Type of security provider. One of the following values:
• Authentication
• Authorization
• User Profile
For more information, see “About Security Providers” on page 889.
Provider File Name of the provider file associated with this security provider, or
Internal Provider for internal providers. For more information, see
“Managing Provider Files” on page 892.
Enabled Indicates whether this security provider is enabled (checked) or not
(unchecked). Note that internal providers cannot be disabled.
Properties Additional properties for this security provider, if defined by the security
provider. Each property is a name-value pair. A security provider might
require or allow unique properties that you can specify here. To learn more,
see “Configuring Provider Properties” on page 898.

Setting Up Security 897


Managing Security Providers

Configuring Provider Properties

A provider property is a name-value pair that a security provider might require in order to
access for the service(s) that they provide. You can use the Security Providers tool to
define these properties.

Adding Provider Properties

To add provider properties:


1. Start the Security Providers tool. To learn more, see “Starting the Security
Providers Tool” on page 890.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. In the left navigation pane, select the authentication provider for which you want
to add properties.
4. Click the Add button.
5. The Security Providers tool displays the Add Provider Property dialog.

6. Specify the name of the property.


7. Specify the value to assign to this property.
8. Click OK.

Editing Provider Properties

To edit an existing provider property:


1. Start the Security Providers tool. To learn more, see “Starting the Security
Providers Tool” on page 890.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.

898 Siperian Hub Administrator Guide


Managing Security Providers

3. In the left navigation pane, select the authentication provider for which you want
to edit properties.
4. For each property that you want to edit, click the Edit button next to it, and
specify the new value.
5. Click the Save button to save your changes.

Removing Provider Properties

To remove an existing provider property:


1. Start the Security Providers tool. To learn more, see “Starting the Security
Providers Tool” on page 890.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. In the left navigation pane, select the authentication provider for which you want
to remove properties.
4. Select the property that you want to remove.
5. Click the Delete button.
The Security Providers tool prompts you to confirm deletion.
6. Click Yes.

Custom-added Providers

You can also package custom provider classes in the JAR/ZIP file. Specify the settings
for the custom providers in properties file named providers.properties. You must
place this file within the JAR file in the META-INF directory. These settings (that is, the
name/value pairs) are then read by the loader and translated to what is displayed in the
Hub Console.

Setting Up Security 899


Managing Security Providers

Here are the elements of a provider.properties file:

Element Name Description


ProviderList Comma-separated list of the contained provider
names.
File-Description Description of the package.
Note that the remaining elements listed below
come in groups of five (5) which correspond to
each of the names in ProviderList (so for the
remaining elements listed here, “XXX” represents
one of the names that would be specified in
ProviderList).
XXX-Provider-Name Display name of the provider XXX.
XXX-Provider-Description Description of the provider XXX.
XXX-Provider-Type Type of the provider XXX. The allowed values are
USER_PROFILE_PROVIDER, JAAS_LOGIN_
MODULE, AUTHORIZATION_PROVIDER.
XXX-Provider-Factory-Class-Name Implementation class of the provider (contained in
the same JAR/ZIP file).
XXX-Provider-Properties Comma-separated list of name/value pairs
defining provider properties (name1=value1,…).

Note: The provider archive file (JAR/ZIP) must contain all the classes required for the
custom provider to be functional, as well as all of the required resources. These
resources are specific to your implementation.

900 Siperian Hub Administrator Guide


Managing Security Providers

Example providers.properties File

Note: All of these settings are required except for XXX-Provider-Properties.


ProviderList=ProviderOne,ProviderTwo,ProviderThree,ProviderFour
ProviderOne-Provider-Name: Sample Role Based User Profile Provider
ProviderOne-Provider-Description: Sample User Profile Provider for roled-based
management
ProviderOne-Provider-Type: USER_PROFILE_PROVIDER
ProviderOne-Provider-Factory-Class-Name:
com.siperian.sam.sample.userprofile.SampleRoleBasedUserProfileProviderFactory
ProviderOne-Provider-Properties: name1=value1,name2=value2
ProviderTwo-Provider-Name: Sample Login Module
ProviderTwo-Provider-Description: Sample Login Module
ProviderTwo-Provider-Type: JAAS_LOGIN_MODULE
ProviderTwo-Provider-Factory-Class-Name:
com.siperian.sam.sample.authn.SampleLoginModule
ProviderTwo-Provider-Properties:
ProviderThree-Provider-Name: Sample Role Based Authorization Provider
ProviderThree-Provider-Description: Sample Role Based Authorization Provider
ProviderThree-Provider-Type: AUTHORIZATION_PROVIDER
ProviderThree-Provider-Factory-Class-Name:
com.siperian.sam.sample.authz.SampleAuthorizationProviderFactory
ProviderThree-Provider-Properties:
ProviderFour-Provider-Name: Sample Comprehensive User Profile Provider
ProviderFour-Provider-Description: Sample Comprehensive User Profile Provider
ProviderFour-Provider-Type: USER_PROFILE_PROVIDER
ProviderFour-Provider-Factory-Class-Name:
com.siperian.sam.sample.userprofile.SampleComprehensiveUserProfileProviderFactory
ProviderFour-Provider-Properties:
File-Description=The sample provider files

Setting Up Security 901


Managing Security Providers

Adding a Login Module

Siperian Hub supports the use of external authentication for users through the Java
Authentication and Authorization Service (JAAS). Siperian Hub provides templates for
the following types of authentication standards:
• Lightweight Directory Access Protocol (LDAP)
• Microsoft Active Directory
• Network authentication using the Kerberos protocol

These templates provide the settings (protocols, server names, ports, and so on) that
are required for these authentication standards. You can use these templates to add a
new login module and provide the settings you need. To learn more about these
authentication standards, see the applicable vendor documentation.

To add a login module:


1. Start the Security Providers tool. To learn more, see “Starting the Security
Providers Tool” on page 890.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. In the left navigation pane, right-click Authentication Providers (Login Modules)
and choose Add Login Module.
The Security Providers tool displays the Add Login Module dialog box.

902 Siperian Hub Administrator Guide


Managing Security Providers

4. Click the down arrow and select a template for the login module.

Template Name Description


OpenLDAP-template Based on LDAP authentication properties.
MicrosoftActiveDirectory-template Based on Active Directory authentication
properties.
Kerberos-template Based on Kerberos authentication properties.

5. Click OK.
The Security Providers tool adds the new login module to the list.

Setting Up Security 903


Managing Security Providers

6. In the Properties panel, click the Edit button next to any property that you
want to edit, such as its name and description, and change the setting.
For LDAP, you can specify the following settings.

Property Description
java.naming.factory.initial Required. Java class name of the JNDI implementation for
connecting to an LDAP server. Use the following value:
com.sun.jndi.ldap.LdapCtxFactory.
java.naming.provider.url Required. URL of the LDAP server. For example:
ldap://localhost:389/
username.prefix Optional. Tells Siperian Hub how to parse the LDAP
username. An OpenLDAP user name looks like this:
cn=myopenldapuser,dc=siperian,dc=com
where
• myopenldapuser is the user name
• siperian is the domain name
• com is the top-level domain
In this example, the username.prefix is: cn=
username.postfix Optional. User in conjunction with username.prefix. Using
the previous example, set username.postfix to:
,dc=siperian,dc=com
Note the comma in the beginning of the string.

For Microsoft Active directory, you can specify the following settings:

Property Description
java.naming.factory.initial Required. Java class name of the JNDI implementation for
connecting to an LDAP server. Use the following value:
com.sun.jndi.ldap.LdapCtxFactory.
java.naming.provider.url Required. URL of the LDAP server. For example:
ldap://localhost:389/

For Kerberos authentication:

904 Siperian Hub Administrator Guide


Managing Security Providers

• To set up Kerberos authentication for a user on JBoss and WebLogic using


Sun’s JVM, use Sun’s LoginModule
(com.sun.security.auth.module.Krb5LoginModule). To learn more, see the
Kerberos documentation at http://java.sun.com.
• To set up Kerberos authentication for a user on WebSphere using IBM’s JVM,
you can use IBM’s LoginModule
(com.ibm.security.auth.module.Krb5LoginModule). To learn more, see the
Kerberos documentation on http://www.ibm.com.
• To use either of these Kerberos implementations, you must configure the JVM
of the Siperian Hub application server with winnt\krb5.ini or
JAVA_HOME\jre\lib\security\krb5.conf.

7. Click the Save button to save your changes.

Deleting a Login Module

To add a delete login module:


1. Start the Security Providers tool. To learn more, see “Starting the Security
Providers Tool” on page 890.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. In the left navigation pane, right-click a login module under Authentication
Providers (Login Modules) and choose Delete Login Module.
The Security Provider tool prompts you to confirm deletion.
4. Click Yes.
The Security Provider tool removes the deleted login module from the list and
refreshes the left navigation pane.

Changing Security Provider Settings

To change the settings for a security provider:


1. Start the Security Providers tool. To learn more, see “Starting the Security
Providers Tool” on page 890.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.

Setting Up Security 905


Managing Security Providers

3. Select the security provider whose properties you want to change, as described in
“Selecting a Security Provider” on page 896.
4. In the Properties panel, click the Edit button next to any property that you
want to edit.
5. Click the Save button to save your changes.

Enabling and Disabling Security Providers


1. Acquire a write lock, if you have not already done so.
2. Select the security provider that you want to enable or disable, as described in
“Selecting a Security Provider” on page 896.
3. Do one of the following:
• Check the Enabled check box to enable a disabled security provider.
• Uncheck the Enabled check box to disable a security provider.
Once disabled, the provider name appears greyed out and at the end of the
Providers list. Disabled providers cannot be moved.
4. Click the Save button to save your changes.

Moving a Security Provider Up in the Processing Order

As described in “Sequence of the Providers List” on page 896, Siperian Hub processes
security providers in the order in which they appear in the Providers list.

To move a security provider up the list:


1. Start the Security Providers tool. To learn more, see “Starting the Security
Providers Tool” on page 890.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. In the left navigation pane, select the provider (not the first one in the list, nor any
disabled providers) that you want to move up.
4. In the left navigation pane, right-click and choose Move Provider Up.
The Security Provider tool moves the provider ahead of the previous one in the
Providers list, and then refreshes the left navigation pane.

906 Siperian Hub Administrator Guide


Managing Security Providers

Moving a Security Provider Down in the Processing Order

As described in “Sequence of the Providers List” on page 896, Siperian Hub processes
security providers in the order in which they appear in the Providers list.

To move a provider down the list:


1. Start the Security Providers tool. To learn more, see “Starting the Security
Providers Tool” on page 890.
2. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.
3. In the left navigation pane, click the provider (not the last one in the list, nor any
disabled providers) that you want to move down.
4. In the left navigation pane, right-click and choose Move Provider Down.
The Security Provider tool moves the provider after the subsequent one in the
Providers list and refreshes the left navigation pane.

Setting Up Security 907


Managing Security Providers

908 Siperian Hub Administrator Guide


21
Viewing Registered Custom Code

This chapter describes how to use the User Object Registry tool to view registered
custom code.

Chapter Contents
• About User Objects
• About User Objects
• Starting the User Object Registry Tool
• Viewing User Exits
• Viewing Custom Stored Procedures
• Viewing Custom Java Cleanse Functions
• Viewing Custom Button Functions

909
About User Objects

About User Objects


User objects are user-defined functions or procedures that are registered with the Siperian
Hub to extend its functionality. There are four types of user objects:

User Object Description


User Exits A user-customized, unencrypted stored procedure that
includes a set of fixed, pre-defined parameters. The procedure
is configured, on a per-base object basis, to execute at a
specific point during a Siperian batch process run. For more
information, see “Viewing User Exits” on page 912.
Custom Stored Procedures Stored procedures that are registered in table C_REPOS_
TABLE_OBJECT and can be invoked from Batch Manager.
For more information, see “Viewing Custom Stored
Procedures” on page 913.
Custom Java Cleanse Java cleanse functions that supplement the standard cleanse
Functions libraries with customer logic. These functions are basically Jar
files and stored as BLOBs in the database. For more
information, see “Viewing Custom Java Cleanse Functions”
on page 915.
Custom Button Functions Custom UI functions that supply additional icons and logic in
Data Manager, Merge Manager and Hierarchy Manager. For
more information, see “Viewing Custom Button Functions”
on page 916.

About the User Object Registry Tool


The User Object Registry Tool is a read-only tool that keeps track of user objects that
have been developed for use in the Siperian Hub.

Note: To view custom user code in the User Object Registry tool, you must have
registered the following types of objects:
• Custom Stored Procedures; for more information regarding stored procedures, see
“Developing Custom Stored Procedures for Batch Jobs” on page 806
• Custom Java Cleanse Functions; for more information regarding Java cleanse
functions, see “Using Cleanse Functions” on page 414

910 Siperian Hub Administrator Guide


Starting the User Object Registry Tool

• Custom Button Functions; for more information regarding custom buttons, see
“About Custom Buttons in the Hub Console” on page 978

Note: You do not have to pre-configure user exit procedures to view them in the User
Object Registry tool.

Starting the User Object Registry Tool


To start the User Object Registry tool:
1. In the Hub Console, connect to an Operational Record Store (ORS), according to
the instructions in “Changing the Target Database” on page 31.
2. Expand the Siperian Utilities workbench and then click User Object Registry.
The Hub Console displays the User Object Registry tool, as shown in the
following example.

Registered User Objects User Object


Properties

Viewing Registered Custom Code 911


Viewing User Exits

The User Object Registry tool displays the following areas:

Column Description
Registered User Object Types Hierarchical tree of user objects registered in the selected
ORS, organized by the following categories:
• User Exits
• Custom Stored Procedures
• Custom Java Cleanse Functions
• Custom Button Functions
User Object Properties Properties for the selected user object.

Viewing User Exits


This section describes how to view user exits in the User Object Registry tool.

About User Exits


A user exit is an unencrypted stored procedure that includes a set of fixed, pre-defined
parameters. The procedure is configured, on a per-base object basis, to execute at a
specific point during a Siperian batch process run. User exits are triggered by the
Siperian Hub back-end processes that provide a mechanism to integrate custom
operations with Hub Server processes such as POST_LOAD, POST_MERGE,
POST_MATCH, and so on. For more information, see “About User Exits” on page
956.

Note: The User Object Registry tool displays the types of pre-existing user exits.

912 Siperian Hub Administrator Guide


Viewing Custom Stored Procedures

Viewing User Exits


To view the Siperian Hub user exits in the User Object Registry tool:
1. Start the User Object Registry tool. For more information, see “Starting the User
Object Registry Tool” on page 911.
2. In the list of user objects, select User Exits.
The User Object Registry tool displays the user exits, as shown in the following
example.

Viewing Custom Stored Procedures


This section describes how to view registered custom stored procedures in the User
Object Registry tool.

About Custom Stored Procedures


In the Hub Console, the Siperian Hub Batch Viewer and Batch Group tools provide
simple mechanisms for executing Siperian Hub batch jobs. To execute and manage jobs
according to a schedule, you need to execute stored procedures that do the work of
batch jobs or batch groups. For more information, see “About Siperian Hub Batch
Jobs” on page 668.

Siperian Hub also allows you to create and run custom stored procedures for batch
jobs. For more information, see “Developing Custom Stored Procedures for Batch
Jobs” on page 806. You can also create and run stored procedures using the SIF API

Viewing Registered Custom Code 913


Viewing Custom Stored Procedures

(using Java, SOAP, or HTTP/XML). For more information, see the Siperian Services
Integration Framework Guide.

How Custom Stored Procedures Are Registered


You must register a custom stored procedure with Siperian Hub in order to make it
available to users in the Batch Viewer and Batch Group tools in the Hub Console. For
more information, see “Registering a Custom Stored Procedure” on page 808.

Viewing Registered Custom Stored Procedures


To view the registered custom stored procedures in the User Object Registry tool:
1. Start the User Object Registry tool. For more information, see “Starting the User
Object Registry Tool” on page 911.
2. In the list of user objects, select Custom Stored Procedures.
The User Object Registry tool displays registered custom stored procedures, as
shown in the following example.

914 Siperian Hub Administrator Guide


Viewing Custom Java Cleanse Functions

Viewing Custom Java Cleanse Functions


This section describes how to view registered custom Java cleanse functions in the
User Object Registry tool.

About Custom Java Cleanse Functions


The User Object Registry exposes the details of custom cleanse functions that have
been added to Java libraries (not user libraries). In Siperian Hub, you can build and
execute cleanse functions that cleanse data. A cleanse function is a function that is applied
to a data value in a record to standardize or verify it. For example, if your data has a
column for salutation, you could use a cleanse function to standardize all instances of
“Doctor” to “Dr.” You can apply cleanse functions successively, or simply assign the
output value to a column in the staging table. For more information, see “About
Cleanse Functions” on page 414 and “Configuring Java Libraries” on page 419.

How Custom Java Cleanse Functions Are Registered


Cleanse functions are configured using the Cleanse Functions tool in the Hub Console.
For more information, see “Configuring Java Libraries” on page 419

Viewing Registered Custom Java Cleanse Functions


To view the registered custom Java cleanse functions in the User Object Registry tool:
1. Start the User Object Registry tool. For more information, see “Starting the User
Object Registry Tool” on page 911.
2. In the list of user objects, select Custom Java Cleanse Functions.

Viewing Registered Custom Code 915


Viewing Custom Button Functions

The User Object Registry tool displays the registered custom Java cleanse
functions, as shown in the following example.

Viewing Custom Button Functions


This section describes how to view registered custom button functions in the User
Object Registry tool.

About Custom Button Functions


In your Siperian Hub implementation, you can provide Hub Console users with
custom buttons that can be used to extend your Siperian Hub implementation. Custom
buttons can give users the ability to invoke a particular external service (such as
retrieving data or computing results), perform a specialized operation (such as
launching a workflow), and other tasks. Custom buttons can be added to any of the
following tools in the Hub Console: Merge Manager, Data Manager, and Hierarchy
Manager. For more information, see “About Custom Buttons in the Hub Console” on
page 978.

Server and client-based custom functions are visible in the User Object Registry. For
more information, see “Server-Based and Client-Based Custom Functions” on page
982.

916 Siperian Hub Administrator Guide


Viewing Custom Button Functions

How Custom Button Functions Are Registered


To add a custom button to the Hub Console in your Siperian Hub implementation,
complete the following tasks:
1. Determine the details of the external service that you want to invoke, such as the
format and parameters for request and response messages.
2. Write and package the business logic that the custom button will execute, as
described in “Writing a Custom Function” on page 981.
3. Deploy the package so that it appears in the applicable tool(s) in the Hub Console,
as described in “Deploying Custom Buttons” on page 986.

Viewing Registered Custom Button Functions


To view the registered custom button functions in the User Object Registry tool:
1. Start the User Object Registry tool. For more information, see “Starting the User
Object Registry Tool” on page 911.
2. Select Custom Button Functions.
The User Object Registry tool displays the registered custom button functions, as
shown in the following example.

Viewing Registered Custom Code 917


Viewing Custom Button Functions

918 Siperian Hub Administrator Guide


22
Auditing Siperian Hub Services and Events

This chapter describes how to set up auditing and debugging in the Hub Console.

Chapter Contents
• About Integration Auditing
• Starting the Audit Manager
• Auditing SIF API Requests
• Auditing Message Queues
• Auditing Errors
• Using the Audit Log

919
About Integration Auditing

About Integration Auditing


Your Siperian Hub implementation has a variety of different log files that track
activities in various components—MRM log, application server log, database server
log, and so on. The auditing covered in this chapter can be described as integration
auditing to track activities associated with the exchange of data between Siperian Hub
and external systems. To learn more about the other types of log files, see the Siperian
Hub Installation Guide for your platform.

Auditing is configured separately for each Operational Record Store (ORS) in your
Siperian Hub implementation.

Auditable Events
Integration with external applications often involves complexity. Multiple applications
interact with each other, exchange data synchronously or asynchronously, use data
transformations back and forth, and engage various business rules to execute business
processes across applications.

To expose the details of application integration to application developers and system


integrators, Siperian Hub provides the ability to create an audit trail whenever:
• an external application interacts with Siperian Hub by invoking a Services
Integration Framework (SIF) request. To learn more, see the Siperian Services
Integration Framework Guide.
• Siperian Hub sends a message (using JMS) to a message queue for the purpose of
distributing data changes to other systems. To learn more, see Chapter 16,
“Configuring the Publish Process.”

The Siperian Hub audit mechanism is optional and configurable. It tracks invocations
of SIF requests that are audit-enabled, collects data about what occurred when, and
provides some contextual information as to why certain actions were fired. It stores
audit information in an audit log table (C_REPOS_AUDIT) that you can subsequently
view using TOAD or another compatible, external data management tool.

Note: Auditing is in effect whether metadata caching is enabled (on) or disabled (off).

920 Siperian Hub Administrator Guide


About Integration Auditing

Audit Manager Tool


Auditing is configured using the Audit Manager tool in the Hub Console. The Audit
Manager allows administrators to select:
• which SIF requests to audit, and on which systems (Admin, defined source
systems, or no system).
• which message queues to audit (assigned to use with message triggers) as outbound
messages are sent to JMS queues

To learn more, see “Starting the Audit Manager” on page 922.

Capturing XML for Requests and Responses


For thorough debugging of specific SIF requests or JMS events, users can optionally
capture the request and response XML in the audit log, which can be especially useful
for write operations. Because auditing at this granular level collects extensive
information with a possible performance trade-off, it is recommended for debugging
purposes but not for ongoing use in a production environment.

Auditing Must Be Explicitly Enabled


By default, the auditing of SIF requests and events is disabled. You must use the Audit
Manager tool to explicitly enable auditing for each SIF request and event that you want
to audit.

Auditing Occurs After Authentication


Any SIF request invocation can be audited once the user credentials associated with the
invocation have been authenticated by the Hub Server. Therefore, a failed login
attempt is not audited. For example, if a third-party application attempts to invoke a
SIF request but provides invalid login credentials, that information will not be captured
in the C_REPOS_AUDIT table. Auditing begins only after authentication succeeds.

Auditing Siperian Hub Services and Events 921


Starting the Audit Manager

Auditing Occurs for Invocations With Valid, Well-formed


XML
Only SIF request invocations with valid and well-formed XML will be audited. SIF
requests with invalid XML or XML that is not well-formed will not be audited.

Auditing Password Changes


For invocations of the Siperian Hub change password service, the user’s default
database determines whether the SIF request is audited or not.
• If the user’s default database is an Operational Record Store (ORS), then the
Siperian Hub change password service is audited. To learn more, see “Changing
Passwords” on page 72.
• If the user’s default database is the Master Database, then the change password
service invocation is not audited.

Starting the Audit Manager


To start the Audit Manager:
• In the Hub Console, scroll to the Utilities workbench, and then click Audit
Manager.
The Hub Console displays the Audit Manager, as shown in the following example.

Navigation Pane Properties Pane

922 Siperian Hub Administrator Guide


Starting the Audit Manager

The Audit Manager is divided into two panes.

Pane Description
Navigation pane Shows (in a tree view) the following information:
• auditing types for this Siperian Hub implementation (see “Auditable
API Requests and Message Queues” on page 923)
• the systems to audit (see “Systems to Audit” on page 923)
• message queues to audit (see “Auditing Message Queues” on page
928)
Properties pane Shows the properties for the selected auditing type or system.

Auditable API Requests and Message Queues


In the Audit Manager, the navigation pane displays a list of the following types of items
to audit, along with any available systems.

Type Description
API Requests Request invocations made by external applications using the Services
Integration Framework (SIF) Software Development Kit (SDK).
Message Queues Message queues used for message triggers. To learn more, see
Chapter 16, “Configuring the Publish Process”.
Note: Message queues are defined at the CMX_SYSTEM level.
These settings apply only to messages for this Operational Record
Store (ORS).

Systems to Audit
For each type of item to audit, the Audit Manager displays the list of systems that can
be audited, along with the SIF requests that are associated with that system.

System Description
No System Services that are not—or not necessarily—associated with a specific system
(such as merge operations).
Admin Services that are associated with the Admin system.

Auditing Siperian Hub Services and Events 923


Starting the Audit Manager

System Description
Defined Source Services that are associated with predefined source systems. To learn more,
Systems see “About the Databases Tool” on page 60.

Note: The same API request or message queue can appear in multiple source systems
if, for example, its use is optional on one of those source systems.

Audit Properties
Note: A write lock is not required to configure auditing.

When you select an item to audit, the Audit Manager displays properties in the
properties pane with the following configurable settings.

Field Description
System Name Name of the selected system. Read-only.
Description Description of the selected system. Read-only.
API Request List of API requests that can be audited.
Message Queue List of message queues that can be audited.
Enable Audit? By default, auditing is not enabled.
• Select (check) to enable auditing for the item.
• Clear (uncheck) to disable auditing for the item.

924 Siperian Hub Administrator Guide


Starting the Audit Manager

Field Description
Include XML? This check box is available only if auditing is enabled for this item. By
default, capturing XML in the log is not included. To learn more, see
“Capturing XML for Requests and Responses” on page 921.
• Check (select) to include XML in the audit log for this item.
• Uncheck (clear) to exclude XML from the audit log for this item.
Note: Passwords are never stored in the audit log. If a password exists in
the XML stream (whether encrypted or not), Siperian Hub replaces the
password with asterisks, as shown in the following example:
...<get>
<username>admin</username>
<password>
<encrypted>false</encrypted>
<password>******</password>
</password>
...
Important: Selecting this option can cause the audit log file to grow very
large rapidly. To learn more, see “Periodically Purging the Audit Log” on
page 935.

For the Enable Audit? and Include XML? check boxes, you can use the following
buttons.

Button Name Description


Select All Check (select) all items in the list.

Clear All Uncheck (clear) all selected items in the list.

Auditing Siperian Hub Services and Events 925


Auditing SIF API Requests

Auditing SIF API Requests


You can audit Services Integration Framework (SIF) requests made by external
applications. Once auditing for a particular SIF API request is enabled, Siperian Hub
captures each SIF request invocation and response in the audit log.

For more information regarding the SIF API requests, see Siperian Services Integration
Framework Guide.

To audit SIF API requests:


1. Start the Audit Manager. To learn more, see “Starting the Audit Manager” on page
922.
2. In the navigation tree, select a system beneath API Requests.
Select No System to configure global auditing settings across all systems.

926 Siperian Hub Administrator Guide


Auditing SIF API Requests

In the edit pane, the Audit Manager displays the configurable API requests for the
selected system. To learn more, see “Audit Properties” on page 924.

3. For each SIF request that you want to audit, select (check) the Enable Audit check
box.
4. If auditing is enabled for a particular API request and you also want to include
XML associated with that API request in the audit log, then select (check) the
Include XML check box.
5. Click the Save button to save your changes.
Note: Your saved settings might not take effect in the Hub Server for up to 60
seconds.

Auditing Siperian Hub Services and Events 927


Auditing Message Queues

Auditing Message Queues


You can configure auditing for message queues for which message triggers have been
assigned. Message queues that do not have configured message triggers are not
available for auditing.

To audit message queues:


1. Start the Audit Manager. To learn more, see “Starting the Audit Manager” on page
922.
2. In the navigation tree, select a system beneath Message Queues.
In the edit pane, the Audit Manager displays the configurable message queues for
the selected system. To learn more, see “Audit Properties” on page 924.

3. For each message queue that you want to audit, select (check) the Enable Audit
check box.
4. If auditing is enabled for a particular message queue and you also want to include
XML associated with that message queue in the audit log, then select (check) the
Include XML check box.
5. Click the Save button to save your changes.
Note: Your saved settings might not take effect in the Hub Server for up to 60
seconds.

928 Siperian Hub Administrator Guide


Auditing Errors

Auditing Errors
You can capture error information for any SIF request invocation that triggers the
error mechanism in the Web service—such as syntax errors, run-time errors, and so
on. You can enable auditing for all errors associated with SIF requests.

Auditing errors is a feature that you enable globally. Even when auditing is not
currently enabled for a particular SIF request, if an error occurs during that SIF request
invocation, then the event is captured in the audit log.

Configuring Global Error Auditing


To audit errors:
1. Start the Audit Manager. To learn more, see “Starting the Audit Manager” on page
922.
2. In the navigation tree, select API Requests to configure auditing for SIF errors.
In the edit pane, the Audit Manager displays the configuration page for errors, as
shown in the following example.

3. Do one of the following:


• Select (check) the Enable Audit check box to audit errors.
• Clear (uncheck) the Enable Audit check box to stop auditing errors.

Auditing Siperian Hub Services and Events 929


Using the Audit Log

4. If you select Enable Audit and you also want to include XML associated with
errors in the audit log, then select (check) the Include XML check box.
Note: If you only select Enable Audit, Siperian Hub provides the associated audit
information in C_REPOS_AUDIT.
If you also select Include XML, Siperian Hub includes an additional column in
C_REPOS_AUDIT named DATA_XML which includes detail log data for audit.
If you select both check boxes, when you run an Insert, Update, or Delete job in
the Data Manager, or run the associated batch job, Siperian Hub includes the audit
data in DATA_XML of C_REPOS_AUDIT.
5. Click the Save button to save your changes.

Using the Audit Log


Once you have configured auditing for SIF request and events, you can use the
populated audit log table (C_REPOS_AUDIT) as needed—for analysis, exception
reporting, debugging, and so on.

About the Audit Log


The C_REPOS_AUDIT table is stored in the Operational Record Store (ORS). If
auditing is enabled for a given SIF request or event, whenever that SIF request is
invoked or that event is triggered on the Siperian Hub, then the audit mechanism
captures the relevant information and stores it in the C_REPOS_AUDIT table. To
learn more about the data stored in this table, see “Audit Log Table” on page 931.

Note: The SIF Audit request allows an external application to insert new records in
the C_REPOS_AUDIT table. You would use this request to report activity involving a
record(s) in Siperian Hub, that is at a higher level, or has more information that can be
recorded by the Hub. For example, audit an update to a complex object before
transforming and decomposing it to Hub objects. To learn more, see the Siperian
Services Integration Framework Guide.

930 Siperian Hub Administrator Guide


Using the Audit Log

Audit Log Table


The C_REPOS_AUDIT table has the following columns.
Schema for the Audit Log Table (C_REPOS_AUDIT)
Name Oracle Type DB2 Type Description
ROWID_AUDIT CHAR(14) CHARACTER(14) Unique ID for this record. Primary key.
CREATE_DATE DATE TIMESTAMP Record creation date. Defaults to the system
date.
CREATOR VARCHAR2(50) VARCHAR(50) User associated with the audit event.
LAST_UPDATE_DATE DATE TIMESTAMP Same as CREATE_DATE.
UPDATED_BY VARCHAR2(50) VARCHAR(50) Same as CREATOR.
COMPONENT VARCHAR2(50) VARCHAR(50) Component involved:
• SIF.sif.api
ACTION VARCHAR2(50) VARCHAR(50) One of the following:
• SIF request name
• message queue name
STATUS VARCHAR2(50) VARCHAR(50) One of the following values:
• debug
• info
• warn
• error
• fatal
ROWID_OBJECT CHAR(14) CHARACTER(14) The rowid_object, if known.
DATA_XML CLOB CLOB XML associated with the auditable event:
request, response, or JMS message. Populated
only if the Include XML option is enabled
(checked).
Note: Passwords are never stored in the audit
log. If a password exists in the XML stream
(whether encrypted or not), Siperian Hub
replaces the password with the text “******”.

Auditing Siperian Hub Services and Events 931


Using the Audit Log

Schema for the Audit Log Table (C_REPOS_AUDIT) (Cont.)


Name Oracle Type DB2 Type Description
CONTEXT_XML CLOB CLOB XML that might contain contextual
information, such as configuration data, the
URL that was invoked, trace for the
execution of a match rule, and so on. If an
error occurs, the request XML is always put
in this column to ensure its capture in case
auditing was not enabled for the SIF request
that was invoked. Populated only if the
Include XML option is enabled (checked).
ROWID_AUDIT_ CHAR(14) CHARACTER(14) Reference to the ROWID_AUDIT of the
PREVIOUS related previous entry. For example, links a
response entry to its corresponding request
entry.
INTERACTION_ID NUMBER(19) BIGINT(8) Interaction ID. May be NULL since
INTERACTION_ID is optional.
USERNAME VARCHAR2(50) VARCHAR(50) User that invoked the SIF request. Null for
message queues.
FROM_SYSTEM VARCHAR2(50) VARCHAR(50) Source system for a SIF request, or Admin
for message queues.
TO_SYSTEM VARCHAR2(50) VARCHAR(50) System to which the audited event is related.
For example, API Requests to Hub set this to
“Admin” and the responses are the system or
null if not known (and vice-versa for
Responses). Note that Activity Manager
Actions set this value.
TABLE_NAME VARCHAR2(100) VARCHAR(100) Table in the Hub Store that is associated with
this audited event.
CONTEXT VARCHAR2(255) VARCHAR(255) Metadata. For example, pkeySource
This is null for audits from Hub, but may
have values for Activity Manager and audits
done through the SIF API.

932 Siperian Hub Administrator Guide


Using the Audit Log

Viewing the Audit Log


You can view the audit log using an external data management tool (not included with
Siperian Hub), such as TOAD. The following example shows viewing the contents of
the DATA_XML column in TOAD:

If available in the data management tool you use to view the log file, you can focus
your viewing by filtering entries—by audit level (view only debug-level or info-level
entries), by time (view entries within the past hour), by operation success / failure
(show error entries only), and so on.

Auditing Siperian Hub Services and Events 933


Using the Audit Log

The following SQL statement is just one example:


SELECT ROWID_AUDIT, FROM_SYSTEM, TO_SYSTEM, USERNAME, COMPONENT,
ACTION, STATUS, TABLE_NAME, ROWID_OBJECT, ROWID_AUDIT_PREVIOUS,
DATA_XML, CREATE_DATE FROM C_REPOS_AUDIT
WHERE CREATE_DATE >= TO_DATE('07/06/2006 12:23:00', 'MM/DD/YYYY
HH24:MI:SS')
ORDER BY CREATE_DATE

Sample Audit Log Entries


Here is an example C_REPOS_AUDIT with audit log entries. For this example, the
XML data was not included.

934 Siperian Hub Administrator Guide


Using the Audit Log

Here is an example C_REPOS_AUDIT with audit log entries that includes the XML
column. For this example, both Enable Audit and Include XML check boxes were
enabled.

Periodically Purging the Audit Log


The audit log table can grow very large rapidly, particularly when capturing XML
request and response information (when the Include XML option is enabled). Using
tools provided by your database management system, consider setting up a scheduled
job that periodically deletes records matching a particular filter (such as entries created
more than 60 minutes ago).

The following SQL statement is just one example:


DELETE FROM C_REPOS_AUDIT WHERE CREATE_DATE < (SYSDATE - 1) AND
STATUS='INFO'

Auditing Siperian Hub Services and Events 935


Using the Audit Log

936 Siperian Hub Administrator Guide


Part 6
Appendixes

Contents
• Appendix A, “Configuring International Data Support”
• Appendix B, “Backing Up and Restoring Siperian Hub”
• Appendix C, “Configuring User Exits”
• Appendix D, “Viewing Configuration Details”
• Appendix E, “Implementing Custom Buttons in Hub Console Tools”
• Appendix F, “Configuring Access to Hub Console Tools”

937
938 Siperian Hub Administrator Guide
A
Configuring International Data Support

This topic explains how to configure character sets in a Siperian Hub implementation.
The database needs to support the character set you want to use, the terminal must be
configured to support the character set you want to use, and the NLS_LANG
environment variable must include the Oracle name for the character set used by your
client terminal.

Appendix Contents
• Configuring Unicode in Siperian Hub
• Configuring the ANSI Code Page (Windows Only)
• Configuring NLS_LANG

939
Configuring Unicode in Siperian Hub

Configuring Unicode in Siperian Hub


This section explains how to configure Siperian Hub to use Unicode Transfer Format
(UTF8) encoding.

Creating and Configuring the Database


The Oracle database used for your Siperian Hub implementation must be created and
configured to support the character set that you want to use. If your implementation
will use mixed locale information (for example, data from multiple countries with
different character sets or display requirements), in order for match to work correctly,
you must set up a UTF8 Oracle database. If, however, the database will contain data
from a single locale, a UTF8 database is probably not required.

To set up a UTF8 Oracle database, complete the following steps:


1. Create a UTF8 database and choose the following settings:

• database character set: AL32UTF8


• national character set: AL16UTF16
Note: Oracle recommends using AL32UTF8 as the database character set for
Oracle 10g. For previous Oracle releases, refer to your Oracle documentation.
2. Set NLS_LANG on both the server and the client:
AMERICAN_AMERICA.AL32UTF8

Notes:
• The NLS_LANG setting should match the database character set.
• The language_territory portion of the NLS_LANG setting (represented as
“AMERICA_AMERICA” in the above example) is locale-specific and might not be
suitable for all Siperian Hub implementations. For example, a Japanese
implementation might need to use the following setting instead:
NLS_LANG=JAPANESE_JAPAN.AL32UTF8

• If you use AL32UTF8 (or even UTF8) as the database character set, then it is
highly recommended that you set NLS_LENGTH_SEMANTICS to CHAR
(in the Oracle init.ora file) when you instantiate the database. Doing so forces
Oracle to default to CHAR (not BYTE) for variable length definitions.

940 Siperian Hub Administrator Guide


Configuring Unicode in Siperian Hub

The NLS_LENGTH_SEMANTICS setting affects all character-related


variable types: VARCHAR, VARCHAR2, and CHAR.
3. Ensure that the Regional Font Settings are correctly configured on the client.
For East Asian data, be sure to install East Asian fonts.
4. When editing data, the regional font settings should match the language being
used.
5. If you are using a multi-byte character set in your Oracle database, you must
change the following setting in the REPOS_DB_RELEASE table to zero (0):
column_length_in_bytes_ind = 0

By default, this setting is one (1), which means that column lengths are declared as
byte values. Changing this to zero (0) means that column lengths are declared as
CHAR values in support of Unicode values.

Configuring Match Settings for Non-US Populations


This section describes how to configure match settings for non-United States
populations. For an introduction, see “Population Sets” on page 326.

Configuring Populations

By default, Siperian Hub supports the population for the United States (provides a
usa.ysp file in the default installation). If your implementation needs to use a
population other than the US population, then additional analysis of the data is
required.
• If the data is exclusively from a different country, and Siperian provides a
population for that country, then use that population. Contact Siperian Support to
obtain the population.ysp file that is appropriate for your implementation, along
with instructions to enable the population.
• If the data is mostly from one country with very small amounts of mixed data from
one or more other populations, consider using the majority population. Contact
Siperian Support to obtain the population.ysp file for the majority population,
along with any instructions.
• If large quantities of data from different countries are mixed, consider whether it is
meaningful to match across such a disparate set of data. If so, then consider using

Configuring International Data Support 941


Configuring Unicode in Siperian Hub

the “international” population. Contact Siperian Support to obtain the appropriate


population.ysp file and instructions to enable the population.

• For all other situations, contact Siperian Support.

To configure match settings for UTF8:


1. In the C_REPOS_SSA_POPULATION metadata table, enable the appropriate SSA_
POPULATION.

Contact Siperian Support to obtain the appropriate means to enable the population
you want to use. The SSA_POPULATION defines the Standard Population Set to use
for match purposes. A Standard Population Set contains the rules that define how
the Key Building, Search Strategies, and Match Purposes operate on a particular
population of data. There is one Standard Population set for each supported
country, language, or population.
2. Copy the appropriate population.ysp file obtained from Siperian Support to the
following location.
Windows
SIP_HOME\cleanse\resources\match

For example:
C:\siperian\hub\cleanse\resources\match

Unix
SIP_HOME/hub/cleanse/
Note: Siperian ships the usa.ysp file by default. If you need to use the population
set for a different country, contact Siperian Support to obtain the population.ysp
file that is appropriate for your implementation, along with instructions to enable
the population.

Configuring Encoding for Match Processing

To configure encoding for match processing, edit the cmxcleanse.properties file


and add the following setting:
cmx.server.match.server_encoding = 1

942 Siperian Hub Administrator Guide


Configuring Unicode in Siperian Hub

This setting helps with the processing of UTF8 characters during match, ensuring that
all data is represented in UTF16 (although its representation in the database is still
UTF8).

Using Multiple Populations Within a Single Base Object

Siperian Hub provides you with the ability to use multiple populations within a single
base object. This is useful if data in a base object comes from different
populations—for example, 70% of the records from the United States and 30% from
China. Populations can vary on a record-by-record basis.

To use multiple (two or more) populations within a base object:


1. Contact Siperian Support to obtain the applicable population.ysp file(s) for your
implementation, along with instructions for enabling the population.
2. For each population that you want to use, enable it in the C_REPOS_SSA_
POPULATION metadata table (c_repos_ssa_population.enabled_ind=1).

3. Copy the applicable population.ysp file(s) obtained from Siperian Support to the
following location.
Windows
SIP_HOME\cleanse\resources\match

For example:
C:\siperian\hub\cleanse\resources\match

Unix
SIP_HOME/hub/cleanse/
4. Restart the application server.
5. In the Schema Manager, add a column to the base object that will contain the
population to use for each record.

Configuring International Data Support 943


Configuring Unicode in Siperian Hub

This must be a VARCHAR column with the physical name of SIP_POP.

Note: The width of the VARCHAR column must fit the largest population name
in use. A width of 30 is probably sufficient for most implementations.
6. Configure the match column as an exact match column with the name of SIP_
POP, according to the instructions in “Configuring Match Columns” on page 515.
7. For each record in the base object that will use a non-default population, provide (in
the SIP_POP column) the name of the population to use instead.
• You can specify values for the SIP_POP column in any manner of
ways—adding the data in the landing tables, using cleanse functions that
calculate the values during the stage process, invoking SIF requests from
external applications—even manually editing the cells using the Data Manager
tool. The only requirement is that the SIP_POP cells must contain this data
for all non-default populations just prior to executing the Generate Match
Tokens process.
• The data in the SIP_POP column can be in any case (upper, lower, or mixed)
because all alphabetic characters will be converted to lowercase in the match
key table. For example, Us, US, and us are all valid values for this column.
• Invalid values in this column will be processed using the default population.
Invalid values include NULLs, empty strings, and any string that does not
match a population name as defined in c_repos_ssa_
population.population_name.

8. Execute the Generate Match Tokens process on this base object to update the
match key table.
9. Execute the match process on this base object.

944 Siperian Hub Administrator Guide


Configuring Unicode in Siperian Hub

Note: The match process compares only records that share the same population.
For example, it will compare Chinese records with Chinese records, and American
records with American records. Any resulting match pairs will be between records
that share the same population.

Cleanse Settings for Unicode


• If you are using the Address Doctor cleanse libraries, ensure that you have the
right database and the unlock code for Address Doctor. You will need to obtain
the Address Doctor database for all countries needed for your implementation.
Contact Siperian Support for details.
• If you are using Trillium, make sure that you use the right template to create the
project. Refer to the Trillium installation documentation to determine which
countries are supported. Obtain country-specific projects from Trillium directly.

Data in Landing Tables


Make sure that the data that is pushed into the landing table is UTF8. This should be
taken care of during the ETL process.

Hub Console
In the Hub Console, menus, warnings, and so on are in English. Current Siperian Hub
UTF support applies only to business data—not metadata or the interface. The Hub
Console will have UTF8 support in a future release.

Locale Recommendations for UNIX When Using UTF8


Many UNIX systems use incompatible character encodings to represent their local
alphabets as binary data. This means that, for example, one string of text written on a
Korean system will not work in a Chinese setting. However, you can make UNIX
systems use UTF-8 encoding for any language. UTF-8 text encoding supports many
languages so that one language does not interfere with another.

You can configure the system locale settings (which define settings for the system
language) to use UTF-8 by completing the following steps:

Configuring International Data Support 945


Configuring the ANSI Code Page (Windows Only)

1. Run the following command:


locale -a

2. Determine whether you can find a locale for your language with a name ending
in .utf8.
localedef -f UTF-8 -i en_US en_US.utf8

3. Once you know whether you have a locale that allows you to use UTF-8, instruct
the UNIX system to use that locale.
Export LC_ALL="en_US.utf8"
export LANG="en_US.utf8"
export LANGUAGE="en_US.utf8"

Configuring the ANSI Code Page (Windows Only)


This section explains how to determine and configure the ANSI code page (ACP) in
Windows.

Determining the ANSI Code Page


Like almost all Windows settings, the ACP is stored in the registry. To determine the
ACP:
1. From the Start menu, choose Run.

2. At the command prompt, type regedit and then click OK.


3. Browse the following registry entry:
HKEY_LOCAL_
MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\ACP

Note: There are many registry entries with very similar names, so be sure to look at the
right place in the registry.

946 Siperian Hub Administrator Guide


Configuring NLS_LANG

Changing the ANSI Code Page


To change the ANSI Code Page in Windows, you need to configure locale and
language settings in the Control Panel. The instructions differ for Windows XP and
Windows 2003 systems. For instructions, refer to your Microsoft Windows
documentation.

Note: On Windows XP systems, you might need to install support for non-Western
languages.

Configuring NLS_LANG
To specify the locale behavior of your client Oracle software, you need to set your NLS_
LANG setting, which specifies the language, territory, and the character set of your client.
This section describes several ways in which to configure the NLS_LANG setting.

Syntax for NLS_LANG


The NLS setting uses the following format:
NLS_LANG = LANGUAGE_TERRITORY.CHARACTERSET

where:

Setting Description
LANGUAGE Specifies the language used for Oracle messages, as well as the names of
days and months.
TERRITORY Specifies monetary and numeric formats, as well as territory and
conventions for calculating week and day numbers.
CHARACTERSET Controls the character set used by the client application, or it matches
your Windows code page, or it is set to UTF8 for a Unicode application.

Note: The character set defined with the NLS_LANG parameter does not change your
client's character set. Instead, it is used to let Oracle know which character set you are
using on the client side so that Oracle can perform the proper conversion.
The character set part of the NLS_LANG parameter is never inherited from the server.

Configuring International Data Support 947


Configuring NLS_LANG

Configuring NLS_LANG in the Windows Registry


On Windows systems, you should make sure that you have set an NLS_LANG registry
subkey for each of your Oracle Homes:

You can modify this subkey using the Windows Registry Editor:
1. From the Start menu, choose Run...

2. At the command prompt, type regedit, and then click OK.


3. Edit the following registry entry:
For Oracle 10g:
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\KEY_<oracle_home_name>

There you should have an entry with the name NLS_LANG.

When starting an Oracle tool (such as sqlplusw), the tool will read the contents of the
oracle.key file located in the same directory to determine which registry tree will be
used (therefore, which NLS_LANG subkey will be used).

Configuring NLS_LANG as an Environment Variable


Although the Windows Registry is the primary repository for settings in Windows
systems, and it is the recommended way to configure NLS_LANG, there are alternatives.
You can set NLS_LANG as a System or User Environment Variable in the System
properties, although this is not the recommended approach. The configured setting will
be used for all Oracle homes.

To check and modify system or user environment variables:


1. Right-click the My Computer icon and choose Properties.

2. Click the Advanced tab.


3. Click Environment Variables.
• The User Variables list contains the settings for the currently logged-in
Windows user.
• The System Variables list contains system-wide variables for all users.

948 Siperian Hub Administrator Guide


Configuring NLS_LANG

4. Change settings as needed.

Because these environment variables take precedence over the parameters specified in
your Windows Registry, you should not set Oracle parameters at this location unless
you have a very good reason. In particular, note that the ORACLE_HOME parameter
is set on Unix but not on Windows.

Configuring International Data Support 949


Configuring NLS_LANG

950 Siperian Hub Administrator Guide


B
Backing Up and Restoring Siperian Hub

This appendix explains how to back up and restore a Siperian Hub implementation.

Appendix Contents
• Backing Up Siperian Hub
• Backup and Recovery Strategies for Siperian Hub

951
Backing Up Siperian Hub

Backing Up Siperian Hub


This appendix describes backup and recovery strategies for Master Reference Manager
(MRM) tables (permanent Hub tables) that are operated on by logging or non-logging
operations.

Non-logging operations (such as CTAS, Direct Path SQL Load, and Direct Insert) are
occasionally performed on permanent Hub tables to speed-up batch processes. These
operations are not recorded in the redo logs and, as such, are not generally recoverable.
However, recovery is possible if a backup is made immediately after the operations are
completed.

Backup and recovery strategies are dependent on the value of the GLOBAL_NOLOGGING_
IND column (in the C_REPOS_DB_RELEASE table), which turns non-logging operations
on or off.

The GLOBAL_NOLOGGING_IND column has two possible values:


• GLOBAL_NOLOGGING_IND = 1 (default), which indicates that non-logging
operations are enabled.
• GLOBAL_NOLOGGING_IND = 0, which indicates that non-logging operations are
disabled.

Note: GLOBAL_NOLOGGING_IND controls non-logging operations for permanent Hub


tables only, but not for transient tables that are used in Hub batch processes.

Backup and Recovery Strategies for Siperian Hub


Different backup and recovery strategies are required depending on whether
non-logging operations are occurring, that is, depending on the GLOBAL_NOLOGGING_
IND column value. This section describes the two different kinds of backup and
recovery strategies: backup and recovery with non-logging operations, and backup and
recovery without non-logging operations.

952 Siperian Hub Administrator Guide


Backup and Recovery Strategies for Siperian Hub

Backup and Recovery With Non-Logging Operations


When non-logging operations on permanent Hub tables are enabled (GLOBAL_
NOLOGGING_IND =1), the following Siperian Hub processes perform non-logging
operations on permanent tables:
• Staging with Delta Detection and Raw Detection
• Tokenization
• Match
• Merge

To recover changes that the non-logging operations make, you must perform an
immediate back-up procedure.

Backup and Recovery Without Non-Logging Operations


If non-logging operations on permanent Hub tables are disabled (GLOBAL_
NOLOGGING_IND = 0), redo logs can be used to ensure database recoverability.

To ensure database recoverability:


1. Log on to sqlplus as the ors user.

2. Use the following command to update the C_REPOS_DB_RELEASE table to disable


non-logging operations:
Run sql:
update c_repos_db_release set GLOBAL_NOLOGGING_IND = 0;
COMMIT;

3. Use the following command to disable index creation with the non-logging option:
Run sql:
update c_repos_table set NOLOGGING_IND = 0;
COMMIT;

4. Make sure that the database is running in the archive log mode.
5. Perform a database backup.
6. If recovery is needed, apply redo logs on the backup.

Backing Up and Restoring Siperian Hub 953


Backup and Recovery Strategies for Siperian Hub

954 Siperian Hub Administrator Guide


C
Configuring User Exits

This chapter provides reference information for the various predefined Siperian Hub
user exit procedures.

Appendix Contents
• About User Exits
• Types of User Exits

955
About User Exits

About User Exits


A user exit is an unencrypted stored procedure that includes a set of fixed, pre-defined
parameters. The procedure is configured, on a per-base object basis, to execute at a
specific point during execution of a Siperian Hub batch job. For more information on
how to view user exits with the User Object Registry Tool, see “Viewing User Exits”
on page 912.

Note: The POST_LANDING, PRE_STAGE, and POST_STAGE user exits are only
called from the batch Stage process. For more information, see “Stage Jobs” on page
745.

Siperian Hub automatically provides the appropriate input parameter values when it
calls a user exit procedure. In addition, Siperian Hub automatically checks the return
code returned by a user exit procedure. A negative return code causes the Hub process
to terminate with an error condition.

A user exit must perform its own transaction handling. COMMITs / ROLLBACKs
must be explicitly issued for any data manipulation operation(s) in a user exit, or in
stored procedures called from user exits. However, this is not true for the Siperian SIF
API requests (for example, Merge, Unmerge, and so on). Transactions for the API
requests are handled by Java code. Any COMMITs / ROLLBACKs in such a case may
cause a Java distributed transaction error.

Note: Dynamic SQL is recommended for all DML/DDL statements, as a user exit
could access objects that only exist at run time.

Note: For Oracle databases, all user exit procedures are located in the cmxue package.

956 Siperian Hub Administrator Guide


Types of User Exits

Types of User Exits


Here are the various types of user exit procedures:

User Exit Name Description


POST_LANDING Data in a Landing table can be refined using this user exit after the
Landing table has been populated using an ETL process. For more
information, see “POST_LANDING User Exit” on page 958.
PRE_STAGE Called before loading the data into a Staging table. For more
information, see “PRE_STAGE User Exit” on page 959.
POST_STAGE Called after a Staging table has been populated. For more
information, see “POST_STAGE User Exit” on page 959.
POST_LOAD Called after a Load batch job and after a Put API call. For more
information, see “POST_LOAD User Exit” on page 961.
PRE_MATCH Called before a Match batch job.
POST_MATCH Called after a Match batch job. For more information, see
“POST_MATCH User Exit” on page 962.
PRE_USER_MERGE_ Called just before records to be merged are assigned to a user. For
ASSIGNMENT more information, see “PRE_USER_MERGE_ASSIGNMENT”
on page 965.
POST_MERGE Called after a Merge or a Multi-Merge batch job and after a Merge
API call. For more information, see “POST_MERGE User Exit”
on page 963.
POST_UNMERGE Called after a Unmerge API call. For more information, see
“POST_UNMERGE User Exit” on page 964.

User Exits for the Stage Process


The POST_LANDING, PRE_STAGE, and POST_STAGE user exits are only called
from the batch Stage process. For more information, see “Stage Jobs” on page 745.

Configuring User Exits 957


Types of User Exits

POST_LANDING User Exit

Use a POST_LANDING user exit for custom work on the landing table prior to delta
detection. For example:
• Hard delete detection
• Replace control characters with printable characters
• Perform any special pre-cleansing processes on Addresses

POST_LANDING Parameters

Parameter Name Description


IN_ROWID_JOB Job id for the Stage job, as registered in C_REPOS_
JOB_CONTROL.
IN_LANDING_TABLE_NAME Source table for the Stage job
IN_STAGING_TABLE_NAME Target table for the Stage job
IN_PRL_TABLE_NAME Previous Landing table name; that is, the copy of the
source data mapped to the staging table from the
previous time the Stage job ran
OUT_ERROR_MESSAGE Error message.
OUT_RETURN_CODE Return code.

958 Siperian Hub Administrator Guide


Types of User Exits

PRE_STAGE User Exit

Use a PRE_STAGE user exit for any special handling of delta processes. For example,
use a PRE_STAGE user exit to check delta volumes and determine whether they
exceed pre-defined allowable delta volume limits (for example, “stop process if source
system is System A and the number of deltas is greater than 500,000”).

PRE_STAGE Parameters

Parameter Name Description


IN_ROWID_JOB Job id for the Stage job, as registered in C_REPOS_
JOB_CONTROL.
IN_LANDING_TABLE_NAME Source table for the Stage job.
IN_STAGING_TABLE_NAME Target table for the Stage job.
IN_DLT_TABLE_NAME Delta table name; that is, the table containing the
records identified as deltas.
OUT_ERROR_MESSAGE Error message.
OUT_RETURN_CODE Return code.

POST_STAGE User Exit

Use a POST_STAGE user exit for any special processing at the end of a Stage job.For
example, use a POST_STAGE user exit for special handling of rejected records from
the Stage job (for example, to automatically delete rejects for known, non-critical
conditions).

Configuring User Exits 959


Types of User Exits

POST_STAGE Parameters

Parameter Name Description


IN_ROWID_JOB Job id for the Stage job, as registered in c_repos_job_
control.
IN_LANDING_TABLE_NAME Source table for the Stage job
IN_STAGING_TABLE_NAME Target table for the Stage job.
IN_PRL_TABLE_NAME Previous Landing table name; that is, the copy of the
source data mapped to the staging table from the
previous time the Stage job ran.
OUT_ERROR_MESSAGE Error message.
OUT_RETURN_CODE Return code.

960 Siperian Hub Administrator Guide


Types of User Exits

User Exits for the Load Process


POST_LOAD User Exit

Use a POST_LOAD user exit after an update or after an insert from Load.

For the Load process, the IN_ACTION_TABLE has the name of the work table
containing the ROWID_OBJECT values to be inserted/updated.

POST_LOAD Parameters
Parameter Name Description
IN_ROWID_JOB Job id for the Load job, as registered in c_repos_job_
control (Blank for the PUT).
IN_TABLE_NAME Name of the target table (Base Object / Relationship Table
/ Dependent Object) for the Load job.
IN_STAGE_TABLE Name of the source table for the Load job.
IN_ACTION_TABLE For the Load job, this is the name of the table containing
the rows to be inserted or updated (staging_table_name_
TINS for inserts, staging_table_name_TOPT for updates).
OUT_ERROR_MESSAGE Error message.

Configuring User Exits 961


Types of User Exits

Parameter Name Description


OUT_RETURN_CODE Return code.

User Exits for the Match Process


POST_MATCH User Exit

Use a POST_MATCH user exit for custom work on the match table.

For example, use a POST_MATCH user exit to manipulate matches in the match
queue.

POST_MATCH Parameters
Parameter Name Description
IN_ROWID_JOB Job id for the Match job, as registered in c_repos_job_
control
IN_TABLE_NAME Base Object that the Match job is running on.
IN_MATCH_SET_NAME Match ruleset.
OUT_ERROR_MESSAGE Error message.
OUT_RETURN_CODE Return code.

962 Siperian Hub Administrator Guide


Types of User Exits

User Exits for the Merge Process


POST_MERGE User Exit

Use a POST_MERGE user exit to perform custom work after the Merge process.

For example, use a POST_MERGE user exit to automatically match and merge child
records affected by the match and merge of a parent record.

POST_MERGE Parameters
Parameter Name Description
IN_ROWID_JOB Job id for the Merge job, as registered in c_repos_job_
control.
IN_TABLE_NAME Base Object that the Merge job is running on.
IN_ROWID_OBJECT_TABLE Bulk merge–action table.
On-line merge–in line view.
OUT_ERROR_MESSAGE Error message.
OUT_RETURN_CODE Return code.

Configuring User Exits 963


Types of User Exits

User Exits for the Unmerge Process


POST_UNMERGE User Exit

Use a POST_UNMERGE user exit for custom work after the Unmerge process.

POST_UNMERGE Parameters
Parameter Name Description
IN_ROWID_JOB Job id for the Unmerge transaction, as registered in c_repos_
job_control.
IN_TABLE_NAME Base Object that the Unmerge job is running on.
IN_ROWID_OBJECT Re-instated rowid_object.
OUT_ERROR_MESSAGE Error message.
OUT_RETURN_CODE Return code.

964 Siperian Hub Administrator Guide


Types of User Exits

Additional User Exits


PRE_USER_MERGE_ASSIGNMENT

Use this user exit to override or extend user assignment lists. This user exit procedure
runs before the user merge assignment is updated. Note that user assignment lists are
stored in C_REPOS_USER_MERGE_ASSIGNMENTS.

Configuring User Exits 965


Types of User Exits

966 Siperian Hub Administrator Guide


D
Viewing Configuration Details

This appendix explains how to view the configuration details in a Siperian Hub
implementation using the Enterprise Manager in the Hub Console.

Appendix Contents
• About the Enterprise Manager
• Starting the Enterprise Manager
• Enterprise Manager Properties

967
About the Enterprise Manager

About the Enterprise Manager


The Enterprise Manager tool allows you to view properties and version histories for
the Hub server, the cleanse servers, the ORS databases, and the Master Database.

Starting the Enterprise Manager


To start the Enterprise Manager:
1. Launch Siperian Hub.

2. In the Change Database window, choose Master Database.


3. In the Siperian Hub Console:
a. Click the Workbenches tab.
b. Expand the Configuration tree.
c. Select Enterprise Manager.
The Enterprise Manager screen is displayed.

968 Siperian Hub Administrator Guide


Enterprise Manager Properties

Enterprise Manager Properties


This section explains how to choose the different servers or databases to view, and lists
the properties that the Enterprise Manager displays for the Hub server, cleanse server,
and Master Database.

Choosing Properties to View


Before you can choose servers or databases to view, you must first start Enterprise
Manager. See “Starting the Enterprise Manager” on page 968.

In the Enterprise Manager screen, from the Select a hub component menu, choose
the type of information you want to view: Hub Servers, Cleanse Servers, Master
database, or ORS databases. The screen displays properties that are specific for your
choice.

When you click Version History, you see version information for the choice you made
in the Select a hub component field. Version history is sorted in descending order
of install start time. All version histories of hub components are similar to the graphic
shown below.

Viewing Configuration Details 969


Enterprise Manager Properties

Hub Server Properties


When you choose Hub Server from the Select field, the Hub Server properties are
displayed. Please see the cmxcleanse.properties file for more information.

To view more information about each property, slide your cursor or mouse over the
property.

The following table describes Hub Server properties that the Enterprise Manager can
display in the Properties tab, depending on your preference. These properties are found
in the cmxserver.properties file (in the hub server installation directory), and are
not configurable.

Property Name Explanation Property


Installation directory Installation directory of the cmx.home= C:/siperian/hub/server
Siperian Hub server
Master database type Type of Master database cmx.server.masterdatabase.type=ORACLE

Application server Type of application server: cmx.appserver.type=<application_server_


type JBoss, Websphere, WebLogic name>

970 Siperian Hub Administrator Guide


Enterprise Manager Properties

Property Name Explanation Property


Application server Optional property used to cmx.appserver.hostname=Clustername
hostname deploy MRM into the EJB
cluster.
RMI port Application server port cmx.appserver.rmi.port=<port_#>
(depends on the appserver
type)
default settings: 2809 for
Websphere, 1099 for JBoss,
7001 for WebLogic
Naming protocol Naming protocol for the cmx.appserver.naming.protocol=Jnp
application server type
iiop for Websphere, jnp for
JBoss, t3 for WebLogic
Initial heap size for Initial heap size for Java jnlp.initial-heap-size=128m
Java web start JVM
Maximum heap size Maximum heap size for Java jnlp.max-heap-size=512m
for Java web start web start JVM
JVM
Refresh interval for Refresh interval for SAM cmx.server.sam.cache.resources.refresh_
SAM resources in resources interval=5
clock ticks cmx.server.sam.cache.user_profile.refresh_
Properties are specific to the
interval=1
security access manager
component within the hub cmx.server.clock.tick_interval=60000
server and used to manage
cached resources for user
profiles.
Refresh interval for Refresh interval for SAM cmx.server.provider.userprofile.cacheable=f
SAM user profiles in user profiles alse
clock ticks cmx.server.provider.userprofile.expiration=
60000
cmx.server.provider.userprofile.lifespan=60
000
Clock tick duration Clock tick duration 60000
in ms
Cache user profiles false

User profile 60000


expiration duration
in ms

Viewing Configuration Details 971


Enterprise Manager Properties

Property Name Explanation Property


User profile lifespan 60000
in ms
Lookout dropdown Number of entries that will sip.lookup.dropdown.limit=100
limit be populated in a dropdown
menu in the Data Manager
and Merge Manager tools
Java runtime Sun Microsystems Inc.
environment vendor

Cleanse Server Properties


When you choose Cleanse Server from the Select field, a list of the cleanse servers is
displayed. When you select a cleanse server, Enterprise Manager displays its properties.
If you place your mouse over the properties, the property values and their source are
displayed.

972 Siperian Hub Administrator Guide


Enterprise Manager Properties

The following table describes Cleanse Server properties that the Enterprise Manager
can display in the Properties tab, depending on your preference. These properties are
found in the cmxcleanse.properties file.

Property Name Explanation Property


Siperian MRM Installation directory of the cmx.server.datalayer.cleanse.working_
Cleanse properties cleanse files files.location=C:/siperian/hub/cleanse/tmp

? cmx.server.datalayer.cleanse.working_
files=KEEP
? cmx.server.datalayer.cleanse.execution=LOC
AL
Installation Installation directory of the cmx.home= C:/siperian/hub/server
directory Siperian Hub server
Application server Type of application server: cmx.appserver.type=<application_server_
type JBoss, Websphere, WebLogic. name>

Default port Application server port cmx.appserver.soap.connector.port=<port_#>


8880 for Websphere (this
property is not applicable for
JBoss and WebLogic)
Match properties Number of threads used cmx.server.match.num_of_threads=1

Number cmx.server.match.server_encoding=0

Number of records per match cmx.server.match.max_records_per_ranger_


ranger node (limits memory use) node=300
Number of threads used during cmx.server.cleanse.num_of_threads=1
cleaning activities
Address Doctor Address Doctor cleanse library
properties unlock code
Address Doctor cleanse library
database path
Address Doctor optimization
Address Doctor memory setting
Address Doctor correction type
Address Doctor certified
preload part

Viewing Configuration Details 973


Enterprise Manager Properties

Property Name Explanation Property


Address Doctor certified
preload full
Address Doctor correction
preload part
Address Doctor correction
preload full
Trillium properties Trillium cleanse library config
file 1
Trillium cleanse library config
file 2
Trillium cleanse library config
file 3
Group 1 software Group One Enterprise Server
cleanse library property
configuration file
Group One cleanse library CDQ
property configuration file
First Logic object FirstLogic cleanse library
property configuration file

Master Database Properties


When you choose Master Database from the Select field, the Master Database
properties are displayed. The only database properties displayed are the database
vendor and version.

974 Siperian Hub Administrator Guide


Enterprise Manager Properties

ORS Database Properties


When you choose ORS Database from the Select field, a list of the ORS databases is
displayed. When a specific ORS is selected, the list of properties for that ORS is
displayed.

The top panel contains a list of ORS databases that are registered with the Master
Database. The bottom panel displays the properties and the version history of the ORS
that is selected in the top panel. Properties of ORS include database vendor and
version, as well as information from the C_REPOS_DB_RELEASE table. Version
history is also kept in the C_REPOS_DB_VERSION table.

The following table describes the C_REPOS_DB_RELEASE properties that the


Enterprise Manager displays for the ORS databases, depending on your preference.

Column Name Property Explanation


DEBUG_LEVEL_STR Debug level for the ORS database
ENVIRONMENT_ID
DEBUG_FILE_PATH Path to the location of the debug log
DEBUG_FILE_NAME Name of the ORS database debug log
DEBUG_IND Flag that indicates whether debug is
enabled or not.
0 = debug is not enabled
1 = debug is enabled

Viewing Configuration Details 975


Enterprise Manager Properties

Column Name Property Explanation


TNSNAME TNS name of the ORS database
CONNECTION_PORT Port on which the ORS database listens
ORACLE_SID Oracle database identifier
DATABASE_HOST Host on which the database is installed
INTER_SYSTEM_TIME_DELTA_SEC Delta-detection value, in seconds, which
determines if the incoming data is in the
future
COLUMN_LENGTH_IN_BYTES_IND Flag that the SQLLoader uses to
determine if the database it is loading
into is a UTF-8 database
A default value of 1 means that the
database is UTF-8.
LOAD_TEMPLATE
MTIP_REGENERATION_REQUIRED_IND Flag that indicates that the MTIP views
will be regenerated before the
match/merge process.
The default value of 0 (zero) means that
views will not be regenerated.
GLOBAL_NOLOGGING_IND This is used when tables are created to
enable logging for DB recovery.
The default of 1 means no logging.

Environment Report
When you choose Environment from the Select field, the Enterprise Manager displays
a summary of the properties of all the other choices, along with any associated error
messages. This report can be downloaded in HTML format to a file system.

976 Siperian Hub Administrator Guide


E
Implementing Custom Buttons in Hub
Console Tools

This chapter explains how, in a Siperian Hub implementation, you can add custom
buttons to tools in the Hub Console that allow you to invoke external services on
demand.

Appendix Contents
• About Custom Buttons in the Hub Console
• Adding Custom Buttons

977
About Custom Buttons in the Hub Console

About Custom Buttons in the Hub Console


In your Siperian Hub implementation, you can provide Hub Console users with
custom buttons that can be used to extend your Siperian Hub implementation. Custom
buttons can provide users with on-demand, real-time access to specialized data
services. Custom buttons can be added to Merge Manager and Hierarchy Manager.

Custom buttons can give users the ability to invoke a particular external service (such
as retrieving data or computing results), perform a specialized operation (such as
launching a workflow), and other tasks. Custom buttons can be designed to access data
services by a wide range of service providers, including—but not limited
to—enterprise applications (such as CRM or ERP applications), external service
providers (such as foreign exchange calculators, publishers of financial market indexes,
or government agencies), and even Siperian Hub itself (for more information, see the
Siperian Services Integration Framework Guide).

For example, you could add a custom button that invokes a specialized cleanse
function, offered as a Web service by a vendor, that cleanses data in the customer
record that is currently selected in the Merge Manager screen. When the user clicks the
button, the underlying code would capture the relevant data from the selected record,
create a request (possibly including authentication information) in the format expected
by the Web service, and then submit that request to the Web service for processing.
When the results are returned, the Hub displays the information in a separate Swing
dialog (if you created one and if you implemented this as a client custom function) with
the customer rowid_object from Siperian Hub.

Custom buttons are not installed by default, nor are they required for every Siperian
Hub implementation. For each custom button you need to implement a Java interface,
package the implementation in a JAR file, and deploy it by running a command-line
utility. To control the appearance of the custom button in the Hub Console, you can
supply either text or an icon graphic in any Swing-compatible graphic format (such as
JPG, PNG, or GIF).

978 Siperian Hub Administrator Guide


About Custom Buttons in the Hub Console

What Happens When a User Clicks a Custom Button


When a user selects a customer record then clicks a custom button in the Hub
Console, the Hub Console invokes the request, passing content and context to the Java
external (custom) service. Examples of the type of data include record keys and other
data from a base object, package information, and so on. Execution is
asynchronous—the user can continue to work in the Hub Console while the request is
processed.

The custom code can process the service response as appropriate—log the results,
display the data to the user in a separate Swing dialog (if custom-coded and the custom
function is client-side), allow users to copy and paste the results into a data entry field,
execute real-time PUT statements of the data back into the correct business objects,
and so on.

custom button Hub Console


base-object
data
packages Java External Service
user requests
flkjljf

log file flkjljf


flkjljf

Swing dialog

data entry fields


PUT command
business
objects

Hub

Implementing Custom Buttons in Hub Console Tools 979


About Custom Buttons in the Hub Console

How Custom Buttons Appear in the Hub Console


This section shows how custom buttons, once implemented, will appear in the Merge
Manager and Hierarchy Manager tools of the Hub Console.

Custom Buttons in the Merge Manager

Custom buttons are displayed to the right of the top panel of the Merge Manager, in
the same location as the regular Merge Manager buttons. This example shows a button
called fx.

Custom
Button

Custom Buttons in the Hierarchy Manager

Custom buttons are displayed in the top part of the top panel of the Hierarchy
Manager screen, in the same location as other Hierarchy Manager buttons. This
example shows a button called fx.

Custom
Button

980 Siperian Hub Administrator Guide


Adding Custom Buttons

Adding Custom Buttons


To add a custom button to the Hub Console in your Siperian Hub implementation,
complete the following tasks:
1. Determine the details of the external service that you want to invoke, such as the
format and parameters for request and response messages.
2. Write and package the business logic that the custom button will execute, as
described in “Writing a Custom Function” on page 981.
3. Deploy the package so that it appears in the applicable tool(s) in the Hub Console,
as described in “Deploying Custom Buttons” on page 986.

Once an external service button is visible in the Hub Console, users can click the
button to invoke the service.

Writing a Custom Function


To build an external service invocation, you write a custom function that executes the
application logic when a user clicks the custom button in the Hub Console.
The application logic implements the following Java interface:

com.siperian.mrm.customfunctions.api.CustomFunction

To learn more about this interface, see the Javadoc that accompanies your Siperian
Hub distribution.

Implementing Custom Buttons in Hub Console Tools 981


Adding Custom Buttons

Server-Based and Client-Based Custom Functions

Execution of the application logic occurs on either:

Environment Description
Client UI-based custom function—Recommended when you want to display
elements in the user interface, such as a separate dialog that displays
response information. To learn more, see “Example Client-Based Custom
Function” on page 982.
Server Server-based custom button—Recommended when it is preferable to call
the external service from the server for network or performance reasons.
To learn more, see “Example Server-Based Function” on page 984.

Example Custom Functions

This section provides the Java code for two example custom functions that implement
the com.siperian.mrm.customfunctions.api.CustomFunction interface. The
code simply prints (on standard error) information to the server log or the Hub
Console log.

Example Client-Based Custom Function

The name of the client function class for the following sample code is
com.siperian.mrm.customfunctions.test.TestFunction.

982 Siperian Hub Administrator Guide


Adding Custom Buttons

//=====================================================================
//project: Siperian Master Reference Manager, Hierarchy Manager
//---------------------------------------------------------------------
//copyright: Siperian Inc. (c) 2008-2009. All rights reserved.
//=====================================================================

package com.siperian.mrm.customfunctions.test;

import java.awt.Frame;
import java.util.Properties;

import javax.swing.Icon;

import com.siperian.mrm.customfunctions.api.CustomFunction;

public class TestFunctionClient implements CustomFunction {

public void executeClient(Properties properties, Frame frame, String username,


String password, String orsId, String baseObjectRowid, String baseObjectUid, String
packageRowid, String packageUid, String[] recordIds) {
System.err.println("Called custom test function on the client with the
following parameters:");
System.err.println("Username/Password: '" + username + "'/'" + password +
"'");
System.err.println(" ORS Database ID: '" + orsId + "'");
System.err.println("Base Object Rowid: '" + baseObjectRowid + "'");
System.err.println(" Base Object UID: '" + baseObjectUid + "'");
System.err.println(" Package Rowid: '" + packageRowid + "'");
System.err.println(" Package UID: '" + packageUid + "'");
System.err.println(" Record Ids: ");
for(int i = 0; i < recordIds.length; i++) {
System.err.println(" '"+recordIds[i]+"'");
}
System.err.println(" Properties: " + properties.toString());
}

public void executeServer(Properties properties, String username, String


password, String orsId, String baseObjectRowid, String baseObjectUid, String
packageRowid, String packageUid, String[] recordIds) {
System.err.println("This method will never be called because
getExecutionType() returns CLIENT_FUNCTION");
}

public String getActionText() { return "Test Client"; }

Implementing Custom Buttons in Hub Console Tools 983


Adding Custom Buttons

public int getExecutionType() { return CLIENT_FUNCTION; }


public Icon getGuiIcon() { return null; }

Example Server-Based Function

The name of the server function class for the following code is
com.siperian.mrm.customfunctions.test.TestFunctionClient.
//=====================================================================
//project: Siperian Master Reference Manager, Hierarchy Manager
//---------------------------------------------------------------------
//copyright: Siperian Inc. (c) 2008-2009. All rights reserved.
//=====================================================================

package com.siperian.mrm.customfunctions.test;

import java.awt.Frame;
import java.util.Properties;

import javax.swing.Icon;

import com.siperian.mrm.customfunctions.api.CustomFunction;

/**
* This is a sample custom function that is executed on the Server.
* To deploy this function, put it in a jar file and upload the jar file
* to the DB using DeployCustomFunction.
*/
public class TestFunction implements CustomFunction {
public String getActionText() {
return "Test Server";
}
public Icon getGuiIcon() {
return null;
}

public void executeClient(Properties properties, Frame frame, String username,


String password, String orsId, String baseObjectRowid, String baseObjectUid, String
packageRowid, String packageUid, String[] recordIds) {
System.err.println("This method will never be called because
getExecutionType() returns SERVER_FUNCTION");
}

984 Siperian Hub Administrator Guide


Adding Custom Buttons

public void executeServer(Properties properties, String username, String


password, String orsId, String baseObjectRowid, String baseObjectUid, String
packageRowid, String packageUid, String[] recordIds) {
System.err.println("Called custom test function on the server with the
following parameters:");
System.err.println("Username/Password: '" + username + "'/'" + password +
"'");
System.err.println(" ORS Database ID: '" + orsId + "'");
System.err.println("Base Object Rowid: '" + baseObjectRowid + "'");
System.err.println(" Base Object UID: '" + baseObjectUid + "'");
System.err.println(" Package Rowid: '" + packageRowid + "'");
System.err.println(" Package UID: '" + packageUid + "'");
System.err.println(" Record Ids: ");
for(int i = 0; i < recordIds.length; i++) {
System.err.println(" '"+recordIds[i]+"'");
}
System.err.println(" Properties: " + properties.toString());
}

public int getExecutionType() {


return SERVER_FUNCTION;
}
}

Controlling the Custom Button Appearance


To control the appearance of the custom button in the Hub Console, you implement
one of the following methods in the
com.siperian.mrm.customfunctions.api.CustomFunction interface:

Method Description
getActionText Specify the text for the button label. Uses the default visual appearance
for custom buttons.
getGuiIcon Specify the icon graphic in any Swing-compatible graphic format (such as
JPG, PNG, or GIF). This image file can be bundled with the JAR file for
this custom function.

Custom buttons are displayed alphabetically by name in the Hub Console.

Implementing Custom Buttons in Hub Console Tools 985


Adding Custom Buttons

Deploying Custom Buttons


Before you can see the custom buttons in the Hub Console, you need to explicitly add
them using the DeployCustomFunction utility from the command line.

To deploy custom buttons:


1. Open a command prompt.

2. Run the DeployCustomFunction utility by specifying following command at the


command prompt:
3. At the respective prompts, specify the following information:
d. Database host.
e. Port
f. Service
g. Login username
h. Login password
4. When prompted, specify database connection information:
• database host, port, service, login username, and password
5. The DeployCustomFunction tool displays a menu of the following options.

Label Description
(L)ist Displays a list of currently-defined custom buttons.
(A)dd Adds a new custom button. The DeployCustomFunction tool prompts
you to specify:
• the JAR file for your custom button
• the name of the custom function class that implements the
com.siperian.mrm.customfunctions.api.CustomFunc
tion interface
• the type of the custom button: m—Merge Manager, h—Hierarchy
Manager (you can specify one or two letters)

986 Siperian Hub Administrator Guide


Adding Custom Buttons

Label Description
(U)pdate Updates the JAR file for an existing custom button.
The DeployCustomFunction tool prompts you to specify:
• the rowID of the custom button to update
• the JAR file for your custom button
• the name of the custom function class that implements the
com.siperian.mrm.customfunctions.api.CustomFunc
tion interface
• the type of the custom button: m—Merge Manager, h—Hierarchy
Manager (you can specify one or two letters)
(C)hange Type Changes the type of an existing custom button. The
DeployCustomFunction tool prompts you to specify:
• the rowID of the custom button to update
• the type of the custom button: m—Merge Manager, and /or
h—Hierarchy Manager (you can specify one or two letters)
(S)et Properties Specify a properties file, which defines name/value pairs that the
custom function requires at execution time (name=value).
The DeployCustomFunction tool prompts you to specify the
properties file to use.
(D)elete Deletes an existing custom button. The DeployCustomFunction tool
prompts you to specify the rowID of the custom button to delete.
(Q)uit Exits the DeployCustomFunction tool.

6. When you have finished choosing your actions, choose (Q)uit.


7. Refresh the browser window to display the custom button you just added.
8. Test your custom button to ensure that it works properly.

Implementing Custom Buttons in Hub Console Tools 987


Adding Custom Buttons

988 Siperian Hub Administrator Guide


F
Configuring Access to Hub Console Tools

Appendix Contents
• About User Access to Hub Console Tools
• Starting the Tool Access Tool
• Granting User Access to Tools and Processes
• Revoking User Access to Tools and Processes

About User Access to Hub Console Tools


For users who will be using the Hub Console in their jobs, you can control access
privileges to Hub Console tools. For example, data stewards typically have access to
only the Data Manager and Merge Manager tools.

You use the Tool Access tool in the Configuration workbench to configure access to
Hub Console tools. To use the Tool Access tool, you must be connected to the master
database.

Note: The Tool Access tool applies only to Siperian Hub users who are not configured
as administrators (users who do not have the Administrator check box selected in the
Users tool, as described in “Editing User Accounts” on page 870).

989
Starting the Tool Access Tool

Starting the Tool Access Tool


To start the Tool Access tool:
1. In the Hub Console, connect to the master database, if you have not already done
so.
2. Expand the Configuration workbench and click Tool Access.
The Hub Console displays the Tool Access tool.

In the above example, the cmx_global user account exists only to store the global
password policy, which is described in “Managing the Global Password Policy” on page
877.

990 Siperian Hub Administrator Guide


Granting User Access to Tools and Processes

Granting User Access to Tools and Processes


To grant user access to Hub Console tools and processes for a specific Siperian Hub
user:
1. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.

2. In the Tool Access tool, scroll the User list and select the user that you want to
configure.
3. Do one of the following:
• In the Available processes list, select a process to which you want to grant
access.
• In the Available workbenches list, select a workbench containing the tool(s)
to which you want to grant access.
4. Click the button.
The Tool Access tool adds the selected tool or process to the Accessible tools
and processes list. Granting access to a process automatically grants access to any
tool that the process uses. Granting access to a tool automatically grants access to
any process that uses the tool.

The user will have access to these processes and tools for every ORS to which they
have access. You cannot give a user access to one tool for one ORS and another tool
for a different ORS.

Note: If you want to grant access to only some of the tools in a workbench, then
expand the associated workbench in the Accessible tools and processes list, select
the tool, and revoke access according to the instructions in the next section, “Revoking
User Access to Tools and Processes” on page 992.

Configuring Access to Hub Console Tools 991


Revoking User Access to Tools and Processes

Revoking User Access to Tools and Processes


To revoke user access to Hub Console tools and processes for a specific Siperian Hub
user:
1. Acquire a write lock. To learn more, see “Acquiring a Write Lock” on page 30.

2. In the Tool Access tool, scroll the User list and select the user that you want to
configure.
3. Scroll the Accessible tools and processes list and select the process, workbench, or
tool to which you want to revoke access.
To select a tool, expand the associated workbench.
4. Click the button.
The Tool Access tool prompts you to confirm that you want to remove the access.
5. Click Yes.
The Tool Access tool removes the selected item from the Accessible tools and
processes list. Revoking access to a process automatically revokes access to any
tool that the process uses. Revoking access to a tool automatically revokes access
to any process that uses the tool.

992 Siperian Hub Administrator Guide


Glossary

accept limit

A number that determines the acceptability of a match. The accept limit is defined by
Siperian within a population in accordance with its match purpose.

active state (records)

This is a state associated with a base object or cross reference record. A base object
record is active if at least one of its cross reference records is active. A cross reference
record contributes to the consolidated base object only if it is active.

Active records participate in Hub processes by default. These are the records that are
available to participate in any operation. If records are required to go through an
approval process, then these records have been through that process and have been
approved.

Activity Manager

Siperian Activity Manager (AM) evaluates data events, synchronizes master data, and
delivers unified views of reference and activity data from disparate sources. AM builds
upon the extensible, template-driven schema of Siperian Hub and uses the rules-based,
configurable approach to combining reference, relationship, and activity data.
It conducts a rules evaluation using a combination of reference, transactional, and
analytical data from disparate sources. It also conducts an event-driven, rules-based
orchestration of data write-backs to selected sources, and performs other event-driven,
rules-based actions to centralize data integration and delivery of relevant data to

993
subscribing users and applications. AM has an intuitive, powerful UI for defining,
designing, delivering, and managing unified views to downstream applications and
systems, as well as built-in lineage, history, and audit functionality.

Admin source system

Default source system. Used for manual trust overrides and data edits from the Data
Manager or Merge Manager tools. See source system.

administrator

Siperian Hub user who has the primary responsibility for configuring the Siperian Hub
system. Administrators access Siperian Hub through the Hub Console, and use
Siperian Hub tools to configure the objects in the Hub Store, and create and modify
Siperian Hub security.

authentication

Process of verifying the identity of a user to ensure that they are who they claim to be.
In Siperian Hub, users are authenticated based on their supplied credentials—user
name / password, security payload, or a combination of both. Siperian Hub provides
an internal authentication mechanism and also supports user authentication via
third-party authentication providers. See credentials, security payload.

authorization

Process of determining whether a user has sufficient privileges to access a requested


Siperian Hub resource. In Siperian Hub, resource privileges are allocated to roles. Users
and user groups are assigned to roles. A user’s resource privileges are determined by
the roles to which they are assigned, as well as by the roles assigned to the user
group(s) to which the user belongs. See user, user group, role, resource, and privilege.

automerge

Process of merging records automatically. For merge-style base objects only. Match
rules can result in automatic merging or manual merging. A match rule that instructs

994 Siperian Hub Administrator Guide


Siperian Hub to perform an automerge will combine two or more records of a base
object table automatically, without manual intervention. See manual merge.

base object

A table that contains information about an entity that is relevant to your business, such
as customer or account.

batch group

A collection of individual batch jobs (for example, Stage, Load, and Match jobs) that
can be executed with a single command. Each batch job in a group can be executed
sequentially or in parallel to other jobs. See also batch job.

batch job

A program that, when executed, completes a discrete unite of work (a process).


For example, the Match job carries out the match process, checking the specified
match condition for the records of a base object table and then queueing the matched
records for either automerge (Automerge job) or manual merge (Manual Merge job).
See also batch group.

batch mode

Way of interacting with Siperian Hub via batch jobs, which can be executed in the Hub
Console or using third-party management tools to schedule and execute batch jobs (in
the form of stored procedures) on the database server. See also real-time mode, batch
job, batch group, stored procedure.

best version of the truth

A record that has been consolidated with the best cells of data from the source records.
Sometimes abbreviated as BVT.
• For merge-style base objects, the base object record is the BVT record, and is built
by consolidating the most-trustworthy cell values from the corresponding source
records.

Glossary 995
BI vendor

A company that produces Business Intelligence software products.

Build Match Group (BMG)

The process for removing redundant matching in advance of the consolidate process.
For example, suppose a base object had the following match pairs:
• record 1 matches to record 2
• record 2 matches to record 3
• record 3 matches to record 4

After running the match process and creating build match groups, and before the
running consolidation process, you might see the following records:
• record 2 matches to record 1
• record 3 matches to record 1
• record 4 matches to record 1

bulk merge

See automerge.

BVT

See best version of the truth.

cascade delete

When the Delete stored procedure deletes records in the parent object, it also removes
the affected records in the child base object. To enable a cascade delete operation, set
the CASCADE_DELETE_IND parameter to 1. The Delete job checks each child BO table
for related data that should be deleted given the removal of the parent BO record.

996 Siperian Hub Administrator Guide


If you do not set this parameter, Siperian Hub generates an error message if there are
child base object records referencing the deleted base object record; the Delete job
fails, and Siperian Hub performs a rollback operation for the associated data.

cascade unmerge

When records in a parent object are unmerged, Siperian Hub also unmerges affected
records in the child base object.

See also: linear unmerge, tree unmerge.

cell

Intersection of a column and a record in a table. A cell contains a data value or null.

change list

List of changes to make to a target repository. A change is an operation in the change


list—such as adding a base object or updating properties in a match rule—that is
executed against the target repository. Change lists represent the list of differences
between Hub repositories. See also creation change list, comparison change list,
Metadata Manager.

cleanse

See data cleansing.

cleanse engine

A cleanse engine is a third party product used to perform data cleansing with the
Siperian Hub.

cleanse function

Code changes the incoming data during Stage jobs, converting each input string to an
output string. Typically, these functions are used to standardize data and thereby

Glossary 997
optimize the match process. By combining multiple cleanse functions, you can perform
complex filtering and standardization. See also data cleansing, internal cleanse.

cleanse list

A logical grouping of rules for replacing parts of an input string during the cleanse
process. See cleanse function, data cleansing.

Cleanse Match Server

The Cleanse Match Server run-time component is a servlet that handles cleanse
requests. This servlet is deployed in an application server environment. The servlet
contains two server components:
• a cleanse server that handles data cleansing operations
• a match server that handles match operations

The Cleanse Match Server is multi-threaded so that each instance can process multiple
requests concurrently. It can be deployed on a variety of application servers.

The Cleanse Match Server interfaces with any of the supported cleanse engines, such as
the Trillium Director cleanse engine. The Cleanse Match Server and the cleanse engine
work to standardize the data. This standardization works closely with the Siperian
Consolidation Engine (formerly referred to as the Merge Engine) to optimize the data
for consolidation.

column

In a table, a set of data values of a particular type, one for each row of the table. See
system column, user-defined column.

comparison change list

A change list that is the result of comparing the contents of two repositories and
generating the list of changes to make to the target repository. Comparison change lists
are used in Metadata Manager when promoting or importing design objects. See also
change list, creation change list, Metadata Manager.

998 Siperian Hub Administrator Guide


complete match tracking

The display of the complete or original match chain that caused two records to be
matched through intermediate records.

conditional mapping

A mapping between a column in a landing table and a staging table that uses a SQL
WHERE clause to conditionally select only those records in the landing table that meet
the filter condition. See mapping, distinct mapping.

Configuration workbench

Includes tools for configuring a variety of Hub objects, including, the ORS, users,
security, message queues, and metadata validation.

consolidation process

Process of merging or linking duplicate records into a single record. The goal in
Siperian Hub is to identify and eliminate all duplicate data and to merge or link them
together into a single, consolidated record while maintaining full traceability.

consolidation indicator

Represents the consolidation state of a record in a base object. Stored in the


CONSOLIDATION_IND column. The consolidation indicator is one of the
following values:

Indicator
Value State Name Description
1 CONSOLIDATED Indicates the record has been through the
match and merge process.
2 UNMERGED Indicates that the record has gone through the
match process.

Glossary 999
Indicator
Value State Name Description
3 QUEUED_FOR_MATCH Indicates that the record is ready to be put
through the match process against the rest of
the records in the base object.
4 NEWLY_LOADED Indicates that the record has been newly loaded
into the base object and has not gone through
the match process.
9 ON_HOLD Indicates that the Data Steward has put the
record on hold, to deal with later.

control table

A type of system table in an ORS that Siperian Hub automatically creates for a base
object. Control tables are used in support of the load, merge, and unmerge processes.
For each trust-enabled column in a base object, Siperian Hub maintains a record (the
last update date and an identifier of the source system) in a corresponding control
table.

creation change list

A change list that is the result of exporting the contents of a repository. Creation
change lists are used in Metadata Manager for importing design objects. See also
change list, comparison change list, Metadata Manager.

credentials

What a user supplies at login time to gain access to Siperian Hub resources. Credentials
are used during the authorization process to determine whether a user is who they
claim to be. Login credentials might be a user name and password, a security payload
(such as a security token or some other binary data), or a combination of user
name/password and security payload. See authentication, security payload.

1000 Siperian Hub Administrator Guide


cross-reference table

A type of system table in an ORS that Siperian Hub automatically creates for a base
object. For each record of the base object, the cross-reference table contains zero to n
(0-n) records per source system. This record contains the primary key from the source
system and the most recent value that the source system has provided for each cell in
the base object table.

Customer Data Integration (CDI)

A discipline within Master Data Management (MDM) that focuses on customer master
data and its related attributes. See master data.

Data Access Services

These application server level capabilities enable Siperian Hub to support multiple
modes of data access and expose numerous Siperian Hub data services via the Siperian
Services Integration Framework (SIF). This facilitates both real-time synchronous
integration, as well as asynchronous integration.

database

Organized collection of data in the Hub Store. Siperian Hub supports two types of
databases: a Master Database and an Operational Record Store (Operational Record
Store). See Master Database, Operational Record Store (ORS), and Hub Store.

data cleansing

The process of standardizing data content and layout, decomposing and parsing text
values into identifiable elements, verifying identifiable values (such as zip codes) against
data libraries, and replacing incorrect values with correct values from data libraries. See
cleanse function.

Glossary 1001
Data Manager

Tool used to review the results of all merges—including automatic merges—and to


correct data content if necessary. It provides you with a view of the data lineage for
each base object record. The Data Manager also allows you to unmerge previously
merged records, and to view different types of history on each consolidated record.

Use the Data Manager tool to search for records, view their cross-references, unmerge
records, unlink records, view history records, create new records, edit records, and
override trust settings. The Data Manager displays all records that meet the search
criteria you define.

datasource

In the application server environment, a datasource is a JDBC resource that identifies


information about a database, such as the location of the database server, the database
name, the database user ID and password, and so on. Siperian Hub needs this
information to communicate with an ORS.

data steward

Siperian Hub user who has the primary responsibility for data quality. Data stewards
access Siperian Hub through the Hub Console, and use Siperian Hub tools to
configure the objects in the Hub Store.

Data Steward workbench

Part of the Siperian Hub UI used to review consolidated data as well as matched data
queued for exception handling by data analysts or stewards who understand the data
semantics and are guardians of data reliability in an organization.

Includes tools for using the Data Manager, Merge Manager, and Hierarchy Manager.

1002 Siperian Hub Administrator Guide


data type

Defines the characteristics of permitted values in a table column—characters, numbers,


dates, binary data, and so on. Siperian Hub uses a common set of data types for
columns that map directly data types for the database platform (Oracle or DB2) used
in your Siperian Hub implementation.

decay curve

Visually shows the way that trust decays over time. Its shape is determined by the
configured decay type and decay period. See decay period, decay type.

decay period

The amount of time (days, weeks, months, quarters, and years) that it takes for the trust
level to decay from the maximum trust level to the minimum trust level. See decay
curve, decay type.

decay type

The way that the trust level decreases during the decay period. See linear decay, RISL
decay, SIRL decay, decay curve, decay period.

deleted state (records)

Deleted records are records that are no longer desired to be part of the Hub’s data.
These records are not used in process (unless specifically requested). Records can only
be deleted explicitly and once deleted can be restored if desired. When a record that is
Pending is deleted, it is permanently deleted and cannot be restored.

delta detection

During the stage process, Siperian Hub only processes new or changed records when
this feature is enabled. Delta detection can be done either by comparing entire records
or via a date column.

Glossary 1003
dependent object

A table that is used to store detail information about the records in a base object (for
example, supplemental notes). One record in a base object table can map to multiple
records in a dependent object table.

design object

Parts of the metadata used to define the schema and other configuration settings for an
implementation. Design objects include instances of the following types of Siperian
Hub objects: base objects and columns, landing and staging tables, columns, indexes,
relationships, mappings, cleanse functions, queries and packages, trust settings,
validation and match rules, Security Access Manager definitions, Hierarchy Manager
definitions, and other settings. See metadata, Metadata Manager.

distinct mapping

A mapping between a column in a landing table and a staging table that selects only the
distinct records from the landing table. Using distinct mapping is useful in situations in
which you have a single landing table feeding multiple staging tables and the landing
table is denormalized (for example, it contains both customer and address data). See
mapping, conditional mapping.

distinct source system

A source system that provides data that gets inserted into the base object without being
consolidated. See source system.

distribution

Process of distributing the master record data to other applications or databases after
the best version of the truth has been establish via reconciliation. See reconciliation,
publish.

1004 Siperian Hub Administrator Guide


downgrade

Operation that occurs when inserting or updating data using the load process or using
CleansePut & Put APIs when a validation rule reduces the trust for a record by a
percentage.

duplicate

One or more records in which the data in certain columns (such as name, address, or
organization data) is identical or nearly identical. Match rules executed during the
match process determine whether two records are sufficiently similar to be considered
duplicates for consolidation purposes.

entity

In Hierarchy Manager, an entity is a typed object that can be related to other entities.
Examples of entities are: individual, organization, product, and household. See entity
type.

entity base object

An entity base object is a base object used to store information about Hierarchy
Manager entities. See entity type and entity.

entity type

In Hierarchy Manager, entity types define the kinds of objects that can be related using
Hierarchy Manager. Examples are individual, organization, product, and household. All
entities with the same entity type are stored in the same entity base object. In the HM
Configuration tool, entity types are displayed in the navigation tree under the Entity
Object with which the Type is associated. See entity.

exact match

A match / search strategy that matches only records that are identical. If you specify an
exact match, you can define only exact match columns for this base object

Glossary 1005
(exact-match base objects cannot have fuzzy match columns). A base object that uses
the exact match / search strategy is called an exact-match base object. See also match /
search strategy, fuzzy match.

exclusive lock

In the Hub Console, a lock that is required in order to make exclusive changes to the
underlying schema. An exclusive lock prevents all other Hub Console users from
making changes to the target database at the same time. An exclusive lock must be
released by the user with the exclusive lock; it cannot be cleared by another user. See
write lock.

execution path

The sequence in which batch jobs are executed when the entire batch group is
executed in the Siperian Hub. The execution path begins with the Start node and ends
with the End node. The Batch Group tool does not validate the execution sequence for
you—it is up to you to ensure that the execution sequence is correct.

export process

In Metadata Manager, the process of exporting metadata in a repository to a portable


change list XML file, which can then be used to import design objects into another
repository or to save it in a source control system for archival purposes. The export
process copies all supported design objects to the change list XML file. See also
Metadata Manager, validation process, import process, promotion process, change list.

external application user

Siperian Hub user who access Siperian Hub data indirectly via third-party applications.

external cleanse

The process of cleansing data prior to populating the landing tables. External cleansing
is typically performed outside of Siperian Hub using an extract-transform-load (ETL)

1006 Siperian Hub Administrator Guide


tool or some other data cleansing utility. See also data cleansing, extract-transform-load
(ETL) tool, internal cleanse.

external match

Process that allows you to match new data (stored in a separate input table) with
existing data in a fuzzy-match base object, test for matches, and inspect the results—all
without actually changing data in the base object in any way, or changing the match
table associated with the base object.

extract-transform-load (ETL) tool

A software tool (external to Siperian Hub) that extracts data from a source system,
transforms the data (using rules, lookup tables, and other functionality) to convert it to
the desired state, and then loads (writes) the data to a target database. For Siperian Hub
implementations, ETL tools are used to extract data from source systems and populate
the landing tables. See also data cleansing, external cleanse.

foreign key

In a relational database, a column (or set of columns) whose value corresponds to a


primary key value in another table (or, in rare cases, the same table). The foreign key
acts as a pointer to the other table. For example, the Department_Number column in
the Employee table would be a foreign key that points to the primary key of the
Department table.

fuzzy match

A match / search strategy that uses probabilistic matching, which takes into account
spelling variations, possible misspellings, and other differences that can make matching
records non-identical. If selected, Siperian Hub adds a special column (Fuzzy Match
Key) to the base object. This column is the primary field used during searching and
matching to generate match candidates for this base object. All fuzzy base objects have
one and only one Fuzzy Match Key. A base object that uses the fuzzy match / search
strategy is called a fuzzy-match base object. Using fuzzy match requires a selected
population. See also match / search strategy, exact match, and population.

Glossary 1007
global business identifier (GBID)

A column that contains common identifiers (key values) that allow you to uniquely and
globally identify a record based on your business needs. Examples include:
• identifiers defined by applications external to Siperian Hub, such as ERP or CRM
systems.
• Identifiers defined by external organizations, such as industry-specific codes (AMA
numbers, DEA numbers. and so on), or government-issued identifiers (social
security number, tax ID number, driver’s license number, and so on).

hard delete

A base object or XREF record is physically removed from the database. See soft delete.

Hierarchies Tool

Siperian Hub administrators use the design-time Siperian Hierarchies tool (was
previously the “Hierarchy Manager Configuration Tool”) to set up the structures
required to view and manipulate data relationships in Hierarchy Manager. Use the
Hierarchies tool to define Hierarchy Manager components—such as entity types,
hierarchies, relationships types, packages, and profiles—for your Siperian Hub
implementation. See Hierarchy Manager.

The Hierarchies tool is accessible via the Model workbench.

Hierarchy Manager

Part of the Siperian Hub UI used to set up the structures required to view and
manipulate data relationships. Siperian Hierarchy Manager (Hierarchy Manager or HM)
builds on Siperian Master Reference Manager (MRM) and the repository managed by
Siperian Hub for reference and relationship data. Hierarchy Manager gives you visibility
into how relationships correlate between systems, enabling you to discover
opportunities for more effective customer service, to maximize profits, or to enact
compliance with established standards.

The Hierarchy Manager tool is accessible via the Data Steward workbench.

1008 Siperian Hub Administrator Guide


hierarchy

In Hierarchy Manager, a set of relationship types. These relationship types are not
ranked based on the place of the entities of the hierarchy, nor are they necessarily
related to each other. They are merely relationship types that are grouped together for
ease of classification and identification. See hierarchy type, relationship, relationship
type.

hierarchy type

In Hierarchy Manager, a logical classification of hierarchies. The hierarchy type is the


general class of hierarchy under which a particular relationship falls. See hierarchy.

history table

A type of table in an ORS that contains historical information about changes to an


associated table. History tables provide detailed change-tracking options, including
merge and unmerge history, history of the pre-cleansed data, history of the base object,
and history of the cross-reference.

HM package

A Hierarchy Manager package represents a subset of an MRM package and contains


the metadata needed by Hierarchy Manager.

hotspot

In business data, a group of records representing overmatched data—a large


intersection of matches.

Hub Console

Siperian Hub user interface that comprises a set of tools for administrators and data
stewards. Each tool allows users to perform a specific action, or a set of related actions,
such as building the data model, running batch jobs, configuring the data flow, running

Glossary 1009
batch jobs, configuring external application access to Siperian Hub resources, and other
system configuration and operation tasks.

hub object

A generic term for various types of objects defined in the Hub that contain
information about your business entities. Some examples include: base objects,
dependent objects, cross reference tables, and any object in the hub that you can
associate with reporting metrics.

Hub Server

A run-time component in the middle tier (application server) used for core and
common services, including access, security, and session management.

Hub Store

In a Siperian Hub implementation, the database that contains the Master Database and
one or more Operational Record Store (ORS) database. See Master Database,
Operational Record Store (ORS).

immutable source

A data source that always provides the best, final version of the truth for a base object.
Records from an immutable source will be accepted as unique and, once a record from
that source has been fully consolidated, it will not be changed—even in the event of a
merge. Immutable sources are also distinct systems. For all source records from an
immutable source system, the consolidation indicator for Load and PUT is always 1
(consolidated record).

implementer

Siperian Hub user who has the primary responsibility for designing, developing, testing,
and deploying Siperian Hub according to the requirements of an organization. Tasks
include (but are not limited to) creating design objects, building the schema, defining
match rules, performance tuning, and other activities.

1010 Siperian Hub Administrator Guide


import process

In Metadata Manager, the process of adding design objects from a library or change list
to a repository. The design object does not already exist in the target repository.
See also Metadata Manager, validation process, promotion process, change list.

incremental load

Any load process that occurs after a base object has undergone its initial data load.
Called incremental loading because only new or updated data is loaded into the base
object. Duplicate data is ignored. See initial data load.

initial data load

The very first time that you data is loaded into an empty base object. During the initial
data load, all records in the staging table are inserted into the base object as new
records.

Insight Manager

The Insight Manager is a Siperian Hub product that generates reporting metadata for
data in the Hub Store, including information about data quality, hub performance, and
data steward productivity. Insight Manager uses this reporting metadata to create
reports and metrics for this data. In addition, third-party reporting tools can be
integrated into the Siperian Hub for report generation.

internal cleanse

The process of cleansing data during the stage process, when data is copied from
landing tables to the appropriate staging tables. Internal cleansing occurs inside
Siperian Hub using configured cleanse functions that are executed by the Cleanse
Match Server in conjunction with a supported cleanse engine. See also data cleansing,
cleanse engine, external cleanse.

Glossary 1011
job execution log

In the Batch Viewer and Batch Group tools, a log that shows job completion status
with any associated messages, such as success, failure, or warning.

job execution script

For Siperian Hub implementations, a script that is used in job scheduling software
(such as Tivoli or CA Unicenter) that executes Siperian Hub batch jobs via stored
procedures.

key match job

A Siperian Hub batch job that matches records from two or more sources when these
sources use the same primary key. Key Match jobs compare new records to each other
and to existing records, and then identify potential matches based on the comparison
of source record keys as defined by the primary key match rules. See primary key match
rule, match process.

key type

Identifies important characteristics about the match key to help Siperian Hub generate
keys correctly and conduct better searches. Siperian Hub provides the following match
key types: Person_Name, Organization_Name, and Address_Part1. See match process.

key width

During match, determines how fast searches are during match, the number of possible
match candidates returned, and how much disk space the keys consume. Key width
options are Standard, Extended, Limited, and Preferred. Key widths apply to fuzzy
match objects only. See match process.

land process

Process of populating landing tables from a source system. See source system, landing
table.

1012 Siperian Hub Administrator Guide


landing table

A table where a source system puts data that will be processed by Siperian Hub.

linear decay

The trust level decreases in a straight line from the maximum trust to the minimum
trust. See decay type, trust.

linear unmerge

A base object record is unmerged and taken out of the existing merge tree structure.
Only the unmerged base object record itself will come out the merge tree structure,
and all base object records below it in the merge tree will stay in the original merge
tree.

See also: cascade unmerge, tree unmerge.

load insert

When records are inserted into the target table (base object or dependent object).
During the load process, if a record in the staging table does not already exist in the
target table, then Siperian Hub inserts the record into the target table. See load process,
load update.

load process

Process of loading data from a staging table into the corresponding base object or
dependent object in the Hub Store. If the new data overlaps with existing data in the
Hub Store, Siperian Hub uses trust settings and validation rules to determine which
value is more reliable. See trust, validation rule, load insert, load update.

load update

When records are inserted into the target table (base object or dependent object).
During the load process, if a record in the staging table does not already exist in the

Glossary 1013
target table, then Siperian Hub inserts the record into the target table. See load process,
load insert.

lock

See write lock, exclusive lock.

lookup

Process of retrieving a data value from a parent table during Load jobs. In Siperian
Hub, when configuring a staging table associated with a base object, if a foreign key
column in the staging table (as the child table) is related to the primary key in a parent
table, you can configure a lookup to retrieve data from that parent table.

manual merge

Process of merging records manually. Match rules can result in automatic merging or
manual merging. A match rule that instructs Siperian Hub to perform a manual merge
identifies records that have enough points of similarity to warrant attention from a data
steward, but not enough points of similarity to allow the system to automatically merge
the records. See automerge.

manual unmerge

Process of unmerging records manually. See manual merge.

mapping

Defines a set of transformations that are applied to source data. Mappings are used
during the stage process (or using the SiperianClient CleansePut API request) to
transfer data from a landing table to a staging table. A mapping identifies the source
column in the landing table and the target column to populate in the staging table,
along with any intermediate cleanse functions used to clean the data. See conditional
mapping, distinct mapping.

1014 Siperian Hub Administrator Guide


master data

A collection of common, core entities—along with their attributes and their


values—that are considered critical to a company's business, and that are required for
use in two or more systems or business processes. Examples of master data include
customer, product, employee, supplier, and location data. See Master Data
Management (MDM), Customer Data Integration (CDI).

Master Data Management (MDM)

The controlled process by which the master data is created and maintained as the
system of record for the enterprise. MDM is implemented in order to ensure that the
master data is validated as correct, consistent, and complete,
and—optionally—circulated in context for consumption by internal or external
business processes, applications, or users. See master data, Customer Data Integration
(CDI).

Master Database

Database that contains the Siperian Hub environment configuration settings—user


accounts, security configuration, ORS registry, message queue settings, and so on.
A given Siperian Hub environment can have only one Master Database. The default
name of the Master Database is CMX_SYSTEM. See also Operational Record Store
(ORS).

Master Reference Manager (MRM)

Master Reference Manager (MRM) is the foundation product of Siperian Hub. Siperian
MRM consists of the following major components: Hierarchy Manager, Security
Access Manager, Metadata Manager, Services Integration Framework (SIF), Insight
Manager, and Activity Manager. Its purpose is to build an extensible and manageable
system-of-record for all master reference data. It provides the platform to consolidate
and manage master reference data across all data sources—internal and external—of
an organization, and acts as a system-of-record for all downstream applications.

Glossary 1015
match

The process of determining whether two records should be automatically merged or


should be candidates for manual merge because the two records have identical or
similar values in the specified columns. See match process.

match candidate

For fuzzy-match base objects only, any record in the base object that is a possible
match.

match column

A column that is used in a match rule for comparison purposes. Each match column is
based on one or more columns from the base object. See match process.

match column rule

Match rule that is used to match records based on the values in columns you have
defined as match columns, such as last name, first name, address1, and address2. See
primary key match rule, match process.

match key table

When you specify a match column, Siperian Hub creates a special key called a match
key (also known as a token string) on a special table called the match key table
(formerly referred to as the token table or strip table). Before the Siperian Hub Match
batch job runs, it first ensures that the correct match keys have been generated in the
match key table. The match job compares the match keys according to the match rules
that have been defined to determine which records are duplicates. See also tokenizing.

match list

Define custom-built standardization lists. Functions are pre-defined functions that


provide access to specialized cleansing functionality such as address verification or
address decomposition. See match process.

1016 Siperian Hub Administrator Guide


match path

Allows you to traverse the hierarchy between records—whether that hierarchy exists
between base objects (inter-table paths) or within a single base object (intra-table paths).
Match paths are used for configuring match column rules involving related records in
either separate tables or in the same table.

match process

Process of comparing two records for points of similarity. If sufficient points of


similarity are found to indicate that two records probably are duplicates of each other,
Siperian Hub flags those records for merging.

match purpose

For fuzzy-match base objects, defines the primary goal behind a match rule. For
example, if you're trying to identify matches for people where address is an important
part of determining whether two records are for the same person, then you would use
the Match Purpose called Resident. Each match purpose contains knowledge about
how best to compare two records to achieve the purpose of the match. Siperian Hub
uses the selected match purpose as a basis for applying the match rules to determine
matched records. The behavior of the rules is dependent on the selected purpose. See
match process.

match rule

Defines the criteria by which Siperian Hub determines whether records might be
duplicates. Match columns are combined into match rules to determine the conditions
under which two records are regarded as being similar enough to merge. Each match
rule tells Siperian Hub the combination of match columns it needs to examine for
points of similarity. See match process.

match rule set

A logical collection of match rules that allow users to execute different sets of rules at
different stages in the match process. Match rule sets include a search level that dictates

Glossary 1017
the search strategy, any number of automatic and manual match rules, and optionally, a
filter that allows you to selectively include or exclude records during the match process
Match rules sets are used to execute to match column rules but not primary key match
rules. See match process.

match subtype

Used with base objects that containing different types of data, such as an Organization
base object containing customer, vendor, and partner records. Using match subtyping,
you can apply match rules to specific types of data within the same base object. For
each match rule, you specify an exact match column that will serve as the “subtyping”
column to filter out the records that you want to ignore for that match rule. See match
process.

match table

Type of system table, associated with a base object, that supports the match process.
During the execution of a Match job for a base object, Siperian Hub populates its
associated match table with the ROWID_OBJECT values for each pair of matched
records, as well as the identifier for the match rule that resulted in the match, and an
automerge indicator. See match process.

match token

Strings that encode data in the columns used to identify candidates for matching.
Match tokens are fixed length, compressed, and encoded values built from a
combination of the words and numbers in a name or address such that relevant
variations have the same key value. For each record being matched, the match process
stores a generated match token in the tokenization table associated with the base
object. See match process.

match type

Each match column has a match type that determines how the match column will be
tokenized in preparation for the match comparison. See match process.

1018 Siperian Hub Administrator Guide


match / search strategy

Specifies the reliability of the match versus the performance you require: fuzzy or
exact. An exact match / search strategy is faster, but an exact match will miss some
matches if the data is imperfect. See fuzzy match, exact match., match process.

maximum trust

The trust level that a data value will have if it has just been changed. For example, if
source system A changes a phone number field from 555-1234 to 555-4321, the new
value will be given system A’s maximum trust level for the phone number field. By
setting the maximum trust level relatively high, you can ensure that changes in the
source systems will usually be applied to the base object.

merge process

Process of combining two or more records of a base object table because they have the
same value (or very similar values) in the specified match columns. See consolidation
process, automerge, manual merge, manual unmerge.

Merge Manager

Tool used to review and take action on the records that are queued for manual
merging.

message

In Siperian Hub, refers to a Java Message Service (JMS) message. A message queue
server handles two types of JMS messages:
• inbound messages are used for the asynchronous processing of Siperian Hub
service invocations
• outbound messages provide a communication channel to distribute data changes
via JMS to source systems or other systems.

Glossary 1019
message queue

A mechanism for transmitting data from one process to another (for example, from
Siperian Hub to an external application).

message queue rule

A mechanism for identifying base object events and transferring the effected records to
the internal system for update. Message queue rules are supported for updates, merges,
and records accepted as unique.

message queue server

In Siperian Hub, a Java Message Service (JMS) server, defined in your application
server environment, that Siperian Hub uses to manage incoming and outgoing JMS
messages.

message trigger

A rule that gets fired when which a particular action occurs within Siperian Hub. When
an action occurs for which a rule is defined, a JMS message is placed in the outbound
message queue. A message trigger identifies the conditions which cause the message to
be generated (what action on which object) and the queue on which messages are
placed.

metadata

Data that is used to describe other data. In Siperian Hub, metadata is used to describe
the schema (data model) that is used in your Siperian Hub implementation, along with
related configuration settings. See also Metadata Manager, design object, schema.

Metadata Manager

The Metadata Manager tool in the Hub Console is used to validate metadata for a
repository, promote design objects from one repository to another, import design
objects into a repository, and export a repository to a change list. See also metadata,

1020 Siperian Hub Administrator Guide


design object, validation process, import process, promotion process, export process,
change list.

metadata validation

See validation process.

minimum trust

The trust level that a data value will have when it is “old” (after the decay period has
elapsed). This value must be less than or equal to the maximum trust. If the maximum
and minimum trust are equal, the decay curve is a flat line and the decay period and
decay type have no effect. See also decay period.

Model workbench

Part of the Siperian Hub UI used to configure the solution during deployment by the
implementers, and for on-going configuration by data architects of the various types of
metadata and rules in response to changing business needs.

Includes tools for creating query groups, defining packages and other schema objects,
and viewing the current schema.

non-contributing cross reference

A cross-reference (XREF) record that does not contribute to the BVT (best version of
the truth) of the BO record. As a consequence, the values in XREF will never show up
in the BO record. Note that this is for state-enabled records only.

non-equal matching

When configuring match rules, prevents equal values in a column from matching each
other. Non-equal matching applies only to exact match columns.

Glossary 1021
null value

The absence of a value in a column of a record. Null is not the same as blank or zero.

operation

Deprecated term. See request.

Operational Record Store (ORS)

Database that contains the rules for processing the master data, the rules for managing
the set of master data objects, along with the processing rules and auxiliary logic used
by the Siperian Hub in defining the BVT. A Siperian Hub configuration can have one
or more ORS databases. The default name of an ORS is CMX_ORS. See also Master
Database.

overmatching

For fuzzy-match base objects only, a match that results in too many matches, including
matches that are not relevant. When configuring match, the goal is to find the optimal
number of matches for your data. See undermatching.

package

A package is a public view of one or more underlying tables in Siperian Hub. Packages
represent subsets of the columns in those tables, along with any other tables that are
joined to the tables. A package is based on a query. The underlying query can select a
subset of records from the table or from another package.

password policy

Specifies password characteristics for Siperian Hub user accounts, such as the password
length, expiration, login settings, password re-use, and other requirements. You can
define a global password policy for all user accounts in a Siperian Hub implementation,
and you can override these settings for individual users.

1022 Siperian Hub Administrator Guide


path

See match path.

pending state (records)

Pending records are records that have not yet been approved for general usage in the
Hub. These records can have most operations performed on them, but operations have
to specifically request Pending records. If records are required to go through an
approval process, then these records have not yet been approved and are in the midst
of an approval process.

policy decision points (PDPs)

Specific security check points that determine, at run time, the validity of a user’s
identity (authentication), along with that user’s access to Siperian Hub resources
(authorization).

policy enforcement points (PEPs)

Specific security check points that enforce, at run time, security policies for
authentication and authorization requests.

population

Defines certain characteristics about data in the records that you are matching. By
default, Siperian Hub comes with the US population, but Siperian provides a standard
population per country. Populations account for the inevitable variations and errors
that are likely to exist in name, address, and other identification data; specify how
Siperian Hub builds match tokens; and specify how search strategies and match
purposes operate on the population of data to be matched. Used only with the Fuzzy
match/search strategy.

Glossary 1023
primary key

In a relational database table, a column (or set of columns) whose value uniquely
identifies a record. For example, the Department_Number column would be the
primary key of the Department table.

primary key match rule

Match rule that is used to match records from two systems that use the same primary
keys for records. See also match column rule.

private resource

A Siperian Hub resource that is hidden from the Roles tool, preventing its access via
Services Integration Framework (SIF) operations. When you add a new resource in
Hub Console (such as a new base object), it is designated a PRIVATE resource by
default. See also secure resource, resource.

privilege

Permission to access a Siperian Hub resource. With Siperian Hub internal


authorization, each role is assigned one of the following privileges.

Privilege Allows the User To....


READ View data.
CREATE Create data records in the Hub Store.
UPDATE Update data records in the Hub Store.
MERGE Merge and unmerge data.
EXECUTE Execute cleanse functions and batch groups.

Privileges determine the access that external application users have to Siperian Hub
resources. For example, a role might be configured to have READ, CREATE,
UPDATE, and MERGE privileges on particular packages and package columns. These
privileges are not enforced when using the Hub Console, although the settings still
affect the use of Hub Console to some degree. See secure resource, role.

1024 Siperian Hub Administrator Guide


profile

In Hierarchy Manager, describes what fields and records an HM user may display, edit,
or add. For example, one profile can allow full read/write access to all entities and
relationships, while another profile can be read-only (no add or edit operations
allowed).

promotion process

Meaning depends on the context:


• Metadata Manager: Process of copying changes in design objects from one
repository to another. Promotion is used to copy incremental changes between
repositories. See also Metadata Manager, validation process, import process,
change list.
• State Management: Process of changing the system state of individual records in
Siperian Hub (for example from PENDING state to ACTIVE state).

provider

See security provider.

provider property

A name-value pair that a security provider might require in order to access for the
service(s) that they provide.

publish

Process of submitting a Siperian Hub message to a message queue for distribution to


other applications, databases, and so on. See distribution.

query

A request to retrieve data from the Hub Store. Siperian Hub allows administrators to
specify the criteria used to retrieve that data. Queries can be configured to return

Glossary 1025
selected columns, filter the result set with a WHERE clause, use complex query syntax
(such as GROUP BY, ORDER BY, and HAVING clauses), and use aggregate
functions (such as SUM, COUNT, and AVG).

query group

A logical group of queries. A query group is simply a mechanism for organizing


queries. See query.

raw table

A table that archives data from a landing table.

real-time mode

Way of interacting with Siperian Hub via third-party applications, which invoke
Siperian Hub operations via the Services Integration Framework (SIF) interface. SIF
provides operations for various services, such as reading, cleansing, matching, inserting,
and updating records. See also batch mode, Services Integration Framework (SIF).

reconciliation

For a given entity, Siperian Hub obtains data from one or more source systems, then
reconciles “multiple versions of the truth” to arrive at the master record—the best
version of the truth—for that entity. Reconciliation can involve cleansing the data
beforehand to optimize the process of matching and consolidating records for a base
object. See distribution.

record

A row in a table that represents an instance of an object. For example, in an Address


table, a record contains a single address. See also source record.

1026 Siperian Hub Administrator Guide


referential integrity

Enforcement of parent-child relationship rules among tables based on configured


foreign key relationship.

regular expression

A computational expression that is used to match and manipulate text data according
to commonly-used syntactic conventions and symbolic patterns. In Siperian Hub, a
regular expression function allows you to use regular expressions for cleanse
operations. To learn more about regular expressions, including syntax and patterns,
refer to the Javadoc for java.util.regex.Pattern.

reject table

A table that contains records that Siperian Hub could not insert into a target table,
such as:
• staging table (stage process) after performing the specified cleansing on a record of
the specified landing table
• Hub store table (load process)

A record could be rejected because the value of a cell is too long, or because the
record’s update date is later than the current date.

relationship

In Hierarchy Manager, describes the affiliation between two specific entities. Hierarchy
Manager relationships are defined by specifying the relationship type, hierarchy type,
attributes of the relationship, and dates for when the relationship is active. See
relationship type, hierarchy.

relationship base object

A relationship base object is a base object used to store information about Hierarchy
Manager relationships.

Glossary 1027
relationship type

Describes general classes of relationships. The relationship type defines:


• the types of entities that a relationship of this type can include
• the direction of the relationship (if any)
• how the relationship is displayed in the Hub Console

See relationship, hierarchy.

repository

An Operational Record Store (ORS). The ORS stores metadata about its own schema
and related property settings. In Metadata Manager, when copying metadata between
repositories, there is always a source repository that contains the design object to copy, and
the target repository that is destination for the design object. See also Metadata Manager,
validation process, import process, promotion process, export process, change list.

request

Siperian Hub request (API) that allows external applications to access specific Siperian
Hub functionality using the Services Integration Framework (SIF), a request/response
API model.

resource

Any Siperian Hub object that is used in your Siperian Hub implementation. Certain
resources can be configured as secure resources: base objects, dependent objects,
mappings, packages, remote packages, cleanse functions, HM profiles, the audit table,
and the users table. In addition, you can configure secure resources that are accessible
by SIF operations, including content metadata, match rule sets, metadata, batch groups,
the audit table, and the users table. See private resource, secure resource, resource
group.

1028 Siperian Hub Administrator Guide


resource group

A collection of secure resources that simplify privilege assignment, allowing you to


assign privileges to multiple resources at once, such as easily assigning resource groups
to a role. See resource, privilege.

Resource Kit

The Siperian Hub Resource Kit is a set of utilities, examples, and libraries that provide
examples of Siperian Hub functionality that can be expanded on and implemented.

RISL decay

Rapid Initial Slow Later decay puts most of the decrease at the beginning of the decay
period. The trust level follows a concave parabolic curve. If a source system has this
decay type, a new value from the system will probably be trusted but this value will
soon become much more likely to be overridden.

role

Defines a set of privileges to access secure Siperian Hub resources. See user, user
group, privilege.

row

See record.

rule

See match rule.

rule set

See match rule set.

Glossary 1029
rule set filtering

Ability to exclude records from being processed by a match rule set. For example, if
you had an Organization base object that contained multiple types of organizations
(customers, vendors, prospects, partners, and so on), you could define a match rule set
that selectively processed only vendors. See match process.

schema

The data model that is used in a customer’s Siperian Hub implementation. Siperian
Hub does not impose or require any particular schema. The schema is independent of
the source systems.

Schema Manager

The Schema Manager is a design-time component in the Hub Console used to define
the schema, as well as define the staging and landing tables. The Schema Manager is
also used to define rules for match and merge, validation, and message queues.

Schema Viewer Tool

The Schema Viewer tool is a design-time component in the Hub Console used to
visualize the schema configured for your Siperian Hub implementation. The Schema
Viewer is particularly helpful for visualizing a complex schema.

search levels

Defines how stringently Siperian Hub searches for matches: narrow, typical, exhaustive,
or extreme. The goal is to find the optimal number of matches for your data—not too
few (undermatching), which misses significant matches, or too many (overmatching),
which generates too many matches, including insignificant ones. See overmatching,
undermatching.

1030 Siperian Hub Administrator Guide


secure resource

A protected Siperian Hub resource that is exposed to the Roles tool, allowing the
resource to be added to roles with specific privileges. When a user account is assigned
to a specific role, then that user account is authorized to access the secure resources via
SIF according to the privileges associated with that role. In order for external
applications to access a Siperian Hub resource using SIF operations, that resource must
be configured as SECURE. Because all Siperian Hub resources are PRIVATE by
default, you must explicitly make a resource SECURE after the resource has been
added. See also private resource, resource.

Status Setting Description


SECURE
PRIVATE Hides this Siperian Hub resource from the Roles tool. Default. Prevents its
access via Services Integration Framework (SIF) operations. When you add
a new resource in Hub Console (such as a new base object), it is designated
a PRIVATE resource by default.

security

The ability to protect information privacy, confidentiality, and data integrity by


guarding against unauthorized access to, or tampering with, data and other resources in
your Siperian Hub implementation. See also authentication, authorization, privilege,
resource.

Security Access Manager (SAM)

Security Access Manager (SAM) is Siperian’s comprehensive security framework for


protecting Siperian Hub resources from unauthorized access. At run-time, SAM
enforces your organization’s security policy decisions for your Siperian Hub
implementation, handling user authentication and access authorization according to
your security configuration.

Security Access Manager workbench

Includes tools for managing users, groups, resources, and roles.

Glossary 1031
security provider

A third-party application that provides security services (authentication, authorization,


and user profile services) for users accessing Siperian Hub.

security payload

Raw binary data supplied to a Siperian Hub operation request that can contain
supplemental data required for further authentication and/or authorization.

segment matching

Way of limiting match rules to specific subsets of data. For example, you could define
different match rules for customers in different countries by using segment matching
to limit certain rules to specific country codes. Segment matching is configured on a
per-rule basis and applies to both exact-match and fuzzy-match base objects.

Services Integration Framework (SIF)

The part of Siperian Hub that interfaces with client programs. Logically, it serves as a
middle tier in the client/server model. It enables you to implement the
request/response interactions using any of the following architectural variations:
• Loosely coupled Web services using the SOAP protocol.
• Tightly coupled Java remote procedure calls based on Enterprise JavaBeans (EJBs)
or XML.
• Asynchronous Java Message Service (JMS)-based messages.
• XML documents going back and forth via Hypertext Transfer Protocol (HTTP).

SIRL decay

Slow Initial Rapid Later decay puts most of the decrease at the end of the decay period.
The trust level follows a convex parabolic curve. If a source system has this decay type,
it will be relatively unlikely for any other system to override the value that it sets until
the value is near the end of its decay period.

1032 Siperian Hub Administrator Guide


soft delete

A base object or an XREF record is marked as deleted in a user attribute or in the


HUB_STATE_IND. See hard delete.

source record

A raw record from a source system. See also record, source system.

source system

An external system that provides data to Siperian Hub. See distinct source system,
source record.

stage process

Process of reading the data from the landing table, performing any configured
cleansing, and moving the cleansed data into the corresponding staging table. If you
enable delta detection, Siperian Hub only processes new or changed records. See
staging table, landing table.

staging table

A table where cleansed data is temporarily stored before being loaded into base objects
and dependent objects via load jobs. See stage process, load process.

state-enabled base object

A base object for which state management is enabled.

state management

The process for managing the system state of base object and XREF records to affect
the processing logic throughout the MRM data flow. You can assign a system state to
base object and XREF records at various stages of the data flow using the Hub tools
that work with records. In addition, you can use the various Hub tools for managing

Glossary 1033
your schema to enable state management for a base object, or to set user permissions
for controlling who can change the state of a record.

State management is limited to the following states: ACTIVE, PENDING, and


DELETED.

state transition rules

Rules that determine whether and when a record can change from one state to another.
State transition rules differ for base object and cross-reference records.

stored procedure

A named set of Structured Query Language (SQL) statements that are compiled and
stored on the database server. Siperian Hub batch jobs are encoded in stored
procedures so that they can be run using job execution scripts in job scheduling
software (such as Tivoli or CA Unicenter).

stripping

Deprecated term. See tokenizing.

strip table

Deprecated term. See match key table.

system column

A column in a table that Siperian Hub automatically creates and maintains. System
columns contain metadata. Common system columns for a base object include
ROWID_OBJECT, CONSOLIDATION_IND, and LAST_UPDATE_DATE. See
column, user-defined column.

1034 Siperian Hub Administrator Guide


system state

Describes how base object records are supported by Siperian Hub. The following states
are supported: ACTIVE, PENDING, and DELETED. See state management.

Systems and Trust Tool

Systems and Trust tool is a design-time tool used to name the source systems that can
provide data for consolidation in Siperian Hub. You use this tool to define the trust
settings associated with each source system for each trust-enabled column in a base
object.

table

In a database, a collection of data that is organized in rows (records) and columns.


A table can be seen as a two-dimensional set of values corresponding to an object.
The columns of a table represent characteristics of the object, and the rows represent
instances of the object. In the Hub Store, the Master Database and each Operational
Record Store (ORS) represents a collection of tables. Base objects and dependent
objects are stored as tables in an ORS.

target database

In the Hub Console, the Master Database or an Operational Record Store (ORS) that
is the target of the current tool. Tools that manage data stored in the Master Database,
such as the Users tool, require that your target database is the Master Database. Tools
that manage data stored in an ORS require that you specify which ORS to

tokenizing

Specialized form of data standardization that is performed before the match


comparisons are done. For the most basic match types, tokenizing simply removes
“noise” characters like spaces and punctuation. The more complex match types result
in the generation of sophisticated match codes—strings of characters representing the
contents of the data to be compared—based on the degree of similarity required. See
also match key table, match token.

Glossary 1035
token table

Deprecated term. See match key table.

traceability

The maintenance of data so that you can determine which systems—and which records
from those systems—contributed to consolidated records.

transactional data

Represents the actions performed by an application, typically captured or generated by


an application as part of its normal operation. It is usually maintained by only one
system of record, and tends to be accurate and reliable in that context. For example,
your bank probably has only one application for managing transactional data resulting
from withdrawals, deposits, and transfers made on your checking account.

tree unmerge

Unmerge a tree of merged base object records as an intact sub-structure. A sub-tree


having unmerged base object records as root will come out from the original merge
tree structure. (For example, merge a1 and a2 into a, then merge b1 and b2 into b, and
then finally merge a and b into c. If you then perform a tree unmerge on a, and then
unmerge a from a1, a2 is a sub tree and will come out from the original tree c. As a
result, a is the root of the tree after the unmerge.)

See also: cascade unmerge, linear unmerge.

trust

Mechanism for measuring the confidence factor associated with each cell based on its
source system, change history, and other business rules. Trust takes into account the
age of data, how much its reliability has decayed over time, and the validity of the data.

1036 Siperian Hub Administrator Guide


trust level

For a source system that provides records to Siperian Hub, a number between 0 and
100 that assigns a level of confidence and reliability to that source system, relative to
other source systems. The trust level has meaning only when compared with the trust
level of another source system.

trust score

The current level of confidence in a given record. During load jobs, Siperian Hub
calculates the trust score for each records. If validation rules are defined for the base
object, then the Load job applies these validation rules to the data, which might further
downgrade trust scores. During the consolidation process, when two records are
candidates for merge or link, the values in the record with the higher trust score wins.
Data stewards can manually override trust scores in the Merge Manager tool.

undermatching

For fuzzy-match base objects only, a match that results in too few matches, which
misses relevant matches. When configuring match, the goal is to find the optimal
number of matches for your data. See overmatching.

unmerge

Process of unmerging previously-merged records. For merge-style base objects only.


See manual unmerge.

user

An individual (person or application) who can access Siperian Hub resources. Users are
represented in Siperian Hub by user accounts, which are defined in the Master Database.
See user group, Master Database.

Glossary 1037
user-defined column

Any column in a table that is not a system column. User-defined columns are added in
the Schema Manager and usually contain business data. See column, system column.

user exit

An unencrypted stored procedure that includes a set of fixed, pre-defined parameters.


The procedure is configured, on a per-base object basis, to execute at a specific point
during a Siperian batch process run.

Developers can extend Siperian Huh batch processes by adding custom code to the
appropriate user exit procedure for pre- and post-batch job processing. See stored
procedure.

user group

A logical collection of user accounts. See user.

user object

User-defined functions or procedures that are registered with the Siperian Hub to
extend its functionality. There are four types of user objects:

User Object Description


User Exits A user-customized, unencrypted stored procedure that
includes a set of fixed, pre-defined parameters. The procedure
is configured, on a per-base object basis, to execute at a
specific point during a Siperian batch process run.
Custom Stored Procedures Stored procedures that are registered in table C_REPOS_
TABLE_OBJECT and can be invoked from Batch Manager.
Custom Java Cleanse Java cleanse functions that supplement the standard cleanse
Functions libraries with customer logic. These functions are basically Jar
files and stored as BLOBs in the database.
Custom Button Functions Custom UI functions that supply additional icons and logic in
Data Manager, Merge Manager and Hierarchy Manager.

1038 Siperian Hub Administrator Guide


Utilities workbench

Includes tools for auditing application event, configuring and running batch groups,
and generating the SIF APIs.

validation process

Process of verifying the completeness and integrity of the metadata that describes a
repository. The validation process compares the logical model of a repository with its
physical schema. If any issues arise, the Metadata Manager generates a list of issues
requiring attention. See also Metadata Manager.

validation rule

Rule that tells Siperian Hub the condition under which a data value is not valid. When
data meets the criteria specified by the validation rule, the trust value for that data is
downgraded by the percentage specified in the validation rule. If the Reserve Minimum
Trust flag is set for the column, then the trust cannot be downgraded below the
column’s minimum trust.

workbench

In the Hub Console, a mechanism for grouping similar tools. A workbench is a logical
collection of related tools. For example, the Cleanse workbench contains
cleanse-related tools: Cleanse Match Server, Cleanse Functions, and Mappings.

write lock

In the Hub Console, a lock that is required in order to make changes to the underlying
schema. All non-data steward tools (except the ORS security tools) are in read-only
mode unless you acquire a write lock. Write locks allow multiple, concurrent users to
make changes to the schema. See exclusive lock.

Glossary 1039
1040 Siperian Hub Administrator Guide
Index
A events 920
log entries, examples of 934
Accept Non-Matched Records As Unique message queues 928
jobs 715, 760 password changes 922
ACTIVE system state, about 206 purging the audit log 935
Address match purpose 553 systems to audit 923
Address_Part1 key type 521 viewing the audit log 933
Admin source system XML 921
about the Admin source system 349 authentication
renaming 353 about authentication 832
allow null foreign key 371 external authentication providers 833
allow null update 370 external directory authentication 833
ANSI Code Page 946 internal authentication 833
asynchronous batch jobs 755 authorization
audience xxv about authorization 833
Audit Manager external authorization 833
about the Audit Manager 921 internal authorization 833
starting 922 Auto Match and Merge jobs 716, 762
types of items to audit 923 Autolink jobs 715, 762
audit trails, configuring 399
Automerge jobs 717, 764
auditing Auto Match and Merge jobs 718
about integration auditing 920
API requests 926
audit log 930 B
audit log table 931
base object style 106
Audit Manager tool 921
base objects
authentication and 921
adding columns 90
configurable settings 924
converting to entity base objects 244
enabling 921
creating 107
errors 929

1041
defined 995 batch jobs
defining 94 about batch jobs 668
deleting 116 Accept Non-Matched Records As Unique
described 83 715, 760
editing 108 asynchronous execution 755
exact match base objects 320 Auto Match and Merge jobs 716, 762
fuzzy match base objects 320 Autolink jobs 715, 762
history table 101 automatically-created batch jobs 672
impact analysis 115 Automerge jobs 717, 764
load inserts 307 BVT Snapshot jobs 719
load updates 309 C_REPOS_JOB_CONTROL table 757
overview of 94 C_REPOS_JOB_METRIC table 757
record survivorship, state management C_REPOS_JOB_METRIC_TYPE
211 table 757
relationship base objects 498 C_REPOS_JOB_STATUS_TYPEC
reserved suffixes 88 table 757
reverting from relationship base objects C_REPOS_TABLE_OBJECT_V
264 table 754
style 106 clearing history 687
system columns 95 command buttons 680
batch groups configurable options 679
about batch groups 798 configuring 667
adding 691 design considerations 671
cmxbg.execute_batchgroup stored proce- executing 680
dure 799 executing, about 750
cmxbg.get_batchgroup_status stored pro- execution scripts 750
cedure 803 External Match jobs 719, 766
cmxbg.reset_batchgroup stored proce- foreign key relationships and 671
dure 802 Generate Match Token jobs 767
deleting 693 Generate Match Tokens jobs 725
editing 693 Hub Delete jobs 726
executing 701 job execution logs 682
executing with stored procedures 798 job execution status 682
levels, configuring 694 Key Match jobs 727, 773
stored procedures for 799 Load jobs 727, 775

1042 Siperian Hub Administrator Guide


Manual Link jobs 732 BVT Snapshot jobs 719
Manual Merge jobs 732
Manual Unlink jobs 733
Manual Unmerge jobs 733 C
Match Analyze jobs 738, 785 C_REPOS_AUDIT table 931
Match for Duplicate Data jobs 740, 786 C_REPOS_JOB_CONTROL table 757
Match jobs 734, 783 C_REPOS_JOB_METRIC table 757
Migrate Link Style to Merge Style jobs C_REPOS_JOB_METRIC_TYPE table 757
740 C_REPOS_JOB_STATUS_TYPEC table
Multi Merge jobs 741 757
Promote jobs 741, 790 C_REPOS_SYSTEM table 99, 349
properties of 678 C_REPOS_TABLE_OBJECT_V
refreshing the status 681 about 751
rejected records 685 C_REPOS_TABLE_OBJECT_V table
Reset Links jobs 744 751, 754
Reset Match Table jobs 744 cascade delete, about 769
results monitoring 755 cascade unmerge 597, 779
Revalidate jobs 745, 794 cell update 369
running manually 677 cleanse functions
scheduling 749 about cleanse functions 414
selecting 677 aggregation 383
sequencing batch jobs 670 availability of 415
setting job status to incomplete 681 Cleanse Functions tool 415
Stage jobs 745, 795 cleanse lists 440
status, setting 681, 708 conditional execution components 438
supporting tables 669 configuration overview 417
Synchronize jobs 467, 747, 796 constants 433
Unmerge jobs 779 decomposition 382
when changes occur 673 function modes 432
Batch Viewer tool graph functions 424
about 674 inputs 434
starting 674 Java libraries 419
best version of the truth 340 libraries 415, 418
best version of the truth (BVT) 92, 343 logging 432
build match groups (BMGs) 327, 822 mappings 381

1043
outputs 435 modes 407
properties of 416 on-line operations 408
regular expression functions 422 properties of 410
secure resources 415 testing 413
testing 437 cleansing data
types 416 about cleansing data 406
types of 414 setup tasks 406
user libraries 418 Unicode settings 945
using 414 clearing history of batch jobs 687
workspace buttons 433 cmxbg.execute_batchgroup 799
workspace commands 432 cmxbg.get_batchgroup_status 803
Cleanse Functions tool cmxbg.reset_batchgroup 802
starting 415 cmxue package
workspace buttons 433 user exits for Oracle databases 956
workspace commands 432 CMXUT.CLEAN_TABLE
cleanse lists removing BO data 810
about cleanse lists 440 color choices window 267
adding 441 columns
editing 442 adding to tables 125
exact match 445 data types 126
match output strings, importing 449 properties of 127
match strings, importing 445 reserved names 89
regular expression match 445 command buttons 43
SQL match 445 complete tokenize ratio 102
string matches 445 concepts 3
Cleanse Match Servers conditional execution components
about Cleanse Match Servers 407 about conditional execution
adding 411 components 438
batch jobs 408 adding 439
Cleanse Match Server tool 409 when to use 438
cleanse requests 408 conditional mapping 394
configuring 407 configuration requirements
deleting 413 User Object Registry, for custom code
distributed 408 910
editing 412 Configuration workbench 48, 990

1044 Siperian Hub Administrator Guide


consolidated record 92 adding 986
consolidation appearance of 980
best version of the truth 340 clicking 979
indicator 999 custom functions, writing 981
state 999 deploying 986
consolidation indicator examples of 982
about the consolidation indicator 289 icons 985
sequence 290 listing 986
values 289 properties file 987
consolidation process text labels 985
data flow 337 type change 987
managing 341 updating 987
options 339 custom functions
overview 335 client-based 982
CONSOLIDATION_IND column 95 deleting 987
constants 433 server-based 982
constraints, allowing to be disabled 103 writing 981
Contact match purpose 556 custom indexes
control tables 457 about custom indexes 111
Corporate_Entity match purpose 557 adding 112
CREATE_DATE column 95, 121, 366 deleting 114
CREATOR column 95, 121, 366 editing 114
cross-reference tables navigating to the node 112
about cross-reference tables 97 custom Java cleanse functions
columns 99 viewing 915
defined 84, 1001 custom java cleanse functions
described 84 viewing and registering 915
history table 101 custom queries
relationship to base objects 98 about custom queries 190
ROWID_XREF 99 adding 190
custom button functions deleting 193
registering 916 editing 193
viewing 917 custom stored procedures
custom buttons about 913
about custom buttons 978 about custom stored procedures 806

1045
example code 811 decay periods 1003
index, registering 809 decay types
parameters of 807 defined 1003
registering 808 linear 460
viewing 914 RISL 460
SIRL 460
DELETED system state, about 207
D DELETED_BY column 95, 99, 365
data cleansing DELETED_DATE column 96, 99, 365
about data cleansing 406 DELETED_IND column 95, 99, 365
Cleanse Match Servers 407 delta detection
defined 1002 configuring 401
Data Manager 1002 considerations for using 403
Data Steward workbench 50 defined 1003
data stewards how handled 403
tools for 50 landing table configuration 357
data types 126 DEP_PKEY_SRC_OBJECT column 120
Database Debug Log DEP_ROWID_SYSTEM column 120
writing messages to 810 dependent objects
database object name, constraints 88 creating 119, 121
databases defined 1004
database ID 70 deleting 125
selecting 23 described 83, 119
target database 21 editing 123
Unicode, configuring 940 load inserts 308
user access 875 load updates 312
Databases tool 61 system columns 120
about the Databases tool 60 DIRTY_IND column 96
starting 61 display packages 197
datasources distinct mapping 393
about datasources 77 distinct source systems 596
creating 77 Division match purpose 555
JDBC datasources 77 documentation xxviii
removing 78 duplicate data
decay curve 459 eliminating 999

1046 Siperian Hub Administrator Guide


match for 326 exclusive write lock
duplicate match threshold 103 acquiring 30
execution scripts 750
exhaustive search level 535
E extended key widths 522
encrypting passwords 74 external application users 867, 1006
entities External Match jobs 766
about 240 about External Match jobs 719
display options 250 input table 720
entity base objects output table 722
about 240 running 724
adding 242 external match tables
converting from base objects 244 system columns 721
reverting to base objects 251 extreme search level 535
entity icons
adding 239
deleting 240 F
editing 239 Family match purpose 551
uploading default 237 Fields match purpose 557
entity icons, configuring 238 filters
entity objects about filters 511
about 240 adding 512
entity types deleting 514
about 240, 241 editing 513
adding 246 properties of 511
assigning HM packages to 275 foreign key relationship base object
deleting 250 creating 259
editing 249 foreign key relationships
errors, auditing 929 creating 140, 143
exact matches defined 143
exact match / search strategy 544 lookups 376
exact match base objects 320 supported 671
exact match columns 515, 563 foreign keys, defined 1007
exact match strategy 493 foreign-key relationships
exclusive locks 28 about foreign-key relationships 140

1047
adding 143 H
deleting 147
hierarchies
editing 145
about 253
virtual relationships 144
adding 253
fuzzy matches
configuring 253
fuzzy match / search strategy 544
deleting 254
fuzzy match base objects 320, 519
editing 254
fuzzy match columns 515
Hierarchies tool
fuzzy match strategy 493
configuration overview 225
starting 234
G Hierarchy Manager
configuration overview 225
GBID columns 129
entity icons, uploading 237
Generate Match Tokens jobs 725, 767
prerequisites 224
Generate Match Tokens on Load 730
repository base object tables 235
generating match tokens on load 104
sandboxes 281
global
upgrading from previous versions 237
password policy 877
highest reserved key 369
roles 1008
history
Global Identifier (GBID) columns 129
enabling 102
glossary 993
history tables
graph functions
base object history tables 101
about graph functions 424
cross-reference history tables 101
adding 425
defined 84, 1009
adding functions to 427
enabling 108
conditional execution components 438
HM packages
inputs 425
about HM packages 269
outputs 425 adding 270
group execution logs assigning to entity types 275
status values 705
configuring 269
viewing 706 deleting 275
editing 275
Household match purpose 550
Hub
installation details 47

1048 Siperian Hub Administrator Guide


tools 48 I
version information 47
immutable rowid object 594
Hub Console
importing table column definitions 135
about the Hub Console 18
IN_OVERRIDE_HISTORY_IND 771
accessing 19
IN_PURGE_HISTORY_IND 772
customizing the interface 45
incremental loads 302
login 20
index
organization of 24
custom, registering 809
Processes view 26
Individual match purpose 549
Quick Launch tab 46
initial data loads (IDLs) 302
target database
inputs 434
connecting to 23
Insight Manager
selecting 21
overview 1011
toolbar 46
integration auditing 920
window sizes and positions 46
intended audience xxv
wizard welcome screens 46
interaction ID column, about 208
Hub Delete jobs 726
intertable matching
history tables, impact on 770
described 564
IN_OVERRIDE_HISTORY_IND 771
inter-table paths 498
IN_PURGE_HISTORY_IND 772
intra-table paths 502
records on hold (CONSOLIDATION_
IND=9), impact on 770
stored procedure, about 769 J
Hub Store 83
creating databases in 58 JAR files
databases 56 ORS-specific, downloading 822
Master Database 56 Java archive (JAR) files
Operational Record Store (ORS) 56 tools.jar 827, 820

properties of 70 Java compilers 820, 827


schema 82 JDBC data sources
table types 83 security, configuring 880
HUB_STATE_IND column 366 JMS Event Schema Manager
HUB_STATE_IND column, about 207 about 824
auto-searching for out-of-sync
objects 829

1049
finding out-of-sync objects 828 LAST_ROWID_SYSTEM column 96
starting 825 LAST_UPDATE_DATE column 95, 121,
JMS Event Schema Manager tool 356, 366
about 824 limited key widths 522
linear decay 460, 1013
linear unmerge 780
K Load jobs 727, 775
Key Match jobs 727, 773 forced updates, about 730
key types 521 Generate Match Tokens on Load 730
key widths 522 load batch size 103
rejected records 685
rules for running 729
L load process
land process data flow 300
C_REPOS_SYSTEM table 349 load inserts 306
configuration tasks 348 load updates 306
data flow 293 overview 299
external batch process 294 steps for managing data 304
extract-transform-load (ETL) tool 294 tables, associated 301
landing tables 292 loading by rowid 394
managing 294 loading data 453
overview 292 incremental loads 302
real-time processing (API calls) 294 initial data loads (IDLs) 302
source systems 292 locking
ways to populate landing tables 294 expiration 29
landing tables locks
about landing tables 355 about locks 28
adding 358 types of 28
columns 356 login
defined 83, 1013 changing 32
editing 360 entering 20
properties of 351, 357 lookups
removing 361 about lookups 376
Unicode 945 configuring 377

1050 Siperian Hub Administrator Guide


M child records 564
defined 1016
manual dynamic match analysis threshold setting
batch jobs 677 495
Manual Merge jobs 732 fuzzy population 494
Manual Unmerge jobs 733 Match Analyze jobs 738
merges 1014 match batch 329
Manual Link jobs 732 Match for Duplicate Data jobs 740
Manual Unlink jobs 733 match lists 1016
mapping match minutes, maximum elapsed 103
between staging and landing tables 83 match only once setting 495
defined 1014 match only previous rowid objects setting
diagrams 385 494
removing 398 match output strings, importing 449
testing 397 match pool 329
mappings match strings, importing 445
about mappings 380 match subtype 559
adding 386 match table, resetting 744
cleansed 381 match tables 734
conditional mapping 394 match tokens, generate on PUT 105
configuring 380, 389 match/search strategy 493
copying 387 maximum matches for manual consolida-
distinct mapping 393 tion 490
editing 388 non-equal matching 560
jumping to a schema 395 NULL matching 561
loading by rowid 394 path 497
passed through 381 populations 941
properties of 386 populations for fuzzy matches 494
query parameters 392 properties
Mappings tool 384, 746 about match properties 490
Master Database 56 setting 488, 505
creating 58 segment matching 562
password, changing 72 strategy
match exact matches 493
accept all unmatched rows as unique 492 fuzzy matches 493
check for missing children 508

1051
string matches in cleanse lists 445 data flow 319
types in match columns 1018 exact match base objects 320
Match Analyze jobs 785 execution sequence 329
match column rules fuzzy match base objects 320
adding 565 managing 333
deleting 572 match key table 321
editing 570 match pairs 331
match columns 1016 match rules 320
about match columns 515 match table 331
exact match base objects 527 match tables 321
exact match columns 515 overview 317
fuzzy match base objects 519 populations 326
fuzzy match columns 515 support tables 321
key widths 522 transitive matches 327
match key types 521 match purposes
missing children 507 field types 517
Match for Duplicate Data jobs 786 match rule sets
Match jobs 734, 783 about match rule sets 531
state-enabled BOs 735 adding 538
match key tables deleting 542
defined 84 editing 539
match link editing the name 541
Autolink jobs 715 filters 536
Manual Link jobs 732 properties of 534
Manual Unlink jobs 733 search levels 534
Migrate Link Style to Merge Style jobs match rules 1017
740 about match rules 320
Reset Links jobs 744 accept limit 558
match paths defining 542
about match paths 497 exact match columns 563
inter-table paths 498 match / search strategy 544
intra-table paths 502 match levels 558
relationship base objects 498 match purposes
match process about 545
build match groups (BMGs) 327 Address match purpose 553

1052 Siperian Hub Administrator Guide


Contact match purpose 556 enabling for state changes 215
Corporate_Entity match purpose 557 message queues
Division match purpose 555 about message queues 344, 608
Family match purpose 551 adding 608, 609
Fields match purpose 557 auditing 928
Household match purpose 550 defined 1020
Individual match purpose 549 deleting 611
Organization match purpose 554 editing 611
Person_Name match purpose 548 message check interval 604
Resident match purpose 549 Message Queues tool 603
Wide_Contact match purpose 557 properties of 608
Wide_Household match purpose 552 receive batch size 604
primary key match rules receive timeout 604
about 578 rules 1020
adding 578 status of 604
editing 581, 582 message schema
properties of 542 ORS-specific, generating 823
Reset Match Table jobs 744 ORS-specific, generating and
types of 320 deploying 827
match subtype 559 message triggers
matching about message triggers 612
defined 1017 adding 615
duplicate data 326 considerations for 614
maximum trust 459, 1019 deleting 622
merge editing 621
defined 1019 types of 612
Manual Merge jobs 732 messages
Manual Unmerge jobs 733 elements in 623
Merge Manager tool 336, 1019 examples
message queue servers accept as unique message 625
about message queue servers 605 AmRule message 626
adding 606 BoDelete message 628
deleting 608 BoSetToDelete message 629
editing 607 delete message 630
message queue triggers insert message 631

1053
merge message 632, 652 Multi Merge jobs 741
merge update message 633
no action message 634
PendingInsert message 635 N
PendingUpdate message 636 narrow search level 534
PendingUpdateXref message 637 New Query Wizard 167, 191
unmerge message 639 NLS_LANG 947
update message 640 non-equal matching 560
update XREF message 641 non-exclusive locks 28
XRefDelete message 642 NULL matching 561
XRefSetToDelete message 643 null values
examples (legacy) allowing null values in a column 127
accept as unique message 646
bo delete message 647
bo set to delete message 648 O
delete message 649 object record stores (ORSs)
insert message 651 logical names 818, 824
merge update message 653 OBJECT_FUNCTION_TYPE_DESC
pending insert message 654 list of values 753
pending update message 655 Operational Record Stores (ORS)
pending update XREF message 656 about ORSs 56
unmerge message 660 assigning users to 886
update message 657 configuring 62
update XREF message 658 connection testing 71
XREF delete message 661 creating 58
XREF set to Delete message 662 editing 69
filtering 625, 645 editing registration properties 67
message fields 644 GETLIST limit (rows) 70
metadata JNDI data source name 70
synchronizing 138 password, changing 73
trust 138 registering 62
Migrate Link Style to Merge Style jobs 740 unregistering 76
minimum trust 459, 1021 Oracle databases
missing children, checking for 507 user exits located in cmxue package 956
Model workbench 49 Organization match purpose 554

1054 Siperian Hub Administrator Guide


Organization_Name key type 521 properties of 507
ORS-specific operations PENDING system state
using SIF Manager tool 818 enabling match 214
outputs 435 PENDING system state, about 207
Person_Name key type 521
Person_Name match purpose 548
P PKEY_SRC_OBJECT column 99, 365
Package Wizard 199 populations
packages configuring 941
about packages 196 multiple populations 943
creating 199 non-US populations 941
defined 1022 selecting 494
deleting 204 POST_LANDING
display packages 197 parameters 958
HM packages 269 user exit, using 958
join queries 204 POST_LOAD
Package Wizard 199 user exit, using 961
properties of 201 POST_MATCH
PUT-enabled packages 197 parameters 962
queries and packages 162 user exit, using 962
refreshing after query change 203 POST_MERGE
when to create 197 parameters 963
parallel degree 104 user exit, using 963
password policies POST_STAGE
global password policies 877 parameters 960
private password policies 879 user exit, using 959
passwords POST_UNMERGE
changing 32 parameters 964
encrypting 74 user exit, using 964
global password policy 877 PRE_STAGE
private passwords 880 parameters 959
path components user exit, using 959
adding 509 PRE_USER_MERGE_ASSIGNMENT
deleting 510 user exit, using 965
editing 509 preface xxv

1055
preferred key widths 522 run-time flow 345
preserving source system keys 368 XSD file 344
primary key match rules purposes, match 545
about 578 PUT_UPDATE_MERGE_IND column 99
adding 578 PUT-enabled packages 197
deleting 582
editing 581
primary keys 1024 Q
Private password policy 880 queries
Processes view 26 about queries 162
product support xxxi adding 166
profiles columns 174
about profiles 278 conditions 178
adding 278 custom queries 190
copying 282 deleting 195
deleting 283 editing 168
editing 280 impact analysis, viewing 194
validating 280 join queries 204
Promote batch job New Query Wizard 167, 191
promoting records 218 overview of 162
Promote jobs 741 packages and queries 162
about 790 Queries tool 162, 164, 198
providers results, viewing 193
custom-added 899 sort order for results 183
providers.properties file SQL, viewing 190
example 901 tables 170
publish process Queries tool 164, 198
distribution flow 343 query groups
managing 346 about query groups 164
message queues 344 adding 165
message triggers 343 deleting 166
optional 343 editing 165
ORS-specific schema file 344
overview 342

1056 Siperian Hub Administrator Guide


R roles
about roles 854
rapid slow initial later (RISL) decay 460 adding 857
raw tables 1026 assigning resource privileges to roles 859
regular expression functions defined 1029
about regular expression functions 422 deleting 858
adding 422 editing 858
reject tables 383, 1027 global roles 1008
relationship base objects 498 Roles tool 855
about 255 row locking, enabling during batch 105
converting to 261 ROWID_OBJECT column 95, 99, 120, 365
creating 256 ROWID_SYSTEM column 99
reverting to base objects 264 ROWID_XREF column 99, 120
relationship types
about 256
adding 265 S
deleting 268
sandboxes 281
editing 268
schema
relationships
ORS-specific, generating 818
about relationships 255
Schema Manager
foreign key relationships 140
adding columns to tables 125
repository base object (RBO) tables 235
base objects 92
requeue on parent merge 104
dependent objects 117
Reset Links jobs 744
filtering items 35
Reset Match Table jobs 744
foreign key relationships 140
Resident match purpose 549
searching for items 40
resource groups
show public Siperian system tables 39
adding 850
sorting display names 35
deleting 853
starting 90
editing 852
schema match columns 744
Resource Kit
schema objects 87
definition 1029
schema trust columns 747
resource privileges, assigning to roles 859
Schema Viewer
Revalidate jobs 745, 794
column names 156
RISL decay 1029
command buttons 150

1057
context menu 155 uploading 893
Diagram pane 149 security provider, defined 1032
hierarchic view 153 Security Providers tool
options 156 about security providers 889
orientation 156 provider files 892
orthogonal view 154 starting 890
Overview pane 149 segment matching 562
panes 149 sequencing batch jobs 670
printing 158 SIF API
saving as JPG 157 ORS-specific, generating 818
starting 148 ORS-specific, removing 823
toggling views 154 ORS-specific, renaming 821
zooming all 152 SIF Manager
zooming in 150 generating ORS-specific APIs 818
zooming out 151 out-of-sync objects, finding 823
schemas SIF Manager tool
about schemas 82 about 818
search levels for match rule sets 534 SIRL decay 1032
secure resources 1031 source systems
security about source systems 348
authentication 832 adding 352
authorization 833 Admin source system 349
concepts 832 defined 1033
configuring 831 defining 348
defined 1031 distinct source systems 596
JDBC data sources, configuring 880 highest reserved key 369
roles 854 immutable source systems 594
tools 50 preserving keys 368
Security Access Manager (SAM) 832 removing 354
Security Access Manager workbench 50 renaming 353
security provider files system repository table
about security provider files 892 (C_REPOS_SYSTEM) 349
deleting 895 Systems and Trust tool, starting 350
list of provider files 892 SRC_LUD column 99
selecting 893 SRC_ROWID column 366

1058 Siperian Hub Administrator Guide


Stage jobs 745, 795 enabling 211
lookups 376 enabling match on pending records 214
rejected records 685 history of XREF promotion, enabling 213
Stage process HUB_STATE_IND column 207
user exits 957 interaction ID column 208
stage process Load jobs 727
data flow 296 Match jobs 735
managing 298 message queue triggers, enabling 215
overview 295 modifying record states 216
tables, associated 297 Promote batch job 218
stages 1033 promoting records 216
staging data rules for loading data 221
prerequisites 364 state transition rules, about 208
setup tasks 364 state-enabled base object
staging tables defined 1033
about staging tables 364 stored procedures
adding 371 batch groups 799
allow null foreign key 371 batch jobs, list 758
allow null update 370 C_REPOS_TABLE_OBJECT_V,
cell update 369 about 751
column properties 370 custom stored procedures 806
columns 130, 365 executing batch groups 798
columns, creating 130 Hub Delete jobs 769
defined 83, 1033 OBJECT_FUNCTION_TYPE_DESC
editing 374 753
highest reserved key 369 removing BO data 810
jumping to source system 376 support xxxi
lookups 376 survivorship 291
preserve source system keys 368 Synchronize jobs 467, 747, 796
properties of 367 synchronizing metadata 138
removing 380 system columns
standard key widths 522 base objects 95
state management dependent objects 120
about 206 described 126, 141
base object record survivorship 211 external match tables 721

1059
system repository table 349 reject tables 383
system states staging tables 83
about 206 supporting tables used by batch
system tables, showing 39 process 669
Systems and Trust tool 350 system repository table
(C_REPOS_SYSTEM) 349
target database
T changing 31
table columns selecting 21
about table columns 126 technical support xxxi
adding 134 tokenization 1035
deleting 139 tokenization process
editing 137 about the tokenization process 322
Global Identifier (GBID) columns 129 DIRTY_IND column 323
importing from another table 135 match keys 322
staging tables 130 match tokens 322
tables when to execute 322
adding columns to 125 Tool Access tool 990
base objects 83 tools
C_REPOS_AUDIT table 931 Batch Viewer tool 674
C_REPOS_JOB_CONTROL table 757 Cleanse Functions tool 415
C_REPOS_JOB_METRIC table 757 Data Steward tools 50
C_REPOS_JOB_METRIC_TYPE Databases tool 61
table 757 described 48
C_REPOS_JOB_STATUS_TYPE Mappings tool 746
table 757 Merge Manager tool 336
C_REPOS_TABLE_OBJECT_V Queries tool 162, 164, 198
table 751 Schema Manager 90
control tables 457 security tools 50
cross-reference tables 84 Tool Access tool 990
dependent objects 83 user access to 989
history tables 84 Users tool 868
Hub Store 83 utilities tools 51
landing tables 83 write locks 28
match key tables 84 traceability 337

1060 Siperian Hub Administrator Guide


training xxx unmerge child when parent unmerges 597
tree unmerge 780 Unmerge jobs 779
trust 460 cascade unmerge 779
about trust 455 linear unmerge 780
assigning trust levels 462 tree unmerge 780
calculations 456 unmerge all 780
considerations for setting 460 UPDATED_BY column 95, 121, 366
decay curve 459 user exits
decay graph types 460 about 912, 956
decay periods 455 cmxue package (Oracle) 956
defined 1036 POST_LANDING 958
defining 462 POST_LOAD 961
enabling 461 POST_MATCH 962
levels 455 POST_MERGE 963
maximum trust 459 POST_STAGE 959
minimum trust 459 POST_UNMERGE 964
properties of 459 PRE_STAGE 959
slow initial rapid later (SIRL) decay 460 PRE_USER_MERGE_ASSIGNMENT
synchronizing trust settings 747 965
Systems and Trust tool 350 Stage process 957
trust levels, defined 455 types 957
typical search level 534 viewing 913
user groups
about user groups 881
U adding 883
Unicode assigning users to 885
ANSI Code Page 946 defined 1038
cleanse settings 945 deleting 884
configuring 940 editing 883
Hub Console 945 User Object Registry
NLS_LANG 947 about 910
Unix and locale recommendations 945 configuration requirements for custom
unmerge code 910
cascade unmerge 597 custom button functions, viewing 917
manual unmerge 733

1061
custom Java cleanse functions, viewing custom validation rules 473, 477
915 defined 468, 1039
custom stored procedures, viewing 914 defining 468
starting 911 domain checks 473
user exits, viewing 913 downgrade percentage 474
user objects editing 480, 481
about 910 enabling columns for validation 470
users examples of 476
about users 867 execution sequence 471
adding 869 existence checks 473
assigning to Operational Record Stores pattern validation 473
(ORS) 886 properties of 473
database access 875 referential integrity 473
deleting 874 removing 482
editing 870 required columns 469
external application users 867 reserve minimum trust 474
global password policies 877 rule column properties 474
password settings 874 rule name 473
private password policies 879 rule SQL 474
properties of 868 rule types 473
supplemental information 872 state-enabled base objects 469
tool access 989 validation checks 468
types of users 867
user accounts 867
Users and Groups tool 882 W
Users tool 868 Web Services Description Language
utilities tools 51 (WSDL)
Utilities workbench 51 ORS-specific APIs 821
Wide_Contact match purpose 557
Wide_Household match purpose 552
V Workbenches view 25
validation checks 468 workbenches, defined 24
validation rules write lock
about validation rules 468 acquiring 30
adding 478 releasing 30

1062 Siperian Hub Administrator Guide


tools that require 28 X
write locks
XSD file
exclusive locks 28
downloading 828
non-exclusive locks 28

1063
1064 Siperian Hub Administrator Guide

You might also like