You are on page 1of 15

Proceedings of the 2012 9th International Pipeline Conference IPC2012 September 24-28, 2012, Calgary, Alberta, Canada

IPC2012-90177

EXPERIENCES IN ESTABLISHING TRUSTWORTHY DIGITAL REPOSITORIES WITHIN A LARGE MULTI-NATIONAL PIPELINE COMPANY
David A. Weir Enbridge Pipelines Inc. Edmonton, Alberta, Canada Pankaj Bhawnani Enbridge Inc. Calgary, Alberta, Canada ABSTRACT Traditionally business areas within an organization individually manage data essential for their operation. This data may be incorporated into specialized software applications, MS Excel or MS Access etc., e-mail filing, and hardcopy documents. These applications and data stores support the local business area decision-making and add to its knowledge. There have been problems with this approach. Data, knowledge and decisions are only captured locally within the business area and in many cases this information is not easily identifiable or available for enterprise-wide sharing. Furthermore, individuals within the business areas often keep shadow files of data and information. The state of accuracy, completeness, and timeliness of the data contained within these files is often questionable. Information created and managed at a local business level can be lost when a staff member leaves his or her role. This is especially significant given ongoing changes in todays workforce. Data must be properly managed and maintained to retain its value within the organization. The development and execution of single version of the truth or master data management requires a partnership between the business areas, records management, legal, and the information technology groups of an organization. Master data management is expected to yield significant gains in staff effectiveness, efficiency, and productivity. In 2011, Enbridge Pipelines applied the principles of master data management and trusted data digital repositories to a widely used, geographically dispersed small database (less than 10,000 records) that had noted data shortcomings such as Stephen Murray Enbridge Inc. Calgary, Alberta, Canada Douglas Rosenberg Enbridge Pipelines Inc. Edmonton, Alberta, Canada

incomplete or incorrect data, multiple shadow files, and inconsistent usage throughout the organization of the application that stewards the data. This paper provides an overview of best practices in developing an authoritative single source of data and Enbridge experience in applying these practices to a real-world example. Challenges of the approach used by Enbridge and lessons learned will be examined and discussed. INTRODUCTION Although some sharing of data or information occurs through enterprise-wide or business unit managed applications or databases, traditionally business areas within a pipeline company have individually managed the data, information, and records essential for their operation. Data and information required by a business area is often incorporated into specialized software applications, Microsoft Excel or Access, Lotus Notes, other desktop software, and hardcopy documents. These applications and data or information stores support the business area decision making and are local corporate knowledge captured within the business areas or silos. In many cases, this corporate knowledge is neither easily identifiable nor trusted, and is not available for sharing with other business areas, such that business areas may be making decisions with incomplete or inaccurate data. Lastly, multiple data sets with differing versions of the truth may exist across business areas in the pipeline organization. As a result, the same inquiry presented to each business area may yield different results. Efforts to reconcile this disparate data into one trusted,

Copyright 2012 by ASME

authoritative and consistent data set may require significant effort. Some data management or coordination efforts are expended in an attempt to tie applications and data stores together across business areas; but unless supported by longterm business area and individual staff objectives these efforts tend to involve a short-term expenditure of effort often leading to failure and management disillusionment in the promise of data management. Management of data is an ongoing requirement that requires procedures and process in place to ensure that it is sustained. Transfer of data between business areas can involve manual handling of data or information which leads to issues with completeness and accuracy. In addition, shadow files of data and information are often kept by individuals within the business areas. These shadow files can be redundant within the business area or across the organization. The state of accuracy, completeness and timeliness of the data and information stored within these files is often questionable. Value-add changes and enhancements to the data and information contained within shadow files is physically separated from the original data or information source and thus to the organization as a whole, such that decisions made using this data are often not reproducible. Managing and maintaining the unique data and information requirements of all applications, spreadsheets, other software and occasionally documentation requires redundant (often manual) effort within the organization. PURPOSE In 2011, Enbridge Liquid Pipelines applied the principles of trusted digital repositories and master data management to a widely used, geographically dispersed small database (less than 10,000 records) that had noted data shortcomings such as incomplete and incorrect data, multiple shadow files, and inconsistent usage throughout the organization. The development and execution of these principles required a partnership between Enbridge business areas, records management, legal staff and the information technology groups. The purpose of this paper is to describe Enbridges initiative with a view to show how to manage oil and gas data integrity with the same rigor used to manage pipeline integrity. It starts with a discussion of definitions followed by a review of data quality issues and data management drivers. It then discusses data management fundamentals and challenges to provide a context for the consideration of best practices and benefits of proper data management. Future trends are noted prior to the introduction and discussion of the case study. Case study future work, lessons learned and challenges encountered conclude the paper. DEFINITIONS The current data management environment is a confusing mixture of terms and definitions. It is therefore prudent to define terms before entering into discussion and debate about

trustworthy or authoritative repositories and master data management. Data, Information, and Knowledge - Data in its most basic form is numbers without context. For example 326,000 is a number and it is data. When context is added to a number, then it becomes information. For example, throughput for Line X is 326,000 barrels per day. This information in turn is used to make decisions and is corporate knowledge. Records - Electronic and hard copy data, information and other documents that are retained for reference and as a means of preserving knowledge. Often records have a retention period, that is, a date after which the records have no value and can be deleted or destroyed. Data Management - The process of managing data as a resource that is valuable to an organization or business and includes the governance and stewardship overseeing the data lifecycle. Data management recognizes data as an asset that must be properly managed to retain its value. Data must also be accessible. Because data may be accessed and/or transformed many times within an organization, strong data management governance and stewardship policies must be in place to ensure that data is as accurate, up to date, and complete as possible. Data Lifecycle Figure 1 illustrates the data lifecycle..
Data Acquisition Or Creation Data Access (Use) Or Transform (Value Add) Data Archive Or Data Delete Permanent Archive Retention Period

Figure 1 Data Management Lifecycle Master Data - The data that has been cleansed, rationalized, and integrated into an enterprise-wide system of record for core business activities. Master data includes key data that matters the most. Master data is used in different applications across the organization [2]. Master Data Management - Is the implementation of data governance policies, procedures, and infrastructure that support the capture, integration, and subsequent shared use of accurate, timely, consistent and complete master data as illustrated in Figure 2 [2].

Copyright 2012 by ASME

Governance
Policies Procedures Infrastructure

Action
Capture Integration Sharing

Quality
Accuracy Timeliness Conciseness Completeness

are widely distributed in various business areas throughout an organization. DATA MANAGEMENT DRIVERS A number of forces, both internal and external to a pipeline company are driving the need to make changes in the way that the organization manages and maintains its electronic and hardcopy data, information, and records. This includes the establishment of trustworthy business data. Business indicators (clues) that suggest the need for establishing trustworthy business data include [7] : Federated business units and autonomous operating divisions that run separate business applications that can generate data in different ways, formats, and in different measurement units. A number of mergers and acquisitions, or new business development that brings disparate data into corporate information systems. Changes in the best mix of technology tools creating data quality issues when migrating data from old to new systems. Complex reporting requiring significant manual staff work gathering data from different sources. An analysis of business indicators reveals seven broadly categorized drivers for establishing data management: 1) staff access to current, accurate, and complete data, information, and records, 2) staff effectiveness and retention, 3) organizational agility as a competitive advantage, 4) capture and retention of corporate knowledge, 5) management of change, 6) the ability to respond successfully to external scrutiny and audit, and 7) records management standards and support in response to stricter legal requirements related to data preservation and protection. DATA MANAGEMENT BENEFITS Implementation of data management and master data management in trustworthy and authoritative data stores has a number of immediate benefits including: 1. Identified Master Data Owners and Sources The identification of business areas responsible for the upkeep of master data reduces the time required by all staff in locating and accessing data, allowing staff to focus on value-add analysis and decision-making activities rather than expending effort on ensuring that data is current, complete, and accurate. This in turn yields gains in staff efficiency and effectiveness. Development of Sustained Master Data Management Implementation of sustained corporate-wide master data management governance and stewardship facilitates staff flexibility and movement. In addition,

Figure 2 Essential Master Data Management Practices Data Governance & Data Stewardship - Data Governance is the organization and implementation of policies, procedures, structure, roles, and responsibilities that outline and enforce rules of engagement, decision rights, and accountabilities for effective management of data assets [3]. Data Stewardship formalizes accountability for managing data and information resources on behalf of others and for the best interests of the organization as outlined by the data governance. Authoritative Data officially recognized data that can be certified and is provided by an authoritative source [4]. Authoritative Source An entity that is authorized by an executive or legal authority to develop or manage data for a specific business purpose. The data this entity creates is the authoritative data. An entity can be a person, organization, publication, or data store [4]. Data Steward An organization within an authoritative source that is charged with the collection and maintenance of authoritative data. The term data steward is often confused with the term authoritative source [4]. Trusted Source and Trusted Data A service provider, agency, person, data store or publication that publishes data from a number of authoritative sources. These publications are often compilations and subsets of the data from more than one authoritative source. It is trusted because there is an official process for compiling the data from authoritative sources and the limitations, currency, and attributes of the data are known and documented [4]. Trustworthiness means that a system does exactly what it claims to do [11]. A trustworthy digital store is defined as the authoritative source that yields accurate and complete data for shared and consistent use across different business contexts. Master data is stored in a trustworthy digital repository. DEFINITION OF DATA For the purposes of this paper data includes all the data, information, knowledge, documents and records contained in databases, e-mails, presentations, spreadsheets, video and photographs, recorded voice and voice messages, drawings, filing cabinets, etc. that is essential for day to day and continued operation irrespective of electronic or physical format. These

2.

Copyright 2012 by ASME

legal and regulatory requirements and expectations regarding complete, accurate, and up to date data can be met. In the long term, a potential reduction in the number of redundant data stores and/or IT applications or systems used by the business areas may be realized. 3. Implementation of Records Management Policies and Practices The implementation of records management policies and practices facilitates the organizations ability to be audited, answer regulatory and legal information requests and be prepared for potential litigation.

units and senior management. Clear ownership should be defined for data and data management activities. This is often possible to create as a tightly run focused project, but very difficult to sustain over a longer period of time with staff turnover, priority changes etc., so a best practice is to always create a parallel audit program with a clear mandate. 4. Selecting the right technology: Tools purchased from vendors or developed in-house should work with multiple applications, tools and platforms including legacy systems to capture, integrate, share and synchronize important or master data periodically, at a fixed interval, or on a real-time basis. Oftentimes, it is difficult to get sufficient resources and budgetary support to achieve continued integration. Controlling time and cost: Bringing all of a large organizations master data under the umbrella of a single master data environment might take a few years and millions of dollars in investment. Challenges include budget overruns and possible cancellation of the initiative due to lack of interest and poor sponsorship. As a result, the data management initiative must be properly planned and cost and time overruns carefully monitored. Following a robust well-defined process: Ownership should be defined for each activity involving data management (for example, each piece of master data must have an identified owner). There should be a robust authorization process in place to verify any change in key identified data elements and/or their governance or stewardship. Creating a culture of data management: The business community drives the data management initiative and plays a leading role, especially in the identification and standardization of important data elements and data ownership. Data management should never follow a big-bang approach. A big organization may have hundreds of key entities, reporting structures and business rules; however bringing all of them into master data is never a practical ideal. A priority list must be made and a timeline for the same must be drawn. For example, in a banking scenario, customers may be the most

DATA MANAGEMENT CHALLENGES Creating a trustworthy, single and authoritative view of key information across the organization presents significant challenges. The greatest complexity comes from business process changes and process integration issues as no organization remains stagnant over time. Some of the key challenges worth discussing are: 1. Ensuring organization wide participation: Data management should be an organizational initiative and should not be driven by a single department or business unit. Single departments or units include pilot users or first adopters, but organization wide holistic and strategic thinking is required for successful execution of data management initiatives. Given the cross-system and cross-business unit nature of key data elements, a comprehensive data management program brings together business leaders, IT, records management, legal, and operational and field staff to create data management strategies and solutions. Bringing together all these stakeholders, defining their roles and responsibilities, ensuring their sponsorship and participation and keeping all actively involved is a daunting task. Aligning business process with data management: Business units need to spend more time and effort to change their business processes and applications to align with a centralized data governance model. This may require significant investment in business process re-design, technology, training and organizational change management. Defining and implementing data governance: It is necessary to define policies to drive data management projects. A data management group is often required to be setup with adequate representation from all business

5.

6.

2.

7.

3.

Copyright 2012 by ASME

important entity and should be brought into the master data management umbrella first. BEST PRACTICES Manage Data as an Important Asset Data is a corporate asset. It has value and it must be properly managed and maintained to retain its value within the organization. It must also be accessible. Successful management of data by an organization is a component of management systems and Operational Excellence. Data is owned by the business areas. Information Technology (IT) Groups within organizations manage the data repositories and provide the tools and technology that enable effective business ownership, stewardship, and use of data across the organization. Data management is a matter of standards adoption, process design and cultural change before implementation of technology. To be properly managed, data owners and data customers are identified for all data elements. Owners are responsible for executing data management requirements outlined in a given companys data governance and stewardship policies. Customers use the data managed by the data owners in their business processes. Value added to this data may yield revisions which must be communicated back to the data owners or may yield new data for which the customer is then accountable. Partner around the Management of Master Data Master data is the sole, single source of complete, accurate, and up to date data. It is the single version of the truth. The development and execution of master data management requires a partnership between the business areas, information technology, legal and the records management groups of the organization. The incorporation of master data management is expected to yield significant long-term gains in staff effectiveness, efficiency, and productivity through improvements to cross functional collaboration. A key partner has become corporate legal staff who can advise the partnership on how legal recognition of electronic records (as evidence for example) has increased corporate responsibility to manage data in a methodical and systematic way. Take a Standards Based Long Term Perspective Oil and gas pipelines companies require long-term data storage and access to electronic data that match the life of their pipeline assets. A trustworthy repository needs to be able to provide preservation, storage, and access to master data over the long term. ISO 14721:2003 [8] outlines a reference model that summarizes the data lifecycle requirements into a number of processes and roles that influence the extent that data can be trusted (see Figure 3).

Figure 3 Processes that influence the extent data can be trusted [8] Designate the Authoritative Source For data to be trustworthy over the long term there is a need to formally designate the data store as the authoritative source. Such a designation ensures high priority consideration of the data store during annual deliberations on budget priorities and enterprise architectures. The working definition of authoritative source is: A managed and verified repository of accurate and complete data that is designated by a governance body, supports a specific business need and complies with records management policy [9] Once a data store is designated as an authoritative source, the appropriate business manager has the enterprise authority to enforce data lifecycle requirements covering the capture, storage, access, service management and disposition processes. The designation of an authoritative source is an important element in linking the abstract characteristics of a trustworthy repository with the processes, standards, and criteria that are enabled by enterprise architectures. Manage Data as Authoritative and Trusted Authoritative data comes directly from the creator of the authoritative source. It is the most current and accurate and has been vetted according to official data governance rules and policy. The accuracy and source (and usage) of the data is known and can be verified and certified by data stewards as the authoritative source Trusted data describes a situation where data sets are published by someone other than the authoritative source. It is trusted because there is an official process for compiling data from authoritative sources; and the datas limitations, currency and attributes are known. The trusted source is recognized by the authoritative source as an official publisher of one or several data subsets.

Copyright 2012 by ASME

FUTURE TRENDS In response to the growing body of health knowledge, U.S. health care specialists have chosen to develop criteria for trustworthy procedures [12]. According to network experts, the future internet should allow a user to make judgments about the trustworthiness of an information service and its likelihood of fulfilling these claims. Work in this area is at an early stage [13]. Finally, criteria for trustworthy digital repositories have recently been published for space data systems based on Requirements for Audit and Certification of Management Systems [14]. Recent U.S. Regulatory rulings and / or communications have emphasized the requirement for pipeline asset records that are traceable, verifiable, and complete [15]. Trustworthy data stores and master data management are key enablers to ensure traceable, verifiable and complete records over the long term. Given these drivers current academic studies suggest there is some interest in a specific standard covering all relevant aspects of a trustworthy digital repository; but at this time there are barriers to creating a specific standards document given the level of work effort and the standards that are already available [11] CASE STUDY As noted in the introduction, Enbridge Liquid Pipelines applied the principles of trustworthy digital repositories and master data management to a widely used, geographically dispersed small database (less than 10,000 records). This case study describes this effort. DATA QUALITY ISSUES The case study was initiated based on the importance of data to the organization and the perceived fundamental and uniform distrust of the data contained within the database throughout Enbridge Pipelines. Any usage of this data for analysis, tracking, or reporting purposes was associated with extensive effort to confirm the currency, completeness, and accuracy of the data. Since each inquiry was slightly different and may involve different business areas, redundant effort was being expended to ensure that the data used in each inquiry was correct. There are seven common data quality issues [5] (Redman) that can be alleviated with proper data management initiatives, processes, procedures, and policy: 1. People Cant Find the Data They Need: Knowledge workers spend 15%-30% of their time searching for data they need [6]. 2. Incorrect Data: Inaccuracies in data contained within accessed data stores

Poor Data Definition: Data frequently misinterpreted. Data not easily connected from one department to the next. 4. Data Privacy / Data Security: All data is subject to loss, either through misuse or maliciousness. 5. Data Inconsistency across Sources: Multiple data sources for the same set of data, often with differing versions of the truth. 6. Too Much Data: Uncontrolled redundancy in data. Capture of data that has no use in the company or organization. 7. Organizational Confusion: Organizations are unable to answer basic questions such as 1) how much data is created each day, and 2) of this data what is the most important? Day in and day out costs of poor data quality are enormous, up to 10-20% of revenue [5]. Of the common data quality issues noted above, five data quality items (1, 2, 3, 5, and 7) were often encountered using the case study data. These issues in turn could lead to additional costs being incurred through correction of data errors and from undoing or repeating work originally done with older nonmaster data. PROCESS OVERVIEW Trustworthy, long term preservation of data requires technical as well as organizational resources and is therefore implemented through a combination of technology, processes, standards and criteria that are further described in this case study. The process used in the case study to develop a trustworthy digital store is illustrated in Figure 4 and described in the following sections.
A
Review Business Context and Project Scope

3.

B
Identify Criteria for Success

C
Implement Trustworthy Repository

D
Communicate Results and Promote Awareness

Figure 4 Development of Trustworthy Data Store A. REVIEW BUSINESS CONTEXT AND PROJECT SCOPE Issues related to developing a trustworthy digital store within the framework of larger Enbridge business strategy are addressed using the following three components: a. Dedicated ongoing business programs and projects that take action to implement the strategy establishing trustworthy repositories is part of a three year organizational level records management initiative. b. A culture and workforce that accepts strategic objectives; for example the media attention given to significant data management events has created a

Copyright 2012 by ASME

general understanding of the need for quality data management. c. Effectively functioning IT business systems with a strong focus on strategic objectives; trustworthy repositories are an information/records management activity that, along with communication networks, operating standards, and metrics to measure success, is a key IT business initiative. The project goals are similar to those that can be applied to other data and master data management activities, and include: Review and validate the accuracy of current data management processes Perform a cleanup of an existing data store with the intent of designating it as an authoritative source of reconciled, high quality master data Provide a consolidated 360 degree view (past, present and future) of information about important business activities, particularly for more effective reporting and analytics Lower costs and complexity through data reuse and leveraging of policy and standards Support future business intelligence and information integration efforts The stakeholders of the case study included the Enbridge Records Management Program (which is part of Corporate IT and is responsible for Change Management e-Discovery, and Records Management Projects); Enbridge Pipelines Operational Risk Management who acted as business manager; the Records Management Department which provided business management and compliance leadership on behalf of Law; the IT Department, specifically the Data Quality Management Group and IT staff responsible for the operations of the data store; and the Law Department who provided subject matter expert input The case study data management activity is part of a larger business strategy. Clear ownership of issues was already in place through governance steering committees consisting of senior executives supported by operating committees of business and subject matter experts. These committees approve plans, provide funding and endorse deliverables produced by individual projects led by the Records Management Department B. IDENTIFY CRITERIA FOR SUCCESS There are a number of criteria that were critical for the success of the trustworthy data repository case study that are described below. 1. Policy and Standards

records across the entire data lifecycle including data capture, active use, inactive retention and final disposition. There are many standards that are useful in providing guidance. However care must be exercised to ensure that productivity is not lost by selecting standards that force a time consuming excellent solution when a good solution would have created more value. For the case study, Enbridge selected the following standards to guide the solution: a. b. Internal Operations and Maintenance Procedures; Canadian General Standards Board(CGSB) CGSB 72.34-2005 - Electronic Records as Documentary Evidence [16]; ISO 14721:2003 Reference Model for an Open Archival Information System (OAIS) [8].

c.

The nature of the project did not require reference to industry specific standards related to data architecture such as the Pipeline Open Data Standard (PODS) [17]. 2. Skill and Expertise

Subject matter experts were available from within the business unit to state business requirements; from within the Records Management Program to provide project management and associated business analysts; and from within the IT department to provide data architects, data analysts and application architects. An IT professional with specific expertise in master data management was consulted. Reference to specific compliance regulations was not necessary. Law subject matter experts were consulted to assure that legal imperatives were being met. 3. Data Providers and Consumers

Enbridges operations and maintenance procedures clearly outline the responsibility of field operations and other staff in providing data into the case study data store. The importance of the data to pipeline operations meant that there were interested and often demanding data consumers who used the case study data as part of their daily business activities. 4. Business and Data Management Cohesion

In the year leading up to the project, Enbridge promulgated a Records Management Policy. This Policy establishes the principles and rules for the management of Enbridge

Gartner research indicates that there is a lack of cohesion between business management and data management [18]. This is being addressed at the enterprise level by an ongoing data management strategy that has prepared business managers to take on data management roles. The

Copyright 2012 by ASME

importance of a data governance capability and associated roles such as Data Governance Council and Business Data Steward were discussed and facilitated. The case study data store is subject to the Enbridge Records Management Policy which establishes that business unit managers are responsible to implement detailed procedures to ensure compliant management of records. Enbridge Records Management Policy was developed based on generally accepted standards [19] that allow part of a Records Management System Program to be delegated provided that the responsibilities of the delegated agent are clearly specified. 5. Tools

The most difficult scenario in which to create trustworthy data is across several applications or across the entire Enterprise. Fortunately this was not the situation for the case study. Previous work had already been done during a migration from Lotus Notes and an enterprise level Oracle data store and associated interfaces supported by detailed requirements gathering and ongoing IT support had been established in 2008. After this rollout, it was identified that there may be data quality issues. The Case Study is a relatively comprehensive data management problem that deals with only one established application that by 2011 had been operating for only three years. During project definition nine master data entities were identified that required higher levels of accuracy and completeness from amongst 40 transactional data entities. 2. Identify producers and consumers of master data Once the master data and the associated process requirements were recognized, it was necessary to trace the data lineage to identify the original sources and individual roles that produce and contribute the data. Producers and contributors have a large impact on data quality. The data originated in paper based documents and forms produced by Enbridge staff during the daily conduct of business. These were managed as paper based records within several business areas. At periodic intervals these business areas would enter up to 40 key transactional data entities into the data store. Any of Enbridge Liquid Pipelines 2000 employees could enter data into the store; and up to 250 supervisors and managers reviewed the data for completeness and accuracy before approving it for release. All Enbridge employees have access and/or consume the case study data. Key consumers include those who use the data store as a source to satisfy analysis and / or management reporting requirements. 3. Define and maintain a trustworthy data integration architecture A trustworthy data integration architecture controls the shared access, replication and flow of data to ensure data integrity accuracy and completeness. Without this architecture data is created in business and application

Master Data Management Tools include meta-data repositories; data profiling, cleansing and integrating tools; business process and rules engines and change management tools. In the case study the focus was on the alignment of processes, standards and criteria to establish the trustworthiness of the data store that contains master data; therefore there was no requirement to use master data management tools. The project did use common data and project management tools such as Mindjet (for brainstorming and work planning), Open Text Live Link Content Management System, MS Project (for scheduling), Visio (for business process mapping), MS Excel (for data manipulation), and SharePoint (for management of project documents). C. IMPLEMENT TRUSTWORTHY REPOSITORY There are a number of elements or steps included in the approach applied to the establishment of a trustworthy data repository for the case study data. These steps are illustrated in Figure 5, and described below.

Figure 5 Trustworthy Data Repository Approach

1.

Understand trustworthy master data management needs

Copyright 2012 by ASME

silos resulting in the inconsistent data problems discussed herein. A logical view of the components of trustworthy data integration architecture is shown in Figure 6.

Figure 6 Logical View of a Trustworthy Data Integration Architecture [10] Official recognition of the authoritative source originates in Enbridges procedures and policies. The authoritative source is usually the primary source from which extracts are aggregated at the enterprise level for business efficiency purposes. The trusted source compiles data from multiple authoritative sources through official processes prior to publishing reports. Trusted sources are designated as such because the processes by which data is moved into them are transparent and understood. Metadata remains constant as it moves between the repositories. Once the foundation was established, requirements identified and an architecture developed, the case study project moved to implementation. 4. Evolve the trustworthy solution Trustworthy solutions cannot be implemented overnight; solutions are implemented iteratively through several project stages. As well, since the trustworthy solutions may be perceived to already exist, the implementation becomes an evolution of existing capability. The following describes the stages that were used in the case study. Stage 1: The previous 2008 project was the first stage. It gathered copies of all the relevant domain records and information from shadow files. Stage 2: An important step in building trustworthy data stores is to baseline the current maturity level of the case data management within the organization. The project therefore immediately conducted a review to

validate the current system against records management criteria with a view to making recommendations on a future roadmap. Roadmap recommendations for the case study included: Create enterprise wide business requirements to ensure that the case study database can be designated as a trustworthy repository. Build collaborative teams to facilitate dedicated governance decisions in terms of financial resources, personnel, policies, procedures, oversight and compliance. Clarify accountability and implement IT standards based service management of the data store Survey data consumers to ascertain how well business needs are being satisfied as a first step in the change management that may be necessary to ensure the successful long-term operation of a trustworthy repository Develop enterprise level risk considerations to evaluate the data store as adequately trustworthy to hold official business records In addition during this stage a clone of the data store was created. This allowed improvements to be made to the data without impacting the day to day use of the data store. At the end of the project, the data from the clone was migrated into production concurrent with supporting communications advertising the results of the project. Stage 3: This stage dealt with planning and conducting the remedial work on the existing data. One of the greatest quality challenges was matching and merging data from multiple systems about the same activity. Matching attempts to remove redundancy, improve data completeness, and provide detail that is accurate. From the Stage 1 work several sources of data were identified that needed to be matched and merged into a single spreadsheet. Data entries on the spreadsheet were matched with data entries in the data store to ensure the latter was complete and accurate especially as it pertained to the nine master data entities. Stage 4: New issues arise as a result of the matching and merging. As a result of the remedial work, over 1100 records were uncovered that had not been entered into the data store. This created the requirement to revisit several business areas to check these records and to receive their approval to enter them into the data store. The visits to the business areas revealed a further 1000 records that had not been processed to date. Different records management practices within

Copyright 2012 by ASME

each business area required the project to recreate data matching and merging processes after each visit. For example, some business areas managed information as case files; others managed data by time period. One business area had already created a business area specific authoritative source in partnership with external contractors. An assessment of this work had to be done before this data could be adopted as part of the larger solution. Stage 5: Scanning documents into the data store. Master data itself cannot provide the complete 360 degree past, present, and future perspectives needed for comprehensive problem solving. Data consumers strongly identified the need to use the master data to drill down into as much of the transactional details as possible. Since the transactional source documents were paper-based, it was decided to attach scanned copies of paper documents to the corresponding master data in the data store. To achieve this, a detailed and thorough discussion was undertaken on the metadata that future consumers would use to search for and retrieve scanned electronic documents; and during the scanning process this metadata were tagged to the scanned file. The tagged metadata and scanned file were then entered into the enterprise official electronic record repository to ensure the long term preservation of the official records. The data store architects created a new data table so that the location links appeared on the dashboard of the data store user interface. Results of the data conversion were confirmed through user acceptance testing. The data store therefore yielded golden data to users on the normal interface; and significant transactional details through links to source records in the official repository. 5. Plan and control each stage of work

Considering schedule risk due to late deliverables and delayed approvals Ensuring cost objectives are staying within approved budgets Updating existing project documentation to reflect the details of the finalized stage plan.

6.

Establish Golden Records Golden data values are the data values thought to be the most accurate, current, complete, and relevant for shared and consistent use. By the end of Stage 5, approximately 10,000 records had been reviewed. This included ensuring the accuracy and completeness of 41,000 master data entries. With the data completeness measure estimated at 95% and the repository designated as an authoritative source, the data in the nine designated master data fields could be considered golden. Of the nine, seven require master designation at the Enterprise level; two are business process specific.

The process, standards and criteria work conducted by the project ensured master data were retained in a data store of sufficient transparency, measurability and documentation to be adequately considered trustworthy within the Enbridge business context. D. COMMUNICATE RESULTS AND PROMOTE AWARENESS The case study established three objectives for communications to allow stakeholders an opportunity to assess the project work and to establish the trustworthy and authoritative nature of the data store in the eyes of the executive, stakeholders, data providers and data consumers. These objectives include: a. Building additional cohesion with the business areas by reviewing support for the Records Management Agent role and taking action on the feedback provided. Providing weekly feedback on project activities to the business area sponsors and monthly feedback in conferences with the Records Management Operating Committee. Working with the program communications staff on specific communication messages and aligning communications with IT department work in promoting data quality awareness.

Due to the long timelines of this project there was a continuing requirement to ensure correct focus at the end of each stage with a view to redirecting the work if necessary. This is characteristic of architecture driven work wherein the vision presented by the architecture can only be implemented through coordinated stages. Work at the junction between stages included: Gathering the results of the current stage including the outcome of quality control reviews Seeking endorsement for deliverables and management guidance on the next stage Developing a plan for the next stage including the provision of new human resources

b.

c.

Three key messages were developed focusing on: how the data had been improved, the designation of the data store as

10

Copyright 2012 by ASME

trustworthy and authoritative; and the need to sustain quality data especially during data capture. These messages were promulgated first through the law department, then with key business managers (who were designated Records Management Agents) and their immediate supervisors up to project governance committees, and senior management. Concurrently, the work was coordinated with IT staff. Finally, communication with data producers and consumers reinforced the trustworthy and authoritative nature of the data store as the sole, single source of complete and accurate domain related data. KEY RESULTS The following key results were achieved: Nine master data entities identified from amongst 40 transactional data entities. Documentation on the trustworthy nature of the data store was provided through project deliverables such as the Current Situation Report, Process Diagram, and Project Closure Report. 41,000 master data elements were measured to be 95% complete. The data store was designated as the authoritative source. Golden record data lineage established in as is architectures that provide the basis for future improvements. Steps taken to ensure data quality through communications of results and promotion of the trustworthy data store. Records Management Agent The case study identified the requirement to build cohesion within the business areas and, supported by the Records Management Policy, established the role of Records Management Agent. This role was approved by the key business managers and has the following responsibilities: contributes to the development, implementation and maintenance of detailed procedures to ensure the efficient and compliant management of the domain data; defines business requirements and specifications for the case study data store monitors and reports on the status of quality assurance and audit programs created by the Records Management Department ensures that records and other relevant recorded information are retained for as long as necessary to

comply with the Enbridge Records Retention Schedule. COST AND SCHEDULE OBJECTIVES The Project was initiated in April 2011 and completed in February 2012. Schedule risk originated from two sources: a) the need to visit regional offices to validate some data and b) the availability of staff at field sites to review records. These were mitigated by hiring for a short period an experienced records analyst who visited sites to augment local staff resources. The cost of the project was $250,000 spent largely on professional services provided by the project manager and information analyst (technology staff and supporting systems and software were already in place). The cost of outsourcing scanning to an Edmonton based contractor was not significant. VALUE OF RESULTS A hard return on investment of 100% was calculated over a five year period. A significant reduction was realized immediately in the time required for staff to conduct data quality checks or look for further data during the preparation of reports. As well, work was reduced for one major ongoing IT project that will be able to leverage data management requirements developed under this project. Anxiety has been reduced around the corporate knowledge that would be lost due to a generation of operations staff retiring. Near term scrap and rework as well as other cascading secondary costs are expected to be reduced by having the right data available at the start of problem solving. Finally, implementation of improved records management standards will reduce costs associated with ongoing changes in the U.S. and Canadian regulatory expectations. In the medium term, a reduction in intangible risk is expected as better data leads to improved problem solving and forecasting outcomes. Organizational agility is expected to increase as trustworthy data is now shared. Improved transparency for stakeholders will occur through the provision of the record lineage for key information. Finally, an improved ability to respond successfully to both internal and external audit is expected. FUTURE WORK Master Data Management Methodology For the implementation of master data management in more complex situations the following steps are necessary: 1. Redefinition of business processes to exclude actions that can decrease the quality of data 2. Define and maintain hierarchies and affiliations between the data entities 3. Plan and implement integration of new data sources

11

Copyright 2012 by ASME

4. 5. 6.

Replicate and distribute master data Consider reference data Manage changes to reference and master data

Data Quality Metrics and Reports Transparency and measurability are important characteristics of a trustworthy data store. Future work is therefore required that provides stakeholders with pull based quality metrics and reports that reinforce the trustworthy nature of master data and the data store used to manage it over the long term. This work will be conducted in conjunction with future records management governance and with the implementation of a data management strategy. Future Technology Systems To contribute to improved master data management in future systems the project created over 150 general and specific requirements for a data management capability covering major areas of business operations, data quality, record management, authoritative source, training, stewardship, and service management. A data migration plan was drafted that overcomes data migration risks such as lack of IT expertise, lack of knowledge about data design details, staff overtime, and unplanned events. Finally it is necessary to turn off the use of shadow files that act as data stores outside of the trusted data integration architecture. Low trust data stores that remain will reduce organizational agility because they make getting things done more difficult. Mergers and Acquisitions During acquisitions corporations may procure data this is inadequate in terms of the latest regulatory requirements. However, the confidential nature of acquisitions and divestitures can limit the assurance that data being passed between corporate entities is complete and accurate. Enbridge therefore created a business standard to provide mergers and acquisition staff with records management guidance that can be applied across a broad range of acquisitions, divestitures and mergers. LESSONS LEARNED AND CHALLENGES Data management projects should not make operational decisions. It is important to understand who owns the information and ensure that these managers are making the decisions on how the information should be managed. At the same time the project team has been assembled to perform master data management work. Master data management work should not be passed onto operational staff.

Master data management work has steps with iterations. It is therefore important to establish a quality plan and conduct quality control at the end of each major milestone. Quality control techniques and reviews take significantly less time than the rework required to redo activities due to an initial faulty deliverable. This is especially true in doing data migrations between systems. What appears to be a simple movement of data between data stores can involve several data conversions each of which requires a quality review to ensure the final product yields the accurate and complete data it purports to yield. An external observer may note a lack of synergy between significant IT strategies such as business process management, disaster recovery, and service oriented architecture. This can be attributed to the different organizational perspectives and vendor solutions from which these strategies originated. It was learned that master data management implements integrating concepts to overcome these synergy roadblocks. The need for this has been foreseen in at least one ISO [8] (page 30). A project that includes a master data management objective forces the project manager to include data priorities and data migration activities that forces integration of business processes and different vendor solutions. Master data management therefore has the potential to create synergy amongst other IT strategies becoming the catalyst that allows the benefits of these strategies to be realized over time. As with other business initiatives, active executive support is required. Attempts to improve data quality without proper support will mean that many of the issues that created poor quality data in the first place will not be addressed. Data will, after a short period of time, revert back to lower levels of quality despite the many drivers that push for data quality improvements. A number of challenges remain that are discussed through the following questions. How can we ensure that the recommendations are sustained and sustainable? Develop (as appropriate) and implement enterprise data management governance and stewardship policies. These policies will authoritatively guide and govern data acquisition, use, change, deletion, and archive with retention. Through ongoing continuous improvement identify innovative new techniques, processes, approaches, and best practices for data management service provision. Leverage available and emerging data management technologies. Benefits: 1) identifies and defines roles and responsibilities to ensure recommendations are followed and sustained, 2) identifies roles and responsibilities in the management and upkeep of data, 3) propagates data changes throughout all of the relevant data stores. 2. How do we ensure that cultural change is achieved? Develop a communication plan that facilitates cultural change to accompany the recommendations. The process and 1.

12

Copyright 2012 by ASME

procedure changes resulting from implementation of the recommendations need to become second nature to company staff. Develop an incentive program to encourage early adoption and long-term participation; and that makes business managers accountable for sustaining the desired level of data quality. Benefits: 1) sustained cultural change ensures that the recommendations are successful. How do we show that the changes have been successful (implementation) and continue to be successful (sustainment)? Develop key performance indicators (KPIs) for the recommendations (for example, the number of future data entries into the database that do not have additional hard copy files or information). In partnership with corporate audit develop an audit plan that focuses on specific data quality areas of the business. As well, establish less formal quality assurance roles for data quality specialists within day to day business processes so their intervention in daily operations becomes well accepted. As well, the coverage of data steward / Records Management Agent can be expanded to increase the number of data entities that are covered by this role. Records Management Agent coverage can then become a simple metric to measure future progress of improvements in data quality. Benefits: 1) provides simple measures of the success of the implementation of the project deliverables and, in the longer term, how well they are being sustained, 2) KPI results identify areas for improvement. Continuous improvement and innovation will be required to retain sustainability. CONCLUSION This paper has been about managing data integrity with the same rigor used to manage pipeline integrity. It has stressed the understanding of terms in what is a relatively new discipline. With properly understood terms, data quality issues and drivers can be recognized and communicated within the corporate hierarchy; and benefits of trustworthy master data can be explained with sufficient clarity to overcome the many challenges that exist in this area. Understanding best practices, future trends and the necessity for iterative solutions can assist in the proper scoping of data integrity management initiatives. Achieving criteria for success and establishing a trustworthy data integration architecture will lay the foundation for solution evolution, benefit exploitation and cost savings. As in other areas worthy of persistent endeavor, lessons remain to be learned and challenges remain to be overcome. When combined with ongoing strategic programs, a knowledgeable workforce, and effectively functioning IT business systems, advances in data integrity management can significantly contribute to the demanding expectations public and private stakeholders have for successful oil and gas corporations. 3.

ACKNOWLEDGEMENT The authors of this paper are passionate about management. This means that we subject our spouses families to a barrage of day to day and organizational issues and the means at our disposal to solve or fix them. to them that we all acknowledge their support understanding. CITED REFERENCES

data and data It is and

[1] Inmon, William H, Nesavich, Anthony, Tapping into Unstructured Data, Integrating Unstructured Data and Textual Analytics into Business Intelligence, Prentice Hall, 2008. [2] Loshin, David, Master Data Management, Morgan Kaufmann OMG Press, Burlington, Mass., 2009. [3] McGilvray, Danette, Executing Data Quality Projects, Ten Steps to Quality Data and Trusted Information, Morgan Kaufmann, 2008. [4]U.S. Federal Geographic Data Committee (FGDC) Subcommittee for Cadastral Data, Authority and Authoritative Sources: Clarification of Terms and Concepts for Cadastral Data, Version 1.1 dated August 2008. [5] Redman, Thomas C. data Driven, Profiting from your most Important Business Asset, Harvard Business Press, 2008. [6] Inmon, William, ONeil, Bonnie, Fryman, Lowell, Business Metadata, Capturing Enterprise Knowledge, Morgan Kaufmann, 2008. [7] Hawker, M.T. Presentation; MDM that Works: A RealWorld Guide to Making Data Quality a Successful Element of Your Cloud Strategy, Pivotal IT Consulting, 28 April 2011. [8] ISO 14721:2003-03-01 Space Data and information transfer systems Open archival information model Reference model, 2003. [9] Westman, Roger, What Constitutes an Authoritative Source, Case: 09-3-3195, Mitre Corporate, 2 September 2009. [10 U.S. Federal Geographic Data Committee (FGDC) Subcommittee for Cadastral Data, Authority and Authoritative Sources: Clarification of Terms and Concepts for Cadastral Data, Version 1.1 dated August 2008. [11] Dobratz, Susanne, The Use of Quality Management Standards in Trustworthy Digital Archives, The

13

Copyright 2012 by ASME

International Journal of Digital Curation, Issue 1, Volume 5, 2010. [12] Standards for Developing Trustworthy Clinical Practice Guidelines, Institute of Medicine of National Academies. 23, March 2011. [13] Meland, P.H., The Challenges of Secure and Trustworthy Service Composition in the Future Internet, System of Systems Engineering (SoSE), 2011 6th International Conference 27-30 June 2011. [14] Consultative Committee for Space Data Systems; Requirements for Bodies Providing Audit and Certification of Candidate Trustworthy Digital Repositories CCSDS 652.1-M-1 November, 2011. [15] Pipeline and Hazardous Materials Safety Administration (PHMSA), Pipeline Safety: Establishing Maximum Allowable Operating Pressure or Maximum Operating Pressure Using Record Evidence, and Integrity Management Risk Identification, Assessment, Prevention and Mitigation, Advisory Bulletin, Docket No. PHMSA2010-0381, January 2011. [16] Government of Canada, Canadian General Standards Board (CAN/CGSB-72.34-2005, Electronic Records as Documentary Evidence. [17] Pipeline Open Data Standard (PODS), PODS Association, Sand Springs, Oklahoma. [18] Beyer, Mark A., Lyn Robison. Advancing Data Management Maturity Key Initiative Overview. July 22, 2011. Published by Gartner. [19] Pearce, Jason. Personal interview. 19 January 2012. GENERAL REFERENCES Adelman, Sid, Moss, Larissa, Abai, Majid, Data Strategy, Addison Wesley, 2007. Berson, Alex, Dubov, Larry, Master Data Management and Customer Data Integration for a Global Enterprise, McGraw-Hill, 2007. Dreibelbis, Allen, Hechler, Eberhard, Milman, Ivan, Oberhofer, Martin, van Run, Paul, Wolfson, Dan, Enterprise Master Data Management An SOA Approach to Managing Core Information, IBM Press, 2008. Fisher, Tony, The Data Asset, How Smart Companies Govern their Data for Business Success, Wiley, 2009.

Anderson, Elana. Best Practices: Customer Data Quality Management. January 8, 2004. Published by Giga Research of Forrester Research. Beyer, Mark A., Lyn Robison. Advancing Data Management Maturity Key Initiative Overview. July 22, 2011. Published by Gartner. Cosham, A., P. Hopkins, K.A. Macdonald. Best practices for the assessment of defects in pipelines Corrosion. Engineering Failure Analysis 14 (2007) 1245 1265. Available through: Science Direct. Data Management Best Practices. August 2008. Released by MBA Residential Technology Steering Committee (Restech). Published by Mortage Bankers Association. Friedman, Ted. Information Infrastructure and Big Data Projects Key Initiative Overview. July 22, 2011. Published by Gartner. Karel, Rob. Enterprise ETL: Evolving And Indispensable To Your Data Management Strategy. May 12, 2010. Published by Forrester. Karel, Rob. Industry-Specific Master Data Management Trends. November 2010. Published by Forrester. Karel, Rob. Trends 2011: Its Time For The Business To Own Master Data Management Strategies. February 15, 2011. Published by Forrester. Karel, Rob, Clay Richardson. Trends in Aligning Business Process and Master Data Management Initiatives. October 11, 2011. Published by Forrester. Markowskit, Adam S., M. Sam Mannan. Fuzzy logic for piping risk assessment (pfLOPA). Journal of Loss Prevention in the Process Industries 22 (2009), p.921 927. Mehler, Gary. Management. Best Practices in Enterprise Data

US National Archives and Records Administration, Interagency Science Working Group. Establishing Trustworthy Digital Repositories: A Discussion Guide Based on ISO Open Archival Information System (OAIS) Standard Reference Model, January 19, 2011. Power, E. Michael, Trope, Roland L., Sailing in Dangerous Waters, A Directors Guide to Data Governance, American Bar Association, 2005.

14

Copyright 2012 by ASME

Raj, Pani K., Theodore Lemoff. Risk analysis based LNG facility siting standard NFPA 59A. Journal of Loss Prevention in the Process Industries 22 (2009), p.820 829. Rasmus, Daniel W. From Data Processing to Information Mangement: The Need for Intelligent Infrastructure. July 26, 2002. Published by Giga Information Group Inc. Rasmus, Daniel W. Learning About Knowledge Management From the Oil and Gas Industry. 2001. Published by Giga Information Group.

White, Andrew. Information Governance and MDM Programs Key Initiative Overview. July 22, 2011. Published by Gartner. Yuhanna, Noel, Gene Leganza, Rob Karel, Boris Evelson, James G. Kobielus, Leslie Owens. Forresters Data Management Reference Architecture. February 2, 2011. Published by Forrester.

15

Copyright 2012 by ASME