You are on page 1of 12

Data Administration and Distributed Data Processing Copyright 1992 CAUSE From _CAUSE/EFFECT_ Volume 14, Number 4, Winter

1992. Permission to copy or disseminate all or part of this material is granted provided that the copies are not made or distributed for commercial advantage, the CAUSE copyright and its dateappear, and notice is given that copying is by permission of CAUSE, the association for managing and using information resources in higher education. To disseminate otherwise, or to republish, requires written permission. For further information, contact CAUSE, 4840 Pearl East Circle, Suite 302E, Boulder, CO 80301, 303-449-4430, e-mail info@CAUSE.colorado.edu DATA ADMINISTRATION AND DISTRIBUTED DATA PROCESSING by Gerald Bernbom ************************************************************************ Gerald Bernbom is Assistant Director, Data Administration and Access, at Indiana University. As part of University Computing Services, his unit is responsible for data administration, database administration, security administration, data dictionary management, campus-wide information systems, and the information center. He was previously Associate Registrar at Indiana, with responsibilities in the area of student information systems. He currently serves on the CAUSE Editorial Committee and the CUMREC Communications Committee, and previously chaired AACRAO committees for Data Communications and Systems Development. ************************************************************************ ABSTRACT: This article explores the relationship between data administration and distributed data processing. It begins by focusing attention on the technology of distributed data processing, its capabilities, and its limitations with respect to providing a logically unified structure for an institution's data resources. The article then looks at the role of data administration as an ally and enabler in support of distributed data processing. Distributed data processing offers the potential to radically transform our institutions in the 1990s. Imagine high-powered, low-cost workstations and desktop computers communicating with each other across high-speed networks and presenting each user with a friendly, intuitive, graphical user interface--all of this made possible through the application of distributed computing technology. Or the result might be different. Consider multiple computers keeping multiple copies of the same data, stored in incompatible formats, and accessed by unlike database management systems; no way to combine or aggregate these data for institution-wide reporting; no one able to reconcile the conflicting analyses and reports produced by these myriad systems. This, too, is possible with distributed technology. The payoff of richer functionality and lower cost that distributed data processing promises is balanced against the risk of fragmentation or dis-integration of an institution's information resources. To guard against this risk requires an understanding of distributed computing, its current state, and its limitations.

This article addresses the path for minimizing the risks by examining two integral components of distributed data processing: (1) the technologies of client/server and distributed database management systems; and (2) the standards, procedures, and practices used to establish and maintain coherence in our information systems. In the first section of the article--an extended discussion of client/server technology and distributed database management systems--we examine their benefits, and the limitations of these technologies in maintaining an integrated and logically-unified information architecture. Where technology reaches its limits, especially in the heterogeneous, mixed-vendor environment of distributed computing, we turn to human intervention--standards, procedures, and practices--to establish and maintain coherence in our information systems. Actions that can be taken to limit the risk of data dis-integration rely heavily on the consistent application of standards and procedures, and on the formal methods of data management. This is the domain of data administration, whose role as an ally and enabler of distributed data processing is the focus of the second section of this article. DISTRIBUTED DATA PROCESSING Distributed computing. Distributed systems. Distributed processing. Distributed database. Sometimes the term is simply "distributed"--as in "I hear you're thinking about going distributed"--a modifier with nothing modified. The focus of this discussion is on distributed technology as it applies to what is traditionally called administrative data processing. The choice of an appropriate noun to attach to "distributed" and an associated definition can help provide some structure and a starting point for discussion of this topic: _Distributed data processing._ The dispersion of computing functions and data at nodes electronically interconnected on a coordinated basis, geographic dispersion not being a requirement in every case."[1] Admittedly, the language in this definition is dense. However its virtue is that it addresses both the distribution of computing functions (the "process" side of data processing) and the distribution of data (the "data" side). It also directs attention to the importance of integrating functions and data across computing platforms, while deemphasizing the importance of geographic distribution. The technology marketplace confirms this definition in that there are two threads of technology developing in parallel. Distributed database management system (DBMS) technology is working on the problem of distributing data. Distributed processing technology--and its implementation in the client/server architecture--is working on the problem of distributing computing functions. These two developments may often coexist in the same product or implementation, addressing the two dimensions of distributed data processing. A distributed database management system is "a collection of centralized [or local] database management systems that are linked through a communications network and integrated in their operations."[2] In its ideal form, a distributed DBMS has several defining characteristics, among them:

* It supports database implementations which are logically centralized but physically distributed. Regardless of its physical implementation or location, a database always appears to the user and the application software as a single entity. * Data which are logically related, including data from a single database, may be fragmented and distributed to multiple locations without the user or application needing to know the location of any piece of data. This is referred to as location transparency. * Copies of data may be made and distributed to multiple locations-for example, to improve the performance or reliability of an application--without the user or application needing to be aware that multiple copies of the data exist. This is referred to as replication transparency. * And finally, these characteristics are achieved within the functions of the DBMS software itself, and do not rely on custom-written application programs and inter-application "bridges" to maintain the integrity of the database or the appearance of logical integration. Distributed processing refers to the deployment of "related computing tasks across two or more discrete computing systems."[3] Implementations of distributed processing vary in their sophistication and complexity. Most are simply host-based systems in which basic functions of data display control or input editing are distributed to intelligent terminals or terminal controllers. Somewhat more complex are local area network (LAN) implementations which permit the distributed sharing of centrally managed resources and devices among multiple independent personal computers and workstations. More complex yet are implementations based on the client/server model--sometimes referred to as "cooperative processing"--in which a single application process is divided into two or more pieces and executed on separate computers, typically using some form of remote procedure call (RPC) for program-toprogram communication. In the client/server model, there is a service requestor (the client), a service provider (the server), and a set of services that the client might request: data, computing cycles, mail and message routing, or user identity authentication, to mention only a few. Among the characteristics which define the client/server model are: * Network communication between the client and server. * Client-initiated interactions with the server. * Restriction by the server over the client's authority to request services or data from the server. * Arbitration by the server over conflicting requests from multiple clients. * Division of application processing between the client and the server.[4] And, as it applies to database processing, the client/server architecture is "the logical division of database application work between user functions (the client) and database-management functions (the server)."[5]

Why distributed? Almost as varied as the ways in which the term is used are the reasons associated with "going distributed." In many organizations, low cost features prominently in the justification for pursuing a distributed solution to an information systems problem. Underlying this rationale is a cost analysis of mainframe processing (and its "expensive MIPs") as compared to desktop and distributed processing (and its "cheap MIPs"). Despite this apparent cost advantage, few organizations have near-term plans to completely dismantle their mainframes and replace them with inexpensive microprocessors. But many organizations are looking at the nature and structure of their mainframe systems to identify discrete applications which can be migrated to smaller computing systems (downsized) and to identify some general application functions, such as user-interface processing, which can be better implemented on distributed desktop computers. In a simple, three-tier schema of application architecture--database and I/O functions, logic and process functions, and user interface functions--the first step toward distributed processing is often a client/server model which locates the user interface function on a client workstation while using the mainframe as a server for database and process functions. This division-of-labor approach capitalizes on another of the frequently mentioned reasons for implementing a distributed solution-the ability to provide an enhanced user interface which gives the user a perception that all services are local to the desktop workstation. As a general rule (and excepting the minimalist "C:\>" prompt of native DOS), most desktop applications are easier to learn and use than most mainframe applications. Increasingly, desktop applications are acquiring a common look and feel, built around a handful of de facto graphical user interface (GUI) standards: Apple Macintosh, Microsoft Windows, and OSF Motif among them. Some implementations of distributed data processing also permit the integration of data from a central or shared server with familiar, off-the-shelf desktop software like Lotus 1-2-3, Excel, dBase, and others. The last decade has brought about a welcome demystification and democratization of information technology; end users are active participants in the technology process, not only as decision-makers but also as developers and implementers of their own technology solutions. Distributed data processing offers the promise of increased end user control by relocating a larger portion of the information systems application from a remote, central site to a local, distributed site. This reasoning is especially relevant in a college or university whose business plan includes some measure of commitment to local autonomy at the level of campus, school, or department. A factor related to both low cost and increased end-user control is the expectation that distributed data processing will support the interconnection of a rich mix of heterogeneous components in a single, integrated information system solution. Heterogeneity is a fact of life in most institutions: a wide variety of desktop computers and desktop software; an equally wide variety of departmental systems and LAN servers; and in some organizations a similar mixed-vendor environment of central timeshare systems. Heterogeneity and the "open standards" movement also form a strategic objective for some organizations, offering freedom from the constraints of a single-vendor or proprietary information system architecture by allowing the organization and its

users greater choice in the marketplace for future technology acquisitions. Finally, a factor not often recognized in organizations with a history of mainframe computing is that a significant measure of decentralization has already occurred, and distributed data processing offers an opportunity to re-integrate fragmented data resources. During the 1980s, while personal computers proliferated and data from mainframe systems remained difficult to access, a large number of stand-alone PC applications were created to replicate the functions of certain key central systems. These "shadow systems" were developed by individual departments to support a variety of business functions, such as budget and account management. These stand-alone systems carried the same or similar data as was found in the central systems, periodically downloaded from the mainframe or sometimes even re-keyed into local systems from central system reports. The payoff in organizations where "shadow systems" have proliferated is the potential of distributed data processing to provide a structure for increased coherence and cohesion in the organization's scattered data resources. What distributed isn't Most important, distributed data processing is not decentralized data processing. If the objective of an organization is simply to decentralize its business functions or operations, it would be difficult to justify the cost or complexity of client/server technology and distributed database management systems. The underlying rationale for these distributed technologies is to maintain a logical unity or coherence in an information system while permitting the physical distribution of processing functions, data resources, or both. Decentralization, on the other hand, emphasizes independence rather than integration, and typically involves both the physical and logical fragmentation and distribution of process and data. By way of comparison, distributed data processing is similar to, but distinct from, a parallel trend in information technology-downsizing. Both trends start with the viewpoint that "smaller might be better." Both also subscribe to the notion that local control should be expanded in a fashion consistent with an organization's overall business plan. The underlying premise of downsizing is that there are central system applications that would be better deployed on smaller computing platforms: personal computers, local area networks, or departmental computers. The distinction between distributed systems and downsizing is that the downsized system typically has a low measure of integration, either in related process or shared data, with other components of an organization's overall information systems. Downsizing, in these terms, is often the result of recognizing that a stand-alone or decentralized application has been running on a central computing resource and might be more efficiently deployed elsewhere. The limits of distributed technology In a somewhat different vein, distributed data processing is also not a full and complete set of mature technologies. More progress has been made on the process side, where there do exist representative implementations of client/server technology that have been successfully deployed. On the data side, however, the ideal distributed database management system described earlier is still as much vision as it is reality. Within the more narrow constraints of single-vendor and proprietary solutions, the technical problems of location transparency,

replication transparency, and full logical integration (including concurrency) across multiple computing platforms are beginning to be solved. In the heterogeneous, multi-vendor environment that is often expected of distributed data processing, distributed database management systems have not yet achieved this same level of integration. While software developers and vendors pursue the ideal of full logical integration at the level of data definition within the DBMS itself, various mediating strategies are being offered by DBMS and other third-party software vendors to achieve some of the same objectives. Four of these strategies are worth noting in this context. First, client/server technology and its remote procedure call (RPC) facility is being employed to achieve some measure of enforcement for concurrency and integrity constraints across multiple, distributed databases. In this strategy, an RPC is defined as a stored procedure on a local database server. Selected database activities on the local server will trigger the RPC to communicate with and take action on a remote database server. This strategy is most effective in a singlevendor DBMS environment. It may also be employed, with more limited success, in a heterogeneous DBMS environment, using gateways to pass RPCs among the unlike database managers. The most serious limitation in a gatewayed implementation is the inability to guarantee synchronized update processing across unlike DBMS products. Second, standardization on relational database management systems (RDBMS) and on Structured Query Language (SQL) for data access, manipulation, and management can provide the necessary shared infrastructure to permit some level of integration among heterogeneous database management systems. The basic strategy for integration in a heterogeneous RDBMS environment is the SQL gateway, a facility for translation of SQL dialects among unlike database management systems. This strategy is most effective at providing users with a unified view of distributed databases and offering a single, SQL-based interface to the data. A third strategy may be employed in the form of data-driven application development tools. In this strategy, data definitions are introduced to the development tool as a prerequisite to the construction of any application. The tool then enforces the rules and constraints of this data definition in all applications developed, including applications which may be deployed across unlike DBMSs and computing systems. This strategy is most effective at enforcing domain constraints and other rules regarding data content, type, and format. A fourth strategy may be employed in the context of fourthgeneration language (4GL) end-user query tools. Some such tools, which are capable of accessing multiple databases across an environment of heterogeneous DBMS products, can create for the user the perception of a unified data structure even when the underlying data are not, in fact, fully integrated. Each of these is a partial solution which addresses a subset of the need for logical integration of an organization's data. Each is also limited in that it operates entirely or primarily at the application level, and is not integrated in the data definition facility of the DBMS itself. Logical integration and the reality of its limits in current state-of-the-art DBMS technology comprise a significant portion of data administration's agenda as a partner and participant in the distributed data processing effort of an organization.

DATA ADMINISTRATION What is data administration? Most simply, data administration is the application of formal rules and methods to the management of an organization's data resources. As an entity in an organization, data administration may take on various forms: as a line function within the IS organization, as a staff function reporting to a senior administrator in such areas as Information Technology or Administration and Finance, or as a "virtual" function where responsibilities of data administration are shared among several areas of the organization. More important than its organizational form--though this form can have a great deal to do with the effectiveness of the function--are the principles of data resource management which data administration brings to bear on the organization's information systems activities. First among these is the principle that data are facts about objects which are of interest to the organization.[6] This premise underlies much of what data administration attempts to accomplish; it says that an organization cares about data primarily because it wants to know facts about the real world and its objects. These objects might be people, places, things, events, or concepts. The significance of any data item derives from the significance of the object represented in the data, and the importance of knowing some fact about that object. A second principle of data administration is that data are a valuable organizational resource. As has so often been pointed out, data are part of an organization's intellectual resource--the facts and knowledge which allow it to operate effectively and efficiently, and to decide wisely. These principles form the basis for six imperatives which apply to the data administration enterprise. Data should be authentic. The data maintained by an organization should be a faithful model of the real world and its objects. The data and their structure should permit the full representation of any meaningful fact about any object of interest. To take a simple example, if a faculty member may hold appointments in more than one academic department, then the data structure should allow this fact to be represented unambiguously. Conversely, the data structure should prohibit representation of facts which have no meaning to the organization. For example, if a faculty member cannot be appointed without assignment to at least one academic department, then the data structure should be defined in a way that requires this assignment. Data should be authoritative. There should exist a single, authoritative source which may be interrogated to determine any fact about an object of interest. In physical database terms, this means there should be a single, primary storage location for each discrete data item, though it does not preclude multiple secondary storage locations where data may be replicated for various performance, reliability, or other application-related reasons. Data should be accurate. The value of data decreases rapidly in proportion to any decrease in its quality. All available data management facilities, such as semantic integrity or domain constraints, should be employed to maintain the accuracy of an organization's data. Data should be shared. The value of data to an organization increases in proportion to its appropriate use. Once the investment has

been made in defining and capturing data, its application to other uses allows repeated multiplication of benefit while adding only marginally to the organization's cost. Data should be secure. Like any asset of the organization, data should be protected from intentional or accidental corruption or destruction. The degree of protection should be commensurate with the value of the resource, and balanced against the cost to the organization, including the cost of impeded access to the data. Data should be intelligible. If the value of data to an organization increases in proportion to its appropriate use, its value also decreases through misinterpretation and misuse. It is important to provide adequate description, definition, and documentation of the data in support of those in the organization who will be using this valuable resource. Data administration tools and methods The "three-schema architecture" is often used as a starting point for discussion of the tools and methods of data administration.[7] * The conceptual schema, the one most often identified with the data administration function, represents a unified and logically integrated view of the organization's entire collection of data resources. It is often referred to as the "logical view" of the data. * The external schema represents the ways in which different groups of users view those portions of an organization's data resources which are of interest to them. This is often referred to as the "user view" of the data, and multiple user views are typically maintained for a single set of data, each tailored to the information needs of a specific user population. * Finally, the internal schema represents the physical storage of data, which may entail storage on multiple, distributed databases. This is referred to as the "physical view" of the data. Data administration is concerned primarily with establishing the conceptual schema for an organization's data and then, secondarily, with the mapping of this to the other two views of the data. Logical data modeling is the data administrator's chief methodology for construction of the conceptual schema. Most widely used of the data modeling techniques is the entity-relationship (E-R) data model.[8] A variety of graphical representation and notational systems have been developed for E-R modeling, but the underlying structure is fairly constant--that the objects of interest to an organization can be characterized either as entities (objects), relationships among these objects, or attributes (properties or characteristics or, more simply, data values) associated with an entity or a relationship. The result of an E-R data modeling process is a conceptual data model, a graphical representation of the conceptual schema which presents a logically integrated view of the organization's data independent of any physical representation of the data or application-defined use of the data. Data administration is concerned also with recording the facts about the conceptual schema which it learns or discovers through the data modeling process. The data dictionary is the "mechanism to collect, maintain, and publish information about data. It is a central repository

for metadata (information about data)."[9] Data dictionaries can also be more than simply a passive storehouse of information about the organization's data resources. Integration with application development and CASE tools can give the dictionary active control over application software, ensuring the enforcement of rules and constraints of the organization. Integration with user query tools can allow the dictionary to serve as an information resource directory, providing end users with the documentation and support needed to render the organization's data more useful and intelligible. Also, the dictionary can be the repository for information about the entire three-schema architecture itself, storing descriptions of how the logical view of the data maps to the physical and user views. The data administration agenda for distributed data processing The principles, imperatives, tools, and methods of data administration described above have a single theme in common: they are concerned primarily with data, independent of specific technologies of data processing. So, from a "top of the mountain" point of view, data administration would have no special agenda for working with the implementation of distributed data processing technology. And in a sense this is true. Data administration views data essentially as facts about objects of interest to the organization and treats data as one of the organization's most valuable resources; it promotes the values of authenticity, authority, accuracy, accessibility, security, and intelligibility with respect to data; and it continues to focus attention on understanding and describing the conceptual data model of the organization using the methods and tools of E-R data modeling and data dictionaries. But the advent of distributed data processing, and especially distributed database management systems, should cause data administration to re-evaluate its priorities and consider new tools or methods for effectively discharging its responsibilities in the distributed environment. A preliminary review identifies several such adjustments which might appropriately make their way onto data administration's agenda for distributed data processing. First is recognition of the increased importance of the conceptual schema and a corresponding increase in the priority given to data modeling. In the distributed data processing model, the dispersion of processing and data across multiple computing platforms creates the potential for rapidly accelerated dis-integration and fragmentation of the organization's data resources. In a central system model, data administration had two strong allies in maintaining the logical integration of the organization's data resources: application inertia and the historical perspective of a central systems development staff. In fact, even where strong formal data management discipline was lacking, the existence of certain key, central application systems provided a point of reference and focus for subsequent application development: code values, data formats, or business rule algorithms were often "cloned" from one application to another, ensuring at least some measure of coherence across applications. More active support for logical integration was provided by the central application development staff capable of providing a high degree of intelligence and historical perspective about the

organization's data structures and their meaning, and whose intervention could be relied on to re-use data or application code, thus limiting redundancy or divergence. Although these factors have substituted for the creation of a formal data administration function in some computing organizations, they cannot be counted on in a distributed data processing environment. With applications migrating to diverse, distributed computing platforms and application development sometimes following this trend and being dispersed as well, the conceptual data model becomes a necessary blueprint for logical integration. The second item for the agenda of data administration is development of an institutional data architecture, or data deployment plan.[10] The first step of this approach is to categorize data by their primary use--operational or decision support--and by their general form-primitive (detail-level data) or derived (extracted, aggregated, or summarized data). The second step is to construct a logical model of the primitive data, since this will form the basis for all subsequent derived data, and to define a standard set of extract algorithms for construction of derived data. The final step is to determine the optimal computing platform and DBMS for deployment of each component of the data architecture. Given the limits of current distributed database technology, it is not possible to locate data anywhere in a networked environment, across any mix of unlike computing systems and database managers, and still guarantee its logical integration. The data architecture methodology allows data administration to focus the greatest attention on ensuring logical coherence in the primitive and operational databases, from which all other data are derived. It also provides data administration with the opportunity to standardize the extract algorithm process, so that derived databases which are deployed across multiple, unlike systems have a known and stable relationship to the primitive, operational databases. And finally, as decisions are made about the deployment of databases, the disciplined analysis of the data model and data architecture provide the data administrator with information about where this deployment may put logical integration at risk. A third item is the role of data administration in technology tool selection. This involvement is especially relevant to selection of application development tools, 4GL user query tools, and, if the decisions have not already been made, the database management system products themselves. If the data architecture analysis identifies areas of potential risk to the logical integration of the organization's data, and if the DBMS environment cannot support full integration across all of the organization's data resources, it is important that the data administrator provide input and guidance to the software evaluation process--helping the organization to identify and select database, development, or query products which employ some of the mediating strategies discussed earlier: (1) standardization on RDBMS and SQL, including SQL dialect translation gateways;(2) integrity enforcement via stored procedures; (3) integrity enforcement via data-driven application development products; or (4) integration at the level of the user view via multiple-DBMS query products. A fourth item is the development of standard practices and procedures for the management of data within the institution. In a centralized environment the storage of data within a single physical system and single DBMS provides the necessary structure for a high degree of logical integration--consistent data definitions, data access routines, integrity constraints, security and audit controls, and a

single point of management for system administration and database administration. Introduction of multiple physical systems and multiple DBMS products places a greater responsibility for logical integration on the use of procedural standards. This is especially true in the areas of development methodology, system administration, security administration, database administration, and data management. Data administration should help define these standards and a means for their enforcement. Its role is to supplement the available technology-based tools for data integration, to facilitate the implementation of distributed computing technology. A final item for the data administrator's agenda is organizational data policy. Top level commitment is always identified as a critical success factor for data administration. This need for organizational awareness and commitment is magnified by the potentially significant costs and complexities associated with trying to achieve logical integration in a distributed data processing environment. Data administration, and all involved with the distributed data processing effort, must be certain that the organization is committed to this goal of integration--that the organization's strategy is not, in fact, decentralization. Data administration also has an obligation in this regard to provide the organization with a realistic assessment of cost, complexity, and risk associated with this effort. A formal data management policy which affirms the organization's commitment to data authenticity, authority, accuracy, accessibility, security, and intelligibility can be one of the data administrator's most valuable resources. CONCLUSION The implementation of distributed data processing is an ambitious and complex undertaking. Data administration has the opportunity to significantly influence the success of this effort. Looking at the worst case scenarios: If the data administration function of an organization is non-existent or ineffective, the result can be a dis-integration and fragmentation of the organization's data resources--decentralization of computing instead of distribution. Or, if data administration misperceives its role in the organization and identifies its duty to data integration primarily as one of enforcing centralization, then the data administrator can become an obstacle to distributed data processing. In the best case, however, data administration can be the function that provides the organizational "glue" to hold together the data resources of an institution. Its emphasis on the value of data as a shared resource of the institution meshes perfectly with the true goals of distributed data processing--the logical integration of processes and data across a physically distributed network of computing systems. ======================================================================== Footnotes 1 Enterprise Need for Data Administration in the '80s and Beyond, Guide Document: DP-1831 (Chicago, Ill.: Guide International Corporation, 1987). 2 E. N. Fong and B. Rosen, "A Guide to Evaluating the Distributed Environment," Data Resource Management, Spring 1990, pp. 17-25.

3 J. Buzzard, "The Client-Server Paradigm: Making Sense Out of the Claims," Data Based Advisor, August 1990, pp. 72-79. 4 Ibid. 5 S. G. Schur, "An Idea Whose Time Has Come," Database Programming and Design, August 1990, pp. 66-72. 6 This discussion is drawn, in part, from R. G. Ross, Entity Modeling: Techniques and Applications (Boston, Mass.: Database Research Group, Inc., 1988). 7 This discussion draws from B. K. Rosen and M. H. Law, Guide to Data Administration, NIST Special Publication 500-173, National Institute of Standards and Technology, 1989. 8 A primary work on E-R modeling is P. P. S. Chen, "The Entity Relationship Model--Toward a Unified View of Data," ACM Transactions on Database Systems 1, March 1976. A more recent, practical introduction to the topic is C. C. Fleming and B. Von Halle, "An Overview of Logical Data Modeling," Data Resource Management, Winter 1990, pp. 5-15. And more thorough coverage, in a practical text, can be found in R. G. Ross, Entity Modeling: Techniques and Applications (Boston, Mass.: Database Research Group, Inc., 1988). 9 W. Durell, Data Administration: A Practical Guide to Successful Data Management (New York: McGraw-Hill Book Company, 1985. 10 For an overview of this concept, see W. H. Inmon and M. L. Loper, "Integrating Information Systems Using a Unified Data Architecture," Data Resource Management, Spring 1990, pp. 43-55. ========================================================================