A Technical Report Submitted in partial fulfillment of the requirements for the Degree of Bachelor of Engineering Under Berhampur University



Roll # EI200210344

September - 2005

Under the guidance of Mrs. SANGHAMITRA PATRI


Palur Hills, Berhampur, Orissa - 761 008, India

Grid computing is poised to drastically change the economics of computing. Grid computing can dramatically lower the cost of computing, extend the availability of computing resources, increase productivity, and improve quality. The basic idea of grid computing is the notion of computing as a utility, analogous to the electric power grid or the telephone network. As a client of the grid, you do not care where your data is or where your computation is done. You want to have your computation done and to have your information delivered to you when you want it. From the server-side, the grid is about virtualization and provisioning. You pool all your resources together and provision these resources dynamically based on the needs of your business; thus achieving better resource utilization at the same time. This paper describes the fundamental attributes of a grid, and the trends in the IT industry that are moving enterprises towards grid computing. It then examines the functionality available in Oracle Database 10g that leverages these trends, and makes grid computing a reality, today.


It feels nice to have got this opportunity to give vent the unbridled feelings of gratitude imprisoned in the core of my heart. It is my proud privilege to epitomize my deepest sense of gratitude and indebtedness to my guide, Mrs. SANGHAMITRA PATRI for her valuable guidance, keen and sustained interest, intuitive ideas and persistent endeavor. His inspiring assistance, laconic reciprocation and affectionate care enabled me to complete my work smoothly and successfully. I acknowledge with immense pleasure the sustained interest, encouraging attitude and constant inspiration rendered by Mr. Sangram Mudali, Director, NIST. His continued drive for better quality in everything that happens at NIST and selfless inspiration has always helped us to move ahead.

At the nib but not neap tide, I bow my head in gratitude at the omnipresent Almighty for all his kindness. I still seek His blessings to proceed further.

SANDIP SARKAR Roll # EI200210344


Under the guidance of.................................................................................................i ABSTRACT...................................................................................................................ii ACKNOWLEDGEMENT............................................................................................iii TABLE OF CONTENTS..............................................................................................iv 1. INTRODUCTION......................................................................................................1 2. DATABASES AND THE GRID...............................................................................3 2.1 GRID TERMINOLOGY......................................................................................5 2.2 GRID DATABASES----THE CURRENT STATE.............................................5 2.3 INTEGRATING DATABASES INTO THE GRID............................................7 2.4 FEDERATING DATABASE SYSTEMS ACROSS THE GRID.....................10 3. VISION OF GRID COMPUTING...........................................................................17 3.1 ENTERPRISE GRIDS.......................................................................................17 3.2 GRID COMPUTING ATTRIBUTES................................................................19 3.3 FIVE GENERATIONS OF DISTRIBUTED COMPUTING............................21 3.4OPEN GRID STANDARDS...............................................................................22 4. ORACLE DATABASE 10g.....................................................................................23 5. CONFIGURING AND INSTALLING ORACLE DATABASE 10g ON STANDARDS- BASED COMPONENTS..................................................................30 6. OPERATIONAL BENEFITS..................................................................................32 7. POSITIONING FOR THE FUTURE .....................................................................35 8. CONCLUSION........................................................................................................37 REFERENCES.............................................................................................................38


Every organization around the world struggles with the very high cost of its information technology infrastructure. These very high costs arise from three primary factors: • • • Excess Computing Capacity: that is poorly utilized due to the need to build capacity for peaks, and the inability to use the spare capacity efficiently. Expensive Capacity Growth: due to the inability to add capacity quickly, when needed, and in low cost, modular units. High Management Costs: due to the complexity of systems; the specialized management tools, procedures, and skills required; and the large amounts of human intervention needed to manage systems. Grid computing is a new software architecture designed to effectively pool together large amounts of low cost modular storage and servers to create a virtual computing resource across which work can be transparently distributed. Grid computing enables computing capacity to be used very efficiently, at low cost, and with very high availability. The resources in a grid can include storage, servers, database servers, application servers, and applications. By pooling resources together, grid computing can offer dependable, consistent, pervasive, and inexpensive access to these resources regardless of their location and when needed. Grid computing thereby provides the best solution to the need for computing and software capacity on-demand. While grid computing has hitherto been primarily used by the scientific community to solve very specialized problems, the rapid evolution of cost-effective networked storage; high speed, high density blade servers; high speed network Interconnects; and low cost operating systems coupled with the advances in systems software (Database Servers and Application Servers) to exploit these advances have now made it possible for enterprises to exploit grid computing. Recognizing the fundamental benefits grid computing offers enterprises, Oracle offers organizations a comprehensive solution to manage information and run enterprise applications on grids. Oracle Database 10g has been designed to manage information on computing grids called database grids. Oracle Application Server 10g (Oracle AS 10g) has been designed to run enterprise

applications on computing grids called application server grids. Both Oracle Database 10g and Oracle Application Server 10g can be very efficiently managed in a grid computing environment using Oracle Enterprise Manager 10g Grid Control. Together these products address the information technology challenges that organizations face like:Eliminating Excess Computing Capacity(Through automatic workload management that distributes workloads to use spare computing capacity efficiently),Enabling Modular, Inexpensive Capacity Growth (Through rapid and efficient software provisioning that enables computing capacity to be added on demand in low cost modular units) and Radically Lowering Management Cost (Through self-managing systems that reduce the need for costly, error-prone human intervention; and through automated software provisioning and management across many systems). Oracle Database 10g is proven to be the fastest database for transaction processing, data warehousing, and third-party applications on servers of all sizes. And it’s proven to securely protect data and ensure data access 24x7, reducing the risk of data loss and system downtime, while keeping the cost of computing down.


Let us examine how databases can be integrated into the Grid. Almost all early Grid applications are file-based, and so, to date, there has been relatively little effort applied to integrating databases into the Grid. However, if the Grid is to support a wider range of applications, both scientific and otherwise, then database integration into the Grid will become important. For example many applications in the life and earth sciences, and many business applications are heavily dependent on databases. First let us consider how databases can be integrated into the Grid so that applications can access data from them. It is not possible to achieve this just by adopting or adapting the existing Grid components that handle files, as databases offer a much richer set of operations (for example queries and transactions), and there is much greater heterogeneity between different database management systems than there is between different file systems. Not only are there major differences between database paradigms (e.g. object and relational), but even within one paradigm different database products (e.g. Oracle and DB2) vary in their functionality and interfaces. This diversity makes it more difficult to design a single solution for integrating databases into the Grid, but the alternative of requiring every database to be integrated into the Grid in a bespoke fashion would result in much wasted effort. Managing the tension between the desire to support the full functionality of different database paradigms, while also trying to produce common solutions to reduce effort, is key to designing ways of integrating databases into the Grid. The diversity of database systems also has other important implications. One of the main hopes for the Grid is that it will encourage the publication of scientific data in a more open manner than is currently the case. If this occurs then it is likely that some of the greatest advances will be made by combining data from separate, distributed sources to produce new results. The data that applications wish to combine will have been created by a set of different researchers who will often have made local, independent decisions about the best database paradigm and design for their data. This heterogeneity presents problems when data is to be combined. If each application has to include its own, bespoke solutions to federating information then similar solutions will be re-invented in different applications, and effort wasted. Therefore, it

is important to provide generic middleware support for federating Grid-enabled databases. Yet another level of heterogeneity needs to be considered. There is also the need to build applications that also access and federate other forms of data. For example, semi-structured data (e.g. XML), and relatively unstructured data (e.g. scientific papers), are valuable sources of information in many fields. Further, this type of data will often be held in files, rather than a database. Therefore, in some applications there will be a requirement to federate these types of data with structured data from databases. There are therefore two main dimensions of complexity in the problem of integrating databases into the Grid: implementation differences between server products within a database paradigm, and the variety of database paradigms. The requirement for database federation effectively creates a problem space whose complexity is abstractly the product of these two dimensions. Unsurprisingly, existing database management systems do not currently support Grid integration. They are however the result of many hundreds of person-years of effort that allows them to provide a wide range of functionality, valuable programming interfaces and tools, and important properties such as security, performance and dependability. As these attributes will be required by Grid applications, we strongly believe that building new Grid-enabled database management systems from scratch is both unrealistic and a waste of effort. Instead we must consider how to integrate existing database management systems into the Grid. As is described later, this approach does have its limitations, as there are some desirable attributes of Grid-enabled databases that cannot be added in this way and need to be integrated in the underlying database management system itself. However, these are not so important as to invalidate the basic approach of building on existing technology. The danger with this approach comes if a purely short-term view is taken. If we restrict ourselves to considering only how existing databases servers can be integrated with existing Grid middleware then we may loose sight of longer-term opportunities for more powerful connectivity. Therefore, we have tried to identify both the limitations of what can be achieved in the short-term solely by integrating existing components and cases where developments to the Grid middleware and database server components themselves will produce longer-term benefits. An important aspect

of this will occur naturally if the Grid becomes commercially important, as the database vendors will then wish to provide “out-of-the-box” support for Grid integration, by supporting the emerging Grid standards. Similarly, it is vital that those designing standards for Grid middleware take into account the requirements for database integration. Together, these converging developments would reduce the amount of “glue” code required to integrate databases into the Grid.

In this section we briefly introduce the terminology that will be used: A database is a collection of related data. A database management system (DBMS) is responsible for the storage and management of one or more databases. Examples of DBMS are Oracle 9i, DB2, Objectivity and MySQL. A DBMS will support a particular database paradigm, for example relational, object-relational or object. A Database System (DBS) is created, using a DBMS, to manage a specific database. The DBS includes any associated application software. Many Grid applications will need to utilise more than one DBS. An application can access a set of DBS individually, but the consequence is that any integration that is required (e.g. of query results or transactions) must be implemented in the application. To reduce the effort required to achieve this, federated databases use a layer of middleware running on top of autonomous databases, to present applications with some degree of integration. This can include integration of schemas and query capability. DBS and DBMS offer a set of services that are used to manage and access the data. These include query and transaction services. A service provides a set of related operations.

In this section we consider how the current Grid middleware supports database integration. We consider Globus, the leading Grid middleware before looking at previous work on databases in Grids. The dominant middleware used for building computational grids is Globus, which provides a set of services covering grid

information, resource management and data management. Information Services allow owners to register their resources in a directory, and provide, in the Monitoring and Discovery Service (MDS) mechanisms through which they can be dynamically discovered by applications looking for suitable resources on which to execute. From MDS, applications can determine the configuration, operational status and loading of both computers and networks. Another service, the Globus Resource Allocation Manager (GRAM) accepts requests to run applications on resources, and manages the process of moving the application to the remote resource, scheduling it and providing the user with a job control interface. An orthogonal component that runs through all Globus services is the Grid Security Infrastructure (GSI). This addresses the need for secure authentication and communications over open networks. An important feature is the provision of “singlesign on” access to computational and data resources. The latest version of Globus (2.0) offers a core set of services (called the Globus Data Grid) for file access and management. There is no direct support for database integration and the emphasis is instead on the support for very large files, such as those that might be used to hold huge datasets resulting from scientific experiments. GridFTP is a version of FTP optimised for transferring files efficiently over highbandwidth wide area networks and it is integrated with the Grid Security Infrastructure. There have been recent moves in the Grid community to adopt Web Services as the basis for Grid middleware, through the definition of the Open Grid Services Architecture (OGSA). This will allow the Grid community to exploit the high levels of investment in Web Service tools and components being developed for commercial computing. The move also reflects the fact that there is a great deal of overlap between the Grid vision of supporting scientific computing by sharing resources, and the commercial vision of enabling Virtual Organisations - companies combining information, resources and processes to build new distributed applications. Despite lacking direct support for database integration, Globus does have services that can assist in achieving this. The Grid Security Infrastructure could be used as the basis of a system that provides a single sign-on capability, removing the need to individually connect to each database with a separate username and password. However, mechanisms for connecting a user or application to the database in a

particular role, and for delegating restricted access rights are required but are not currently directly supported by GSI. A recent development - the Community Authorisation Service - does offer restricted delegation, and so may offer a way forward. Other Globus components could also be harnessed in order to support other aspects of database integration into the Grid. For example, GridFTP could be used both for bulk database loading and, where efficient, for the bulk transfer of query results from a DBS to another component of an application. The MDS and GRAM services can be used to locate and run database federation middleware on appropriate computational resources. In the longer term, the move towards an OGSA servicebased architecture for Globus is in line with the proposed framework for integrating databases into the Grid.

In this section we describe a framework for integrating databases into Grid applications and identify the main functionality. The proposed framework is servicebased. The Figure 1 shows the service-based framework, with a service wrapper placed between the Grid and the DBS (we deliberately refer to DBS here rather than DBMS, as the owner of the database can choose which services to make available on the Grid, and who is allowed to access them). Initially, the service wrappers will have to be custom produced, but, in the future, if the commercial importance of the Grid increases, and standards are defined, then it is to be hoped that DBMS vendors will offer Grid-enabled service interfaces as an integral part of their products. We now discuss each of the services shown in Figure 1: Metadata: This service provides access to technical metadata about the DBS and the set of services that it offers to Grid applications. Examples include the logical and physical name of the DBS and its contents, ownership, version numbers, the database schema and information on how the data can be accessed. The service description metadata would, for each service, describe exactly what functionality is offered. This would be used by Grid application builders, and tools that need to know how to interface to the DBS. It is particularly important for applications that are dynamically constructed –

the two-step access to data means that the databases that are to take part in an application are now

Known until some preliminary processing of metadata has taken place. Each run of such applications may result in the need to access a different set of databases, and so mechanisms are required to dynamically construct interfaces to those DBS – if they are not all able to offer completely standard interfaces, then the metadata can be accessed to determine their functionality and interfaces, so that they can be dynamically incorporated into the application. Query: Query languages differ across different DBMS, though the core of SQL is standard across most relational DBMS. It is therefore important that the service metadata defines the type and level of query language that is supported. To provide input to scheduling decisions, and enable the efficient planning of distributed Grid applications, an operation that provides an estimate of the cost of executing a query is highly desirable. As described in the requirements section, the query service should also be able to exploit a variety of communications mechanisms in order to transfer results over the Grid, including streaming (with associated flow control) and transfer as a single block of data. Finally, it is important that the results of a query can be delivered to an arbitrary destination, rather than just to the sender of the query. This allows the creation of distributed systems with complex communications structures, rather than just simple client-server request-response.

Transaction: These operations would support transactions involving only a single DBS and also allow a DBS to participate in applicationwide distributed transactions, where the DBS supports it. There are a variety of types of transactions that are supported by DBMS (for example, some but not all support nested transactions), and so a degree of heterogeneity between DBS is inevitable. In the longer term, there may also be a need for loosely co-ordinated, longrunning transactions between multiple enterprises, and so support for alternative protocols (e.g. the Business Transaction Protocol BTP may become important. Given the variety of support that could be offered by a transaction service, the service-description metadata must make clear what is available at this DBS. Bulk Loading: Support for the bulk loading of data over the Grid into the database will be important in some systems. For large amounts of data, the service should be able to exploit Grid communication protocols that are optimised for the transfer of large datasets (e.g. GridFTP). Notification: This would allow clients to register some interest in a set of data, and receive a message when a change occurred. Supporting this function requires both a mechanism that allows the client to specify exactly what it is interested in (e.g. additions, updates, deletions, perhaps further filtered by a query) and a method for notifying the client of a change. Scheduling: This would allow users to schedule the use of the DBS. It should support the emerging Grid scheduling service, for example allowing a DBS and a supercomputer to be co-scheduled, so that large datasets retrieved from the DBS can be processed by the supercomputer. Bandwidth on the network connecting them might also need to be pre-allocated. As providing exclusive access to a DBS is impractical, mechanisms are needed to dedicate sufficient resources (disks, CPUs, memory, network) to a particular task. This requires the DBS to provide resource preallocation and management, something that is not well supported by existing DBMS,

and cannot be implemented by wrapping the DBMS and controlling the resources at the operating system level. This is because DBMS, like most efficiently designed servers, run as a set of processes that are shared among all the users, and the management of sharing is not visible or controllable at the operating system process level. Accounting: The DBS must be able to provide the necessary information for whatever accounting and payment scheme emerges for the Grid. This service would monitor performance against agreed service levels, and enable users to be charged for resource usage. The data collected would also provide valuable input for application capacity planning, and for optimising the usage of Grid resources. As with scheduling, as a DBS is a shared server it is important that accounting is done in terms of the individual users (or groups) use of the DBS, and not just aggregated across all users.

Last Section stressed the importance of being able to combine data from multiple DBS. The ability to generate new results by combining data from a set of distributed resources is one of the most exciting opportunities that the Grid will offer. In this section we consider how the service-based framework can help to achieve this. One option is for a Grid application to interface directly to the service interfaces of each of the set of DBS whose data it wishes to access. This approach is illustrated in Figure 2. However, this forces application writers to solve federation problems within the application itself. This would lead to great application complexity, and duplication of effort.


To overcome these problems we propose an alternative, in which Grid-enabled middleware is used to produce a single, federated “virtual database system” to which the application interfaces. Given the service-based approach proposed, federating a set of DBS reduces to federating each of the individual services (query, transaction etc.). This creates a Virtual DBS, which has exactly the same service interface as the DBS described in the previous section but does not actually store any data (advanced versions could however be designed to cache data in order to increase performance). Instead, calls made to the Virtual DBS services are handled by service federation middleware that interacts with the service interfaces of the individual DBS that are being federated, in order to compute the result of the service call. This approach is shown in Figure 3. Because the Virtual DBS has an identical service interface to the “real” DBS, then it is possible for a Virtual DBS to federate the services of both “real” DBS, and other Virtual DBS. Two different scenarios can be envisaged for the creation of a Virtual DBS: 1) A user decides to create a Virtual DBS that combines data and services from a specific set of DBS that they wish to work with. These may, for example, be well known as the standard authorities in their field.


2) A user wishes to find and work with data on a subject of their interest, but they do not know where it is located. A Metadata query would be used to locate appropriate datasets. These would then be federated to create a Virtual DBS that could then be queried. At the end of the work session, the Virtual DBS could be saved for future use. How can the Virtual DBS be created? The ideal situation would be for a tool to take a set of DBS and automatically create the Virtual DBS. At the other end of the scale, a set of bespoke programs could be written to implement each service of the Virtual DBS. Obviously, the former is preferable, especially if we wish to dynamically create Virtual DBS as in the second scenario above. Bearing this in mind, we now consider the issues in federating services. The service-based approach proposed assists in the process of federating services, by encouraging standardisation. However, it will not be possible to fully standardise all services, and it is the resulting heterogeneity that causes problems. A tool could attempt to create a Virtual DBS automatically as follows. For each service, the tool would query the metadata service of each of the DBS being federated in order to determine their functionality and interface. Knowing the integration middleware that was available for the service, and the requirements that this middleware had for the underlying services, the tool would determine the options for federation. If there were more than one option then one would be selected (possibly taking into account application or user preferences). If no options were available then the application or user would be informed that no integration of this service was possible. In this case, the user would either not be able to use the service, or would need to write new federation middleware to effect the integration, if that were possible. Integrating each of the services proposed raises specific issues that are now described: Query: Ideally this would present to the user a single integrated schema for the virtual DBS, and accept queries against it. A compiler and optimiser would determine how to split up the query across the set of DBS, and then combine the results of these subqueries. The major relational DBMS products already offer “Star” tools that implement distributed query middleware. Grid applications do however introduce new

requirements, in particular the need for conformance with Grid standards, and the ability to query across dynamically changing sets of databases. The service-based approach to Grid-enabling databases simplifies the design of federation middleware as it promotes the standardisation of interfaces, but, as was stated in the requirements section, it does not address the higher-level problem of the semantic integration of multiple databases, which has been the subject of much attention over the past decades. The nature of the Grid does however offer some interesting new opportunities for distributed query processing. Once a query has been compiled, Grid resources could be acquired on demand for running the distributed query execution middleware. The choice of resources could be made on the basis of the response time, and price requirements of the user. For example, if a join operator was the bottleneck in a query, and performance was important, then multiple compute nodes could be acquired and utilised to run that part of the query in parallel. If the user was charged for time on the compute nodes, then a trade-off between price and performance would need to be made. Further, because query optimisers can only estimate the cost of a query before it is run, queries sometimes take much longer than expected, perhaps because a filter or join in the middle of a query has produced more data than expected. An option here for Grid-based distributed query execution is to monitor the performance at run-time and acquire more resources dynamically in order to meet the performance requirements of the user. Transaction: The basic transaction service described already supports the creation of distributed transactions across multiple databases. Bulk Loading: This could be implemented by middleware that takes a load file, splits it into separate files for each DBS and uses the bulk load service of each individual DBMS to carry out the loading.


Notification: A client would register an interest in the virtual DBS. Middleware would manage the distribution of the notification operations: registration, filtering and notification, across the DBS. This should ideally be done using a generic Gridenabled event service so that a database specific federation solution is not required. Metadata: This would be a combination of the metadata services of the federated databases, and would describe the set of services offered by the Virtual DBS. At the semantic, data-description level (e.g. providing a unified view of the combined schema) the problems are as described above for the query service. Scheduling: This would provide a common scheduling interface for the virtual DBS. When generic, distributed scheduling middleware is available for the Grid, the implementation of a federated service should be relatively straightforward.


Accounting: This would provide a combined accounting service for the whole virtual DBS. As a Grid accounting service will have to support distributed components, the implementation of this service should be straightforward once that Grid accounting middleware is available. As has been seen, the complexity of the service federation middleware will vary from service to service, and will, in general, increase as the degree of heterogeneity of the services being federated increases. However, we believe that the service-based approach to federating services provides a framework for the incremental development of a suite of federation middleware, by more than one supplier. Initially, it would be sensible to focus on the most commonly required forms of service federation. One obvious candidate is query integration across relational DBMS. However, over time, applications would discover the need for other types of federation. When this occurs, then the aim is that the solution would be embodied in service federation middleware that fits into the proposed framework described above, rather than it being buried in the application specific code. The former approach has the distinct advantage of allowing the federation software to be re-used by other Grid applications. Each integration middleware component could be registered in a catalogue that would be consulted by tools attempting to integrate database services. The process of writing integration components would also be simplified by each taking a set of standard service interfaces as “inputs” and presenting a single, standard federated service as “output”. This also means that layers of federation can be created, with virtual databases taking other virtual databases as inputs. To conclude, we believe that if the Grid is to become a generic platform, able to support a wide range of scientific and commercial applications, then the ability to publish and access databases on the Grid will be of great importance. Consequently, it is vitally important that, at this early stage in the Grid’s development, database requirements are taken into account when Grid standards are defined, and middleware is designed. In the short term, integrating databases into Grid applications will involve wrapping existing DBMS in a Grid-enabled service interface. However, if the Grid becomes a commercial success then it is to be hoped that the DBMS vendors will Grid-enable their own products by adopting emerging Grid standards.


The central idea of grid computing is that computing should be as reliable, pervasive, and transparent as a utility. It shouldn’t matter where your data or application resides, or what computer processes your request. You should be able to request information or computation and have it delivered – as much as you want, whenever you want. This is analogous to the way electric utilities work in that you don’t know where the generator is or how the electric grid is wired. You just ask for electricity and you get it. The goal is to make computing a utility – a ubiquitous commodity. Hence, it has the name, “grid.” Grid computing was conceived in the academic and research communities. Much like internet computing, which grew from the communication needs of dispersed scientific researchers, grid computing originated from the needs of the scientific community’s needs to: 1. Create a dynamic computing environment for sharing resources and results 2. Scale to accommodate petabytes of data, and teraflops of computing power 3. And keep costs down

SETI@home, the Search for Extraterrestrial Intelligence, is one of the earliest examples of a scientific grid. Signals from telescopes, radio receivers, and other sources monitoring deep space are distributed to the PCs of individual science buffs via the internet. This loose network of small computers crunches numbers, looking for patterns that could suggest signs of intelligent life. Although the idea of harnessing idle computers across the internet is intellectually interesting, businesses will never want their data or their computing distributed to random computers. But, just as businesses have brought the concepts of the public internet in-house to make intranets, enterprises can bring the concepts of the scientific grids in-house to make enterprise grids. With both public grids and enterprise grids, grid computing is about harnessing the work of many of small computers. The need for low-cost computing drove the SETI@home innovation. The primary benefit of grid computing to


businesses is achieving high quality of service and flexibility at lower cost. Enterprise grid computing lowers costs by: 1. Increasing hardware utilization and resource sharing 2. Enabling companies to scale out incrementally with low-cost components 3. Reducing management and administration requirements Enterprise grid computing builds a critical software infrastructure that can run on large numbers of small, networked computers, by combining two related concepts: Implement One from Many: Grid computing coordinates the use of clusters of machines to create a single logical entity, such as a database or an application server. By distributing work across many servers, grid computing exhibits benefits of availability, scalability, and performance uses low-cost components. Because a single logical entity is implemented across many machines, companies can add or remove capacity in small increments, online. With the capability to add capacity on demand to a particular function, companies get more flexibility for adapting to peak loads, thus achieving better hardware utilization and better business responsiveness. Manage Many as One: Grid computing allows you to manage and administer groups of machines, groups of database instances, and groups of application servers at low-cost. Grid computing first removes many of the administrative costs of managing a single system by making each database and each application server adaptive to changing circumstances. Then, the model makes managing many systems simple, by allowing them to be managed as a single logical entity. Much of what makes grid computing possible today are the innovations in hardware. For example, 1. Processors. New low-cost, high volume Intel Itanium 2, Sun SPARC, and IBM PowerPC 64-bit processors now deliver performance equal to or better than exotic processors used in high-end SMP servers. 2. Blade Servers. Blade server technology reduces the cost of hardware and increases the density of servers, which further reduces expensive data centre real estate requirements. 3. Networked Storage. Disk storage costs continue to plummet even faster than processor costs. Network storage technologies such as Network Attached Storage

(NAS) and Storage Area Networks (SANs) further reduce these costs by enabling sharing of storage across systems. 4. Network Interconnects. Gigabit Ethernet and Infiniband interconnect technologies are driving down the cost of connecting servers into clusters. Although the newness of grid computing comes primarily from hardware, the power of the grid infrastructure must be embodied in software. The capability of a database, for example, to store and retrieve data through an abstract interface without knowing much about the underlying location or structure of that data requires software intelligence. The capability of an application server to begin distributing work to newly added blade servers without going offline can only be accomplished with software. By providing software to leverage and control new grid hardware, Oracle supplies the grid infrastructure, and powers enterprise grids.

The requirements for grid computing infrastructure can be described by the following attributes: 1. Virtualization at every layer of the computing stack 2. Provisioning of work and resources based on policies and dynamic requirements 3. Pooling of resources to increase utilization 4. Self-adaptive software that largely tunes and fixes itself 5. Unified management and provisioning Virtualization at Every Layer: Virtualization is the abstraction into a service of every physical and logical entity in a grid. Virtualization is important because it enables grid components (such as storage, processors, databases, application servers, and applications) to integrate tightly without creating rigidity and brittleness in the system. Rather than making fixed ties that determine which application server node will handle requests from a particular application, for example, or where a database physically locates its data, virtualization enables each component of the grid to react to changing circumstances more quickly and to adapt to component failures without compromising performance of the system as a whole.

Dynamic Provisioning: Provisioning simply means distributing supplies where they are needed. In the context of the grid, “supplies” may mean server requests that need to be handled, data that needs to be accessed and used, or computations that need to be performed. Provisioning in the grid environment means a grid service broker that knows the resource requirements of one element of the grid and the resource availability of another element links the two together automatically and dynamically to make efficient use of resources. Then it adjusts the associations as circumstances change. Policies, such as response time thresholds or anticipated peak demands, can be used to further optimize the associations of resource-requestors to resource providers. Resource Pooling: Consolidation and pooling of resources is required for grids to achieve better utilization of resources, a key contributor to lower costs. By pooling individual disks into storage arrays and individual servers into blade farms, the grid runtime processes that dynamically couple service consumers to service providers have more flexibility to optimize the associations. Resource sharing also happens purely in software. Web services provide the model for applications to expose re-usable functionality for discovery and invocation by unrelated applications. Self-Adaptive Software: With labour being the most significant portion of IT costs, savings due to better hardware utilization or more responsive systems become irrelevant if the everyday tasks of administrators are not automated and simplified. A grid infrastructure would be unworkable if every node required constant manual tuning and intervention. A critical grid infrastructure requirement is systems that automate the bulk of maintenance and tuning tasks traditionally performed by IT staff. More of the tasks that used to be performed by administrators must now be handled by the systems themselves. Unified Management:

Even with self-managing systems, human beings will always be involved in managing an enterprise grid, but the management tasks required by humans should be simplified with a single tool that can provision, monitor, and administer every element in the grid. Such a tool should evaluate availability and performance from the perspective of the user, such that any bottleneck in the system or any unavailable component raises alerts. Most importantly, with a grid infrastructure, IT professionals must be able to treat groups of systems as a single logical entity so that tasks can be performed once and executed on multiple machines. Implement One from Many: Together, the attributes of virtualization, dynamic

provisioning, and resource pooling form the requirements for software that implements a single logical entity using many services running on multiple servers and crossing multiple disks—an entity which delivers high quality of service from low-cost components. Manage Many as One: Together, the attributes of self-adaptive software and a unified management model form the requirements for dramatically lowering management costs by viewing the entire enterprise grid as one simple whole.

Grid Computing is merely the newest generation of distributed computing. The given Table lists them. The industry is clearly entering the fifth generation now. FIVE GENERATIONS OF GRID COMPUTING


Grid Computing is the result of several trends coming together. Some of these are the following: 1. New standards for object-to-object communications making it easier to build Multivendor, multiapplication networks. 2. High-performance microprocessors have become available, making it possible to deploy large applications on a number of low-cost systems rather than a single midrange system. 3. High-speed networking technology is becoming both less costly and readily available, offering higher levels of performance when deploying distributed application architectures. As these trends combine, applications are likely to be segmented by function or instance of a function. This approach will allow each function to be hosted on the most cost-effective platform. In some cases, both types of segmentation will be used. In the end, an organization’s systems can be considered a pool of shared resources that adapt automatically to changing conditions and failures based upon rules of the Organization’s choosing.

With Oracle 10g, companies can begin implementing grid computing today, but the open standards that will make grid computing as pervasive as the internet are still under development, primarily by the Global Grid Forum (GGF). Oracle is a GGF sponsor and participates in working groups, chairing the Data Access and Integration (DAI) group. The Open Grid Services Architecture (OGSA) is a specification in active development within the GGF to define the general services-based approach to grid computing. Other working groups, such as Open Grid Services Infrastructure (OGSI) and OGSA-DAI, endeavour to define the common interfaces and protocols for various grid services. Oracle plans to actively support all grid-related open standards as they emerge.


Oracle 10g provides the first complete, integrated software infrastructure to power grid computing. Oracle 10g takes the fundamental attributes of grid computing… Implement One from Many, Virtualization at every layer, Dynamic provisioning, Resource pooling. Manage Many as One, Self-adaptive software, unified management…and implements them throughout every element of the grid: storage, databases, application servers, and applications. The diagram visually depicts the way Oracle 10g products and features map to grid computing requirements.

The following sections describe how grid computing attributes are embodied in Oracle’s three grid infrastructure products: Oracle Database 10g Oracle Application Server 10g Oracle Enterprise Manager 10g Grid Control

Oracle Database 10g: Oracle Database 10g builds on the success of Oracle9i Database, and adds many new grid-specific capabilities. Other vendors implement certain portions of a grid infrastructure, for example pools of virtualized storage are becoming common, but no one else can provide a true grid database. Oracle Database 10g is based on Real Application Clusters, introduced in Oracle9i. There are more than 500 production customers running Oracle’s clustering technology, helping to prove the validity of Oracle’s grid infrastructure. Real Application Clusters: Oracle Real Application Clusters enables a single database to run across multiple clustered nodes in a grid, pooling the processing resources of several standard machines. Oracle is uniquely flexible in its ability to provision workload across machines because it is the only database technology that does not require data to be partitioned and distributed along with the work. In Oracle 10g, the database can immediately begin balancing workload across a new node with new processing capacity as it gets re-provisioned from one database to another, and can relinquish a machine when it is no longer needed— this is capacity on demand. Other databases cannot grow and shrink while running and, therefore, cannot utilize hardware as efficiently. New integrated cluster ware in Oracle 10g makes clustering easy by eliminating the need to purchase, install, configure, and support third-party cluster ware. Servers can be easily added and dropped to an Oracle cluster with no downtime. Oracle has the only database technology to include cluster ware for all operating systems, which dramatically reduces the opportunities for failure in a clustered environment. Automatic Storage Management: Automatic Storage Management simplifies storage management for Oracle Databases. By abstracting the details of storage management, Oracle improves data access performance through sophisticated data provisioning, without requiring additional work from DBAs. Instead of managing many database files, Oracle DBAs manage only a small number of disk groups. A disk group is a set of disk devices that Oracle manages as a single, logical unit. An administrator can define a particular disk group as the default disk group for a database, and Oracle automatically allocates storage for and creates or deletes the files

associated with the database object. Automatic Storage Management also offers the benefits of storage technologies such as RAID or Logical Volume Managers (LVMs). Oracle can balance I/O from multiple databases across all of the devices in a disk group, and it implements striping and mirroring to improve I/O performance and data reliability. In addition, Oracle can reassign disks from node to node and cluster to cluster, automatically reconfiguring the group. Because Automatic Storage Management is written to work exclusively with Oracle, it achieves better performance than generalized storage virtualization solutions. Information Provisioning : In addition to the provisioning of work across multiple nodes and the provisioning of data across multiple disks, another type of provisioning happens within Oracle Database 10g — the provisioning of information itself. Depending on the volume of information and the frequency of access, it may be necessary to move data from where it currently resides or to share data across multiple databases. Oracle 10g includes various facilities to provide access to information when and where it’s needed, matching information providers and information requestors. The most fine-grained and real-time of these facilities is Oracle Streams, which can migrate data from one database to another while both are online. Bulk data transfers are more suitable in some circumstances, for which Oracle provides Data Pump and Transportable Table spaces. In Oracle 10g, all information provisioning facilities can move data to databases running on different operating systems, which is particularly useful for migrating databases to a grid environment, for example, blade server’s running Linux. Self-Managing Database: The first step toward manageability in a grid environment is making each individual system require less human attention. Oracle 10g, with the new self managing database, reduces the maintenance and tuning tasks required by administrators. Oracle Database 10g includes an intelligent database infrastructure that takes snapshots of vital statistics and workload data to be analyzed for self-tuning and for advising administrators. The self-managing database automatically diagnoses problems such as poor connection management, lock contention, and poorly performing SQL. Oracle Database 10g fixes certain diagnosed problems and advises DBAs about simple corrective measure in other cases. Oracle’s self-managing

database enables DBAs to concentrate on more value-added work and dramatically reduces administration costs of databases. Oracle Application Server 10g: Oracle Application Server 10g provides a complete infrastructure platform for developing and deploying enterprise applications, integrating many functions including a J2EE and Web services runtime environment, an enterprise portal, an enterprise integration broker, business intelligence, web caching, and identity management services. Oracle Application Server 10g adds new grid computing features, building on the success of Oracle9i Application Server, which has hundreds of customers running production enterprise applications. Application Server Clusters: Oracle Application Server 10g run-time services can be pooled and virtualized via application server clusters. Every service within the Oracle Application Server – HTTP, J2EE, Web cache, Web Services, LDAP, portal and others – can be distributed across multiple machines in a grid. New features in Oracle 10g enable performance thresholds to be defined beyond which new application server instances can automatically be added and started (or relinquished) to process additional work on new nodes of a grid, delivering capacity on demand. With Oracle 10g, an administrator can define a set of policies or business rules that affect how individual work is provisioned across multiple machines. Specifically, workload allocation can be influenced by resource consumption metrics, such as CPU or memory usage, or application-specific metrics, such as transaction throughput or JDBC connections, or workload can be provisioned based on schedules, such as peak times of day or end of quarter. Oracle Application Server 10g provides out-of-the-box instrumentation that captures these various metrics and creates advisories based on historical and real-time information to help administrators make the best policy choices. Oracle Application Server 10g also provides several availability enhancements. Because Oracle 10g includes clustering of every service within the application server, there is no single point of failure. Both planned and unplanned downtime of an individual instance will simply cause requests to be routed to another node. Because Application Server 10g includes efficient session replication, any type of failure (even that of a J2EE application holding state) will remain transparent to the user. Application Server 10g further improves application reliability through its

interaction with Oracle Real Application Clusters. If an instance in the back-end database goes down, Application Server 10g is notified to reconnect. Without notification from a failed instance, an application server would wait for an IP time out, which takes several minutes, but the multi-tier failover notification feature reduces recovery time in such cases to mere seconds, and both failure and recovery remain transparent to the user. Identity Management: Centralized application user administration becomes even more important in a grid environment. Identity management features within Application Server 10g simplify and centralize account creation, suspension, and deletion and privilege modification, all of which lower administration costs and reduce security vulnerabilities. Oracle provides centralized user provisioning and single sign-on for users across all applications deployed to the Oracle Application Server. Access privileges for all applications can be created and revoked through a single interface. Identities can be managed through Oracle Internet Directory, a standards-based LDAP directory that benefits from the availability and scalability of being built on the Oracle Database. Application Development Framework: Tightly integrated with Oracle Application Server 10g are the development tools that enable companies to quickly develop custom internet applications, and then easily deploy those applications to Oracle Application Server. Applications for scientific grids, such as SETI@home, must be designed explicitly to run on loosely connected grids. In contrast, enterprise applications do not need to be re-designed to exploit the availability, scalability, and performance benefits of enterprise grids. When applications are deployed to an application server in a grid, those applications benefit immediately from the transparent workload distribution, load balancing, and scheduling necessary to efficiently coordinate work across multiple servers. To gain additional benefits from grid computing, however, enterprise applications can expose their behaviour to other applications and to management tools through standardized interfaces in a serviceoriented architecture. Oracle Developer Suite 10g, which includes JDeveloper 10g, enables developers to create dynamic Web sites, J2EE applications, and Web services and to make these services accessible through enterprise portals and wireless devices.

Applications designed to a service-oriented architecture can leverage a set of standards-based internet protocols to communicate with other applications and heterogeneous resources across a grid. Designing to a service-oriented architecture enables companies to reduce development time and integration costs. Oracle Enterprise Manager 10g Grid Control Oracle Enterprise Manager 10g Grid Control is the complete, integrated, central management console and underlying framework that automates administrative tasks across sets of systems in a grid environment. Grid Control helps reduce administration costs through automation and policy-based standardization. With Oracle Grid Control, IT professionals can group multiple hardware nodes, databases, application servers, and other targets into single logical entities. By executing jobs, enforcing standard policies, monitoring performance and automating many other tasks across a group of targets instead of on many systems individually, Grid Control enables IT staff to scale with a growing grid. Because of this feature, the existence of many small computers in a grid infrastructure does not increase management complexity. Software Provisioning: Because of the potentially large number of physical nodes, it’s especially important in a grid environment that installation and configuration of the software running on those nodes is fast and requires no human intervention. Manually installing software on hundreds of nodes would be time consuming and cumbersome. Administrators would certainly find ways to work around a manual installation, but the workarounds could lead to unsupportable upgrade situations and lost information about the configuration of the system. With Grid Control, Oracle 10g automates installation, configuration, and cloning of Application Server 10g and Database 10g across multiples nodes. Oracle Enterprise Manager provides a common framework for software provisioning and management, allowing administrators to create, configure, deploy, and utilize new servers with new instances of the application server and database as they are needed. This framework is used not only to provision new systems but also to apply patches and upgrade existing systems. In Oracle Application Server 10g, applications can be deployed once to a single application server instance, registered with the central repository, then automatically deployed to all relevant nodes in the grid. As changes are made to the application and as new nodes are added to the grid, nodes can be kept in sync.

Application Service Level Monitoring: Oracle Grid Control views the availability and performance of the grid infrastructure as a unified whole, as a user would experience it, rather than as isolated storage units, processing boxes, databases, and application servers. An administrator can trace a performance or availability problem as experienced by a user from end to end – from the user visible Web page, through external and internal networks, to application code, application server, and database access. Grid Control then allows an administrator to trace the root cause of the problem down to the individual Java class, for example, or the individual system configuration parameter.


Oracle Database10g makes it easy for you to run your database on a grid running on standard low cost modular hardware components-storage, blades and interconnects. Automatic Storage Management. It simplifies storage management for Oracle databases. By abstracting the details of storage management, Oracle improves data access performance through sophisticated data provisioning, without requiring additional work from DBA’s.Instead of managing many database files Oracle DBAs manage only a small number of disk groups. A disk group is a set of disk devices that oracle manages as a single logical unit. An administrator can define a particular disk group as a default disk group for a database and oracle automatically allocates storage and creates or deletes the files associated with the database object. Automatic Storage Management also offers the benefits of storage technologies such as RAID or LOGICAL VOLUME MANAGERS (LVM).Oracle can manage I/Os from multiple databases across all of the devices in a disk group and it implements stripping and mirroring to improve I/O performance and data reliability. In addition, Oracle can reassign disks from node to node and cluster to cluster, automatically reconfiguring the group. Portable Clusterware Clusterware is the software that provides clustering services for communication between Servers in a cluster. New integrated clusterware in ORACLE 10g makes clustering easy by eliminating the need to purchase, install,configure and support third party clusterware.Servers can be easily added to and dropped from an Oracle cluster with no downtime. With a single install we can identify the nodes where we would like to install the portable clusterware and Oracle Universal Installer installs portable clusterware on all these nodes. Oracle has the only database technology to include clusterware for all operating systems. High Speed Infiniband Network Support

ORACLE 10g has enhancements to provide better performance and scalability withy upcoming high speed interconnects such as Infiniband. We can use it for all net5work communications. It offers many benefits: 1. Infiniband offers a tremendous performance improvement over Gigabit Ethernet networks. The low latency and high band-width of Infiniband makes it especially useful as a cluster interconnects. 2. We can use single network infrastructure for our communication between different servers and between servers and storage. This simplifies the cabling requirement of our data centre. 3. With simplified network infrastructure we use a single network backplane which makes network provisioning easier. 4. With Oracle Database 10g we can now use Infiniband for our application server to database server communication, for server-to-server communication in a clustered database and for server to storage communication. This provides us with all around performance improvement and flexibility in our data centre. Easy Client Install: The Easy Client Install feature simplifies deployment of applications in a grid. Clients of the database only need to download or copy a very small subset of Oracle client files and set an environmental variable. We no longer need to go through the install process on the database client. Easy Oracle Database Install: Oracle Database 10g has simplified the installation of the Oracle Database. We can install it with a single CD.Oracle Universal Installer (OUI) can also perform multinode installs of the clustered Oracle database. During the install we are required to identify the host names where we would like to install the Oracle Database.OUI then installs the Oracle Database software on all of the nodes. We can also decide to have either a single shared image of the software or a separate image on each host machine.


The dynamic nature of the grid imposes stringent operational requirements on the grid infrastructure. The grid infrastructure should be self-reliant (it should be able to tolerate system failures and adapt to changing business needs). Self-reliant database A truly responsive enterprise requires the grid to self-manage and to learn and adapt to changing circumstances. It should tolerate the failures of individual components and provide high availability in all circumstances. High Availability Oracle database 10g brings the highest levels of reliability and availability to the grid. We get the same levels of reliability and availability on the standard low cost modular hardwareservers and storage. Automatic storage management provides reliability and availability on low cost standard storage.RAC provides the same on low cost standard servers. Oracle database 10g provides robust features to protect from data errors and disasters. The new flashback database feature provides the ability to recover a database to a spec9ific time to recover from human error. The recovery time is equivalent to the time duration to which it needs to go back. With this flash backup feature database administrators can now use low cost standard disks for maintaining their backups. Oracle database 10g also includes tools to minimize planned downtime, critical for any interactions in a 24x7 environment. The new rolling upgrade feature enables online applications of patches to the database software. We don’t need to bring down the entire database to apply a patch. We can apply patches to the clustered database –one instance at a time-thus keeping the database online while applying the patch. Self-managing: With the new self managing features, Oracle database 10g has taken a giant leap towards making oracle database self reliant. Oracle database 10g includes an intelligent database monitor that records data regarding all aspects of database performance. Using this information, Oracle database Automatic memory management dynamically allocates memory to different components of the Oracle database. Automatic health management automatically generates alerts regarding various aspects of the database that simplify database monitoring for DBAs.

Oracle enterprise manager grid control: Oracle Enterprise Manager 10g Grid Control is the complete, integrated, central management console and underlying framework that automates administrative tasks across sets of systems in a grid environment. Grid Control helps reduce administration costs through automation and policy-based standardization. With Oracle Grid Control, IT professionals can group multiple hardware nodes, databases, application servers, and other targets into single logical entities. By executing jobs, enforcing standard policies, monitoring performance and automating many other tasks across a group of targets instead of on many systems individually, Grid Control enables IT staff to scale with a growing grid. Because of this feature, the existence of many small computers in a grid infrastructure does not increase management complexity. Managing security in the grid: The dynamic nature of the grid makes security extremely important. Enterprises need to make sure that their data is secure. Exactly the right set of users must have access to the right set of data. At the same time, they need an easy way to manage security through their enterprise. Oracle database 10g makes it easy for enterprises to manage their security needs in the grid. Enterprise User security: Enterprise user security centralizes the management of user credentials and privileges in a directory. This avoids the need to create the same user in multiple databases across a grid. A directory-based user can in the directory. Virtual private database (VPD) VPD provides server-enforced, fine grained access control, and a secure application context that can be used within a grid setting to enable multiple customers ,partners or departments utilizing the same database to have secure access to mission critical data.VPD enables per-user and per-customer data access within a single database, with the assurance of physical data separation.

authenticate and access all the databases

that are within an enterprise domain based on the credentials and privileges specified

Oracle label security Oracle label security gives administrators an out-of-the-box row and now columnlevel-security solution for controlling access to data based on its sensitivity, eliminating the need to manually write such policies.


Grid technologies are evolving rapidly. Oracle assures the cost conscious enterprises that their investments in oracle today will be leveraged for future grid technologies. Oracle posses the right architecture and has its products directions fully aligned to deliver future grid computing technologies. Product directions aligned with grid Oracle product directions are aligned with the grid. Oracle database 10g is the first database designed for the grid. Oracle already supports more grid computing technology than any of its competitors. Grid standards support Oracle is committed to support industry standards. Oracle is working with the global grid forum to help define grid standards. Just has oracle has supported in it’s products ,and is helping other standards such as J2EE,Web Services,Xquery,and SQL,Oracle intends to fully support grid standards. Oracle Database 10g is the world's most affordable self-managing database, eliminating many of the traditional manual administration tasks such as performance tuning, and disk and memory management. Oracle Database 10g Release2 furthers Oracle’s commitment to reducing the cost of computing in all aspects of database development and deployment with: 1. Automated database administration enhancements include statistics collection directly from memory, eliminating the need to execute SQL queries. New administrative reports include automatic database workload repository comparison to help understand changes in workloads and possible performance impact. 2. Application development enhancements include X-Query feature for queries and mapping of XML results inside the database. Oracle HTML DB makes it easier to develop and deploy web-based database applications. Oracle’s commitment to the Windows platform continues with support for CLR stored procedures and tight integration with Visual Studio.

3. The cost of business intelligence is reduced with data mining PL/SQL package support for analytic applications such as Oracle Discoverer. Improved VLDB support is available with more partitions per table and more efficient partition management and query optimization. Information cycle time is reduced with enhanced data loading and query processing improvements.


Many phrases have been coined to describe new computing models created by the IT industry. Grid computing is the emerging standard, and grid computing is Oracle’s approach to lowering costs while improving quality. The benefits of grid computing to businesses are real: increasingly flexible systems that can largely self-manage; better availability, performance and scalability at lower cost; and the opportunity for incremental investment and immediate return. Grid computing will not radically change enterprise data centres, and it does not require throwing out existing investments and best practices. However, grid computing is also not just a passing fad. Enterprise grid computing, based on the Oracle 10g infrastructure, will be the foundation of information technology for the future, resulting in more cost effective computing for running more nimble, data-driven businesses. Grid computing is poised to change the economics of computing. Rapid innovations and new economics in hardware make grid computing possible and sensible at the hardware layer today. Only oracle database 10g leverages these hardware innovations and implements the fundamental attributes of grid computing. Only oracle database 10g with its strong security, self-reliance, and manageability offerings addresses the stringent operational needs of enterprise grids. With oracle database 10g we can realize grid benefits today and leverage our investments in oracle for future grid computing technologies.


1. Global Grid Forum, Global Grid Forum Security Working Group, 2. Foster I., Kesselman C., the Grid: Blueprint for a New Computing Infrastructure. 3.\database


Sign up to vote on this title
UsefulNot useful