You are on page 1of 16

INTRANET GRID - EXPERIENCE THE STREAK OF

LIGHTNING OVER THE INTERNET

CONTENTS

 Abstract
 Getting started with Grid computing
 Importance of Grid computing
 Types of Grid
 Creating our own Grid
 Employing the Globus architecture in our proposed Grid
 Accessing proposed Intranet Grid
 Proposed algorithm for Intranet Grid
 Conclusion

INTRANET GRID-EXPERIENCE THE STREAK OF LIGHTNING OVER


THE INTERNET

ABSTRACT

Today we are in the world of Internet and we always prefer to enjoy fast access of
the Internet. But due to multiple downloading of files there is a chance that the system
might hang up that leads to the restarting of the entire process from the beginning. This is
one of the serious problems that need the attention of the researchers.
So we have taken this problem for our research and in this paper we are providing
a layout for implementing our proposed Intranet Grid that can access the Internet very
fast. By using our Grid we can easily download any number of files very fast depending
on the number of systems employed in the Grid. We have used the concept of Grid
Computing for this purpose. Grid Computing is a technique in which the idle systems in
the Network and their “ wasted “ CPU cycles can be efficiently used by uniting pools of
servers, storage systems and networks into a single large virtual system for resource
sharing dynamically at runtime.

The Grid formulated by us uses the standard Globus Architecture, which is the
only Grid Architecture currently used world wide for developing the Grid. And we have
proposed an algorithm for laying our Intranet Grid that we consider as a blueprint for
further implementation. When practically implemented, our Grid provides the user to
experience the streak of lightening over the Internet while downloading multiple files.

Key words:
Grid Security Interface (GSI), Global Access to Secondary Storage (GASS), Monitoring
and Discovery Service (MDS), Globus Resource Allocation Manager (GRAM).

GETTING STARTED WITH GRID COMPUTING

What's Grid computing? Sometimes it's easier to start defining Grid computing
by telling you what it isn't. For instance, it's not artificial intelligence, and it's not some
kind of advanced networking technology. It's also not some kind of science-fictional
panacea to cure all of our technology ailments.

If you can think of the Internet as a network of communication, then Grid computing
is a network of computation: tools and protocols for coordinated resource sharing and
problem solving among pooled assets. These pooled assets are known as virtual
organizations. They can be distributed across the globe; they're heterogeneous (some PCs,
some servers, maybe mainframes and supercomputers); somewhat autonomous (a Grid
can potentially access resources in different organizations); and temporary.

It's really more about bringing a problem to the computer (or Grid) and getting a
solution to that problem. Grid computing is flexible, secure, coordinated resource sharing
among dynamic collections of individuals, institutions, and resources. Grid computing
enables the virtualization of distributed computing resources such as processing, network
bandwidth, and storage capacity to create a single system image, granting users and
applications seamless access to vast IT capabilities. Just as an Internet user views a
unified instance of content via the World Wide Web, a Grid user essentially sees a single,
large, virtual computer.

The Internet allows us to access remote data through the use of TCP/IP protocols,
and Web browser displays content on the local computer. The important parallel to note
here is that standards define document markup and communication protocols that formed
the worldwide network through integration of isolated local network islands. In the same
way, Grid computing will give worldwide access to a network of distributed resources --
CPU cycles, storage capacity, devices for input and output, services, whole applications,
and more abstract elements like licenses and certificates.

This form of Grid computing has been used by academia and research for several
years. For example, to solve a compute-intensive problem, the problem is split into
multiple tasks that are distributed over local and remote systems, and the individual
results are consolidated at the end. Viewed from another perspective, these systems are
connected to one big computing Grid. The individual nodes can have different
architectures, operating systems, and software versions. Some of the target systems can
be clusters of nodes themselves or high performance servers.

IMPORTANCE OF GRID COMPUTING:


To achieve better response times for requests from the Web, data was cached on
the edge of the network and services were distributed among multiple systems. This
resulted in the distribution of the integrated information infrastructure into heterogeneous
and fragmented systems. But there isn't an efficient way to manage these resources
because they can't be managed remotely, and because they implement inconsistent
interfaces and support proprietary protocols.
As a result, it's the burden of system integrators to combine these distributed
components, while maintaining quality of service. A lack of common architecture and
open standards makes this task more and more complex and burdensome. Administrators
of one system architecture have to communicate verbally with the staff of other systems
to change parameters to guarantee the availability and quality of service of the complete
distributed system. The Grid formulated by us uses the standard Globus Architecture that
is the only Grid Architecture that is currently used world wide for developing the Grid.

TYPES OF GRID:
The three primary types of grids are summarized below. Of course, there are no
hard boundaries between these grid types and often grids may be a combination of two or
more of these. However, as you consider developing applications that may run in a grid
environment, remember that the type of grid environment that you will be using will
affect many of your decisions.
 Computational grid
A computational grid is focused on setting aside resources specifically for
computing power. In this type of grid, most of the machines are high-performance
servers.

 Scavenging grid
A scavenging grid is most commonly used with large numbers of desktop
machines. Machines are scavenged for available CPU cycles and other resources.
Owners of the desktop machines are usually given control over when their resources
are available to participate in the grid.
 Data grid
A data grid is responsible for housing and providing access to data across
multiple organizations. Users are not concerned with where this data is located as long
as they have access to the data.

CREATING OUR OWN GRID:


We are using the Scavenging Grid for our implementation and later planning to
extend it by using both Scavenging and data Grid. The Block Diagram gives an idea
about the Grid that we have proposed. While Internet browsing most of us might have
faced the burden of multiple downloading and in particular with downloading huge files
i.e., there can be a total abrupt system failure while a heavy task is assigned to the
system. The system may hang up and may be rebooted while some percentage of
downloading might have been completed. This rebooting of the system leads to download
of the file once again from the beginning, which is one of the major problems everyone is
facing today.

In order to avoid this problem we have formulated our own Grid for such an
Internet accessing via an Intranet (LAN). For example we have taken into account of a
small LAN that consists of around some 20 systems out of which 10 are idle (for our
consideration) and their CPU cycles are wasted. And our work begins here as we are
going to efficiently utilize those wasted CPU cycles into working cycles.

FIGURE 1: LAYOUT OF OUR INTRANET GRID:


Now let us consider, N numbers of files of different size are being downloaded on
a single system (a Desktop PC). This will take approximately more time to download it
by using the T1-line of normal speed. This is once again a tedious task for the user to
download. Our Grid plays a major role here. By using our Grid this huge number of files
are distributed evenly to all the systems in the Network through the standard protocols,
which we will discuss later on while dealing with the architecture of our Grid.

We are maintaining a database in our Grid for the whole Network and this can be
extended according to our requirements. When we are downloading the file it gets
distributed to the idle systems where we have the Globus Toolkit installed. In this
platform the process gets completed and this file gets stored virtually in the database
(SAN). The authenticated user can access this database and can retrieve his file that he
has downloaded.
The various processes that are taking place in our Grid such as authentication,
availability of resources, scheduling, data management and finally job and resource
management are viewed by following a standard architecture – The Globus Architecture.

EMPLOYING THE GLOBUS ARCHITECTURE IN OUR GRID


While planning to implement a Grid project, we must address issues like security,
managing and brokering the workload, and managing data and resources information.
Most Grid applications contain a tight integration of all these components.

The Globus Project provides open source software tools that make it easier to
build computational Grids and Grid-based applications. These tools are collectively
called the Globus Toolkit. Globus Toolkit is the open source Grid technology for
computing and data Grids. On the server side, Globus Toolkit 2.2 provides interfaces in
C. On the client side, it provides interfaces in C, Java language, and other languages. On
the client side, the Java interfaces are provided by the Java Commodity Grid (CoG) Kit.
Globus runs on Linux, AIX, HP-UX, Solaris, and also on windows operating systems.

The Globus architecture represents a multiple-layer model. The local services


layer contains the operating system services and network services like TCP/IP. In
addition, there are the services provided by cluster scheduling software (like IBM Load
Leveler) -- job-submission, query of queues, and so forth. The cluster scheduling
software allows a better use of the existing cluster resources. The higher layers of the
Globus model enable the integration of multiple or heterogeneous clusters.
FIGURE 2: GLOBUS ARCHITECTURE
The core services layer contains the Globus toolkit building blocks for security,
job submission, data management, and resource information management. Each building
block has programmable interfaces and can be used independently of one another. Globus
itself doesn't offer any vertically integrated solution. A Grid application must integrate the
Globus building blocks itself, or it needs to access services provided by higher-level
services. When integrated with the Globus architecture our Intranet Grid (application)
will provide better utilization of these programmable building blocks available in the
architecture.

The high-level services and tools layer contains tools that integrate the lower level
services or implement missing functionality. For example, the Globus run tool, supplied
by GTK, allows the transfer of the input data and executable to the target system,
execution of the job on the target system, and redirection of the output to the local
system. Globus run thereby integrates the Globus components GRAM, GASS, and GSI
(more on these components later).
Condor-G is an example of software that offers a restart mechanism and a
scheduling mechanism paired with matchmaking, which allows allocating the best
available resource to a request.
The upper-most layer contains applications such as the submission of jobs via a
Web-portal without specifying where they have to be executed. Such an application can
then -- with the help of lower layers -- find the optimal resource on which to distribute
the work, monitor the work on those systems, and represent the results in a nice user
interface. The target system itself may be a cluster with cluster job-scheduling software.

The Globus Components


• Grid Security Interface (GSI)
• GridFTP and Global Access to Secondary Storage (GASS)
• Monitoring and Discovery Service (MDS)
• Globus Resource Allocation Manager (GRAM)

ACCESSING THE INTRANET GRID

When any user wants to access our proposed Intranet Grid in order to download
multiple files over the Internet, then he should follow certain procedures that we consider
necessary for the security of our Grid. The main Requirements for Processing in Grid
Environment are:

• Security: single sign-on, authentication, authorization, and secure data


transfer.
• Resource Management: remote job submission and management.
• Data Management: secure and robust data movement.
• Information Services: directory services of available resources and
their status.
• Fault Detection: Checking the intranet.
• Portability: C bindings (header files) needed to build and compile programs.

Security
A major requirement for Grid computing is security. At the base of any grid
environment, there must be mechanisms to provide security, including authentication,
authorization, data encryption, and so on. The Grid Security Infrastructure (GSI)
component of the Globus Toolkit provides robust security mechanisms. It also provides a
single sign-on mechanism, so that once a user is authenticated, a proxy certificate is
created and used when performing actions within the grid. We are using the GSI sign-in
to grant access to the portal. We can also provide alternative security mechanisms
according to user requirements.

Broker
Once authenticated, the user will be launching an application. Based on the
application, and possibly on other parameters provided by the user, the next step is to
identify the available and appropriate resources to use within the grid. The host on which
the user works requests the server about the availability of the resources in the network.
The server responds to the request by providing the details about the resources available.
This task could be carried out by a broker function. This service is called the Grid
Information Service (GIS), or more commonly the Monitoring and Discovery Service
(MDS). This service provides information about the available resources within the grid
and their status.

Monitoring and Discovery Service (MDS) manages the static and dynamic data of
the participating Grid nodes. Static data might include number of CPUs and the operating
system. Dynamic data might include CPU utilization and free disk space. The Globus
Information Index Service (GIIS) allows you to aggregate the data and query it based on
certain search filters.
Scheduler
Once the resources have been identified, the next logical step is to schedule the
individual jobs to run on them. When the user downloads multiple files by using the Grid,
initially the size of each file is calculated and is stored in a priority queue until the
number of files equals the number of resources available or we can assign a timer for the
Grid to wait before the execution of the processes. Once the timer is set the size of
different files stored in the queue are matched with the appropriate resource available in
the Grid (host) using the Condor-G, software built in with the Globus toolkit.
When the resource gets allocated for each file stored in the queue, the execution
(downloading) of files begins.

Figure 3: Working Of Globus Grid


Data management
If any data -- including application modules -- must be moved or made accessible
to the nodes where an application's jobs will execute, then there needs to be a secure and
reliable method for moving files and data to various nodes within the grid. The Globus
Toolkit contains a data management component that provides such services. This
component, know as Grid Access to Secondary Storage (GASS), includes facilities
such as GridFTP. GridFTP is built on top of the standard FTP protocol, but adds
additional functions and utilizes the GSI for user authentication and authorization.
Therefore, once a user has an authenticated proxy certificate, he can use the GridFTP
facility to move files without having to go through a login process to every node
involved. This facility provides third-party file transfer so that one node can initiate a file
transfer between two other nodes.

Job and Resource Management


We now get to the core set of services that help perform actual work in a grid
environment. The Grid Resource Allocation Manager (GRAM) provides the services to
actually launch a job on a particular resource, check its status, and retrieve its results
when it is complete.

Globus Resource Allocation Manager (GRAM) allows a user to submit jobs.


The Globus job description language - called Resource Submission Language (RSL) is
used to describe the job resource requirements, such as the name of the executable and
the amount of memory required. The RSL is common for all the various cluster job
schedulers. The GRAM component processes the job request in the RSL specification.
On the target system, another GRAM component called the gatekeeper authenticates the
request and starts up a GRAM job manager. The job manager maps the RSL requirements
into the specification/language of the local job scheduler language, starts the job, and
reports all status changes back to the client. The job manager is extensible.
Here is some pseudo code that shows how the different Globus components can be
integrated in a typical application:
Existing Algorithm for Globus Architecture

Step [1].Create security_proxy via GSI services


Step [2].Access a MDS-GIIS server
Step [3].Search for required machine(s)
Step [4].Rank the machine list based on a scheduling policy
Step [5].Prepare the data
Step [6].Transfer the data to the target machine by using GASS services
Step [7].Prepare a RSL document
Step [8].Submit the program using GRAM services
Step [9].React to status changes from GRAM
Step [10].Get results via GASS

DATA RETRIVAL
When the downloading of the files gets completed in the common database, now
the task begins to retrieve the downloaded files from the database. For retrieving the
downloaded files the user has to give the file name in form of queries to the database and
the user should get proper authentication to access the database that adds double security
to the Grid environment. In our concept we are now using only the concept of files
which gets stored in some storage device, later planning to extend our Grid by adding the
database in order to form a complete Grid.

Added module:
Step [11].Reassembling of downloaded files is done
Step [12].Gets stored in the common database.
Step [13].Retrieval of data from database is done after proper authentication.

You'll also see how Grid services and the very framework it all rests on is very much like
object-oriented programming.
PROPOSED ALGORITHM FOR OUR INTRANET GRID
Steps to perform multiple downloading on the Grid. The host details are got from
the server of the LAN in order to identify the various hosts. The host information is got
whenever needed on the priority queue basis.

[1]. Start lookup // look for file size and resource information
[2]. Declare nres, nfile // no of resources available and no of files
[3]. Input nres, nfiles
[4]. Input size // the file size
[5]. Initialize P1  res info // store the resource information in priority queue
P1 with highest system configuration as priority
[6]. Initialize P2  file size // store the file information in the priority queue P2
with maximum file size as priority
[7]. If condition (nfiles == nres) // check whether the no of resources is equal
to no of files
[8]. Initialize counter
[9]. For (counter =1; counter <= nres; counter++) // initialize the loop to assign
the files.
[10]. Assign the 1st file of P2 to the 1st node in P1.// first node will be node with
highest configuration and first file will be the file with maximum size.
[11].Start processing // files directed to the appropriate system for accessing their
wasted CPU cycles.
[12].Loop
[13].Else:
[14].Start timer
[15]. Delay  1 min
[16].Collect incoming files // the files downloaded in this duration.
[17]. Assign the files  P2
[18].Goto step 8
[19].Goto step 1
[20].End // when the user exits from proposed Grid.
CONCLUSION
Grid computing was once said to be fading out but due to the technological
convergence it is blooming once again and the Intranet Grid we have developed adds a
milestone for the Globalization of Grid Architecture, which, leads to the hasty computing
that is going to conquer the world in the nearest future. By implementing our proposed
Intranet Grid it is very easy to download multiple files very fast and no need to worry
about the security as we are authenticating each and every step taking place in our Grid
and in particular user to access the database. Further implementations could be carried
out in the nearest future.

BIBILIOGRAPHY

[1]. The Globus Alliance, The Globus Toolkit 3.0

[2]. Foster, I. and Kesselman, C., The GRID: Blueprint for a New Computing
Infrastructure.Morgan-Kaufmann, 1999.

[3].Foster, I, Kesselman, C, Nick, J.M., and Tuecke,S. The Physiology of the


Grid :An Open Grid Services Architecture for Distributed Systems
Integration. http://www.globus.org/ogsa.2002.

WEB REFERENCES

[1].www. Globus.org
[2].www.GridForum.org
[3]. www.gridcomputingplanet.com