Final Documentation Sky

Sky Computing
A Technical Seminar Report Submitted to
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY, HYDERABAD
In Partial Fulfillment of the requirement For the Award of the Degree of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
Submitted by V.Sindhu(H.T.N0: 20N05A0512)
Under the Supervision of

MS. L.PRIYANKA
Assistant Professor
Department of Computer Science and Engineering
SREE CHAITANYA COLLEGE OF ENGINEERING

(Affiliated to JNTUH, HYDERABAD)
THIMMAPUR, KARIMNAGAR, TELANGANA-505527
2020-2023
(Affiliated to JNTUH , HYDERABAD)
THIMMAPUR, KARIMNAGAR, TELANGANA- 505 527
CERTIFICATE
This is to certify that the Technical Seminar Report entitled “Sky Computing” is being
submitted by V.Sindhu, bearing hall ticket number: 20N05A0512, for partial fulfillment of the
requirement for the award of the degree of Bachelor of Technology in Computer Science and
Engineering discipline to the Jawaharlal Nehru Technological University, Hyderabad during
the academic year 2019-2023 is a bonafide work carried out by her under my guidance and
supervision.
The result embodied in this report has not been submitted to any other University or
institution for the award of any degree of diploma.
Guide Head of the Department
MS. L.PRIYANKA Mr. KHAJA ZIAUDDIN

Assistant Professor Associate Professor
Department of CSE Department of CSE
i
THIMMAPUR, KARIMNAGAR, TELANGANA-505 527
DECLARATION
I, V.SINDHU, is student of Bachelor of Technology in Computer Science and

Engineering, during the academic year: 2019-2023, hereby declare that the work presented in
this Technical Seminar Report Work entitled SKY COMPUTING is the outcome of my own
bonafide work and is correct to the best of my knowledge and this work has been undertaken
taking care of Engineering Ethics and carried out under the supervision of MS.
L.PRIYANKA, Assistant Professor.
It contains no material previously published or written by another person nor material

which has been accepted for the award of any other degree or diploma of the university or other
institute of higher learning, except where due acknowledgment has been made in the text.
V.SINDHU(H.T.NO:20N05A0512)
Date:
Place
iii
THIMMAPUR, KARIMNAGAR, TELANGANA-505-527
ACKNOWLEDGEMENTS
The Satisfaction that accomplishes the successful completion of any task would be
incomplete without the mention of the people who make it possible and whose constant
guidance and encouragement crown all the efforts with success.
I would like to express my sincere gratitude and indebtedness to my seminar supervisor
MS. L.PRIYANKA, Assistant Professor, Department of Computer Science and Engineering,
Sree Chaitanya College of Engineering, LMD Colony, Karimnagar for his valuable
suggestions and interest throughout the course of this technical report.
I am also thankful to Head of the department Mr. KHAJA ZIAUDDIN, Associate
Professor & HOD, Department of Computer Science and Engineering, Sree Chaitanya College
of Engineering, LMD Colony, Karimnagar for providing excellent infrastructure and a nice
atmosphere for completing this report successfully
We sincerely extend out thanks to Dr . G. VENKATESWARLU , Principal, Sree
Chaitanya College of Engineering, LMD Colony, Karimnagar, for providing all the facilities
required for completion of this technical report.
I convey my heartfelt thanks to the lab staff for allowing me to use the required
equipment whenever needed.
Finally, I would like to take this opportunity to thank my family for their support through
the work.
I sincerely acknowledge and thank all those who gave directly or indirectly their support
in completion of this work.
V.SINDHU
iii
ABSTRACT
Infrastructure-as-a-service (IaaS) cloud computing is revolutionizing how we approach computing.

Compute resource consumers can eliminate the expense inherent in acquiring, managing, and
operating IT infrastructure and instead lease resources on a pay-as-you-go basis. IT infrastructure
providers can exploit economies of scale to mitigate the cost of buying and operating resources
and avoid the complexity required to manage multiple customer-specific environments and
applications. The authors describe the context in which cloud computing arose, discuss its current
strengths and shortcomings, and point to an emerging computing pattern it enables that they call
sky computing.Over few years cloud computing has evolved and matured to create favorable
conditions for innovating better cloud applications called Sky Computing. The idea of Sky
Computing can bring about reduction in overall costing of managing and operating IT
Infrastructure rather than using rented resources. Currently companies such as Yahoo, Google and
Amazon maintain huge databases can adopt Sky Computing for reducing cost for software's
managing these large databases. Amalgamation of multiple clouds into one large cloud can be
called Sky Computing or higher version of cloud computing can be called as Sky Computing. Sky
Computing architecture creates a large infrastructure by utilizing resources from numerous cloud
providers. This type of infrastructure provides high performance parallel to computing.
NAME:V.SINDHU
HT.NO:20N05A0512
BRANCH:CSE-B
iii
CONTETNS
TITLE PAGE NO.
CERTIFICATE
iii
DECLARATION ii
ACKNOWLEDGEMENTS iii
ABSTRACT iv
LIST OF FIGURES vii
CHAPTER 1
INTRODUCTION 1
CHAPTER 2
EVOLUTION 3
2.1 Coming up of Big Data 3
CHAPTER 3
NEEDS TO ADDRESS BIG DATA 5
CHAPTER 4
CHARACTERISTICS AND APPLICATIONS 7
Big Data – Characteristics 7
Big Data – Applications 8
CHAPTER 5
DATA PROCESSED BY BIG DATA 10
Structured Data 10
Unstructured Data 10
CHAPTER-1
7
INTRODUCTION
The initial applications were recently built following different distributed computing
Approaches:
1. a service oriented architecture (soa) based training system

2. A modular real-time data analyzer
3. A cluster-based simulator.
But cloud technologies are currently designed mainly for developing new applications. “early cloud
providers were focused on developers and technology startups when they designed their offerings”.
Software architects looking to build new applications can design the components, processes and workflow
for their solution according to the new cloud related concepts. However, building new applications that are
being architected from scratch for the cloud is only slowly gaining traction, and there are only few
enterprise applications that currently take real advantage of the cloud’s elasticity. Having an application
distributed across multiple clouds to a large extent reduces the risk of data security and storage, as well as
power and equipment breakdown. This is one of the reasons that led to the bringing together of several
clouds (owned by different providers) to form what is known as sky computing. Infrastructure as- a-service
(iaas) cloud computing is revolutionizing how we approach computing. Infrastructure-as-a- service (iaas)
cloud computing represents a fundamental change from the grid computing reconciling those choices
between multiple user groups proved to be complex, time consuming, and expensive. Compute resource
consumers can eliminate the expense inherent in acquiring, managing, and operating it infrastructure and
instead lease resources on a pay-as-you-go basis. It infrastructure providers can exploit economies of scale
to mitigate the cost of buying and operating resources and avoid the complexity required to manage
multiple customer-specific environments and applications. So, this complexity helped to arose an emerging
computing pattern known as “sky computing”. The main advantage of the cloud computing is that this
technology reduces the cost effectiveness for the implementation of the hardware, software and license for
all users can further Benefit from low cost and high resource utilization by using sky computing.
8
What Is Sky Computing
Sky Computing is an emerging computing model where resources from multiple clouds providers are
leveraged to create large scale distributed infrastructures.
Fig1: Sky Computing
• Federation of multiple clouds
• Creates large scale infrastructures
• Allows to run software requiring large computational power
• Sky providers are consumers of cloud providers
• “Virtual” datacenter-less dynamic clouds
9
Cloud computing vs. Grid Computing vs. Sky Computing
A. Control on Resources
Grid Computing- When using remote resources for regular computing assumption is done
that control over resources is stays with the site, but this choice is not always useful when remote
users who might need a different OS or login access.
Cloud computing- A cloud computing infrastructure is a complex system with a large
number of shared resources. These are subject to unpredictable requests and can be affected by
external events beyond our control. Cloud resource management requires complex policies and
decisions for multi-objective optimization. The strategies for cloud resource management
associated with the three cloud delivery models, Infrastructure as a Service (IaaS), Platform as a
Service (PaaS) and Software as a Service (SaaS), differ from one another.Sky computing- Sky
computing allows users to control resources on their own. So trust relationships within sky
computing are the same as those within a traditional non distributed site, simplifying how remote
resources interact.
B. Scalability
Grid Computing- It is hard to scale
Cloud computing- The ability to scale on demand is one of the biggest advantages of
cloud computing. Often, when considering the range of benefits of cloud, it is difficult to
conceptualize the power of scaling on demand, but organizations of all kinds enjoy tremendous
benefits when they correctly implement auto scaling.
Sky computing- It is dynamically scalable as resources are distributed over several cloud.
C. Security
Grid Computing- Grid systems & applications require standard security functions such as
authentication, access control, integrity, privacy & non repudiation. To develop security
architecture of grid system it should satisfy the following constraints like single sign on,
protection of credential, interoperability with local security solutions etc. But with the
development of latest security technologies like globus toolkit security models has tightened the
grid security to some extent.
Cloud computing- In this security is not strong as users data is disclosed to unauthorized systems
& sometimes hijacking of accounts is possible because of unauthorized access to intruder.
10
Sky computing- When in sky computing we deploy a single appliance with a specific
provider, we rely on basic security and contextualization measures this provider – specific
networking & security context. So security relationships are more complex require provider
independent methods to establish a security & configuration context.
D. Challenges
Grid Computing- It distributes the resources on large geographically distributed
environments & accesses the heterogeneous devices & machines. So it is major challenge to
manage the administrative of the grid computing & the software, which are enabled the grid
computing are less.
Cloud computing- The cloud services providers are faced with large, fluctuating loads
that challenge the claim of cloud elasticity. In some cases, when they can predict a spike, they
can provision resources in advance.
Sky computing-To connects the client to a trusted networking domain and configures
explicit trust & relationships between them so that client securely takes ownership of customized
infrastructure for an agreed time period. To achieve this is a major challenge in sky computing.
E. Applications
Grid Computing- Grid portals, Load balancing, Resource broker etc.
Cloud computing- Big data analytics, File storage, Disaster recovery, Backup etc.
Sky computing- Seasonal e-commerce web server, event based alert systems etc
11
CHAPTER-2
SKY COMPUTING ARCHITECTURE
It is to create a turn-around model to enable intensive computing in Cloud networks. This
is hoped to be achieved by enlarging the set of available resources in a way they overcome the
problems referred before, like elevated latency between nodes. Also, it must be cross Cloud
provider in order to combine resources.
Fig 2: Architecture of Sky Computing
To achieve this, there must be a structure capable of receiving instructions, process and return
results from all different underlying cloud systems. The upper layer, Sky Computing, integrates
the last level of Infrastructure as a Service and the next layer of Software as a Service. It allows
scheduling and distributing resources to inputted tasks/requests made. This is a critical layer, as it
must be as comprehensive as possible in features and capabilities.
12
CREATING A SKY COMPUTING DOMAIN
Several building blocks underpin the creation of a sky environment. While leveraging
cloud computing, we can in principle trust the configuration of remote resources, which will
typically be connected via entrusted WANs. Furthermore, they won’t be configured to recognize
trust each other. So, we need to connect them to a trusted networking domain and configure
explicit trust and configuration relationships between them. In short, we must provide an end-
user environment that represents a uniform abstraction — such as a virtual cluster or a virtual
site
— independent of any particular cloud provider and that can be instantiated dynamically. We
next examine the mechanisms
CREATING A TRUSTED NETWORK ENVIRONMENT
Network connectivity is particularly challenging for both users and providers. It’s
difficult to offer APIs that reconfigure the network infrastructure to adjust to users’ needs without
giving them privileged access to core network equipment — something providers wouldn’t do
owing to obvious security risks. Without network APIs, establishing communication among
resources in distinct providers is difficult for users. Deploying a “virtual cluster” spanning
resources in different providers faces challenges in terms of network connectivity, performance,
and management:
• Connectivity: Resources in independently administered clouds are subject to different
connectivity constraints due to packet filtering and network address translations; techniques to
overcome such limitations are necessary. Due to sky computing dynamic, distributed nature,
reconfiguring core network equipment isn’t practical because it requires human intervention in
each provider. Network researchers have developed many overlay networks to address the
connectivity problem involving resources in multiple sites, including NAT-aware network
libraries and APIs, virtual networks (VNs), and peer-to-peer (P2P) systems.
• Performance: Overlay network processing negatively affects performance. To minimize
performance degradation, compute resources should avoid overlay network processing when it’s
not necessary.
13
For example, requiring overlay network processing in every node (as with P2P systems)
slows down communication among nodes on the same LAN segment. In addition, overlay
network processing is CPU intensive and can take valuable compute cycles from applications. A
detailed study of overlay network processing performance is available elsewhere. Service levels.
Sky computing requires on-demand creation of mutually isolated networks over heterogeneous
resources (compute nodes and network equipment) distributed across distant geographical
locations and under different administrative domains. In terms of SLAs, this has security as well
as performance implications. Core network routers and other devices are designed for a single
administrative domain, and management coordination is very difficult in multisite scenarios.
Overlay networks must be easily deployable and agnostic with respect to network equipment
vendors.
To address these issues and provide connectivity across different providers at low
performance cost, we developed the Virtual Networks (ViNe) networking overlay.6 ViNe offers
end-to-end connectivity among nodes on the overlay, even if they’re in private networks or
guarded by firewalls. We architected ViNe to support multiple, mutually isolated VNs, which
providers can dynamically configure and manage, thus offering users a well-defined security
level. In performance terms, ViNe can offer throughputs greater than 800 Mbps with sub
millisecond latency, and can handle most traffic crossing LAN boundaries as well as Gigabit
Ethernet traffic with low overhead.
ViNe is a user-level network routing software, which creates overlay networks using the
Internet infrastructure. A machine running ViNe software becomes a ViNe router (VR), working
as a gateway to overlay networks for machines connected to the same LAN segment. We
recommend delegating overlay network processing to a specific machine when deploying ViNe
so that the additional network processing doesn’t steal compute cycles from compute nodes, a
scenario that can occur if all nodes become VRs. ViNe offers "exibility in deployment as
exemplified in the following scenarios.
14
ViNe-enabled providers: Providers deploy a VR in each LAN segment. The ability to
dynamically and programmatically configure ViNe overlays lets providers offer APIs for virtual
networking without compromising the physical network infrastructure configuration. The cost
for a provider is one dedicated machine (which could be a VM) per LAN segment and can be a
small fraction of the network cost charged to users. IaaS providers offer VN services in this case.
Fig 3: ViNe Routing
End-user clusters: In the absence of ViNe services from providers, users can enable
ViNe as an additional VM that they start and configure to connect different cloud providers. This
user deployed VR would handle the traffic crossing the cluster nodes’ LAN boundaries. ViNe’s
cost in this case is an additional VM per user.
Isolated VMs: A VR can’t be used as a gateway by machines that don’t belong to the
same LAN segment. In this case, every isolated VM (or a physical machine, such as the user’s
client machine) must become a VR. ViNe’s cost is the additional network processing that
compute nodes perform, which can take compute cycles from applications.
15
DYNAMIC CONFIGURATION AND TRUST
When we deploy a single appliance with a specific provider, we rely on basic security
and contextualization measures this provider has implemented to integrate the appliance into a
provider-specific networking and security context (for example, to let the appliance owner log
in). However, when we deal with a group of appliances, potentially deployed across different
providers, configuration and security relationships are more complex and require provider-
independent methods to establish a security and configuration context. In earlier work, we
describe a context broker service that dynamically establishes a security and configuration
context exchange between several distributed appliances. Orchestrating this exchange relies on
the collaboration of three parties:
• IaaS providers, who provide generic contextualization methods that securely deliver to
deployed appliances the means of contacting a context broker and authenticating themselves to it
as members of a specific context
University of University of Florida Purdue University

Chicago (UC) (UF) (PU)
Xen version 3.1.0 3.1.0 3.0.3
Guest kernel 2.6.18-x86_64 2.6.18-i686 2.6.16-i686
Nimbus version 2.2 2.1 2.1
CPU architecture AMD Opteron 248 Intel Xeon Prestonia Intel Xeon Irwindale
CPU clock 2.2 GHz 2.4GHz 2.8GHz
CPU cache 1Mbyte 512Kbytes 2Mbytes
Virtual CPUs per 2 2 2
node
Memory 3.5Gbytes 3.5Gbytes 1.5Gbytes
Networking Public Private Public
Table 1: Service-level agreement and instances at each cloud provider.
End users provide context information via a simple generic schema and method that’s
the same for every appliance used with this provider. Adopting this simple schema lets every
provider deliver basic context information to every appliance.
•Appliance providers, who provide methods that let appliance supply information to and
receive it from a context broker and integrate information conveyed by templates describing
16
application-specific roles. Appliances can integrate the information using any configuration
method from any appliance provider. This information in the templates is application-specific and
potentially different from appliance to appliance, but the templates themselves are uniform, and
any context broker can process them.
Deployment orchestrators (context brokers), who provide generic methods of security context
establishment and information exchange based on information the appliance templates provide.
A typical contextualization process works as follows. Before a user deploys appliances,
he or she registers a context object with a context broker. This object is identified by an identifier
and a secret. The IaaS provider securely conveys the identifier and secret (along with ways to
contact the context broker) on deployment. This gives the appliance a way to authenticate itself
to the context broker, which can then orchestrate security context establishment as well as
information exchange between all appliances in the context (external sources can provide
additional security and configuration information to the security broker).
Defining this exchange in terms of such roles lets any appliance contextualize with any
provider (or across providers). For example, using the Nimbus toolkit implementation of a
context broker, we could dynamically deploy clusters of appliances on Nimbus’s Science Clouds
(including multiple Science Cloud providers) as well as Amazon EC2.7
Network connectivity is particularly challenging for both users and providers. It’s
difficult to offer APIs that reconfigure the network infrastructure to adjust to users’ needs without
giving them privileged access to core network equipment — something providers wouldn’t do
owing to obvious security risks. Without network APIs, establishing communication among
resources in distinct providers is difficult for users
Compute resource consumers can eliminate the expense inherent in acquiring, managing,
and operating IT infrastructure and instead lease resources on a pay-as-you-go basis.
IT infrastructure providers can exploit economies of scale to mitigate the cost of buying
and operating resources and avoid the complexity required to manage multiple customer-specific
environments and applications.
17
BUILDING METACLOUDS
Next, let’s look at how we can exploit resource availability across different Science Clouds
offering different SLAs, to construct a sky environment: a virtual cluster large enough to support
an application execution. Rather than simply selecting the provider with the largest available
resources, we select IaaS allocations from a few different providers and build a sky environment on
top of those allocations using the ViNe network overlay and the Nimbus context exchange tools.
A virtual cluster
Fig 4:A virtual cluster
interconnected with ViNe.

The Science Clouds test bed comprises multiple Ia request. Apart from providing a
platform on which scientific applications explore cloud computing, aS providers configured in
the academic space and providing different SLAs to users; Science Cloud providers grant access
to resources to scientific projects, free of charge and upon request. Apart from providing a
platform on which scientific applications explore cloud computing, the Science Clouds testbed
creates a laboratory in which different IaaS providers use compatible technologies to provide
offerings, letting us experiment with sky computing.
Our sky computing study uses resources on three sites: University of Chicago (UC),
University of Florida (UF), and Purdue University (PU). All sites use the same virtualization
implementation (Xen), and although the versions and kernels differ slightly, VM images are
portable across sites. All sites use Nimbus so that VM images are contextualization-compliant
across those sites.
Consequently, the sites are also API-compliant, but, as Table 1 shows, they offer
different SLAs. Although all sites offer an “immediate lease,” the provided instances (defined in
18
terms of
19
CPU, memory, and so on) are different. More significantly from a usability viewpoint, the UC
and PU clouds provide public IP leases to the deployed VMs, whereas UF doesn’t. To construct
a sky virtual cluster over the testbed we just described, a user with access to the Science Clouds
testbed takes the following steps:
• Preparation: Obtain a Xen VM image configured to support an environment the
application requires as well as the ViNe VM image (the ViNe image is available from the
Science Clouds Marketplace). Make sure both images are contextualized (that is, capable of
providing and integrating context information). The user must upload both images to each
provider site.
• Deployment: Start a ViNe VM in each site (the ViNe VMs provide virtual routers for
the network overlay). In addition, start the desired number of compute VMs at each provider site.
The contextualized images are configured to automatically (securely) con Deployment. Start a
ViNe VM in each site (the ViNe VMs provide virtual routers for the network overlay). In
addition, start the desired number of compute VMs at each provider site. The contextualized
images are configured to automatically (securely) contact the context broker to provide
appropriate networking and security information and adjust network routes to use VRs to reach
nodes crossing site boundaries. The configuration exchange includes VMs on different provider
sites so that all VMs can behave as a single virtual cluster.
• Usage: Upload inputs and start the desired application (typically, by simply logging into
the virtual cluster and using a command line interfaces).
To experiment with the scalability of virtual clusters deployed in different settings, we
configured two clusters: a Hadoop cluster, using the Hadoop Map Reduce framework, version
0.16.2 and a message passing interface (MPI) cluster using MPICH2 version 1.0.7. We used each
virtual cluster to run parallel versions of the Basic Local Alignment Search Tool (Blast), a
popular bioinformatics application that searches for, aligns, and ranks nucleotide or protein
sequences that are similar to those in an existing database of known sequences. We configured
the Hadoop cluster with Blast version 2.2.18 and the MPI cluster with the publicly available
mpiBlast version 1.5.0beta1
Both versions have master-slave structures with low communication-to-computation
ratios. The master coordinates sequence distribution among workers, monitoring their health and
combining the output. The runs used in the evaluation consisted of executing blast of 960
20
sequences averaging 1,116.82 nucleotides per sequence against a 2007 non redundant (NR)
protein sequence database from the US National Center for Biotechnology Information (NCBI)
in 1 fragment (3.5 Gbytes of total data).
University of University of Florida Purdue University

Chicago
Sequential Execution 36 hours and 20 minutes 43 hours and 6 34 hours and 49
Time minutes minutes
Normalization Factor 1.184 1 1.24
Table-2: Normalized single processor performance at each site
We deployed the two virtual clusters in two settings: on the UF cloud only (one-site experiment)
and on all three sites using the same number of processors (three-site experiment). For three-site
experiments, we balanced the number of hosts in each site executing Blast — that is, one host in
each site, two hosts in each site, and so on, up to five hosts in each site. (Choosing random
numbers of nodes from different sites would, in effect, weigh the three-site experiment’s
performance toward comparing the UF site and the site with the most processors). The SLAs
expressed as instances from each metacloud provider are different (PU instances outperform UC
instances which outperform UF instances), which makes it difficult to compare providers. To
establish a comparison base between the SLAs each provider offers, we used the performance of
the sequential execution on a UF processor of the Blast job described earlier to define a
normalized performance benchmark 1 UC processor is equivalent to 1.184 UF processors,
whereas 1 PU processor is equivalent to 1.24 UF processors. For example, an experiment with 10
UF processors, 10 UC processors, and 10 PU processors should provide the performance of a
cluster with 34.24 UF processors. We used these factors to normalize the number of processors
Figure 4 shows the speedup Blast execution on various numbers of testbed processors in different
deployment settings versus the execution on one processor at UF. A sequential execution on one
UF processor resource that took 43 hours and 6 minutes was reduced to 1 hour and 42 minutes
using Hadoop on 15 instances (30 processors) of the UF cloud, a 25.4-fold
speedup.
21
It was reduced to 1 hour and 29 minutes using Hadoop on five instances in each of the three sites
(30 processors), a 29-fold speedup. Overall, the performance difference between a virtual cluster
deployed in a single cloud provider and a virtual cluster deployed in three distinct cloud providers
interconnected across a WAN through a VN is minimal for Blast executed with either Hadoop or
MPI. Also, comparison with “ideal” performance (assuming perfect parallelization — that is,
where N CPU clusters would provide N-fold speedup relative to sequential execution) shows that
the application parallelizes well.
In the data presented, we refer only to the VMs used to create the application platform
and not to those additional ones used to run VRs. Running those routers (one per site) constitutes
an additional cost in resource usage. This cost is relatively small and depends on network traffic,
as detailed elsewhere. We can further amortize this cost by sharing the router with other cloud
users (the provider could offer it as another service) or running it in one of the compute nodes.
Our experiments aimed to study the feasibility of executing a parallel application across
multiple cloud providers. In this context, our two main objectives were to demonstrate that end
users can deploy a sky computing environment with full control, and that the environment
performs well enough to execute a real-world application. We’ve successfully combined open
source and readily available cloud (Nimbus toolkit) and VN (ViNe) technologies to let users
launch virtual clusters with nodes that are automatically configured and connected through
overlays. The observed impact of network virtualization overheads was low, and we could
sustain the performance of a single-site cluster using a cluster across three sites. This illustrates
sky computing potential in that even when the necessary resources are unavailable in a single
cloud, we can use multiple clouds to get the required computation power.
22
CHAPTER-3
GRID AND CLOUD COMPUTING
Evolution of Distributed computing: Scalable computing over the Internet – Technologies for
network based systems – clusters of cooperative computers - Grid computing Infrastructures –
cloud computing - service oriented architecture – Introduction to Grid Architecture and standards
– Elements of Grid – Overview of Grid Architecture.
DISTRIBUTED COMPUTING
“A distributed computing consists of multiple autonomous computers that communicate through
a computer network. “Distributed computing utilizes a network of many computers, each
accomplishing a portion of an overall task, to achieve a computational result much more quickly
than with a single computer.” “Distributed computing is any computing that involves multiple
computers remote from each other that each has a role in a computation problem or information
processing
Agent Agent
Agent Cooperation
Cooperation
Cooperation
Distribution Distribution
Distribution
Internet Agent
Distribution
Subscription
Job Request
Resource Large-scale
Management
Application
Fig 5: Distribution of Computing
A distributed system is one in which hardware or software components located at networked

computers communicate and coordinate their actions only by message passing.In the term
23
distributed computing, the word distributed means spread out across space. Thus, distributed
computing is an activity performed on a spatially distributed system. These
24
networked computers may be in the same room, same campus, same country, or in different
continents
Characteristics:
● Resource Sharing
● Openness
● Concurrency
● Scalability
● Fault Tolerance
● Transparency
Architecture:
● Client-server
● 3-tier architecture
● N-tier architecture
● loose coupling, or tight coupling
● Peer-to-peer
● Space based
APPLICATIONS OF DISTRIBUTED COMPUTING
 Database Management System

 Distributed computing using mobile agents
 Local intranet
 Internet (World Wide Web)
 JAVA Remote Method Invocation (RMI)
25
Distributed computing using mobile agents:
Fig 6: Distributed Computing Using Mobile Programs
Mobile agents can be wandering around in a network using free resources for their own computation
Local intranet
A portion of Internet that is separately administered & supports internal sharing of
resources (file/storage systems and printers) is called local intranet
Fig 7: Local Intrane

26
Internet (World Wide Web)
The Internet is a global system of interconnected computer networks that use the
standardized Internet Protocol Suite
Fig 8: Messages over the Internet
JAVA Remote Method Invocation (RMI)
Embedded in language Java:-

 Object variant of remote procedure call
 Adds naming compared with RPC (Remote Procedure Call)
 Restricted to Java environments
27
Fig 9: Java RM
28
GRID COMPUTING
Grid computing is a form of distributed computing whereby a "super and virtual computer" is
composed of a cluster of networked, loosely coupled computers, acting in concert to perform very
large tasks. Grid computing (Foster and Kesselman, 1999) is a growing technology that facilitates
the executions of large-scale resource intensive applications on geographically distributed
computing resources. Facilitates flexible, secure, coordinated large scale resource sharing among
dynamic collections of individuals, institutions, and resource Enable communities (“virtual
organizations”) to share geographically distributed resources as they pursue common goals
Criteria for a grid
 Coordinates resources that are not subject to centralized control.

 Uses standard, open, general-purpose protocols and interfaces
 Delivers nontrivial qualities of service.
Benefits
 Exploit Underutilized resources

 Resource load Balancing
 Virtualizes resources across an enterprise
 Data Grids, Compute Grids
 Enable collaboration for virtual organizations
Grid Applications
Data and computationally intensive applications
This technology has been applied to computationally-intensive scientific, mathematical, and

academic problems like drug discovery, economic forecasting, seismic analysis back office data
processing in support of e-commerce
A chemist may utilize hundreds of processors to screen thousands of - compounds per hour.
Teams of engineers worldwide pool resources to analyze terabytes of structural data.
Meteorologists seek to visualize and analyze petabytes of climate data with enormous
computational demands.
29
Resource sharing
Computers, storage, sensors, networks,
Sharing always conditional: issues of trust, policy, negotiation, payment,
Coordinated problem solving
distributed data analysis, computation, collaboration

Grid Topologies
Intragrid
Local grid within an organization
Trust based on personal contracts
Extragrid
Resources of a consortium of organizations
connected through a (Virtual) Private Network
Trust based on Business to Business contracts
Intergrid
Global sharing of resources through the internet
Trust based on certification
COMPUTATIONAL GRID
“A computational grid is a hardware and software infrastructure that provides dependable,

consistent, pervasive, and inexpensive access to high-end computational capabilities.”
”The Grid: Blueprint for a New Computing Infrastructure”, Kesselman & Foster
Example: Science Grid (US Department of Energy).
30
DATA GRID
 A data grid is a grid computing system that deals with data — the controlled sharing and
management of large amounts of distributed data.
 Data Grid is the storage component of a grid environment. Scientific and engineering
applications require access to large amounts of data, and often this data is widely
distributed. A data grid provides seamless access to the local or remote data required to
complete compute intensive calculations.
Example:
Biomedical informatics Research Network (BIRN),
The Southern California earthquake Center (SCEC).
METHODS OF GRID COMPUTING
 Distributed Supercomputing
 High-Throughput Computing
 On-Demand Computing
 Data-Intensive Computing
 Collaborative Computing
 Logistical Networking
Distributed Supercomputing
 Combining multiple high-capacity resources on a computational grid into a single, virtual

distributed supercomputer.
 Tackle problems that cannot be solved on a single system.
High-Throughput Computing
Uses the grid to schedule large numbers of loosely coupled or independent tasks, with the goal of
putting unused processor cycles to work.
31
On-Demand Computing
 Uses grid capabilities to meet short-term requirements for resources that are not locally
accessible.
 Models real-time computing demands.
Collaborative Computing
● Concerned primarily with enabling and enhancing human-to-human interactions.
● Applications are often structured in terms of a virtual shared space.
Data-Intensive Computing
 The focus is on synthesizing new information from data that is maintained in

geographically distributed repositories, digital libraries, and databases.
 Particularly useful for distributed data mining.
Logistical Networking
● Logistical networks focus on exposing storage resources inside networks by optimizing

the global scheduling of data transport, and data storage.
● Contrasts with traditional networking, which does not explicitly model storage resources
in the network.
GRID ARCHITECTURE
The Hourglass Model
Focus on architecture issues
 Propose set of core services as basic infrastructure

 Used to construct high-level, domain-specific solutions (diverse)
32
Design principles
● Keep participation cost low
● Enable local control
● Support for adaptation
● “IP hourglass” model

Grid is the storage component of a grid environment. Scientific and engineering applications
require access to large amounts of data, and often this data is widely distributed. A grid provides
seamless access to the local or remote data required to complete compute intensive calculations.
This technology has been applied to computationally-intensive scientific, mathematical, and
academic problems like drug discovery, economic forecasting, and seismic analysis back office
data processing in support of e-commerce.Grid is a growing technology that facilitates the
executions of large-scale resource intensive applications on geographically distributed computing
resources. Facilitates flexible, secure, coordinated large scale resource sharing among dynamic
collections of individuals, institutions, and resource Enable communities (“virtual organizations”)
to share geographically distributed resources as they pursue common goals
LAYERED GRID ARCHITECTURE
“Coordinating multiple resources”: ubiquitous infrastructure services, app- specific distributed

services
Fig 10: Internet Protocol Architecture

“Sharing single resources”: negotiating access, controlling use “Talking to
things”: communication (Internet protocols) & security “Controlling things
locally” Access to control of resources.

33
DATA GRID ARCHITECTURE
App: Discipline-Specific Data Grid Application
Collective (generic): Replica catalog, replica management, co-allocation, certificate authorities,

metadata catalogs,
Resource: Access to data, access to computers, access to network performance data
Connect: Communication, service discovery, authentication, authorization, delegation
Fabric: Storage systems, clusters, networks, network caches
SIMULATION TOOLS
● GridSim – job scheduling
● SimGrid – single client multiserver scheduling
● Bricks – scheduling
● GangSim- Ganglia VO
● OptoSim – Data Grid Simulations
 G3S – Grid Security services Simulator – security services
GridSim is a Java-based toolkit for modeling, and simulation of distributed resource management
and scheduling for conventional Grid environment.
GridSim is based on SimJava, a general purpose discrete-event simulation package implemented

in Java.
All components in GridSim communicate with each other through message passing operations
defined by SimJava
34
.
Salient Features of GridSim
● It allows modeling of heterogeneous types of resources.
● Resources can be modeled operating under space- or time-shared mode.
● Resource capability can be defined (in the form of MIPS (Million Instructions per
Second) benchmark.
● Resources can be located in any time zone.
● Weekends and holidays can be mapped depending on resource’s local time to model non-
Grid (local) workload.
● Resources can be booked for advance reservation.
● Applications with different parallel application models can be simulated.
● Application tasks can be heterogeneous and they can be CPU or I/O intensive.
● There is no limit on the number of application jobs that can be submitted to a resource.
● Multiple user entities can submit tasks for execution simultaneously in the same resource,
which may be time-shared or space-shared. This feature helps in building schedulers that
can use different market-driven economic models for selecting services competitively.
● Network speed between resources can be specified.
35
CHAPTER-4
SERVICE ORIENTED ARCHITECTURE
A method of design, deployment, and management of both applications and the software infrastructure
where:
All software is organized into business services that are network accessible and executable.
Service interfaces are based on public standards for interoperability.
CHARACTERISTICS OF SOA
● Quality of service, security and performance are specified.
● Software infrastructure is responsible for managing.
● Services are cataloged and discoverable.
● Data are cataloged and discoverable.
● Protocols use only industry standards.
WHAT IS SERVICE
● A Service is a reusable component.
● A Service changes business data from one state to another.
● A Service is the only way how data is accessed.
● If you can describe a component in WSDL, it is a Service.
36
WHY GETTING SOA WILL BE DIFFICULT
Managing for Projects
● Software: 1 - 4 years
● Hardware: 3 - 5 years;
● Communications: 1 - 3 years;
● Project Managers: 2 - 4 years;
● Reliable funding: 1 - 4 years;
● User turnover: 30%/year;
● Security risks: 1 minute or less.
Managing for SOA
● Data: forever.
● Infrastructure: 10+ years.
WHY MANAGING BUSINESS SYSTEMS IS DIFFICULT
● 40 Million lines of code in Windows XP is unknowable.
● Testing application (3 Million lines) requires >1015 tests.
● Probability correct data entry for a supply item is <65%.
● There are >100 formats that identify a person in DoD.
● Output / Office Worker: >30 e-messages /day.
37
HOW TO VIEW ORGANISING FOR SOA
Fig 11: Organizing for SOA
SOA MUST REFLECT TIMING
Fig 12: Timing of SOA
38
SOA Must Reflect Conflicting Interests
Fig 13: Conflicting Interests of SOA
ORGANIZATION OF SERVICES
1) Infrastructure Services
2) Data Services
3) Security Services
4) Computing Services
5) Communication Services
6) Application Services
39
ORGANIZATION OF INFRASTRUCTURE SERVICES
Fig 14: Infrastructure Services
ORGANIZATION OF DATA SERVICES
Fig 15: Data Services
40
Data Interoperability Policies
● Data are an enterprise resource.
● Single-point entry of unique data.
● Enterprise certification of all data definitions.
● Data stewardship defines data custodians.
● Zero defects at point of entry.
● Deconflict data at source, not at higher levels.
● Data aggregations from sources data, not from reports.

Data Concepts
Data Element Definition
Text associated with a unique data element within a data dictionary that describes
the data element, give it a specific meaning and differentiates it from other data
elements. Definition is precise, concise, non-circular, and unambiguous.
Data Element Registry
A label kept by a registration authority that describes a unique meaning and

representation of data elements, including registration identifiers, definitions,
names, value domains, syntax, ontology and metadata attributes.
Data and Services Deployment Principles
● Data, services and applications belong to the Enterprise.
● Information is a strategic asset.
● Data and applications cannot be coupled to each other.
● Interfaces must be independent of implementation.
● Data must be visible outside of the applications.
● Semantics and syntax is defined by a community of interest.
41
● Data must be understandable and trusted.
42
ORGANIZATION OF SECURITY SERVICES
Fig 16: Security Services
Security Services
Conduct Attack/Event Response
● Ensure timely detection and appropriate response to attacks.
● Manage measures required to minimize the network’s vulnerability.

Secure Information Exchanges
● Secure information exchanges that occur on the network with a level of

protection that is matched to the risk of compromise.
Provide Authorization and Non-Repudiation Services
● Identify and confirm a user's authorization to access the network.
43
ORGANIZATION OF COMPUTING SERVICES
Fig 17: Computing Services
Computing Services
Provide Adaptable Hosting Environments
● Global facilities for hosting to the “edge”.
● Virtual environments for data centers.
Distributed Computing Infrastructure
● Data storage and shared spaces for information sharing.
Shared Computing Infrastructure Resources
● Access shared resources regardless of access device.
44
ORGANIZATION OF COMMUNICATION SERVICES
Fig 18: Communication Services
Communication Services
Provide Information Transport
● Transport information, data and services anywhere.
● Ensures transport between end-user devices and servers.
● Expand the infrastructure for on-demand capacity.
ORGANIZATION OF APPLICATION SERVICES
Fig 19: Application Services

45
Application Services and Tools
Provide Common End User Interface Tools
Application generators, test suites, error identification, application components and standard
utilities.
Common end-user Interface Tools
E-mail, collaboration tools, information dashboards, Intranet portals, etc.
SOA PROTOCOLS
● Universal Description, Discovery, and Integration, UDDI. Defines the publication

and discovery of web service implementations.
● The Web Services Description Language, WSDL, is an XML-based language that

defines Web Services.
● SOAP is the Service Oriented Architecture Protocol. It is a key SOA in which a

network node (the client) sends a request to another node (the server).
● The Lightweight Directory Access Protocol, or LDAP is protocol for querying and
modifying directory services.
● Extract, Transform, and Load, ETL, is a process of moving data from a legacy system
and loading it into a SOA application.
46
CONCLUSION
Clouds provide the components for novel types of IT systems or novel implementations
of familiar IT system architectures. Sky-computing refers to such systems and their use In
particular, combined clouds capable of providing environments, workflows, enterprise IT, etc as
a service
Design and management of combined clouds face challenges and need fundamental and
system oriented advances. A new area for IT research. Essential for standards and next
generation of IT business
Sky computing to create large scale distributed infrastructures. Our approach relies on
Nimbus for resource management, contextualization and fast cluster instantiation, ViNe for all-
to-all connectivity, Hadoop for dynamic cluster extension. It Provides both infrastructure and
application elasticity
Through the communication platform, the students can communicate with their teacher at
any convenient time, and vice versa at the most reduced cost. This helps teachers know the
situation of teaching and student's knowledge level of the course. The teacher also can answer
questions or send messages to students through this communication platform freely. In practice,
through these technical means it narrows the gap between students and teachers and produces
satisfactory results.
Sky computing is an emerging computing model where resources from multiple cloud
providers are leveraged to create large scale distributed infrastructures. These infrastructures
provide resources to execute computations requiring large computational power, such as
scientific software. Establishing a sky computing system is challenging due to differences among
providers in terms of hardware, resource management, and connectivity. Furthermore,
scalability, balanced distribution of computation and measures to recover from faults are
essential for applications to achieve good performance. This work shows how resources across
two experimental projects: the Future Grid experimental testbed in the United States and
Grid'5000, n infrastructure for large scale parallel and distributed computing research composed
of 9 sites in France, can be combined and used to support large scale, distributed experiments
Several open source technologies are integrated to address these challenges.
47
REFERENCES
[1] Jose Fortes, Advanced Computing and Information Systems Laband NSF Center for
Autonomic Computing
[2] KatarzynaKeahey, MauricioTsugawa, Andrea Matsunaga, and Jose A.B. Fortes – paper of
nimbus 2009.P. Singhala, D. N. Shah, B.Patel ,Temperature Control using Fuzzy Logic, January
2014
[3] HarmeetKaur, Kamal Gupta 2013, International Journal of Scientific Research Engineering
& Technology (IJSRET)
[4] NehaMishra, RituYadavand, SaurabhMaheshwari 2014, International Journal on
Computational Sciences & Applications (IJCSA) Vol.4
[5] Sky computing-Exploring the aggregated Cloud resources – Part I, by Andre Monteiro,
Joaquim Sousa Pinto, Claudio Teixeira, Tiago Batista
[6] Sky Computing: When Multiple Clouds Become One, Jose Fortes, Advanced Computing and
Information Systems Lab and NSF Center for Autonomic Computing
[7] Architecturing a sky computing platform, Dana Petcu, Ciprian Craciun, Marian Neagul,
Silviu Panica Abha Tewari et al, / (IJCSIT) International Journal of Computer Science and
Information Technologies, Vol. 6 (4), 2015, 3861-3864
48
49

Final Documentation Sky

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Documentation Sky

Uploaded by

Copyright:

Available Formats

Sky Computing

A Technical Seminar Report Submitted to

JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY, HYDERABAD

In Partial Fulfillment of the requirement For the Award of the Degree of

COMPUTER SCIENCE AND ENGINEERING

Submitted by V.Sindhu(H.T.N0: 20N05A0512)

Under the Supervision of

Department of Computer Science and Engineering

SREE CHAITANYA COLLEGE OF ENGINEERING

THIMMAPUR, KARIMNAGAR, TELANGANA-505527

Guide Head of the Department

MS. L.PRIYANKA Mr. KHAJA ZIAUDDIN

Department of CSE Department of CSE

I, V.SINDHU, is student of Bachelor of Technology in Computer Science and

It contains no material previously published or written by another person nor material

Infrastructure-as-a-service (IaaS) cloud computing is revolutionizing how we approach computing.

1. a service oriented architecture (soa) based training system

Fig1: Sky Computing

• Federation of multiple clouds

• Creates large scale infrastructures

• Allows to run software requiring large computational power

• Sky providers are consumers of cloud providers

• “Virtual” datacenter-less dynamic clouds

Fig 2: Architecture of Sky Computing

Fig 3: ViNe Routing

University of University of Florida Purdue University

Table 1: Service-level agreement and instances at each cloud provider.

Fig 4:A virtual cluster

interconnected with ViNe.

University of University of Florida Purdue University

Table-2: Normalized single processor performance at each site

Fig 5: Distribution of Computing

A distributed system is one in which hardware or software components located at networked

● loose coupling, or tight coupling

APPLICATIONS OF DISTRIBUTED COMPUTING

 Database Management System

Fig 6: Distributed Computing Using Mobile Programs

Fig 7: Local Intrane

Fig 8: Messages over the Internet

JAVA Remote Method Invocation (RMI)

Embedded in language Java:-

Criteria for a grid

 Coordinates resources that are not subject to centralized control.

 Exploit Underutilized resources

Data and computationally intensive applications

This technology has been applied to computationally-intensive scientific, mathematical, and

Computers, storage, sensors, networks,

Sharing always conditional: issues of trust, policy, negotiation, payment,

Coordinated problem solving

distributed data analysis, computation, collaboration

Local grid within an organization

Trust based on personal contracts

Resources of a consortium of organizations

connected through a (Virtual) Private Network

Trust based on Business to Business contracts

Global sharing of resources through the internet

Trust based on certification

“A computational grid is a hardware and software infrastructure that provides dependable,

Example: Science Grid (US Department of Energy).

Biomedical informatics Research Network (BIRN),

The Southern California earthquake Center (SCEC).

METHODS OF GRID COMPUTING

 Combining multiple high-capacity resources on a computational grid into a single, virtual