Professional Documents
Culture Documents
Computing
Dr. Shrirang Ambaji Kulkarni
CSE Department
MODULE 1 – Cloud Computing
Defining Cloud Computing
Cloud computing refers to applications and services that run on a distributed network
using virtualized resources and accessed by common Internet protocols and networking
standards.
It is distinguished by the notion that resources are virtual and limitless and that details of
the physical systems on which software runs are abstracted from the user.
Common SaaS services include Dropbox, Salesforce, Cisco WebEx. PaaS services include
Google App Engine, Apache Stratos and OpenShift.
Some well-known IaaS services are Amazon Web Services (AWS), Cisco Metapod and
Microsoft Azure.
MODULE 1 – Cloud Computing
Distributed networking is a distributed computing network system where components of
the program and data depend on multiple sources.
Distributed networking, used in distributed computing, is the network system over which
computer programming, software, and its data are spread out across more than one
computer, but communicate complex messages through their nodes (computers), and are
dependent upon each other.
The goal of a distributed network is to share resources, typically to accomplish a single or
similar goal.
Client/server computing is a type of distributed computing where one computer, a client,
requests data from the server, a primary computing center, which responds to the client
directly with the requested data, sometimes through an agent. Client/server distributed
networking is also popular in web-based computing
MODULE 1 – Cloud Computing
Cloud computing
Enterprises with rapid growth and scaling needs may find it challenging to maintain their
own distributed network under the traditional client/server computing model.
Cloud Computing is the utility of distributed computing over Internet-based applications,
storage, and computing services.
A cloud is a cluster of computers or servers that are closely connected to provide scalable,
high-capacity computing or related tasks.
A computer cluster is a set of computers that work together so that they can be viewed
as a single system.
MODULE 1 – Cloud Computing
In a Beowulf cluster, the application programs never see the computational nodes (also
called slave computers) but only interact with the "Master" which is a specific computer
handling the scheduling and management of the slaves.
In a typical implementation the Master has two network interfaces, one that
communicates with the private Beowulf network for the slaves, the other for the general
purpose network of the organization.
The slave computers typically have their own version of the same operating system, and
local memory and disk space.
However, the private slave network may also have a large and shared file server that
stores global persistent data, accessed by the slaves as needed
MODULE 1 – Cloud Computing
Virtualization is a process that allows for more efficient utilization of physical computer
hardware and is the foundation of cloud computing.
Virtualization uses software to create an abstraction layer over computer hardware that
allows the hardware elements of a single computer—processors, memory, storage and
more—to be divided into multiple virtual computers, commonly called virtual machines
(VMs).
Each VM runs its own operating system (OS) and behaves like an independent computer,
even though it is running on just a portion of the actual underlying computer hardware.
MODULE 1 – Cloud Computing
A hypervisor is the software layer that coordinates VMs. It serves as an interface between the
VM and the underlying physical hardware, ensuring that each has access to the physical
resources it needs to execute. It also ensures that the VMs don’t interfere with each other by
impinging on each other’s memory space or compute cycles.
MODULE 1 – Cloud Computing
Defining Cloud Computing
Cloud computing takes the technology, services, and applications that are similar to those on
the Internet and turns them into a self-service utility.
The use of the word “cloud” makes reference to the two essential concepts:
Abstraction: Cloud computing abstracts the details of system implementation from users and
developers.
Applications run on physical systems that aren’t specified, data is stored in locations that are
unknown, administration of systems is outsourced to others, and access by users is ubiquitous.
Virtualization: Cloud computing virtualizes systems by pooling and sharing resources.
Systems and storage can be provisioned as needed from a centralized infrastructure, costs are
assessed on a metered basis, multi-tenancy is enabled, and resources are scalable with agility.
MODULE 1 – Cloud Computing
In cloud computing, multitenancy means that multiple customers of a cloud vendor are using
the same computing resources.
Despite the fact that they share resources, cloud customers aren't aware of each other, and
their data is kept totally separate.
To help clarify how cloud computing has changed the nature of commercial system
deployment, consider these three examples:
Google: In the last decade, Google has built a worldwide network of datacenters to service
its search engine.
In doing so Google has captured a substantial portion of the world’s advertising revenue.
That revenue has enabled Google to offer free software to users based on that infrastructure
and has changed the market for user-facing software.
This is the classic Software as a Service
MODULE 1 – Cloud Computing
The first and best-known offerings are an extensive set of very popular applications that
Google offers to the general public.
These applications include Google Docs, Google Health, Picasa, Google Mail, Google Earth, and
many more.
Amazon Web Services: One of the most successful cloud-based businesses is Amazon Web
Services, which is an Infrastructure as a Service offering that lets you rent virtual computers on
Amazon’s own infrastructure.
MODULE 1 – Cloud Computing
Cloud Types
Most people separate cloud computing into two distinct sets of models:
Deployment models: This refers to the location and management of the cloud’s infrastructure.
Service models: This consists of the particular types of services that you can access on a cloud
computing platform.
The NIST model
The U.S. National Institute of Standards and Technology (NIST) has a set of working definitions
(http://csrc.nist.gov/groups/SNS/cloudcomputing/cloud-def-v15.doc) that separate cloud
computing into service models and deployment models.
Those models and their relationship to essential characteristics of cloud computing are shown
in Figure 1.1.
MODULE 1 – Cloud Computing
The NIST model originally did not require a cloud to use virtualization to pool resources, nor
did it absolutely require that a cloud support multi-tenancy in the earliest definitions of cloud
computing.
Multi-tenancy is the sharing of resources among two or more clients.
The latest version of the NIST definition does require that cloud computing networks use
virtualization and support multi-tenancy.
The Cloud Cube Model
The Open Group maintains an association called the Jericho Forum
(https://www.opengroup.org/jericho/index.htm) whose main focus is how to protect cloud
networks.
The group has an interesting model that attempts to categorize a cloud network based on four
dimensional factors.
MODULE 1 – Cloud Computing
The four dimensions of the Cloud Cube Model are shown in Figure 1.2 and listed here:
Physical location of the data: Internal (I) / External (E) determines your organization’s
boundaries.
Ownership: Proprietary (P) / Open (O) is a measure of not only the technology ownership,
but of interoperability, ease of data transfer, and degree of vendor application lock-in.
Security boundary: Perimeterised (Per) / De-perimiterised (D-p) is a measure of whether
the operation is inside or outside the security boundary or network firewall.
Sourcing: Insourced or Outsourced means whether the service is provided by the customer
or the service provider.
MODULE 1 – Cloud Computing
Taken together, the fourth dimension corresponds to two different states in the eight possible
cloud forms: Per (IP, IO, EP, EO) and D-p (IP, IO, EP, EO).
The sourcing dimension addresses the deliverer of the service.
What the Cloud Cube Model is meant to show is that the traditional notion of a network
boundary being the network’s firewall no longer applies in cloud computing.
Deployment models
A deployment model defines the purpose of the cloud and the nature of how the cloud is
located.
The NIST definition for the four deployment models is as follows:
Public cloud: The public cloud infrastructure is available for public use alternatively for a large
industry group and is owned by an organization selling cloud services.
MODULE 1 – Cloud Computing
Private cloud: The private cloud infrastructure is operated for the exclusive use of an
organization. The cloud may be managed by that organization or a third party. Private
clouds may be either on- or off-premises.
Hybrid cloud: A hybrid cloud combines multiple clouds (private, community of public) where
those clouds retain their unique identities, but are bound together as a unit.
A hybrid cloud may offer standardized or proprietary access to data and applications, as well
as application portability.
Community cloud: A community cloud is one where the cloud has been organized to serve a
common function or purpose.
It may be for one organization or for several organizations, but they share common concerns
such as their mission, policies, security, regulatory compliance needs, and so on
Service models
In the deployment model, different cloud types are an expression of the manner in which infrastructure
is deployed.
You can think of the cloud as the boundary between where a client’s network, management, and responsibilities ends
and the cloud service provider’s begins. As cloud computing has developed, different vendors offer clouds that have
different services associated with them.
The portfolio of services offered adds another set of definitions called the service model.
MODULE 1 – Cloud Computing
Three service types have been universally accepted:
Infrastructure as a Service: IaaS provides virtual machines, virtual storage, virtual
infrastructure, and other hardware assets as resources that clients can provision.
The IaaS service provider manages all the infrastructure, while the client is responsible for all
other aspects of the deployment.
This can include the operating system, applications, and user interactions with the system.
Platform as a Service: PaaS provides virtual machines, operating systems, applications,
services, development frameworks, transactions, and control structures.
The client can deploy its applications on the cloud infrastructure or use applications that were
programmed using languages and tools that are supported by the PaaS service provider.
The service provider manages the cloud infrastructure, the operating systems, and the
enabling software.
The client is responsible for installing and managing the application that it is deploying.
MODULE 1 – Cloud Computing
Software as a Service: SaaS is a complete operating environment with applications,
management, and the user interface.
In the SaaS model, the application is provided to the client through a thin client interface (a
browser, usually), and the customer’s responsibility begins and ends with entering and
managing its data and user interaction.
Everything from the application down to the infrastructure is the vendor’s responsibility.
MODULE 1 – Cloud Computing
A native app, or native application, is a software application built in a specific programming
language, for the specific device platform, either iOS or Android.
web app “is an application that is accessed via a web browser over a network such as the Internet.”
A web app provides additional functionality and interactivity. E.g. Facebook
[Web applications are websites with functionality and interactive elements]
API Management
APIs–Application Programming Interfaces–are the building blocks of modern technology and
business infrastructure.
At its most basic level, an API is a piece of software that allows two applications to talk to each
other.
API management is the process of developing, designing, monitoring, testing, securing, and
analyzing APIs for organizations. API management platforms provide a robust set of software and
processes with this functionality, hosted on premises, in the cloud, or a hybrid environment.
MODULE 1 – Cloud Computing
API management can be delivered on-premises, through the cloud, or using a hybrid on-
premises – SaaS (Software as a Service) approach.
The TeraGrid integrated high-performance computers, data resources and tools, and
experimental facilities. Resources included more than a petaflops of computing capability and
more than 30 petabytes of online and archival data storage, with rapid access and retrieval
over high-performance computer network connections. Researchers could also access more
than 100 discipline-specific databases.
MODULE 1 – Cloud Computing
petaflops
a unit of computing speed equal to one thousand million million (1015) floating-point
operations per second.
The new Intel® Core™ i9-7980XE Extreme Edition Processor runs at about 4.3 GHz (faster if
overclocked) and thus should calculate to 1.3 teraflops, 1012
Petabyte
An extremely large unit of digital data, one Petabyte is equal to 1,000 Terabytes.
1 TB equals 1,000 gigabytes
One Gigabyte (GB) is about 1 billion bytes, or (1024 Megabytes MB)
MODULE 1 – Cloud Computing
Peer-to-Peer Network Families
An example of a well-established distributed system is the client-server architecture. In this
scenario, client machines (PCs and workstations) are connected to a central server for
compute, e-mail, file access, and database applications.
The P2P architecture offers a distributed model of networked systems.
First, a P2P network is client-oriented instead of server-oriented.
P2P Systems
In a P2P system, every node acts as both a client and a server, providing part of the
systemresources.
Peer machines are simply client computers connected to the Internet.
All client machines act autonomously to join or leave the system freely.
This implies that no master-slave relationship exists among the peers.
MODULE 1 – Cloud Computing
No central coordination or central database is needed.
In other words, no peer machine has a global view of the entire P2P system.
The system is self-organizing with distributed control.
Figure 1.17 shows the architecture of a P2P network at two abstraction levels. Initially, the
peers are totally unrelated.
Each peer machine joins or leaves the P2P network voluntarily.
Only the participating peers form the physical network at any time.
Unlike the cluster or grid, a P2P network does not use a dedicated interconnection network.
The physical network is simply an ad hoc network formed at various Internet domains
randomly using the TCP/IP and NAI protocols.
Thus, the physical network varies in size and topology dynamically due to the free
membership in the P2P network.
MODULE 1 – Cloud Computing
Overlay Networks
Data items or files are distributed in the participating peers.
Based on communication or file-sharing needs, the peer IDs form an overlay network at the
logical level.
This overlay is a virtual network formed by mapping each physical machine with its ID,
logically, through a virtual mapping as shown in Figure.
When a new peer joins the system, its peer ID is added as a node in the overlay network.
When an existing peer leaves the system, its peer ID is removed from the overlay network
automatically.
Therefore, it is the P2P overlay network that characterizes the logical connectivity among the
peers.
MODULE 1 – Cloud Computing
There are two types of overlay networks: unstructured and structured. An unstructured overlay
network is characterized by a random graph.
There is no fixed route to send messages or files among the nodes.
Often, flooding is applied to send a query to all nodes in an unstructured overlay, thus resulting
in heavy network traffic and nondeterministic search results.
Structured overlay networks follow certain connectivity topology and rules for inserting and
removing nodes (peer IDs) from the overlay graph.
Routing mechanisms are developed to take advantage of the structured overlays.
P2P Application Families
Based on application, P2P networks are classified into four groups, as shown in Table 1.5.
The first family is for distributed file sharing of digital contents (music, videos, etc.) on the P2P
network.
This includes many popular P2P networks such as Gnutella, Napster, and BitTorrent, among
others. Collaboration P2P networks include MSN or Skype chatting, instant messaging, and
collaborative design, among others.
MODULE 1 – Cloud Computing
The third family is for distributed P2P computing in specific applications.
For example, SETI@home provides 25 Tflops of distributed computing power, collectively, over
3 million Internet host machines.
Other P2P platforms, such as JXTA, .NET, and FightingAID@home
Is .NET a P2P platform
The .NET Framework and Visual Studio .NET make it easier for developers to write peer
applications by offering a rich UI client programming model, support for scalable Web services,
and easy-to-use networking classes, all within the same powerful development environment.
MODULE 1 – Cloud Computing
Particularly, as P2P systems require the active collaboration of the participants beyond their
selfish interest, many systems include methods designed to lure the most resourceful users
into broader participation, to provide an overall better service.
The methods devised to attract the contribution of users are unfortunately vulnerable to a
particular class of attacks: collusion.
Collusion is broadly defined as any malicious coordinated behavior of a group of users aimed
at gaining undeserved benefits, or at damaging (some) well behaved users.
Open networking describes a network that uses open standards and commodity hardware.
In other words, open networking is the ability a computer system has to be open in terms of
component compatibility -- in both hardware and software -- expandability and extendibility.
MODULE 1 – Cloud Computing
P2P Computing Challenges
P2P computing faces three types of heterogeneity problems in hardware, software, and
network requirements.
There are too many hardware models and architectures to select from; incompatibility exists
between software and the OS; and different network connections and protocols make it too
complex to apply in real applications.
P2P performance is affected by routing efficiency and self-organization by participating peers.
Fault tolerance, failure management, and load balancing are other important issues in using
overlay networks.
Lack of trust among peers poses another problem. Peers are strangers to one another.
Security, privacy, and copyright violations are major worries by those in the industry in terms
of applying P2P technology in business applications.
By replicating data in multiple peers, one can easily lose data in failed nodes.
MODULE 1 – Cloud Computing
Cloud Computing over the Internet
Gordon Bell, Jim Gray, and Alex Szalay [5] have advocated: “Computational science is changing
to be data-intensive.
In the future, working with large data sets will typically mean sending the computations
(programs) to the data, rather than copying the data to the workstations.
This reflects the trend in IT of moving computing and data from desktops to large data centers,
where there is on-demand provision of software, hardware, and data as a service.
This data explosion has promoted the idea of cloud computing.
IBM, a major player in cloud computing, has defined it as follows: “A cloud is a pool of
virtualized computer resources. A cloud can host a variety of different workloads, including
batch-style backend jobs and interactive and user-facing applications.”
Several modern supercomputers such as the Tianhe-2 use Slurm, which arbitrates contention
for resources across the system.
Slurm is open source, Linux-based, very scalable, and can manage thousands of nodes in a
computer cluster with a sustained throughput of over 100,000 jobs per hour.
MODULE 1 – Cloud Computing
Application scalability This refers to matching problem size scalability with machine size
scalability.
Problem size affects the size of the data set or the workload increase.
Instead of increasing machine size, users can enlarge the problem size to enhance system efficiency or cost-
effectiveness
Technology scalability This refers to a system that can adapt to changes in building technologies,
such as the component and networking technologies discussed in Section 3.1. When
scaling a system design with new technology one must consider three aspects: time, space, and
heterogeneity. (1) Time refers to generation scalability. When changing to new-generation processors,
one must consider the impact to the motherboard, power supply, packaging and cooling,
and so forth. Based on past experience, most systems upgrade their commodity processors every
three to five years. (2) Space is related to packaging and energy concerns. Technology scalability
demands harmony and portability among suppliers. (3) Heterogeneity refers to the use of
hardware components or software packages from different vendors. Heterogeneity may limit the
scalability.
MODULE 1 – Cloud Computing
Scalability versus OS Image Count
In Figure 1.23, scalable performance is estimated against the multiplicity of OS images in
distributed systems deployed up to 2010.
Scalable performance implies that the system can achieve higher speed by adding more
processors or servers, enlarging the physical node’s memory size, extending the disk capacity,
or adding more I/O channels.
The OS image is counted by the number of independent OS images observed in a cluster, grid,
P2P network, or the cloud.
SMP and NUMA are included in the comparison.
An SMP (symmetric multiprocessor) server has a single system image, which could be a single
node in a large cluster.
By 2010 standards, the largest shared-memorySMP node was limited to a few hundred
processors.
The scalability of SMP systems is constrained primarily by packaging and the system
interconnect used.
MODULE 1 – Cloud Computing
NUMA (nonuniform memory access) machines are often made out of SMP nodes with
distributed, shared memory.
A NUMA machine can run with multiple operating systems, and can scale to a few thousand
processors communicating with the MPI library.
For example, a NUMA machine may have 2,048 processors running 32 SMP operating systems,
resulting in 32 OS images in the 2,048-processor NUMA system.
The cluster nodes can be either SMP servers or high-end machines that are loosely coupled
together.
Therefore, clusters have much higher scalability than NUMA machines.
The number of OS images in a cluster is based on the cluster nodes concurrently in use.
The cloud could be a virtualized cluster.
As of 2010, the largest cloud was able to scale up to a few thousand VMs.
MODULE 1 – Cloud Computing
The grid node could be a server cluster, or a mainframe, or a supercomputer, or an MPP.
Therefore, the number of OS images in a large grid structure could be hundreds or thousands
fewer than the total number of processors in the grid.
A P2P network can easily scale to millions of independent peer nodes, essentially desktop
machines
MODULE 1 – Cloud Computing
MODULE 1 – Cloud Computing
At the most basic level, Amdahl's Law is a way of showing that unless a program (or part of a
program) is 100% efficient at using multiple CPU cores, you will receive less and less of a
benefit by adding more cores
MODULE 1 – Cloud Computing
Network Threats and Data Integrity
Clusters, grids, P2P networks, and clouds demand security and copyright protection if they are
to be accepted in today’s digital society.
This section introduces system vulnerability, network threats, defense countermeasures, and
copyright protection in distributed or cloud computing systems.
Threats to Systems and Networks
Network viruses have threatened many users in widespread attacks.
These incidents have created a worm epidemic by pulling down many routers and servers, and
are responsible for the loss of billions of dollars in business, government, and services.
Figure 1.25 summarizes various attack types and their potential damage to users.
As the figure shows, information leaks lead to a loss of confidentiality.
MODULE 1 – Cloud Computing
Loss of data integrity may be caused by user alteration, Trojan horses, and service spoofing
attacks.
A denial of service (DoS) results in a loss of system operation and Internet connections.
Lack of authentication or authorization leads to attackers’ illegitimate use of computing
resources.
Open resources such as data centers, P2P networks, and grid and cloud infrastructures
could become the next targets.
Users need to protect clusters, grids, clouds, and P2P systems.
Otherwise, users should not use or trust them for outsourced work.