You are on page 1of 43

UNIT – 3 SYLLABUS :

Parallel and Distributed Programming Paradigms – Map Reduce, Twister and Iterative Map
Reduce – CGL– Map Reduce – Programming models for Aneka –Hadoop Library from Apache
– Mapping Applications – Programming Support –Google App Engine, Amazon AWS – Cloud
Software Environments –Eucalyptus,Open Nebula, Open Stack, CloudSim – SAP Labs – EMC – Sales
force – VMware.

Parallel Computing:

➢ Parallel computing is also called parallel processing.


➢ There are multiple processors in parallel computing. Each of them performs the computations
assigned to them.
➢ In other words, in parallel computing, multiple calculations are performed simultaneously.
(a),(b) are Distributed computing

(c) is Parallel Computing

The systems that support parallel computing can have a shared memory or distributed memory.
➢ In shared memory systems, all the processors share the memory.
➢ In distributed memory systems, memory is divided among the processors.
➢ There are multiple advantages to parallel computing.
➢ As there are multiple processors working simultaneously, it increases the CPU utilization and
improves the performance.
➢ Moreover, failure in one processor does not affect the functionality of other processors.
➢ Therefore, parallel computing provides reliability.
➢ On the other hand, increasing processors is costly.
➢ Furthermore, if one processor requires instructions of another, the processor might cause
latency.

What is Distributed Computing


➢ Distributed computing divides a single task between multiple computers.
➢ Each computer can communicate with others via the network.
➢ All computers work together to achieve a common goal.
➢ Thus, they all work as a single entity.
➢ A computer in the distributed system is a node while a collection of nodes is a cluster

➢ There are multiple advantages of using distributed computing.

➢ It allows scalability and makes it easier to share resources easily. It also helps to perform
computation tasks efficiently.
➢ On the other hand, it is difficult to develop distributed systems.
➢ Moreover, there can be network issues.
Difference Between Parallel and Distributed Computing

Definition
➢ Parallel computing is a type of computation in which many calculations or execution of
processes are carried out simultaneously.
➢ Whereas, a distributed system is a system whose components are located on different
networked computers which communicate and coordinate their actions by passing messages to
one another.
➢ Thus, this is the fundamental difference between parallel and distributed computing.

Number of computers

➢ The number of computers involved is a difference between parallel and distributed computing.
➢ Parallel computing occurs in a single computer whereas distributed computing involves
multiple computers.

Functionality

➢ In parallel computing, multiple processors execute multiple tasks at the same time.

➢ However, in distributed computing, multiple computers perform tasks at the same time.

➢ Hence, this is another difference between parallel and distributed computing.

Memory
➢ Moreover, memory is a major difference between parallel and distributed computing.
➢ In parallel computing, the computer can have a shared memory or distributed memory.
➢ In distributed computing, each computer has its own memory.
Communication

➢ Also, one other difference between parallel and distributed computing is the method of
communication.
➢ In parallel computing, the processors communicate with each other using a bus.
➢ In distributed computing, computers communicate with each other via the network.

Usage

➢ Parallel computing helps to increase the performance of the system.


➢ In contrast, distributed computing allows scalability, sharing resources and helps to perform
computation tasks efficiently.
➢ So, this is also a difference between parallel and distributed computing.

What is MapReduce?

➢ A MapReduce is a data processing tool which is used to process the data parallelly in a distributed
form.
➢ It was developed in 2004, on the basis of paper titled as "MapReduce: Simplified Data Processing
on Large Clusters," published by Google.
➢ The MapReduce is a paradigm which has two phases, the mapper phase, and the reducer phase.
➢ In the Mapper, the input is given in the form of a key-value pair.
➢ The output of the Mapper is fed to the reducer as input.
➢ The reducer runs only after the Mapper is over. The reducer too takes input in key-value format,
and the output of reducer is the final output.
Steps in Map Reduce

o The map takes data in the form of pairs and returns a list of <key, value> pairs. The keys will not
be unique in this case.
o Using the output of Map, sort and shuffle are applied by the Hadoop architecture. This sort and
shuffle acts on these list of <key, value> pairs and sends out unique keys and a list of values
associated with this unique key <key, list(values)>.
o An output of sort and shuffle sent to the reducer phase. The reducer performs a defined function
on a list of values for unique keys, and Final output <key, value> will be stored/displayed.
Sort and Shuffle

➢ The sort and shuffle occur on the output of Mapper and before the reducer.
➢ When the Mapper task is complete, the results are sorted by key, partitioned if there are multiple
reducers, and then written to disk.
➢ Using the input from each Mapper <k2,v2>, we collect all the values for each unique key k2.
➢ This output from the shuffle phase in the form of <k2, list(v2)> is sent as input to reducer phase.

Usage of MapReduce

o It can be used in various application like document clustering, distributed sorting, and web link-
graph reversal.
o It can be used for distributed pattern-based searching.
o We can also use MapReduce in machine learning.
o It was used by Google to regenerate Google's index of the World Wide Web.
o It can be used in multiple computing environments such as multi-cluster, multi-core, and
mobile environment.
TWISTER ARCHITECTURE

➢ The Twister is designed to effectively support iterative MapReduce function.


➢ To reach this flexibility it reads data from the local disk of the worker nodes and handle the
intermediate data data in the distributed memory of the workers mode.

The messaging infrastructure in twister is called broker network and it is responsible to perform
data transfer using publish/subscribe messaging.

Twister has three main entity:


1. Client Side Driver responsible to drive entire MapReduce computation
2. Twister Daemon running on every working node.
3. The broker Network.

Access Data

1. To access input data for map task it either reads dta from the local disk of the worker nodes.
2. Receive data directly via the broker network.
They keep all data read as file and having data as native file allows Twister to pass data directly
to any executable.
Additionally they allow tool to perform typical file operations like

(i) create directories, (ii) delete directories, (iii) distribute input files across worker nodes, (iv)
copy a set of resources/input files to all worker nodes, (v) collect output files from the worker
nodes to a given location, and (vi) create partition-file for a given set of data that is distributed
across the worker nodes.

Intermediate Data

The intermediate data are stored in the distributed memory of the worker node. Keeping the
map output in distributed memory enhances the speed of the computation by sending the
output of the map from these memory to reduces.

Messaging

The use of publish/subscribe messaging infrastructure improves the efficiency of Twister


runtime. It use scalable NaradaBrokering messaging infrastructure to connect difference Broker
network and reduce load on any one of them.

Fault Tolerance

There are three assumption for for providing fault tolerance for iterative mapreduce:
(i) failure of master node is rare adn no support is provided for that.
(ii) Independent of twister runtime the communication network can be made fault tolerant.
(iii) the data is replicated among the nodes of the computation infrastructure. Based on these
assumptions we try to handle failures of map/reduce tasks, daemons, and worker nodes
failures.

Why Iterative
The MapReduce framework like Hadoop and Dryad has been very successful in fulfilling the
need of the people to analyze huge files and compute data intensive problems.
➢ Although it takes care of many problems but many data analysis techniques require
iterative computations,
➢ Including PageRank , HITS (Hypertext-Induced Topic Search) , recursive relational
queries, clustering, neural-network analysis, social network analysis, and network
traffic analysis.
These techniques have a common trait: data are processed iteratively until the computation
satisfies a convergence or stopping condition.
➢ Most of the iterative algorithm are run once and then output is operated with initial
output to generate the required result.
➢ This type of program terminates only when fixed output is reached i.e the result
does not changes from one iteration to another.
➢ The MapReduce framework does not directly support these iterative data analysis
applications.
➢ Instead, programmers must implement iterative programs by manually issuing
multiple MapReduce jobs and orchestrating their execution using a driver program .
in which the data flow takes the form of a directed acyclic graph of operators. These
platforms lack built-in support for iterative programs.

Aneka in Cloud Computing

o Aneka includes an extensible set of APIs associated with programming models like
MapReduce.
o These APIs support different cloud models like a private, public, hybrid Cloud.
o Microsoft focuses on creating innovative software technologies to simplify the development
and deployment of private or public cloud applications.
o Our product plays the role of an application platform as a service for multiple cloud
computing.
o Multiple Structures:
o Aneka is a software platform for developing cloud computing applications
o In Aneka, cloud applications are executed.
O Aneka is a pure PaaS solution for cloud computing.
O Aneka is a cloud middleware product.
O Manya can be deployed over a network of computers, a multicore server, a data center, a
virtual cloud infrastructure, or a combination there
Multiple containers can be classified into three major categories:

o Textile services o Foundation Services o Application Services

1. Textile Services:

Fabric Services defines the lowest level of the software stack that represents multiple containers.
They provide access to resource-provisioning subsystems and monitoring features implemented in
many.

2. Foundation Services:

Fabric Services are the core services of Manya Cloud and define the infrastructure management
features of the system. Foundation services are concerned with the logical management of a
distributed system built on top of the infrastructure and provide ancillary services for delivering
applications.

3. Application Services:

Application services manage the execution of applications and constitute a layer that varies
according to the specific programming model used to develop distributed applications on top of
Aneka.

There are mainly two major components in multiple technologies:

The SDK (Software Development Kit) includes the Application Programming Interface (API)
and tools needed for the rapid development of applications. The Aneka API supports three popular
cloud programming models: Tasks, Threads and MapReduce;

A runtime engine and platform for managing the deployment and execution of applications on a
private or public cloud.

One of the notable features of Aneka Pass is to support the provision of private cloud resources
from desktop, cluster to a virtual data center using VMware, Citrix Zen Server, and public cloud
resources such as Windows Azure, Amazon EC2, and GoGrid cloud service.

Aneka's potential as a Platform as a Service has been successfully harnessed by its users and
customers in three different areas, including engineering, life sciences, education, and business
intelligence.
Architecture of Aneka

➢ Aneka is a platform and framework for developing distributed applications on the Cloud.
➢ It uses desktop PCs on-demand and CPU cycles in addition to a heterogeneous network of
servers or datacenters.

➢ It can be a public cloud available to anyone via the Internet or a private cloud formed by
nodes with restricted access.
➢ A multiplex-based computing cloud is a collection of physical and virtualized resources
connected via a network, either the Internet or a private intranet. Aneka provides a rich set of
APIs for developers to transparently exploit such resources and express the business logic of
applications using preferred programming abstractions.

➢ System administrators can leverage a collection of tools to monitor and control the deployed
infrastructure

➢ Each resource hosts an instance of multiple containers that represent the runtime environment
where distributed applications are executed.
➢ The container provides the basic management features of a single node and takes advantage
of all the other functions of its hosting services.
➢ Services are divided into clothing, foundation, and execution services.
➢ Foundation services identify the core system of Anka middleware, which provides a set of
infrastructure features to enable Anka containers to perform specific and specific tasks.
➢ Fabric services interact directly with nodes through the Platform Abstraction Layer (PAL)
and perform hardware profiling and dynamic resource provisioning.
➢ Execution services deal directly with scheduling and executing applications in the Cloud.
➢ One of the key features of Aneka is its ability to provide a variety of ways to express
distributed applications by offering different programming models;
➢ Execution services are mostly concerned with providing middleware with the
implementation of these models.
➢ Additional services such as persistence and security are inverse to the whole stack of services
hosted by the container.
➢ At the application level, a set of different components and tools are provided to

o Simplify the development of applications (SDKs),


oPort existing applications to the Cloud, and
oMonitor and manage multiple clouds.
An Aneka-based cloud is formed by interconnected resources that are dynamically modified
according to user needs using resource virtualization or additional CPU cycles for desktop
machines.

A common deployment of Aneka is presented on the side. If the deployment identifies a private
cloud, all resources are in-house, for example, within the enterprise.

This deployment is enhanced by connecting publicly available on-demand resources or by


interacting with several other public clouds that provide computing resources connected over the
Internet.

Hadoop is an Apache open source framework written in java that allows distributed processing of
large datasets across clusters of computers using simple programming models.
The Hadoop framework application works in an environment that
provides distributed storage and computation across clusters of computers.
Hadoop is designed to scale up from single server to thousands of machines, each offering local
computation and storage.

Hadoop Architecture

At its core, Hadoop has two major layers namely −

Processing/Computation layer (MapReduce),


Storage layer (Hadoop Distributed File
System).

MapReduce

MapReduce is a parallel programming model for writing distributed applications devised at Google
for efficient processing of large amounts of data (multi-terabyte data-sets),
On large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.
The MapReduce program runs on Hadoop which is an Apache open-source framework.

Hadoop Distributed File System

The Hadoop Distributed File System (HDFS) is based on the Google File System (GFS) and
provides a distributed file system that is designed to run on commodity hardware.
It has many similarities with existing distributed file systems. However, the differences from other
distributed file systems are significant.
It is highly fault-tolerant and is designed to be deployed on low-cost hardware.
It provides high throughput access to application data and is suitable for applications having large
datasets.
Apart from the above-mentioned two core components, Hadoop framework also includes the
following two modules −
• Hadoop Common − These are Java libraries and utilities required by other Hadoop
modules.
• Hadoop YARN − This is a framework for job scheduling and cluster resource management.

How Does Hadoop Work?

It is quite expensive to build bigger servers with heavy configurations that handle large scale
processing, but as an alternative,
You can tie together many commodity computers with single-CPU, as a single functional
distributed system and practically, the clustered machines can read the dataset in parallel and
provide a much higher throughput.
Moreover, it is cheaper than one high-end server. So this is the first motivational factor behind
using Hadoop that it runs across clustered and low-cost machines.
Hadoop runs code across a cluster of computers. This process includes the following core tasks that
Hadoop performs −
• Data is initially divided into directories and files. Files are divided into uniform sized blocks
of 128M and 64M (preferably 128M).
• These files are then distributed across various cluster nodes for further processing.
• HDFS, being on top of the local file system, supervises the processing.
• Blocks are replicated for handling hardware failure.
• Checking that the code was executed successfully.
• Performing the sort that takes place between the map and reduce stages.
• Sending the sorted data to a certain computer. Writing the debugging logs for each job.
Advantages of Hadoop

• Hadoop framework allows the user to quickly write and test distributed systems. It is
efficient, and it automatic distributes the data and work across the machines and in turn,
utilizes the underlying parallelism of the CPU cores.
• Hadoop does not rely on hardware to provide fault-tolerance and high availability (FTHA),
rather Hadoop library itself has been designed to detect and handle failures at the
application layer.
• Servers can be added or removed from the cluster dynamically and Hadoop continues to
operate without interruption.
• Another big advantage of Hadoop is that apart from being open source, it is compatible on
all the platforms since it is Java based.

Application Mapping Definition

➢ Application mapping refers to the process of identifying and mapping interactions and
relationships between applications and the underlying infrastructure.
➢ An application, or network map, visualizes the devices on a network and how they are
related.
➢ It gives users a sense of how the network performs in order to run analysis and avoid data
bottlenecks.
➢ For containerized applications, it depicts the dynamic connectivities and interactions
between the microservices.
What is Application Mapping?

➢ As enterprises grow, the number and complexity of applications grow as well.


➢ Application mapping helps IT teams track the interactions and relationships between
applications, software, and supporting hardware.
➢ In the past, companies mapped out interdependencies between apps using extensive
spreadsheets and manual audits of application code.
➢ Today, however, companies can rely on an application mapping tool that automatically
discovers and visualizes interactions for IT teams.
➢ Popular application mapping tools include configuration management database – CMDB
application mapping or UCMDB application mapping.
➢ Some application delivery controllers also integrate application mapping software.

Application mapping includes the following techniques:

• SNMP-Based Maps — Simple Network Management Protocol (SNMP) monitors the health
of computer and network equipment such as routers. An SNMP-based map uses data from
routers to switch management information bases (MIBs).

• Active Probing — Creates a map with data from packets that report IP router and switch
forwarding paths to the destination address. The maps are used to find “peering links” between
Internet Service Providers (ISPs). The peering links allow ISPs to exchange customer traffic.

• Route Analytics — Creates a map by passively listening to layer 3 protocol exchanges


between routers. This data facilitates real-time network monitoring and routing diagnostics.

What are the Benefits of Application Mapping?

Application mapping diagrams can be helpful for the following benefits:

• Visibility – locate where exactly applications are running and plan accordingly for system
failures

• Application health – understand the health of entire application instead of analyzing


individual infrastructure silos

• Quick troubleshooting – pinpoint faulty devices or software components in seconds by


conveniently tracing connections on the app map, rather than sifting through the entire
infrastructure
How are Application Maps Used in Networking?

IT personnel use app maps to conceptualize the relationships between devices and transport
layers that provide network services. Using the application map, IT can monitor network
statuses, identify data bottlenecks, and troubleshoot when necessary.

How are Application Maps Used in DevOps?

Application owners and operations team use app maps to conceptualize the relationships between
software components and application services. Using the application map, DevOps team can
monitor application health, identify security policy breaches, and troubleshoot when necessary.

What is an Application Mapping Example?

An application map (see image below) provides visual insights into inter-app communications in
a container-based microservices application deployment. It captures the complex relationships of
containers. An application map can graph the latency, connections, and throughput information
of microservice relationships.
GOOGLE APP ENGINE :
➢ A scalable runtime environment, Google App Engine is mostly used to run Web
applications.
➢ These dynamic scales as demand change over time because of Google’s vast computing
infrastructure.
➢ Because it offers a secure execution environment in addition to a number of services, App
Engine makes it easier to develop scalable and high-performance Web apps.
➢ Google’s applications will scale up and down in response to shifting demand.
➢ Croon tasks, communications, scalable data stores, work queues, and in-memory caching
are some of these services.
After creating a Cloud account, you may Start Building your App
• Using the Go template/HTML package
• Python-based webapp2 with Jinja2
• PHP and Cloud SQL using Java’s Maven

➢ The app engine runs the programmers on various servers while “sandboxing” them.

➢ The app engine allows the program to use more resources in order to handle increased
demands.

➢ The app engine powers programs like Snapchat, Rovio, and Khan Academy.

Features of App Engine

Runtimes and Languages

To create an application for an app engine, you can use Go, Java, PHP, or Python. You can develop
and test an app locally using the SDK’s deployment toolkit. Each language’s SDK and nun time
are unique. Your program is run in a:
• Java Run Time Environment version 7
• Python Run Time environment version 2.7 PHP runtime’s PHP 5.4 environment
• Go runtime 1.2 environment

Generally Usable Features

➢ These are protected by the service-level agreement and depreciation policy of the app
engine. The implementation of such a feature is often stable, and any changes made to it
are backward-compatible.
➢ These include communications, process management, computing, data storage, retrieval,
and search, as well as app configuration and management.
➢ Features like the HRD migration tool, Google Cloud SQL, logs, datastore, dedicated
Memcached, blob store, Memcached, and search are included in the categories of data
storage, retrieval, and search.

Features in Preview

➢ In a later iteration of the app engine, these functions will undoubtedly be made broadly
accessible.
➢ However, because they are in the preview, their implementation may change in ways that
are backward-incompatible.
➢ Sockets, MapReduce, and the Google Cloud Storage Client Library are a few of them.

Experimental Features

➢ These might or might not be made broadly accessible in the next app engine updates.
They might be changed in ways that are irreconcilable with the past.
➢ The “trusted tester” features, however, are only accessible to a limited user base and
require registration in order to utilize them.
➢ The experimental features include Prospective Search, Page Speed, OpenID,
Restore/Backup/Datastore Admin, Task Queue Tagging, MapReduce, and Task Queue
REST API.
➢ App metrics analytics, datastore admin/backup/restore, task queue tagging, MapReduce,
task queue REST API, OAuth, prospective search, OpenID, and Page Speed are some of
the experimental features.

Third-Party Services

➢ As Google provides documentation and helper libraries to expand the capabilities of the
app engine platform, your app can perform tasks that are not built into the core product
you are familiar with as app engine.
➢ To do this, Google collaborates with other organizations. Along with the helper libraries,
the partners frequently provide exclusive deals to app engine users
Advantages of Google App Engine :
The Google App Engine has a lot of benefits that can help you advance your app ideas. This
comprises:

1. Infrastructure for Security: The Internet infrastructure that Google uses is arguably the
safest in the entire world. Since the application data and code are hosted on extremely secure
servers, there has rarely been any kind of illegal access to date.
2. Faster Time to Market: For every organization, getting a product or service to market
quickly is crucial. When it comes to quickly releasing the product, encouraging the
development and maintenance of an app is essential. A firm can grow swiftly with Google
Cloud App Engine’s assistance.
3. Quick to Start: You don’t need to spend a lot of time prototyping or deploying the app to
users because there is no hardware or product to buy and maintain.
4. Easy to Use: The tools that you need to create, test, launch, and update the applications are
included in Google App Engine (GAE).
5. Rich set of APIs & Services: A number of built-in APIs and services in Google App Engine
enable developers to create strong, feature-rich apps.
6. Scalability: This is one of the deciding variables for the success of any software. When using
the Google app engine to construct apps, you may access technologies like GFS, Big Table,
and others that Google uses to build its own apps.
7. Performance and Reliability: Among international brands, Google ranks among the top
ones. Therefore, you must bear that in mind while talking about performance and reliability.
8. Cost Savings: To administer your servers, you don’t need to employ engineers or even do it
yourself. The money you save might be put toward developing other areas of your company.
9. Platform Independence: Since the app engine platform only has a few dependencies, you
can easily relocate all of your data to another environment.
What is AWS?
o AWS stands for Amazon Web Services. The AWS service is provided by the Amazon that uses

distributed IT infrastructure to provide different IT resources available on demand.


o It provides different services such as infrastructure as a service (IaaS), platform as a service
(PaaS) and packaged software as a service (SaaS). o Amazon launched AWS, a cloud
computing platform to allow the different organizations to take advantage of reliable IT
infrastructure.

Uses of AWS

o A small manufacturing organization uses their expertise to expand their business by leaving
their IT management to the AWS.
o A large enterprise spread across the globe can utilize the AWS to deliver the training to the
distributed workforce.
o An architecture consulting company can use AWS to get the high-compute rendering of
construction prototype. o A media company can use the AWS to provide different types of
content such as ebox or audio files to the worldwide files.

Pay-As-You-Go

Based on the concept of Pay-As-You-Go, AWS provides the services to the customers.

AWS provides services to customers when required without any prior commitment or upfront
investment.

Pay-As-You-Go enables the customers to procure services from AWS.

o Computing

oProgramming models
oDatabase storage

oNetworking
Advantages of AWS

1) Flexibility

o We can get more time for core business tasks due to the instant availability of new features
and services in AWS.
o It provides effortless hosting of legacy applications. AWS does not require learning new
technologies and migration of applications to the AWS provides the advanced computing
and efficient storage.
o AWS also offers a choice that whether we want to run the applications and services together
or not. We can also choose to run a part of the IT infrastructure in AWS and the remaining
part in data centres.

2) Cost-effectiveness

AWS requires no upfront investment, long-term commitment, and minimum expense when
compared to traditional IT infrastructure that requires a huge investment.
3) Scalability/Elasticity

➢ Through AWS, autoscaling and elastic load balancing techniques are automatically scaled up
or down, when demand increases or decreases respectively.
➢ AWS techniques are ideal for handling unpredictable or very high loads.
➢ Due to this reason, organizations enjoy the benefits of reduced cost and increased user
satisfaction.

4) Security

o AWS provides end-to-end security and privacy to customers.


o AWS has a virtual infrastructure that offers optimum availability while managing full
privacy and isolation of their operations.
o Customers can expect high-level of physical security because of Amazon's several years of
experience in designing, developing and maintaining large-scale IT operation centers.
o AWS ensures the three aspects of security, i.e., Confidentiality, integrity, and availability of
user's data.

EUCALYPTUS :

➢ The open-source cloud refers to software or applications publicly available for the users in the

cloud to set up for their own purpose or for their organization.

➢ Eucalyptus is a Linux-based open-source software architecture for cloud computing and also a

storage platform that implements Infrastructure a Service (IaaS).

➢ It provides quick and efficient computing services.

➢ Eucalyptus was designed to provide services compatible with Amazon’s EC2 cloud and Simple

Storage Service(S3).
Eucalyptus Architecture :

➢ Eucalyptus CLIs can handle Amazon Web Services and their own private instances.
➢ Clients have the independence to transfer cases from Eucalyptus to Amazon Elastic Cloud.
➢ The virtualization layer oversees the Network, storage, and Computing.
➢ Occurrences are isolated by hardware virtualization.
Important Features are:-
1. Images: A good example is the Eucalyptus Machine Image which is a module software bundled
and uploaded to the Cloud.
2. Instances: When we run the picture and utilize it, it turns into an instance.
3. Networking: It can be further subdivided into three modes: Static mode(allocates IP address to
instances),System mode(assigns a MAC address and imputes the instance’s network interface to
the physical network via NC)and Managed mode(achieves local network of instances).
4. Access Control: It is utilized to give limitations to clients.
5. Elastic Block Storage: It gives block-level storage volumes to connect to an instance.
6. Auto-scaling and Load Adjusting: It is utilized to make or obliterate cases or administrations
dependent on necessities.
Components of Architecture

• Node Controller is the lifecycle of instances running on each node. Interacts with the operating
system, hypervisor, and Cluster Controller. It controls the working of VM instances on the host
machine.
• Cluster Controller manages one or more Node Controller and Cloud Controller simultaneously. It
gathers information and schedules VM execution.
• Storage Controller (Walrus) Allows the creation of snapshots of volumes. Persistent block
storage over VM instances. Walrus Storage Controller is a simple file storage system. It stores
images and snapshots. Stores and serves files using S3(Simple Storage Service) APIs.
• Cloud Controller Front-end for the entire architecture. It acts as a Complaint Web Services to
client tools on one side and interacts with the rest of the components on the other side.

Operation Modes Of Eucalyptus

• Managed Mode:

Numerous security groups to users as the network is large. Each security group is assigned a
set or a subset of IP addresses. Ingress rules are applied through the security groups specified
by the user. The network is isolated by VLAN between Cluster Controller and Node Controller.
Assigns two IP addresses on each virtual machine.

• Managed (No VLAN) Node:

The root user on the virtual machine can snoop into other virtual machines running on the
same network layer. It does not provide VM network isolation.

• System Mode:

Simplest of all modes, least number of features. A MAC address is assigned to a virtual
machine instance and attached to Node Controller’s bridge Ethernet device.

• Static Mode:
Similar to system mode but has more control over the assignment of IP address. MAC
address/IP address pair is mapped to static entry within the DHCP server. The next set of
MAC/IP addresses is mapped.
Advantages Of The Eucalyptus Cloud
1. Eucalyptus can be utilized to benefit both the eucalyptus private cloud and the eucalyptus
public cloud.
2. Examples of Amazon or Eucalyptus machine pictures can be run on both clouds.
3. Its API is completely similar to all the Amazon Web Services.
4. Eucalyptus can be utilized with DevOps apparatuses like Chef and Puppet.
5. Although it isn’t as popular yet but has the potential to be an alternative to OpenStack and
CloudStack.
6. It is used to gather hybrid, public and private clouds.
7. It allows users to deliver their own data centers into a private cloud and hence, extend the
services to other organizations.
OPEN NEBULA :
➢ OpenNebula is a free and open source software solution for building clouds and for data
centre virtualisation.
➢ It is based on open technologies and is distributed under the Apache License 2.
➢ OpenNebula has features for scalability, integration, security and accounting.
➢ It offers cloud users and administrators a choice of interfaces.
➢ OpenNebula is an open source platform for constructing virtualised private, public and
hybrid clouds.
➢ It is a simple yet feature-rich, flexible solution to build and manage data centre
virtualisation and enterprise clouds.
➢ So, with OpenNebula, virtual systems can be administered and monitored centrally on
different Hyper-V and storage systems.
➢ When a component fails, OpenNebula takes care of the virtual instances on a different
host system.
➢ The integration and automation of an existing heterogeneous landscape is highly flexible
without further hardware investments.
Benefits of OpenNebula:
The plurality of support to Hyper-V and platform-independent architecture makes OpenNebula
the ideal solution for heterogeneous computing centre environments.

The main advantages of OpenNebula are:

• It is 100 per cent open source and offers all the features in one edition.
• It provides control via the command line or Web interface, which is ideal for a variety of user
groups and needs.
• OpenNebula is available for all major Linux distributions, thus simplifying installation.
• The long-term use of OpenNebula in large scale production environments has proven its stability
and flexibility.
• OpenNebula is interoperable and supports OCCI (Open Cloud Computing Interface) and AWS
(Amazon Web Services).

Key features of OpenNebula :

➢ OpenNebula has features for scalability, integration, security and accounting.


➢ The developers also claim that it supports standardisation, interoperability and portability.
➢ It allows cloud users and administrators to choose from several cloud interfaces.
➢ Figure 1 shows the important features of OpenNebula.

Figure 1: Key features of OpenNebula


Why OpenNebula?

• Web interface or CLI – the choice is yours

By using the OpenNebula CLI or Web interface, you can keep track of activities at any time.
There is a central directory service through which you can add new users, and those users can
be individually entitled. Managing systems, configuring new virtual systems or even targeting
the right users and groups is easy in OpenNebula.

• Availability at all times

OpenNebula not only takes care of the initial provisioning, but the high availability of its
cloud environment is much better compared to other cloud solutions. Of course, the central
OpenNebula services can be configured for high availability, but this is not absolutely
necessary. All systems continue to operate in their original condition and are automatically
included in the restored availability of the control processes.

• Easy remote access

In virtual environments, one lacks the ability to directly access the system when there are
operational problems or issues with the device. Here, OpenNebula offers an easy solution —
using the browser, one can access the system console of the host system with a VNC
integrated server.

• Full control and monitoring

All host and guest systems are constantly monitored in OpenNebula, which keeps the host
and VM dashboards up to date at all times. Depending on the configuration, a virtual machine
is to be restarted in case of the host system failing or if migrating to a different system. If a
data store is used with parallel access, the systems can of course be moved, while in
operation, on to other hardware. The maintenance window can be minimised and can often be
completely avoided.

• Open standards

OpenNebula is 100 per cent open source under the Apache License. By supporting open
standards such as OCCI and a host of other open architecture, OpenNebula provides the
security, scalability and freedom of a reliable cloud solution without vendor lock-in, which
involves considerable support and follow-up costs.
Figure 2: OpenNebula architecture

Figure 3: OpenNebula components


OpenNebula architecture :
To control a VM’s life cycle, the OpenNebula core coordinates with the following three areas of
management:
1) Image and storage technologies — to prepare disk images
2) The network fabric — to provide the virtual network environment
3) Hypervisors — to create and control VMs
Components of OpenNebula
Based on the existing infrastructure, OpenNebula provides various services and resources. You
can view the components in Figure 3.

• APIs and interfaces: These are used to manage and monitor OpenNebula components. |To
manage physical and virtual resources, they work as an interface.
• Users and groups: These support authentication, and authorise individual users and groups
with the individual permissions.
• Hosts and VM resources: These are a key aspect of a heterogeneous cloud that is managed
and monitored, e.g., Xen, VMware.
• Storage components: These are the basis for centralised or decentralised template
repositories.
• Network components: These can be managed flexibly. Naturally, there is support for VLANs
and Open vSwitch.

The front-end

• The machine that has OpenNebula installed on it is known as the front-end machine, which is
also responsible for executing OpenNebula services.
• The front-end needs to have access to the image repository and network connectivity to each
node.
• It requires Ruby 1.8.7 or above.
• OpenNebula’s services are listed below:
1. Management daemon (Oned) and scheduler (mm_sched)
2. Monitoring and accounting daemon (Onecctd)
3. Web interface server (Sunstone)
4. Cloud API servers (EC2- query or OCCI) Virtualisation hosts
• To run the VMs, we require some physical machines, which are called hosts.
• The virtualisation sub-system is responsible for communicating with the hypervisor and
taking the required action for any node in the VM life cycle.
• During the installation, the admin account should be enabled to execute commands with root
privileges.

Storage

Data stores are used to handle the VM images, and each data store must be accessible by the
front-end, using any type of storage technology.

OpenNebula has three types of data stores:

• File data store – used to store the plain files (not disk images)

• Image data store – repository for images only


System data store – used to hold the running VM images

• The image data store type depends on the storage technology used. There are three types of
image data stores available:

• File system – stores VM images in file formats

• LVM – reduces the overhead of having the file system in place; the LVM is used to store
virtual images instead of plain files

• Ceph – stores images using Ceph blocks

OpenNebula can handle multiple storage scenarios, either centralised or decentralised.

Networking :
There must be at least two physical networks configured in OpenNebula:

• Service network – to access the hosts to monitor and manage hypervisors, and to move VM
images.
• Instance network – to offer network connectivity between the VMs across the different hosts.
Whenever any VM gets launched, OpenNebula will connect its network interfaces to the
bridge described in the virtual network definition.

OpenNebula supports four types of networking modes:

• Bridged–where the VM is directly attached to the physical bridge in the hypervisor.


• VLAN–where the VMs are connected by using 802.1Q VLAN tagging.
• Open vSwitch–which is the same as VLAN but uses an open vSwitch instead of a Linux
bridge.
• VXLAN–which implements VLAN using the VXLAN protocol.

OPEN STACK :

Introduction to OpenStack

➢ OpenStack lets users deploy virtual machines and other instances that handle different tasks
for managing a cloud environment on the fly.

➢ It makes horizontal scaling easy, which means that tasks that benefit from running
concurrently can easily serve more or fewer users on the fly by just spinning up more
instances.

➢ For example, a mobile application that needs to communicate with a remote server might
be able to divide the work of communicating with each user across many different
instances, all communicating with one another but scaling quickly and easily as the
application gains more users.

➢ And most importantly, OpenStack is open source software, which means that anyone who
chooses to can access the source code, make any changes or modifications they need, and
freely share these changes back out to the community at large.

➢ It also means that OpenStack has the benefit of thousands of developers all over the world
working in tandem to develop the strongest, most robust, and most secure product that they
can.

How is OpenStack used in a cloud environment?

➢ The cloud is all about providing computing for end users in a remote environment, where
the actual software runs as a service on reliable and scalable servers rather than on each
end-user's computer.
➢ Cloud computing can refer to a lot of different things, but typically the industry talks about
running different items "as a service"—software, platforms, and infrastructure. OpenStack
falls into the latter category and is considered Infrastructure as a Service (IaaS).

➢ Providing infrastructure means that OpenStack makes it easy for users to quickly add new
instance, upon which other cloud components can run.

➢ Typically, the infrastructure then runs a "platform" upon which a developer can create
software applications that are delivered to the end users.

What are the components of OpenStack?

OpenStack is made up of many different moving parts. Because of its open nature, anyone can add
additional components to OpenStack to help it to meet their needs.

But the OpenStack community has collaboratively identified nine key components that are a part
of the "core" of OpenStack, which are distributed as a part of any OpenStack system and officially
maintained by the OpenStack community.

• Nova is the primary computing engine behind OpenStack. It is used for deploying and
managing large numbers of virtual machines and other instances to handle computing tasks.

• Swift is a storage system for objects and files. Rather than the traditional idea of a referring to
files by their location on a disk drive, developers can instead refer to a unique identifier
referring to the file or piece of information and let OpenStack decide where to store this
information. This makes scaling easy, as developers don’t have the worry about the capacity
on a single system behind the software. It also allows the system, rather than the developer, to
worry about how best to make sure that data is backed up in case of the failure of a machine or
network connection.

• Cinder is a block storage component, which is more analogous to the traditional notion of a
computer being able to access specific locations on a disk drive. This more traditional way of
accessing files might be important in scenarios in which data access speed is the most
important consideration.

• Neutron provides the networking capability for OpenStack. It helps to ensure that each of the
components of an OpenStack deployment can communicate with one another quickly and
efficiently.

• Horizon is the dashboard behind OpenStack. It is the only graphical interface to OpenStack,
so for users wanting to give OpenStack a try, this may be the first component they actually
“see.” Developers can access all of the components of OpenStack individually through an
application programming interface (API), but the dashboard provides system administrators a
look at what is going on in the cloud, and to manage it as needed.

• Keystone provides identity services for OpenStack. It is essentially a central list of all of the
users of the OpenStack cloud, mapped against all of the services provided by the cloud, which
they have permission to use. It provides multiple means of access, meaning developers can
easily map their existing user access methods against Keystone.

• Glance provides image services to OpenStack. In this case, "images" refers to images (or
virtual copies) of hard disks. Glance allows these images to be used as templates when
deploying new virtual machine instances.

• Ceilometer provides telemetry services, which allow the cloud to provide billing services to
individual users of the cloud. It also keeps a verifiable count of each user’s system usage of
each of the various components of an OpenStack cloud. Think metering and usage reporting.

• Heat is the orchestration component of OpenStack, which allows developers to store the
requirements of a cloud application in a file that defines what resources are necessary for that
application. In this way, it helps to manage the infrastructure needed for a cloud service to run.

CLOUD SIM :

➢ CloudSim is an open-source framework, which is used to simulate cloud computing

infrastructure and services.

➢ It is developed by the CLOUDS Lab organization and is written entirely in Java.

➢ It is used for modelling and simulating a cloud computing environment as a means for

evaluating a hypothesis prior to software development in order to reproduce tests and results.

➢ For example, if you were to deploy an application or a website on the cloud and wanted to

test the services and load that your product can handle and also tune its performance

to overcome bottlenecks before risking deployment, then such evaluations could be performed by simply
coding a simulation of that environment with the help of various flexible and scalable classes provided by the
CloudSim package, free of cost.

Benefits of Simulation over the Actual Deployment:

Following are the benefits of CloudSim:


• No capital investment involved. With a simulation tool like CloudSim there is no installation
or maintenance cost.
• Easy to use and Scalable. You can change the requirements such as adding or deleting
resources by changing just a few lines of code.
• Risks can be evaluated at an earlier stage. In Cloud Computing utilization of real testbeds
limits the experiments to the scale of the testbed and makes the reproduction of results an
extremely difficult undertaking. With simulation, you can test your product against test cases
and resolve issues before actual deployment without any limitations.
• No need for try-and-error approaches. Instead of relying on theoretical and imprecise
evaluations which can lead to inefficient service performance and revenue generation, you can
test your services in a repeatable and controlled environment free of cost with CloudSim.

Why use CloudSim?

Below are a few reasons to opt for CloudSim:

• Open source and free of cost, so it favours researchers/developers working in the field.
• Easy to download and set-up.
• It is more generalized and extensible to support modelling and experimentation.
• Does not require any high-specs computer to work on.
• Provides pre-defined allocation policies and utilization models for managing resources, and
allows implementation of user-defined algorithms as well.
• The documentation provides pre-coded examples for new developers to get familiar with the
basic classes and functions.
• Tackle bottlenecks before deployment to reduce risk, lower costs, increase performance, and
raise revenue.
CloudSim Architecture:

CloudSim Layered Architecture

CloudSim Core Simulation Engine provides interfaces for the management of resources such as
VM, memory and bandwidth of virtualized Datacenters.
CloudSim layer manages the creation and execution of core entities such as VMs, Cloudlets,
Hosts etc.
It also handles network-related execution along with the provisioning of resources and their
execution and management.
User Code is the layer controlled by the user. The developer can write the requirements of the
hardware specifications in this layer according to the scenario. Some of the most common
classes used during simulation are:
• Datacenter: used for modelling the foundational hardware equipment of any cloud
environment, that is the Datacenter. This class provides methods to specify the functional
requirements of the Datacenter as well as methods to set the allocation policies of the VMs
etc.
• Host: this class executes actions related to management of virtual machines. It also defines
policies for provisioning memory and bandwidth to the virtual machines, as well as allocating
CPU cores to the virtual machines.
• VM: this class represents a virtual machine by providing data members defining a VM’s
bandwidth, RAM, mips (million instructions per second), size while also providing setter and
getter methods for these parameters.
• Cloudlet: a cloudlet class represents any task that is run on a VM, like a processing task, or a
memory access task, or a file updating task etc. It stores parameters defining the
characteristics of a task such as its length, size, mi (million instructions) and provides
methods similarly to VM class while also providing methods that define a task’s execution
time, status, cost and history.
• DatacenterBroker: is an entity acting on behalf of the user/customer. It is responsible for
functioning of VMs, including VM creation, management, destruction and submission of
cloudlets to the VM.
• CloudSim: this is the class responsible for initializing and starting the simulation
environment after all the necessary cloud entities have been defined and later stopping after
all the entities have been destroyed.

Features of CloudSim:

CloudSim provides support for simulation and modelling of:


1. Large scale virtualized Datacenters, servers and hosts.
2. Customizable policies for provisioning host to virtual machines.
3. Energy-aware computational resources.
4. Application containers and federated clouds (joining and management of multiple public
clouds).
5. Datacenter network topologies and message-passing applications.
6. Dynamic insertion of simulation entities with stop and resume of simulation.
7. User-defined allocation and provisioning policies.
Scope :

➢ With the flexibility and generalizability of the CloudSim framework,


➢ it is easy to model heavy cloud environments which would otherwise require
experimentation on paid computing infrastructures.
➢ Extensible capabilities of scaling the infrastructure and resources to fit any scenario helps
in fast and efficient research of several topics in cloud computing.
CloudSim has been used in several areas of research such as:
• Task Scheduling
• Green Computing
• Resource Provisioning
• Secure Log Forensics.

SAP Platform :
➢ SAP Cloud Platform (SCP) is a platform-as-a-service (PaaS) product that provides a
development and runtime environment for cloud applications.
➢ Based in SAP HANA in-memory database technology, and using open source and open
standards, SCP allows independent software vendors (ISVs), startups and developers to
create and test HANA-based cloud applications.
➢ According to SAP, SCP is primarily intended to allow organizations to extend existing on -
premises or cloud-based ERP applications with next-generation technology,
➢ Such as advanced analytics, blockchain or machine learning; build and deploy new
enterprise business cloud and mobile apps; integrate and connect enterprise applications
regardless of the application location or data source; and connect enterprise applications and
data to IoT.
➢ For example, SCP facilitates the integration of SAP S/4HANA Finance with cloud
applications like SAP Ariba or SAP SuccessFactors.
➢ It can also integrate these applications with non-SAP systems and data sources, including
social media sites and other vendors' enterprise applications.
➢ SCP is based on open standards and offers developers flexibility and control over which
clouds, frameworks and applications to deploy, according to SAP.
➢ SCP uses different development environments, including Cloud Foundry and Neo, and
provides a variety of programming languages.
SAP Cloud Platform licensing models

➢ SCP is available in two commercial models: subscription based and consumption


based.
➢ These options allow companies a flexible way to match SCP services with
organizational needs, according to SAP.
➢ Under the subscription model, customers get access to SAP Cloud Platform services
for a fixed price and defined time, and can use as much of the services as th ey want.
➢ This model allows organizations to protect their IT investments with predictable
costs as long as they are subscribed to the service.
➢ Under the consumption model, customers can buy SCP services through credits and
use them as they see fit.
➢ This setup allows companies to start and scale up development projects quickly
whenever business requirements change.
➢ The SCP credits are paid for up front, and a cloud credit balance is kept for all the
services used.

SAP Cloud Platform use cases

➢ Although the applications developed and running on SCP provide widely divergent
functions and benefits, they share a common characteristic of enabling business
digital transformation.
➢ A number of custom use cases are available on SCP, including:

➢ Building custom, SAP Fiori-like user experience (UX) apps for SAP S/4HANA.

➢ Automating employee onboarding processes through integrating SAP Business Suite


and SAP SuccessFactors.

➢ Creating mobile apps for field service workers.

➢ Building an employee recruitment travel and expense management application that


integrates SAP API Business Hub with SAP SuccessFactors and SAP Concur.

➢ There are also a number of early SCP customers who have implemented its services
and technology in production environments, according to SAP.
➢ For example, German robotics firm Kuka AG uses SAP Cloud Platform to connect
robotics in manufacturing processes.
➢ Mitsubishi Electric Europe incorporates IoT into its industrial automation technology
via SCP.
➢ Global healthcare company Aesculap developed an Apple iOS app on SCP that
manages and simplifies the use of sterile containers in surgeries.

SAP capabilities and services :

➢ SAP Cloud Platform provides a variety of services and capabilities.


➢ As of August 2018, SAP lists 19 capabilities that generally fall under data-based
services and analytics, emerging technologies, user-based activities, and application
development and deployment.
➢ Prominent capabilities include the following:

➢ Analytics, which allows you to embed advanced analytics into applications for real -
time results.

➢ DevOps, which simplifies application development and operations.

➢ Integration, which allows you to integrate on-premises and cloud applications.

➢ Mobile, which enables mobile app development.

➢ User Experience, which lets you develop personalized and simple user interactions.

SAP Cloud Platform SDK for iOS


➢ One of the key integration tools for developers in SCP is the SAP Cloud Platform
SDK for iOS.
➢ This option allows developers to build mobile apps for iPhones and iPads that
integrate data from back-end enterprise applications with the iOS front end.
➢ The SDK uses the Apple Swift open programming language.
➢ It also includes a library of prebuilt UX components and access to iOS device
capabilities such as Touch ID, location services and notifications.
SAP Cloud Platform vs. SAP HANA Enterprise Cloud

➢ Although SAP Cloud Platform shares a similar name with SAP HANA Enterprise
Cloud (HEC), the two platforms have different intents and purposes.
➢ Both are variations of HANA cloud technology, but the two products use different
service models.
➢ While SCP offers a PaaS tool intended for developing and running cloud-based
applications,
➢ HEC is an infrastructure-as-a-service (IaaS) tool that enables companies to run
SAPbased operations in a hosted environment.
➢ SAP hosts HEC applications in several data centers located around the world and
provides ongoing application support and management, including upgrades, backups,
patches, restoration and recovery, infrastructure monitoring and event detection.

VMWARE CLOUD SERVICES :

➢ VMware Cloud services are services that enable you to integrate, manage, and secure
applications on cloud resources.
➢ These services work for any cloud service using VMware and can help you centralize
the management and maintenance of hybrid or multi-cloud environments.
➢ VMware Cloud services enable you to determine how resources are used and where
workloads are deployed while applying a single operational model.
➢ This enables you to standardize security, reduce management complexity, and
improve your ROI.
➢ You can use VMware Cloud services with either public or private clouds.
➢ When integrating these services you do not need to re-architecture applications or
convert data. This can help you simplify app modernization and ensure high
performance.
➢ VMware Cloud services are available in a variety of technologies provided as part of
a VMware Cloud subscription. This subscription offers a wide range of services.
➢ The following services are particularly helpful for monitoring and managing your
cloud environments:
VMware Cloud on AWS
• Cloud Provider Metering
• vRealize Network Insight Cloud
• vRealize Log Insight
• vRealize Automation

VMware Cloud on AWS

➢ VMware Cloud is available as a standalone service. It is also available as an

integration with Amazon Web Services (AWS).

➢ This integration was developed jointly by AWS and VMware and applies VMware

services to AWS infrastructure.

➢ You can use this integration to extend on-premises or other cloud services to AWS.

➢ When you integrate VMware Cloud with AWS you gain access to a single-tenant

infrastructure built on Elastic Compute Cloud (Amazon EC2) instances.

➢ EC2 instances are optimized for high volume input/output operations and storage

with low-latency solid-state drives (SSDs) based on Non-Volatile Memory Express

(NVMe).

➢ This infrastructure supports up to 16 vSphere host clusters on bare metal

infrastructure. In your deployment you can control scaling with options for between

three to sixteen hosts on each cluster you operate.

➢ Additionally, VMware Cloud on AWS enables you to run the VMware Software-
Defined

Data Center (SDDC) stack directly on your hosts.

➢ It does not require nested virtualization, making configuration and management


simpler.
➢ When migrating workloads you have access to cold (manual), VM template, and
vMotion (live) options.

You might also like