You are on page 1of 30

Dynamic Parallel Data Processing In Heterogeneous Cloud

Presented By: Anjali Sharma(7016854) Swati Khurana(7016880) Acharya Narendra Dev College

Cloud computing is a mechanism that enables management of computing and IT infrastructure to be consolidated in one or more data centre to reduce the overall cost of operating computing facilities. Cloud - a metaphor for Internet is a technology that facilitates delivery of common business applications online that are accessed from the internet. This technology is expected to revolutionize the way data is stored and managed. Cloud computing is rapidly making its mark in IT industry. There are various reasons behind its rapid popularity and interests of companies in it .This papers contribution is three fold. First cloud computing has been discussed as technology. Secondly its benefits to customers of all sizes and the lastly some fears which are yet to be resolved for widespread adoption of cloud computing.


define clouds, explains the benefits of cloud computing and outlines cloud architecture and its major components.

characterize the problems and their impact on adoption .


show advantages of Cloud over Cluster in terms of Dynamic Data Processing.

What is Cloud?
A cloud is a pool of virtualized computer resources. A cloud can:

Host a variety of different workloads. Allow workloads to be deployed and scaled-out quickly through the rapid provisioning of virtual machines or physical machines. Support redundant, self-recovering, highly scalable programming models that allow workloads to recover from many unavoidable hardware/software failures. Monitor resource use in real time to enable rebalancing of allocations when needed.

Cloud Computing
It is an internet based development & use of computer technology. It is technology whereby details are abstracted from the user who no longer need knowledge of expertise in technology infrastructure in the cloud that supports them. It consists of shared computing resources that are virtualized and accessed as a service, through an API. Cloud computing resources are offered as a service on an asneeded basis,.

Characteristics Of Cloud Computing

Incremental Scalability: allow users to

access additional demand , computer resources on-

Multi-tenancy: Several customers share

infrastructure, without compromising privacy and security of each of the customers data,




improves through the use of multiple redundant sites, which makes cloud computing suitable for disaster recovery, Utility-based: Users only pay for the services they use, Security: could improve due to centralization of data, Cost: reduces due to the fact of utility computing. Maintenance: easier to maintain, since they don't have to be installed on each user's computer

Why cloud computing?

Data centers are notoriously underutilized, often idle 85% of the time Over provisioning Insufficient capacity planning and sizing Improper understanding of scalability requirements etc. Cost effective solutions to key business demands
Move workloads efficiency to improve

Types of Cloud Computing


Cloud Private Cloud Hybrid Cloud


Public Cloud:Public cloud resources are dynamically provisioned on a fine-grained, self-service basis over the Internet, via web applications/web services, from an off-site third-party provider who bills on a finegrained utility computing basis. Private Cloud:"Private cloud" and "internal cloud" have been described as neologisms, but the concepts themselves pre-date the term cloud by 40 years. Even within modern utility industries, hybrid models still exist despite the formation of reasonably wellfunctioning markets and the ability to combine multiple providers.

Hybrid cloud: A hybrid storage cloud uses a combination of public and private storage clouds. Hybrid storage clouds are often useful for archiving and backup functions, allowing local data to be replicated to a public cloud.


Services Offered by Cloud Computing




a. b.

IaaS(Infrastructure as-a Services):It involves utility computing. PaaS(Platform as-a Services): It is a set of software and development tools hosted on the providers servers. Developers can create applications using the providers APIs. Google Apps is one of the most famous Platform-as-a-Service providers SaaS(Software as-a Services): the provider allows the customer only to use its applications. The software interacts with the user through a user interface.




Expensive .Pay only for what you use .Flexible .Accessibility .Need not investment on multiple licensing .Improved performance .Reduced hardware equipment for end users .Lower hardware and software maintenance .Better collaboration

Security concerns Internet connection Too many platforms Time for transition Speed Location of servers


Data Parallelism

Parallel processing is the simultaneous processing of the same task on two or more microprocessor in order to obtain faster results. The computer resources can include a single computer with multiple processors, or a number of computers connected by a network, or a combination of both. The processors access data through shared memory. With the help of parallel processing, a number of computations can be performed at once, thus reducing TIME factor!! With the help of parallel processing, highly complicated scientific problems that are otherwise extremely difficult to solve can be solved effectively. Parallel computing can be effectively used for tasks that involve a large number of calculations, have time constraints and can be divided into a number of smaller tasks.

Parallel Data Processing in Cloud Computing

One of the key feature is the compute resources available in a cloud are highly dynamic and possibly heterogeneous. New VMs can be allocated at any time through a well-defined interface . Machines which are no longer used can be terminated instantly and the cloud customer will be charged for them no more. The scheduler of a framework must become aware of the cloud environment a job should be executed in. Scheduler must know about the different types of available VMs as well as their cost and be able to allocate or destroy them on behalf of the cloud customer The paradigm used to describe jobs must be powerful enough to express dependencies between the different tasks the jobs consist of. The system must be aware of which tasks output is required as another tasks input.


Nepheles architecture

An example of a Job Graph in Nephele


An Execution Graph created from the original Job Graph


Nepheles job viewer provides graphical feedback on instance utilization and detected bottlenecks



The Execution Graph for Experiment 2 (MapReduce and Nephele)


The Execution Graph for Experiment 3 (DAG and Nephele)




Results of Experiment 1: MapReduce and Hadoop


Results of Experiment 2: MapReduce and Nephele


Results of Experiment 3: DAG and Nephele





Related Work

We are approaching new Frame work Nephele