Team Members: R.Dhanalakshmi S.Dhiviya J.Mary Revathy

Problem Specification
• In this project we Design a data processing framework to

explicitly exploit the dynamic resource allocation for both task scheduling and execution. However, the processing frameworks which are currently used have been designed for static, homogeneous cluster setups. Particular tasks of a processing job can be assigned to different virtual machines which are automatically instantiated and terminated during the job execution.

Input Specification
The internal input being provided includes: • The VM Specification such as OS configuration (String) ,Processor(String) ,Memory (Long Int) Expressed in terms of MB. • No of Datacenters(Integer) • No of VM’s (Integer) • No of PE’s in Each Vm (Integer) • Cost per VM (float) • Bandwidth per Datacenter (Long int) • RAM memory per VM (long int ) expressed in terms of MB

• User Name (String) • Password (String) • Program Type (String) • Program Size (Long Int) .External Input Specification The external input being fed to the system for processing includes the following parameters.

The input is sent to the server for processing. • Total Cost for that particular usage of the cloud (Float) • Result of the Processed Program . initially the User Name and Password provides authentication for using the cloud. The output is of the form of cost which is evaluated from the server side within the simulator.Output Specification The output being obtained can be in two levels. if the user is a registered person then access is provided for accessing the cloud.

Job Description • • • • • Instance Types Number of subtasks Number of subtasks per instance Sharing instances between tasks Channel Types .

Job Scheduling and Execution: • After getting valid Job Graph from the user. • An Execution Graph is Nephele’s primary data structure for scheduling and monitoring the execution of a Nephele job. Nephele’s Job Manager transforms it into a so-called Execution Graph. .

However. which all put different constrains on the Execution Graph. instead of using a TCP connection.Types of Channel Nephele features three different types of channels. the respective subtasks exchange data using the instance’s main memory • File channels: A file channel allows two subtasks to exchange records via the local file system. • In-Memory channels: Similar to a network channel. • Network channels: A network channel lets two subtask exchange data via a TCP connection. . an in-memory channel also enables pipelined processing.

Scheduling process in cloud can be generalized into three stages namely: • Resource discovering and filtering • Resource selection • Task submission .

Scheduling strategies • Task Grouping • Prioritization • Algorithm – Deadline – Cost .

collectively referred to as a cluster or a grid . .MapReduce 1. 2. Map Reduce is a framework for processing embarrassingly parallel problems across huge datasets using a large number of computers (nodes). Computational processing can occur on data stored either in a file system (unstructured) or in a database (structured).

•Map Reduce can take advantage of locality of data. •There are 2 steps: •Map step •Reduce step • Distribution and reliability . processing data on or near the storage assets to decrease transmission of data.

1) . String document): // name: document name // document: document contents for each word w in document: emit (w.Function Map • function map(String name.

Iterator partialCounts): // word: a word // partialCounts: a list of aggregated partial counts sum = 0 for each pc in partialCounts: sum += pc emit (word. sum) .Function Reduce • function reduce(String word.

Cloud Components A Cloud system consists of 3 major components such as •Clients •Datacenter •Distributed servers .

Clients generally fall into three categories as: • Mobile: Windows Mobile Smartphone.Clients End users interact with the clients to manage information related to the cloud. • Thin: They only display information. like a Blackberry. • Thick: These use different browsers like IE or mozilla Firefox or Google Chrome to connect to the Internet cloud. . smartphones. Thin clients don’t have any internal memory. or an iPhone.Servers do all the works for them.

Now-a-days a concept called virtualisation is used to install a software that allow multiple instances of virtual server applications. A datacenter may exist at a large distance from the clients. User connects to the datacenter to subscribe different applications.DataCenter • • • • Datacenter is a collection of servers hosting different applications. .

the user will feel that he is using this application from its own machine. • But while using the application from the cloud. .Distributed Servers • Distributed servers are the parts of a cloud which are present throughout the Internet hosting different applications.

Virtualisation • Virtualisation means ”something which isn’t real”. • It is the software implementation of a computer which will execute different programs like a real machine. • Types of Virtualization : Full virtualization Paravirtualization .

scheduling and allocation policies of a large scaled Cloud platform .Implementation Environment • CloudSim is a framework developed by the GRIDS laboratory of University of Melbourne • It enables seamless modelling. service brokers. • CloudSim is a self-contained platform which can be used to model data centers. simulation and experimenting on designing Cloud computing infrastructures.

Cloudsim architecture .

Cloud simulation framework .

44 MB 15 VGA Colour Logitech 512 MB .HARDWARE REQUIREMENTS: System Hard Disk Floppy Drive Monitor Mouse Ram : : : : : : Windows XP 40 GB 1.

SOFTWARE REQUIREMENTS • Operating system : • Coding Language : • Database : Windows 7 Professional. Java Oracle 10g .

2 Intel Xeon (2.Server Configuration • • • • Processor .66 GHz ) No of Cores – 8 CPU Cores Main Memory – 32 GB Operating System – Gentoo Linux .

Large ₋ No of CPU Core: 8 ₋ RAM Capacity : 18 GB ₋ Hard disk : 512 GB ₋ Cost : 0.10 $ • C1.VM Instance Configuration • M1.Small – No of CPU Core: 1 – RAM capacity: 1 GB – Hard disk : 128 GB – Cost : 0.80 $ .

Modular description • Three Modules have been designed . they are as follows: – Network Module – Security Module – VM Module .

A client also shares any of its resources. • Often clients and servers operate over a computer network on separate hardware.Network Module • Server – Client computing or networking is a distributed application architecture that partitions tasks or workloads between service providers (servers) and service requesters. Clients therefore initiate communication sessions with servers which await (listen to) incoming requests. A server machine is a high-performance host that is running one or more server programs which share its resources with clients. called clients. .

• Once the client is logged in.Security Module • The cloud works in a distributed environment and hence it becomes essential in providing security to the clients who are using it . he/she can select what type of file is he going to execute and what task the client assigns to the server. . • The client once registered with the server is given a valid username and ID which he can use to login.

The cloud environment is designed as in the base paper with the configuration mentioned previously. • Hence the Brokers are created along with the datacenter. • The VM can be accessed using Broker and datacenter. . • The parameter’s such as cost is also given within the program so as to calculate the total cost.VM Module • In this module.

Parameters for Comparison in the Existing System • CPU Utilization • Resource Utilization • Cost .

Project implementation schedule TITLE Literature Survey Creation of Cloud Cluster Implementing the Scheduling Algorithm in the VmScheduler Measuring the Parameters for Comparison Month July’12 – August’12 August’12 DURATION ( Weeks) 4 2 August’12-September’12 3 September’12 2 Designing the Load Balancing Algorithm Measuring the Parameters Comparison of Both Existing System and Proposed System October’12November’12 December’12 January’13-February’13 4 3 2 .

2.References • • Venkatesa Kumar. ” A Dynamic Resource Allocation Method for Parallel DataProcessing in Cloud Computing”.Technische Universität Berlin. Daniel Warneke . Volker Markl .D 83.Einsteinufer 17.” A Dynamic Optimization Algorithm for Task Scheduling in Cloud Environment”. Sateesh Kumar Peddoju / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622. Odej Kao . Monika Choudhary.Wissenschaftlichen Aussprache: 28. Calheiros. pp. Daniel Warneke . F.” Massively Parallel Data Processing on Infrastructure as a Service Platforms”.Berlin 2011.2564-2568 . Vol. Rodrigo N. Rajkumar Buyya.Germany.S. Dipl.” CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms”. Issue 3. “Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing”. May-Jun 2012. ISSN 1549-3636© 2012 Science Publications. Rajiv Ranjan. Journal of Computer Science 8 (5): 780-788.10587 Berlin. Palaniswami . Anton Beloglazov. De Rose. César A.V .-Inf. September 2011. 2012. • • • .

References • Saurabh Kumar Garg and Rajkumar Buyya. .2011 Fourth IEEE International Conference on Utility and Cloud Computing.” NetworkCloudSim: Modelling Parallel Applications in Cloud Simulations”.