Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
Map reduce example for the Sun Grid Engine

Map reduce example for the Sun Grid Engine

Ratings: (0)|Views: 257 |Likes:
Published by Ioannis Moutsatsos
This simple programming model helps Java
developers easily and efficiently use the Sun Grid Compute Utility for the distributed execution of parallel computations
This simple programming model helps Java
developers easily and efficiently use the Sun Grid Compute Utility for the distributed execution of parallel computations

More info:

Published by: Ioannis Moutsatsos on Jan 10, 2010
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Projects>sungrid>map­reduce­example  Login|Register
My pagesProjectsCommunities java.netGet Involved
 java­net Project Request a Project Project Help Wanted Ads Publicize your Project Submit Content Site Help 
Project tools
Project home Announcements Discussion forums Mailing lists Documents & files Subversion Issue tracker  
Advanced search 
How do I...
Learn about projects? Customize my project homepage? Get release notes for CollabNet 4.5.2? Get help? 
This projectGo
If you wereregisteredandlogged in, you could join this project.
Introduction|Application Architecture|Example Code and Doc 
Project Description
This example illustrates one way to implement the MapReduce design pattern on the Sun Grid Compute Utility. The example leverages ComputeServer technology to simplify development of the solution using the Java programming language.
The MapReduce design pattern is widely used today to solve a set of data processing problems that involve two phases of execution: a map phaseand a reduce phase. In the first phase, input key/value pairs are processed through a mapping function to produce a set of intermediate results, alsoas key/value pairs. Those intermediate results are then reduced in a second execution phase, wherein all of the values for a single key areconsolidated to produce a final set of unique key/value pairs. This pattern was popularized by Google, which employs it to process large volumes of data, and takes its name from Google's MapReduce programming model and associated implementation, introduced by Jeffrey Dean and SanjayGhemawat in the 2004 Google Labs paper "MapReduce: Simplified Data Processing on Large Clusters," published athttp://labs.google.com/papers/mapreduce­osdi04.pdf .Typically, the work to be completed in both the map and reduce phases is divided into independent tasks that execute in parallel, making thesesolutions highly scalable and well suited to run on the Sun Grid Compute Utility. That said, the simplicity of the MapReduce design pattern is easilylost among complexities introduced by distributed systems development. For that reason, many MapReduce solutions are implemented atop adistributed systems execution layer that is customized to insulate developers from these complexities and to preserve the clarity of the MapReducemodel.In this example we leverage Sun'sCompute Server technologyto simplify the development of a Word Counter application, modeled as a MapReduceproblem, using the Java programming language. Compute Server technology handles all of the distributed systems work for us ­­ including thedistribution and load balancing of tasks across many grid nodes, the collection of results from those distributed execution nodes, partial failuretolerance, and more ­­ thereby allowing us to focus our energies exclusively on development of the logic for our map and reduce functions. Using theGrid Compute Server Plug­in for NetBeans™ IDE, we are also able to take advantage of the features of this popular IDE to aid in development and off­grid debugging of our application.
Application Architecture
The MapReduce design pattern is easily implemented using theCompute Server programming model. This simple programming model helps Javadevelopers easily and efficiently use the Sun Grid Compute Utility for the distributed execution of parallel computations. Any application that can bemodeled as a set of independent, compute­bound tasks that execute in parallel, or as a series of such parallel execution phases, can take advantageof Compute Server technology. The MapReduce pattern fits this characterization, and is one of manydesign patterns supported by Compute Server technology.
[insert architecture diagram here]
The Compute Server interfaces implemented in the WordCounter example application are:
 In the Compute Server programming model, tasks are independent units of work that are distributed across nodes on the Sun Grid Compute Utility for parallel execution. The WordCounter example includes a Map task class (WCMapTask.java) and a Reduce task class (WCReduceTask.java).WordCounter Map tasks are used to count the number of occurrences of each word in each input file. WordCounter Reduce tasks consolidate themap tasks' output to produce a total count, for each word, of all occurrences across all files.
 A Compute Server task generator generates task objects. In the WordCounter example, one generator class (WCMapGenerator.java) is used duringthe map phase to generate Map tasks, and another generator class (WCReduceGenerator.java) is used during the reduce phase to generate Reducetasks. The map phase has been configured to run first, followed by the reduce phase.
 A JobInputProducer is used off­grid, prior to executing a job, to prepare input for use by a Compute Server application. The WordCounter exampleuses an input producer class (WCInputProducer) to prepare a list of files for use as input to a WordCounter job run.
 This class is also run off­grid, and is used to retrieve and process the final results of a completed Compute Server job. The OutputProcessor used bythe WordCounter example (WCOutputProcessor.java) simply prints job execution statistics and the list of word counts returned from the job run.
Example Code and Documentation
The WordCounter example code and the supporting Compute Server technology infrastructure are freely available under open source license throughtheCompute Server Project. Both are included as part of the Grid Compute Server Plugin for NetBeans™ IDE, which can be downloaded from theproject'sdownload page. In order to run Compute Server technology, you will also need theJava™ SE platformand theNetBeans™ Integrated Development Environment. Once you have downloaded and installed the Compute Server plugin, simply open the WordCounter project (see ComputeServer documentation for the location of example projects) to examine and run the code.The WordCounter exam le also includes detailed documentation describin its im lementation. If ou would like to review the documentation rior to
MapReduce Example

Activity (2)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->