P. 1
Pallet Big Data - JClouds Meetup 2013

Pallet Big Data - JClouds Meetup 2013

|Views: 1,447|Likes:
Published by tbatchelli

More info:

Published by: tbatchelli on Feb 07, 2013
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

07/10/2013

pdf

text

original

Pallet Big Data

JClouds Meetup Feb 2013 Toni Batchelli -- co-founder -- PalletOps.com

Saturday, March 16, 13

Programmable Infrastructure

• • • • •
Saturday, March 16, 13

The Cloud! Flexible Powerful Dynamic ... and then what?

... and then what?

Configure all the things!

• • •

configure the servers configure the local systems configure the distributed systems

Configuration Managers:

• •

build a configuration database wait for nodes to pull config

Saturday, March 16, 13

Programmatic Infrastructure

In a programatic infrastructure, the systems are provisioned and configured by running a program (*)

• •

jclouds takes care of the provisioning part Pallet takes care of the configuration part

(*) as opposed to configuring a server to coordinate the config, or using templates
Saturday, March 16, 13

Why programs?

With a program you can do many things:

• • • • • • •
Saturday, March 16, 13

Run it anywhere Keep it in GitHub Parametrize it Have it run by another program Make it a library Extend it etc

e.g. Hadoop Clusters

Saturday, March 16, 13

NameNode

TaskTracker

JobTracker

DataNode

Caution: Major oversimplification in progress!
Saturday, March 16, 13

Master

Slave TaskTracker

NameNode

JobTracker

DataNode

Caution: Major oversimplification in progress!
Saturday, March 16, 13

Slave TaskTracker

DataNode

Master

Slave TaskTracker

NameNode

JobTracker

DataNode

Slave TaskTracker

DataNode

Caution: Major oversimplification in progress!
Saturday, March 16, 13

Slave TaskTracker

Slave TaskTracker

DataNode

DataNode

Master

Slave TaskTracker

Slave TaskTracker

NameNode

JobTracker

DataNode

DataNode

Slave TaskTracker

Slave TaskTracker

DataNode

DataNode

Caution: Major oversimplification in progress!
Saturday, March 16, 13

Slave TaskTracker

Slave TaskTracker

DataNode

DataNode

NameNode Slave NameNode TaskTracker TaskTracker Slave

DataNode

DataNode

JobTracker

Slave TaskTracker

Slave TaskTracker

DataNode

DataNode

Caution: Major oversimplification in progress!
Saturday, March 16, 13

Java

Hadoop

Data Node

Task Tracker

Job Tracker

Name Node

Saturday, March 16, 13

.jar

Java

Hadoop

Data Node

Task Tracker

Job Tracker

Name Node

Saturday, March 16, 13

.jar

Java

Hadoop

Data Node

Task Tracker

Job Tracker

Name Node

Slave Node

Master Node

Saturday, March 16, 13

.jar

Java

Hadoop

Data Node

Task Tracker

Job Tracker

Name Node

Slave Node Hadoop Cluster
Saturday, March 16, 13

Master Node

Slave TaskTracker

Slave TaskTracker

DataNode

DataNode

NameNode

SSH Slave Slave TaskTracker

NameNode

SSH TaskTracker SSH DataNode SSH DataNode

JobTracker

Slave SSH TaskTracker SSH

Slave TaskTracker

DataNode

DataNode

Caution: Major oversimplification in progress!
Saturday, March 16, 13

function:  authorize-­‐node  (node,  group)    (public-­‐key,  private-­‐key)  =  gen-­‐key(node)    for  target-­‐node  in  nodes(group)  do        auth-­‐key(public-­‐key,  target-­‐node)    done function:  auth-­‐key(key,  node)    when-­‐not  “./ssh”  do        create-­‐dir(“./ssh”)    done    when-­‐not  “./ssh/authorized_keys”  do        create-­‐file(“./ssh/authorized_keys”)    done    append-­‐to-­‐file(“./ssh/authorized_keys”,  key)

Saturday, March 16, 13

function:  build-­‐cluster  (infra,  slave-­‐count,  RAM)    slave-­‐spec  =  build-­‐slave-­‐spec(RAM)    master-­‐spec  =  build-­‐master-­‐spec(RAM)    slaves  =  procure(infra,  slave-­‐spec,  slave-­‐count)    master  =  procure(infra,  master-­‐spec,  1)    master.configure()    for  slave  in  slaves  do          slave.configure()      done authorize-­‐node(master,  slaves) ...     ec2c  =  build-­‐cluster(ec2,  100,  8GB) rsc  =  build-­‐cluster(rackspace,  100,  16GB) vbc  =  build-­‐cluster(virtualbox,  3,  2GB)  

Saturday, March 16, 13

e.g. Pallet Big Data

Saturday, March 16, 13

Pallet Big Data

We decided we’d build something useful with all this power: liberating Amazon EMR users :)

• • • •

Build Hadoop clusters anywhere and everywhere Use your preferred Hadoop distro and version Build your own workflows I just saved a bunch of $$$ by switching to <your cloud here>

Saturday, March 16, 13

a Hadoop Cluster
{:cluster-­‐prefix  "hc1"  :groups  {:master  {:node-­‐spec                                        {:hardware                                            {:hardware-­‐id  "m1.medium"}}                                          :count  1                                          :roles  #{:namenode  :jobtracker}}                    :slave  {:node-­‐spec                                      {:hardware                                          {:hardware-­‐id  "m1.medium"}}                                    :count  2                                    :roles  #{:datanode  :tasktracker}}}  :node-­‐spec  {:image                          {:os-­‐family  :ubuntu                            :os-­‐version-­‐matches  "12.04"                            :os-­‐64-­‐bit  true}}  :hadoop-­‐settings  {:dist  :cloudera}}
Saturday, March 16, 13

a Hadoop workflow
{:steps [{:script-file "bootstrap/setup.sh"} {:script "/bin/start-daemon"} {:jar {:remote-file "//usr/.../image-parse.jar"} :main "parse" :input "s3n://sources/satellite-data" :output “hdfs://parsed-sat-img”} {:jar {:remote-file “//usr/.../outline-detection.jar”} :main “detect” :input “hdfs://parsed-sat-img” :output "s3n://results/weather-data"}] :on-completion :terminate-cluster}
Saturday, March 16, 13

run hadoop, run!

$ bin/hadoop start $ bin/hadoop job job_spec.clj $ bin/hadoop destroy

Saturday, March 16, 13

Saturday, March 16, 13

PalletOps

Saturday, March 16, 13

backlog
• • • •
Feature parity with Amazon EMR Server Rack support Extended workflows Central Management service

Interested in giving it a try?

contact@palletops.com tbatchelli@palletops.com
Saturday, March 16, 13

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->