You are on page 1of 20

Copyright ©2012 Big Logic Technologies

1  
Copyright ©2012 Big Logic Technologies
2  
   
3  Major  
Categories  
 
•       Clients  
•       Master  Nodes  
•       Slave  Nodes  

Copyright ©2012 Big Logic Technologies


3  
   

This  is  the  typical  architecture  of  a  Hadoop  cluster  

Copyright ©2012 Big Logic Technologies


4  
   

Copyright ©2012 Big Logic Technologies


5  
   

Copyright ©2012 Big Logic Technologies


6  
   

Copyright ©2012 Big Logic Technologies


7  
   

Copyright ©2012 Big Logic Technologies


8  
   

Copyright ©2012 Big Logic Technologies


9  
   

Copyright ©2012 Big Logic Technologies


10  
   

Copyright ©2012 Big Logic Technologies


11  
   

Copyright ©2012 Big Logic Technologies


12  
   

Copyright ©2012 Big Logic Technologies


13  
   

Copyright ©2012 Big Logic Technologies


14  
   

Copyright ©2012 Big Logic Technologies


15  
   

Copyright ©2012 Big Logic Technologies


16  
•     Start  with  small  cluster  (  4  to  10  nodes)  and  grow  as  and  when  required.    
    Cluster  can  be  grown  whenever  there  is  a    
ü   Increase  in  computation  power  needed  
ü   Increase  in  data  to  be  stored    
ü   Increase  in  amount  of  memory  to  process  tasks    
ü   Increase  in  data  transfer  between  data  nodes  

Cluster  Growth  based  on  Storage  Capacity:    

Data  Growth   Replica5on     Intermediate   Overall  Space  needed    


 TB/Week   Factor   &  Log  Files   per  week  
2   3   30%   7.8  
Two  Machines  with  1X4TB  are  needed.    

Copyright ©2012 Big Logic Technologies


17  
   

Copyright ©2012 Big Logic Technologies


18  
Master  Node:     Data  Nodes:      
      • 4*  1TB  hard  disks  in  a  JBOD  (Just  a  
•   Single  Point  of  Failure     Bunch  Of  Disks)  configuration.  No  
• 32  GB  RAM   RAID.    
•   Dual  Xeon  E5600  or  better  (Quad   • 2  quad  core  CPUs,  running  at  least  
core)   2-­‐2.5GHz  
•   Dual  Power  supply  for  Redundancy   • 16-­‐24GBs  of  RAM  (24-­‐32GBs  if  you’re  
•   4  x  500  GB  7200  rpm  SATA  drives   considering  HBase)  
•   Dual  1  Gb  Ethernet  cards     • Gigabit  Ethernet  

Master  Node:     #  of  Tasks  per  Core:  


•   No  Commodity  Hardware   2  Cores  -­‐  Datanode  and  Tasktracker    
• RAIDed  hard  drives   Thumb  Rule  –  1  Core  can  run    1.5  
•   Backup  Metadata  to  an  NFS  Mount   Mappers  or  Reducers  
•   RAM  Thumb  rule:    1  GB  per  1  million   Amount  of  RAM:  
blocks  of  data.  32GB  for  100  nodes.     Thumb  Rule:    1G  per  Map  or  Red  task  
•   If  Metadata  is  lost,  whole  cluster  is   RAM  for  Hbase  Region  Server:    0.01  x  
lost.    Use  expensive  Name  Node.   <dataset  size>/<number  of  slaves>  

Copyright ©2012 Big Logic Technologies


19  
Copyright ©2012 Big Logic Technologies
20  

You might also like