You are on page 1of 21

ProgrammingwithLinuxonthe Playstation3

FOSDEM2008 olivier.grisel@ensta.org

Architectureoverview: introducingtheCellBE InstallingLinux SIMDprogramminginC/C++ Asynchronousdatatransferwith theDMA

WhoamI

Java/PythondeveloperatNuxeo(FOSSdocument managementserver) InterestedinArtificialIntelligence(andneedfast SupportVectorMachines) Slidestobepublishedat: http://oliviergrisel.name

PS3architectureoverview

CPU:IBMCell/BE@3.2GHz

218GFLOPS MainRAM:256MBXDR(64b@3.2GHz) 1.8TFLOPS(SP)/356GFLOPSprogrammable VRAM:256MBGDDR3(2x128b@700MHz)

GPU:NvidiaRSX

SystemBus:2.5GB/s

TheCellBroadbandEngine

1PPEcore@3.2GHz

64bithyperthreaded PowerPC 512KBL2cache 128bitSIMDoptimized 256KBSRAM

8SPEcores@3.2GHz

PS3Clusters

Cheapclusterfor academicresearchers CarolinaStateU.and U.MassachusettsatD. 8+1clusterwithsshand MPI

PS3GRIDComputing

PS3GRIDproject

basedonBOINC 30,000atomssimulation 1PFLOPSwith800 TFLOPSfromPS3s BlueGene==280 TFLOPS


Folding@Home

LinuxonthePS3

Lv1Hypervisorshippedwiththedefaultfirmware PartitionutilityintheSonyGameOSmenu Chooseyourfavoritedistro:

Installapowerpc64smporps3kernel Installgccspu+libspe2

ProgrammingtheCell/BEinC

ProgramthePPEasachiefconductortospreadthe numericalcodetoSPEs UsePOSIXthreadstostartSPEsubroutinesin parallel UseSPEintrinsicstoperformvectorinstructions EliminatebranchesasmuchaspossibleinSPEcode Alignyourdatato16bytes


IntroductiontoSIMDprogramming

128bitsregisters(SSE2,Altivec,SPE)

2xdouble 4xfloat 4xint

introducenewvectortypes 1vectorfloatoperation==4floatoperations logical(and,or,cmp,...),arithmetic(+,*,abs,...), shuffling


SIMDprogrammingthebigpicture

NotalwaysSIMDizable

SIMDprogrammingwithlibspe2and gccspu

#include<spu_intrinsics.h> avoidscalartypesuse:

vector_float4 vector_double2 vector_char16...

d=spu_and(a,b);e=spu_madd(a,b,c); spugccpure_spe_prog.copure_spe_prog.elf

Branchelimination

avoidbranching(if/else)

c=spu_sel(a,b,spu_cmpgt(a,d));

AsampleSPEprogram
volatileunion{ vec_float4vec; floatpart[4]; }sum; floatdot_product(constfloat*xp,constfloat*yp,constintsize){ sum.vec=(vec_float4){0,0,0,0}; vec_float4*xvp=(vec_float4*)xp; vec_float4*yvp=(vec_float4*)yp; vec_float4*xvp_end=xvp+size/4; while(__builtin_expect(xvp<xvp_end,1)){ sum.vec=spu_madd(*xvp,*yvp,sum.vec); xvp++; yvp++; } returnsum.part[0]+sum.part[1]+sum.part[2]+sum.part[3]; }

DMAwiththeSPUs'MemoryFlow Controllers

#include<spu_mfcio.h> mfc_get(&local_data,main_mem_data_ea, sizeof(local_data),DMA_TAG,0,0); mfc_put(&local_data,main_mem_data_ea, sizeof(&local_data),DMA_TAG,0,0); mfc_getb(&local_data,main_mem_data_ea, sizeof(local_data),DMA_TAG,0,0); spu_mfcstat(MFC_TAG_UPDATE_ALL);


Doublebufferingtheproblem

Doublebufferingthebigpicture

DoublebufferingwithMFC

1.SPUqueuesMFCGETtofillbuffer#1 2.SPUqueuesMFCGETtofillbuffer#2 3.SPUwaitsforbuffer#1tofinishfilling 4.SPUprocessesbuffer#1 5.SPUqueuesMFCPUTbackcontentofbuffer#1 6.SPUqueuesMFCGETBtorefillbuffer#1 7.SPUwaitsforbuffer#2tofinishfilling 8.SPUprocessesbuffer#2(...)


Someresources

CellBEProgrammingTutorial(ibm.com190pages) IBMdeveloperworksshortprogrammingtutorials

SearchforarticlesbyJonathanBarlett http://www.bsc.es/projects/deepcomputing/linuxoncell/ http://www.cc.gatech.edu/~bader/CellProgramming.html


BarcelonaSupercomputingCenter(software)

PS3programmingworkshops(videos)

#ps3devonfreenode

Thanks,credits,licensing

MostschemasfromexcellentGFDL'dtutorialby GeoffLevand(SonyCorp)

http://www.kernel.org/pub/linux/kernel/people/geoff/cell

Picturesandtrademarksbelongtotheirrespective owners(Sony,IBM,Universities,Folding@Home, PS3GRID,...) AllremainingworkisGFDL

7differences

You might also like