You are on page 1of 1

— PREVIEW/PRELIMINARY — PREVIEW/PRELIMINARY — PREVIEW/PRELIMINARY —

ACCFS – Accelerator File System
A case study towards a generalized accelerator interface
A R C H I T E K T U R

Andreas Heinig1, Wolfgang Rehm1, Heiko Schick2
1 Chemnitz University of Technology, Germany 2 IBM Deutschland Entwicklung GmbH, Germany

{heandr,rehm}@cs.tu-chemnitz.de, schickhj@de.ibm.com

Introduction Extending the SPUFS concept to Cell Broadband Engine Architecture
ACCFS
Current Situation SPE SPE SPE SPE SPE SPE SPE SPE
• Different accelerators are available on the market SPU SPU SPU SPU SPU SPU SPU SPU

User Level
application application
• They all are integrated through different ways into the Linux envi- SXU SXU SXU SXU SXU SXU SXU SXU
libspe * libacc **
ronment, e.g.:
Accelerator Integration API system call interface system call interface
LS LS LS LS LS LS LS LS
Cell/B.E. VFS SPUFS libfs RSPUFS concept libfs

FPGA Char. Dev OpenFPGA, AAL accfs ** MFC MFC MFC MFC MFC MFC MFC MFC

Kernel Level
(GP)GPU (Tesla) Char. Dev CUDA

device handler

device handler

device handler
proc

proc
ext2

ext2

ACC **

ACC **

ACC **
... spufs * ...

..
ON−chip coherent bus (EIB)
⇒ No common attach point inside the Kernel is available Character / Network / Block
device drivers
Character / Network / Block
device drivers

• This is disadvantageous for application- and library programmers
⇒ Every interface has other syntax and semantics Hardware Level
S S S S S S A A A
L2 PPE memory controller bus interface controller

PPE P P ... P CPU PPE P P ... P CPU C C ... C
E E E E E E C C C Dual Rambus Rambus
XDR FlexIO
Basic Idea Cell/B.E. CPU: case study Opteron ACCelerator: e.g. FPGA
L1 PXU

• The idea is to extend the programing modle chosen for integrate starting point: SPUFS concept intermediate generalization step target: ACCFS concept
PPU
the Cell processor into the Linux environment
• On the Cell/B.E. multiple independent vector processors called Syn-
ergistic Processing Units (SPUs) are built around a 64-bit PowerPC PPE - Power Processor Element
core (PPE) • Power Architecture with Altivec unit
• The programing model is to create a virtual file system (VFS) to • 32 KiB Level-1-Cache and 512 KiB Level-2-Cache
export the functionality of the SPUs to the user space via a file ⇒ Executes the operating system
system interface
SPE - Synergistic Processing Element
• RISC processor with 128-bit SIMD organization
• 256 KiB instruction and data memory (local-store)
• The execution unit can only operate on the local-strore
• DMA has to be used to access other addresses
⇒ Accelerator Units

SPUFS RSPUFS ACCelerator File System (ACCFS)
User Level

application
• SPUFS = Synergistic

User Level
rspufsd
User Level

application application
libspe
libspe
Processing Unit File Sys- (ii)

tem (iii) (i) libacc
system call interface system call interface
system call interface
• Virtual File System (iii) (ii) (i)
Kernel Level

(VFS) mounted on
network

network
Kernel Level

libfs libfs
stack

stack

libfs (ii)

ext 2 ... proc spufs ext 2 ... proc spufs ext 2 ... proc rspufs
system call interface
”/spu” by convention (ii)
Character / Network / Block
device drivers • Integrates the SPUs in Character / Network / Block
device drivers
Character / Network / Block
device drivers libfs
the Linux environment
Hardware Level

accfs
Kernel Level
Hardware Level

Hardware (iii) (ii)
Cell/B.E
Hardware Hardware

device handler

device handler

device handler
Cell/B.E. Opteron
proc
ext2

...

ACC

ACC

ACC
dedicated link (Ethernet, TCP/IP)

SPUFS - Concepts
1. Virtualization of the SPE • RSPUFS = Remote Synergistic Processing Unit File System
• Accelerators are mostly only exclusively usable • Proof of concept integration of the SPEs into the Opteron
Character / Network / Block
⇒ The system can dead-lock if several applications need a huge • Cell and Opteron are connected through Ethernet (TCP/IP) device drivers
amount of SPEs
=⇒ The ”physical SPE” gets abstracted with ”SPE context” Challanges
1. Different byte order
Hardware Level

2. VFS context access ⇒ The Opteron kernel swaps the bytes before sending and after A A A
⇒ VFS uses well known system VFS context entries receiving CPU C C ... C
C C C
calls (open, close, read, ...) File Description ⇒ The application has to swap the data by itself
• Only two new system calls: mbox SPU to CPU mailbox 2. No RDMA capable interconnection ACCelerator: e.g. FPGA
– sys spu create ibox SPU to CPU mailbox • Accessing the memory of the Cell is not possible in hardware
→ Creates a SPE context wbox CPU to SPU mailbox • The functionality is necessary to support assisted callbacks, the
– sys spu run mem local-store memory • ACCFS = ACCelerator File System
direct mapping of the local-store and the XDR access
→ Starts the execution of regs register file • Virtual File System (VFS) mounted on ”/acc” by convention
⇒ RSPUFS has to simulate the DMA
the SPU code .. • Proposal for integrating different kinds of accelerators into the Linux
=⇒ extension of the VFS context with a new ”xdr” interface
environment

ACCFS Concepts ACCFS Concepts (continued) ACCFS Benefits
1. Virtualization of the accelerator Top Half – User Interface 1. Device handlers have only to concentrate on the hardware integra-
2. Virtual File System context access tion
3. Separation of the functionalities: Tasks: • No management of operation system structures
• Top half: ”accfs” ACCFS interfaces • Handle the VFS • No providing of a whole user interface
– VFS implementation • Provide the Interfaces 2. Ease the development of library programing
"VFS interface"

"User Interface" (Syscalls)

– Provides the user/VFS in- User Interface: • Well known interface
libfs
terface • Two new system calls: ⇒ No non-standard ioctl differing from on accelerator to an-
– Provides the vendor inter- accfs other
– sys acc create
face • Better usability of the accelerator
device handler

device handler

device handler

"Vendor Interface"

→ Creates a new ACCFS context, reflecting in a new reflecting in
proc
ext2

ACC

ACC

ACC

...
• Bottom half: a new folder ⇒ Always the same usage ”protocol”:
”device handlers” – sys spu run 1. Create the Context
– Vendor specific part → Starts the accelerator 2. Upload the Code
– Integrates the accelerator 3. Execute the Context
• VFS context entries 4. Destroy the Context
Bottom Half – Vendor Interface File Description
regs register file 3. The accelerator becomes better exchangeable
Tasks:
message message interface
• Managing the accelerator
– Establish the interconnection
memory/ exported memories ACCFS: Further Work
semaphore/ semaphores
– Initialize the hardware • Finish the interface implementation of ACCFS
– Configure memory mappings • Porting SPUFS to an ACCFS device handler for SPEs
– ... • Implementing device handlers for the first accelerators other than
• Provide the virtualization Cell

Interface: (between top half and bottom half)
• accfs register (const struct accfs vendor *)
⇒ Loading (registration) of a device handler
• accfs unregister (const struct accfs vendor *)
⇒ Unloading of the device handler

This research is supported by the Center for Advanced Studies (CAS) of the IBM Böblingen Laboratory as part of the NICOLL Project.
— PREVIEW/PRELIMINARY — PREVIEW/PRELIMINARY — PREVIEW/PRELIMINARY —