This action might not be possible to undo. Are you sure you want to continue?
The Cell Processor
An overview from conception to production
A Computer Science 625a Project Created For: Mark Daley Created By: Nathan Lemieux 250017145 December 2005
The Cell Processor
An overview from conception to production
The Cell was developed by the STI team which includes Sony, Toshiba and IBM. They teamed together to produce a scaleable and flexible processor that is extremely powerful yet energy conscious. It was another attempt to “redo the RISC revolution by simplifying processor micro-architecture and moving complexity into software” . Over the past twenty years, the processor industry has introduced techniques to increase clock speeds dramatically but jeopardized die space, power consumption and simplicity. I will start off by outlining a brief history of the Cell conception in Section 2 and then provide an overview of its architecture in Section 3. In Section 4, I will discuss how to use resources available in the Cell. In Section 5, I will compare the Cell to other architectures and in Section 6 I will look at why some of the design decisions were made. Before I conclude in Section 8, I will discuss in section 7, what software will run on the Cell and what it will take to program for the Cell.
The original Cell concept was developed by Sony Computer Entertainment Incorporated (SCEI) in 1999 after the release of its Playstation 2 game console. The Emotion Engine inside the PS2 did not meet its publicized expectations, so SCEI went back to the drawing board. The STI group was formed in 2000 with the first design center opening in 2001. In the fall of 2002, the patent for the Cell was published. Sometime after that, a prototype was developed and STI claims it was clocked at over 4.5 GHz. In February 2005, the final architectural design was released to the public at the International Solid-State Circuits Conference (ISSCC). Each company in the STI group has its own requirements and expectations on what the Cell should be capable of. STI intends to scale the processor by varying the number of cores on a chip and the number of units in a single core and by linking multiple chips to each other over a distributed network for low-end and high-end applications. IBM wants a powerful and scaleable processor, Sony wants a cheap but powerful processor; Toshiba also wants a cheap yet energy efficient processor. IBM IBM brings to the table its vast knowledge and expertise in developing and manufacturing state-of-the-art microprocessors. IBM has put its POWER architecture inside the Cell to act as the brains of the chip. IBM has plans to put the Cell in its high-end sever line. Sony Corporation Sony Corporation is a leading manufacturer of audio and video products for the consumer and professional markets. SCEI manufactures, distributes, and markets the PlayStation® game console family. SCEI has already discussed plans to launch the new PS3 in 2006 with the Cell processor. This will be the first commercially available product that contains the Cell processor. 1
Toshiba Toshiba Corporation is a leader in the development and manufacture of electronic devices and components for digital consumer products. Toshiba has plans to put the Cell in HDTVs to decode MPEG-2 streams simultaneously. The STI group that has been formed is a strategic alliance; each company brings a different skill set and vast amount of capital to invest in the Cell development. The purpose of developing the Cell was to reduce the cost of components by building their own. It is likely STI will produce the Cell in vast numbers because five plus years of development cost usually results in an expensive product that can only be reduced by manufacturing in volume.
Cell architecture is intended to be scaleable through the use of vector processing elements (SPE). You can scale up by adding SPEs or scale down by removing them. Depending on the usage requirements of the Cell processor it will be capable of a number of different configurations. Basic configuration consists of the following: PowerPC Processing Element (PPE) Multiple (eight) Synergistic Processing Elements (SPE) Rambus Memory Interface Controller (MIC) Rambus FlexIO interface Element Interconnect Bus (EIB)
Die Photo of Cell Processor  2
Theoretical Performance Calculations With the above configuration the Cell has a theoretical computing power of 256 GFLOPS (Billion Floating Points Operations per Second) Single Precision not including the PPE computing power when clocked at 4 GHz. “8(SPE) x 4Ghz x 4 (32 bit words in a vector) x 2 (multiply-adds are counted as 2 operations) = 256 GLOPS” So each SPE is capable of 32 SP GFLOPS . “SPE can produce 2 DP FMADD operations every 7 cycles or 4 SP FMADD every cycle” . So that is approximately 2.3 DP GFLOPS per SPE or approximately 18.4 DP GFLOPS in total. Again this is not including the processing power of the PPE which will be capable of 8 DP GFLOPS because of the AltiVec unit. Supercomputers rankings are done by Double Precision calculations. So, one Cell processor clocked at 4 GHz has a theoretical capability to reach “approximately 26 DP GFLOPS” . For comparison, the current Supercomputer BlueGene/L develop by IBM has a theoretical peak performance of 183500 GFLOPS but has only achieved 136800 GFLOPS. IBM’s BlueGene/L has 65536 processors giving each processor a theoretical peak performance of approximately 2.8 DP GFLOPS . If the Cell only achieves half of its theoretical performance, say 13 DP GFLOPS, it will still far exceed the current supercomputers in performance per processor.
3.1 Power Processor Element (PPE)
The Power Processing Element acts as the host processor and performs real time resource scheduling for the SPEs. It is a 64 bit processor based on IBM’s POWER architecture (Performance Optimization With Enhanced RISC). It contains a 32K instruction and level 1 cache, a VMX (AltiVec) unit (a floating point and integer SIMD instruction set, developed earlier by IBM with Apple and Motorola) and it is connected to a 512 KB system level 2 cache. It is a dual issue, SMT (Simultaneous Multithreading), in-order execution processor. IBM’s Hypervisor technology has been incorporated into the Cell to allow it to run multiple operating systems simultaneously such as Linux and real-time OS for computer entertainment and consumer electronics applications.
3.2 Synergistic Processing Elements (SPE)
The Synergistic Processing Element is essentially a “system on a chip” (SoC) design. It is a self-contained SIMD (Single Instruction Multiple Data) vector processor which acts as an independent processor. So, each SPE can perform multiple operations simultaneously with a single instruction. The SPEs handles most of the computational workload. From the above calculations, we see that it is capable of double precision point instructions but is geared toward single point precision and is capable of 32 SP GFLOPS. Like the PPE, the SPEs are in-order processors and have no Out-Of-Order (OOO) capabilities. However, the SPEs can support up to two instructions per cycle; one slot supports fixed and floating-point operations and the other slot provides loads/stores, byte permutation, and branches operations. The SPE checks to see if they can operate in 3
parallel and if not, then in program order. If necessary, after execution, the instructions are put back in sequence and the result is written back to local memory . Each SPE contains a 256 KB local memory which STI called “local store”. The local store memory is visible to the PPE and can be addressed directly by software. There is no memory or coherency mechanisms used within the local stores. Each SPE contains a 128 entry register file with 128 bits per entry. The register file has six read ports and two write ports. The SPEs cannot operate directly on main memory; they have to move data to and from the local stores. The DMA device in each SPE handles the movement of data between the main memory and the local store in blocks of 1024 bits or 128 bytes. The SPEs operate on registers which are read from or written to local stores. “Unlike Power processors, the SPEs operate only on their local memory (LS). Code and data must be transferred into the associated LS for an SPE to execute or operate on. LS addresses do have an alias in the PPE address map and transfers to and from LS to memory at large (including other LS) are coherent in the system. As a result, a pointer to a data structure that has been created on the PPE can be passed to an SPE and the SPE can use this pointer to issue a DMA command to bring the data structure into its LS in order to perform operations on it. If the SPE (or PPE) issues a DMA command to place it back into non-LS memory after some computations, the transfer is again coherent” 
SPE Architecture  4
3.3 Element Interconnect Bus (EIB)
The Element Interconnect Bus consists of two channels in opposite directions for a total of 4 channels. Each channel can transfer 24 bytes per cycle (16 bytes data + 8 bytes tag). Therefore, the Bus can transfer a total of 96 bytes/cycle. The EIB enables communication between the SPEs and the PPE; as a lot of data movement is intended to be internal. The EIB also connects to the system level 2 cache, memory controller and FlexIO (for external communications). The EIB has a vast amount of bandwidth capable of keeping the hungry SPE from starving. This is one of the concerns of having so many vector units on a single chip; can the EIB keep feeding and taking away data as quickly as it is produced. The EIB was the perfect design to allow the Cell to have different configurations of SPEs. This is because data travels no more than the width of one SPE. So, adding or removing SPEs means that the data transport latency is equal to the number of additional or reduced hops through those SPEs.
3.4 Memory controller and FlexIO
Again, just because there are powerful processing units and a bus capable of supplying them with data, the Cell still needs a high speed memory and I/O system to bring in and temporarily store data. STI went with a dual channel Rambus XDR controller, using 32 bit wide data busses and two channels, the peak memory bandwidth is 25.6 GB per second (2 channels x 2 devices per channel x 2 bytes per device x 3.2 GHz). System interface is also produced by Rambus. The FlexIO is capable of running from 400 MHz to 8 GHz and is organized into 12 lanes; each lane is unidirectional 8 bit wide point to point path. From the 12 lanes, 5 lanes are inbound, 7 outbound, for a theoretical peak I/O bandwidth of 76.8 GB (44.8GB out, 32GB in). Furthermore, the lanes are arranged into two groups of ports; one group of ports of non-coherent off-chip traffic and another group for coherent off-trip traffic. Coherent off-chip traffic happens when you have configuration of multiple Cell processors.
4. Possible Configurations and Resource Utilization
As previously discussed, the Cell can be configured for different uses. The Cell architecture allows for variable number of PPEs and SPEs with different memory configurations. Likewise, you can also connect multiple cells together. IBM has already produced a prototype with two cell processors. From what I understand, the two Cells can communicate with each other without additional hardware (using the high speed Rambus I/O interconnects rather than a control switch). I did not find much information about this as STI may not have all the wrinkles worked out of it yet.
Basic configuration with one Cell
Two Cell configuration with “glue-less” communication
Four Cell configuration with addition switch hardware for communication 
Since the Cell is extremely flexible in its configurations there is a need for a number of potential resource utilization schemes. Tasks are divided into SPE and PPE “modules” or jobs. Each SPE module is a sub-task, which operates using one or more SPEs depending on compute power needed; modules can also stream data to one another. PPE Scheduling – The PPE maintains a job queue, schedules jobs in SPEs and monitors progress. Each SPE performs its own job and synchronizes with the PPE. When an SPE has finished its execution the next job in line is assigned to that SPE. These jobs are self contained mini-programs. SPE self scheduling –Scheduling is distributed across the SPEs. SPEs run their own mini-kernel which allows them to assign jobs to themselves without guidance from the PPE. The SPE uses shared memory for all jobs in this configuration as the PPE still maintains the job queue. Stream Processing – Each SPE runs a distinct program to be chained together. Data comes from an input stream and is sent to SPE(s) to be stored in its local store. When an SPE(s) has terminated the processing, the output data is stored in its local store. The next SPE reads the output from the first SPE’s local store and processes it and stores it in its local store. This process of passing data from SPE to SPE continues until the stream processing operation has finished. 6
Above is an example of the Steam Processing schema in the Cell processor. “SPEs load programs for reading DVD, decoding video, decoding audio and display. The data would be passed off from SPE to SPE until finally ending up on the TV”  With a good compiler or some cleaver programming, I believe it is possible to combine different schemes. For example, you could have four SPEs working in parallel/queue scheme while the other four SPEs work in serial/stream scheme. In the above schemes, all the SPEs are dynamically assigned, so the developer does not need to know how many SPEs there are or what the SPEs are currently computing. STI also intends to allow distributed processing among the different products that will contain the Cell (from a PDA to camera to a microwave). The Operating System will perform the necessary communication set-up and will use whatever local network technology that is available.
5. Comparisons to other Architectures
CISC (Complex Instruction Set Computer) adds more instructions and complexity to the processor at the hardware level and abstract more away from the programmer and the compiler. All this complexity takes logic in the form of space on chip. Intel and AMD chips are based on the x86 architecture and have a CISC instruction set, even though currently they break down these complex instructions into simpler RISC instructions. The x86 architecture contains multiple levels of cache, OOO (Out-Of-Order) hardware such as the branch prediction unit and rename resisters to boost performance and complexity. Intel and AMD are both moving to the dual-core (multi-core) approach which will more likely have cache per core. If either core tries to access the same memory, the data in the cache may become out of date and needs to be updated in order to be coherent. Supporting this takes more logic (space) and time (performance). Also, the more processors there are the more complex the situation becomes. The Cell is very similar to current GPUs (Graphic Processing Units). Current graphics cards have vertex/pixel units similar to the SPEs in the Cell and are attached with high speed memory. However SPEs are more general purpose and have the ability to be chained together offering more flexibility and allowing the Cell to handle more than just graphics. However, GPU manufacturers are starting to produce more general purpose GPUs which provide better performance than traditional CPU and will be a direct competition for the Cell.
6. Decisions for Design
The trend in CPUs has been to increase performance by not only increasing clock speeds but also increasing IPC (Instruction Per Cycle). OOO hardware allows for more IPC, but this increases power consumptions, complexity, size, and cost. OOO CPUs are more complex and have larger number of transistor. These all require power and need to be cooled; “If you cannot cool the transistors they begin to leak electrons making them consume power when inactive” . Heat is why a lot of CPU manufacturers have given up on boosting clock speeds and have taken a multi-core approach. As you can see, changing the design influences processor efficiencies and the difficulty lies in the fact that you need to balance the tradeoffs. So what STI has done is to produce a multi-core processor within a processor with reduced logic complexity. Thus obtaining hefty gains in power consumption and size but jeopardizing IPC. The Power consumption problem has hit a point that if you want higher clock frequency you need to simplify the design. By designing a high speed EIB, local store memory and I/O interconnects, STI hopes to overcome and surpass the reduced IPC by adding more specialized execution units (SPEs) and allowing them to scale multiple Cells. Likewise the reduced complexity results in reduced power consumption and allows the Cell to be clocked at higher frequencies. To battle the heat issue STI has installed heat sensors in the ten larger sections of the Cell, the PPE, SPEs, and system cache. So, when some SPEs are working harder than the others, they could switch tasks evenly distributing the heat generated throughout the Cell. This is one of many design decision that were made to allow the Cell to scale to super computer performance; as heat has always been an ongoing issue, especially when multiple chips are clustered together. 8
Above I discussed reducing the complexity of the Cell. This includes the removal of OOO hardware (thus reducing amount of cache needed), branch prediction hardware, and the substitution of local addressable memory (LS) for cache. STI has removed some control logic from the SPEs for more local storage space and execution hardware. The SPE does not do register renaming, branch prediction or instruction reordering as STI has eliminated the instruction window. Secondly, and most important, STI has substituted the level 1 cache for locally addressable memory termed “local store” to reduce the memory latency gap. “The basic idea is that the Cell has moved memory closer to the execution units and let the processor store frequently used data in that local memory”. Since there is no cache in the SPE, the burden of managing the local store has been moved into the software. The use of a large unified register set and good compiler can schedule the instructions so dependencies have less impact. An advantage of a large register set is that loop unrolling and interleaving can be supported without the use of reorder hardware. The Compiler with the help of many registers can do what the OOO hardware does. The result is a less complex processor that operates at a higher frequency with relatively low power consumption.
6.1 Conventional Cache Vs Local Store
Conventional CPUs perform all their operations in registers which are directly read or written from main memory. Operating directly on main memory is slow, so this latency is why cache was introduced to hide the effects of going to or from main memory. Cache works by storing part of the memory that the processor is working on. If data is not present in the cache, the CPU waits for this data to be fetched. Local Store is different from cache as it is not transparent to software and does not contain hardware structures that predict what data to load, leaving more real-state for execution hardware. The local store memory is aliased in the PPE address map and is closer to the execution units. However to take advantage of the Local store the programmer or compiler must pre-schedule data transfers. Since the SPE are dual issue, pre-scheduled data transfers occur asynchronously in parallel with computation.
Primarily, the Cell’s PPE is a 64-bit PowerPC. The compatibility with the Power Architecture provides a base for porting existing software including operating systems like Apples OS X (will still need some extensive re-writing of code, but significantly less than re-writing Microsoft Windows for the Cell). Currently, Sony, Toshiba, and IBM have already ported Linux (different versions) to run on the Cell’s PPE core but additional patches are being developed jointly to unlock the performance potential of the Cell’s SPE. According to STI both the PPE and SPE are programmable in C/C++ using a common API. Existing 32-bit POWER applications will run on the Cell Processor without modification. Any algorithm that is can be vectorable or made parallel will release the performance gains of the SPEs. So, applications that use graphics, audio, video and encryption will perform superiorly. Nevertheless, scalar operations will also run across different SPE, just not at the same performance.
Since STI has removed some of the design complexity of the SPEs, they have increased the programming complexity for developers. Developing for the SPE will almost certainly be done at the hardware level. The tasks of managing the local store memory and system level coherency early on will be done by the programmer or library writer, but over time this should be effectively handled by the compiler. These future compilers should be able to use auto-vector code and create parallel code automatically but currently this is not available. Just this month IBM has released a Cell Software Development Kit (SDK). You can download the SDK for free and start programming for the Cell on any x86 (32-bit or 64-bit). The SDK comes with a bunch of software such as Linux kernel patches, the GNU toolchain (GCC), the XL C compiler (with different optimization options from none to loop analysis to whole program analysis), a system simulator, code samples, libraries and of course lots of documentation. It requires a significant amount of system performance and needs Red Hat Fedora Core 4 operating system to be installed. For more information see references [15 Thanks Mark]. For information purposes only, STI has released a hardware development kit for the PS3 nicked named “Cytology”. It is not as powerful (2.4 GHz) as the promised PS3 (3.2 GHz) when released in 2006. If your are interested in obtaining one you will have to wait at least until the New Year as they are being distributed in very small numbers to developers.
The Cell is a general purpose processor but optimized for high performance computing tasks. The intended basic configuration consists of 9 cores. PPE core is a conventional POWER processor and acts as a controller. The other eight SPE cores are independent vector processors that can work alone or chained together and perform most the computational workload. STI has given the Cell the ability to communicate with other Cells or bring in a huge amount input via high speed Rambus interconnects. STI design decisions were made to create a powerful processor which was easily configured for different uses from gaming system, to HDTV’s to super computers, while still being energy conscious. The Cell is very scaleable as SPE can be added to scale up or removed to scale down. It has taken STI over five years to design the Cell so one would think it would be expensive once it is released. Since it is very scaleable in its design and flexible in it uses it should be produced in vast numbers by the STI group providing exceptional power for a reasonable price. Programming for the Cell may be difficult at first as the PPE and SPE are different types of processors. In the meantime, compilers and libraries will be developed, giving programmers a break. The Cell will be successful in the niche market that STI members have designed it for and will change the way we perceive current processors. However, with the recent announcement by Apple, switching to Intel from the PowerPC, it may be a few years before we see the Cell in a desktop PC. Apples reasons for the switch include price of PowerPC chips, the inability of the PPC970FX to reach 3 GHz and failing to provide Apple with a low power PowerPC chip to be implemented in their PowerBook line.
Glossary of Acronyms
STI – Sony, Toshiba, IBM SCEI – Sony Computer Entertainment Incorporated ISSCC – International Solid-State Circuits Conference PPE – Power Processing Element SPE – Synergistic Processing Element LS – Local Store EIB – Element Interconnect Bus IPC – Instruction Per Cycle MIC – Memory Interface Card OOO – Out Of Order (Execution) CPU – Central Processing Unit DMA – Direct Memory Access SIMD – Single Instruction Multiple Data RISC – Reduced Instruction Set Computer CISC – Complex Instruction Set Computer POWER – Performance Optimization With Enhanced RISC DVD – Digital Video Decoder IBM – International Business Machine DP – Double Precision SP – Single Precision GFLOPS – Billion Floating Points Operations per Second FMADD - Floating point Multiply-Add Instruction GPU – Graphics Processing Unit KB – Kilobyte GHz – Gigahertz (Unit of Frequency equal to one billion hertz) MHz – Megahertz (Unit of Frequency equal to one million hertz) API – Application Program Interface SoC – System on a Chip SDK – Software Development Kit GNU – GNU Not Unix PDA – Personal Digital Assistant
 Wikipedia, “Cell (microprocessor),” http://en.wikipedia.org/wiki/Cell_processor  Stokes, Jon, “Introducing the IBM/Sony/Toshiba Cell Processor, Part 1,” http://arstechnica.com/articles/paedia/cpu/cell-1.ars/1, 02/07/2005  Stokes, Jon, “Introducing the IBM/Sony/Toshiba Cell Processor, Part 2,” http://arstechnica.com/articles/paedia/cpu/cell-2.ars, 02/08/2005  Suzuoki, Masakazu & Yamazaki, Takeshi, “US Patent Application” http://appft1.uspto.gov/netacgi/nphParser?Sect1=PTO2&Sect2=HITOFF&p=1&u=/netahtml/PTO/searchbool.html&r=1&f=G&l=50&co1=AND&d=PG01&s1=20020138637&OS=20020138 637&RS=20020138637, 10/26/2002  Press Release, “IBM, Sony, Sony Computer Entertainment Inc. and Toshiba Disclose Key Details of the Cell Chip,” http://www.us.playstation.com/Pressreleases.aspx?id=252, 02/07/2005  Hofstee, Peter, “Power Efficient Processor Architecture and The Cell Processor,” http://ieeexplore.ieee.org/iel5/9519/30167/01385948.pdf?arnumber=1385948, HPCA-11 2005  Press Release, “Sony Computer Entertainment Inc. To launch its next generation computer entertainment system,” http://www.us.playstation.com/Pressreleases.aspx?id=279, 05/16/2005  Wang, David, “Cell Micro[rpcessor III ,” http://realworldtech.com/page.cfm?ArticleID=RWT072405191325, 07/24/2005  Blachford, Nicholas, “Cell Architecture Explained Version 2,” http://blachford.info/computer/Cell/Cell0_v2.html, 2005  Wang, David, “ISSCC 2005: The Cell Microprocessor,” http://realworldtech.com/page.cfm?ArticleID=RWT021005084318, 02/10/2005  Wang, David, “Cell Microprocessor Revisited,” http://realworldtech.com/page.cfm?ArticleID=RWT022805234129, 02/28/2005  Blachford, Nicholas, “Cell Architecture Explained Version 1,” http://blachford.info/computer/Cell/archive/Cell0.html, 2005  J. Kahle, M. Day, H. Hofstee, C. John, T. Maeureu, D. Shippy, “Introduction to the Cell multiprocessor”, http://researchweb.watson.ibm.com/journal/rd/494/kahle.html, IBM Journal of Research and Development, Vol. 49, Number 4/5, 2005  “Top 500 List,” http://www.top500.org/lists/2005/06/, 06/2005  “Get started with the Cell Broadband Engine Software Development Kit”, http://www-128.ibm.com/developerworks/power/library/pa-cellstartsim/#N10276, IBM 11/09/2005
PS3 Specifications courtesy of Sony Computer Entertainment Inc. 
Cell Processor PowerPC-base Core @3.2GHz 1 VMX vector unit per core 512KB L2 cache 7 x SPE @3.2GHz 7 x 128b 128 SIMD GPRs 7 x 256KB SRAM for SPE * 1 of 8 SPEs reserved for redundancy total floating point performance : 218 GFLOPS RSX @550MHz 1.8 TFLOPS floating point performance Full HD (up to 1080p) x 2 channels Multi-way programmable parallel floating point shader pipelines Dolby 5.1ch, DTS, LPCM, etc. (Cell-base processing) 256MB XDR Main RAM @3.2GHz 256MB GDDR3 VRAM @700MHz Main RAM 25.6GB/s VRAM 22.4GB/s RSX 20GB/s (write) + 15GB/s (read) SB 2.5GB/s (write) + 2.5GB/s (read) 2 TFLOPS Detachable 2.5" HDD slot x 1 Front x 4, Rear x 2 (USB2.0) standard/Duo, PRO x 1 standard/mini x 1 (Type I, II) x 1 (10BASE-T, 100BASE-TX, 1000BASE-T) x 3 (input x 1 + output x 2) IEEE 802.11 b/g Bluetooth 2.0 (EDR) Bluetooth (up to 7) USB2.0 (wired) Wi-Fi (PSP®) Network (over IP) Screen size AV Output HDMI Analog Digital audio PlayStation® PlayStation®2 CD CD-DA Super Audio CD Disc media * read only DVD DualDisc PlayStation®2 DVD-Video Blu-ray Disc BD-Video 480i, 480p, 720p, 1080i, 1080p HDMI out x 2 AV MULTI OUT x 1 DIGITAL OUT (OPTICAL) x 1 CD-ROM CD-ROM CD-DA (ROM), CD-R, CD-RW Hybrid disc(HD layer/CD layer), HD layer DualDisc (audio side), DualDisc (DVD side) DVD-ROM DVD-ROM, DVD-R, DVD-RW, DVD+R, DVD+RW BD-ROM, BD-R, BD-RE
System Bandwidth System Floating Point Performance Storage HDD USB I/O Memory Stick SD CompactFlash Ethernet Communication Wi-Fi Bluetooth Controller
PLAYSTATION®3 DVD-ROM PLAYSTATION®3 BD-ROM
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.