You are on page 1of 19

Stanford Hydra Architecture

Presented by Drew Schena and Josh Milas


Overview

● Architecture
○ Current Processors
○ Base Idea for Hydra
○ Changes
● Thread Level Speculation
○ Hardware
○ Control Mechanisms
○ Hazards
Current Processors

● What do we do now?
○ Multiple issue superscalar
○ More instruction level parallelism
● Chip designs are getting larger
○ Longer critical path
○ Longer wires
○ More wire delay
● Diminishing returns
○ Can only get so much out of ILP
○ Data dependency? Assume conflict
Base Idea for Hydra

● Follow Chip Multi-Processor (CMP)


○ Don’t make it larger, make it smaller
○ Independent CPUs
○ Shared communication bus
○ CPUs must communicate via this bus
● Make multiple processors transparent to the user
○ Make single threaded programs faster
○ Compilers are great
Hydra Architecture

● 4 MIPS CPUs
● Each CPU has its own L1 cache
● Shared L2 cache
● Don’t just do ILP, do Thread Level Parallelism
○ Add special hardware for speculating
○ Assume no data dependencies
Hydra Architecture
CPU communication

● Why not have CPU’s directly talk to each other?


○ Many edge cases to worry about
○ Adds complexity
○ Makes it slower
● Use the L2 cache
○ Its fast enough
CPU Modifications

● Add hardware to allow speculation


○ Speculation Coprocessor
○ Speculation Buffer
○ Modified L1 cache
● CPU can
○ Speculate values
○ Maintain CPU state
○ Invalidate cache
Thread Level Speculation
● Sequential applications are broken apart into
“Threads”.
● Threads are run in parallel in a potentially
unsafe manner.
● The thread with the commit token commits its
generated data to memory on completion, and
passes token.
● Thread found to have data or control violations
are “Squashed” and restarted.
Hardware Requirements

Thread level speculation (TLS)


requires specialized hardware to run
effectively because the separate cores
need to communicate quickly and be
capable of restarting a task.

1. Thread Speculation Buffer


2. Speculation Control System
Hydra Architecture Overview
http://arsenalfc.stanford.edu/papers/hydra_MICRO00
.pdf
Speculation Thread Buffer

1. Holds written data locally until commit.


2. Tracks “Exposed Reads”.

Exposed reads are reads from memory locations that have not been written to
previously during the same thread. These are potentially unsafe operations
and any data read must be checked against any previous tasks written data
prior to commit.
Speculation Control System

1. Handles commit token passing.


2. Handles data forwarding between threads.
3. Writes tasks buffered data to main memory.
4. When violations are detected, drops tasks buffered memory, then
squashes and restarts threads.
Read After Write Hazards (Case 1)

In the case where reads occur after writes, but


occur on a subsequent thread, the data from the
write is forwarded to the thread, allowing it to
receive the correct data even though it has not
been committed to main memory yet.
Read After Write Hazards (Case 2)

In the case where the read occurs prior to the


write, it reads from main memory and continues
as usual. If a write to a location of an exposed
read occurs on a previous thread following the
read, the thread is squashed and restarted with
the new data forwarded to it.
Write After Read Hazards

1. The write from the second thread is held in the tasks


memory buffer.
2. The read from the first thread accesses main
memory, so it never sees the data from the second
threads write.
Write After Write Hazards

1. The write from the first thread is held in the tasks


memory buffer and does not overwrite the second
thread write.
2. When the commit token is passed, the control logic
recognizes that the write from the second thread is
meant to overwrite the write from the first thread
so the data gets ignored.
Control Hazards

● Most TLS implementations simply use standard


branch prediction methods to handle branches.
● Thread that are found to be down the incorrect
branch can simply be squashed.

TLS also offers the possibility of hedging bets when the


branch predictors certainty is low. Tasks can be created
that go both ways down a branch, reducing miss costs.
Concluding Remarks

● Move towards CMP design


○ Can make processors faster
● Thread level speculation
○ More parallelism
○ More transparent threads
References
● http://www-hydra.stanford.edu/publications/MICRO00.pdf
● http://meseec.ce.rit.edu/cmpe750-spring2015/750-4-16-2015.pdf
● http://www-hydra.stanford.edu/publications/ManoharPrabhuDissertation.pdf
● http://www-hydra.stanford.edu/publications/ICS99.pdf

You might also like