This action might not be possible to undo. Are you sure you want to continue?
Ananth Nallamuthu Holcombe Department of Electrical and Computer Engineering Clemson University MMESSII etc. Each protocol takes has its own benefits and drawbacks. This simulations uses the MESI protocol. The MESI protocol has four states Modified(M), Exclusive(E) and Shared(S), Invalid(I), and is an updated version of the MSI protocol which did not have the Exclusive(E) state. MESI protocols or modified versions of it are used in modern day multiprocessors such as the Silicon graphics challenge. In a cache that follows MESI protocol the states of the data block change as follows: On a cache miss during a processor read(PrRd), a BusRd transaction is generated, following which the memory is loaded into the cache in exclusive(E) state if no other cache has a copy of the particular memory block. On the other hand if any other cache has a copy of the block, the block could be loaded only into shared(S) state. A BusRdX is generated if a PrWr has caused a cache miss. The Memory block in this case is loaded in modified(M) state and all other caches which might have a copy of the memory block need to invalidate their copies on seeing the BusRdX. If a cache already has the memory block in modified state, on seeing a BusRdX for the same block, it needs to supply the modified version of the memory block to the requesting cache and update the main memory as well. In some architectures the cache does not supply the memory block directly to the requesting cache, but updates the main memory instead and invalidates its copy, the main memory later responds to the request with the updated block. In either case the modified information on the memory block is not lost.
Abstract: Performance improvement of multiprocessor systems has been attempted with several techniques. Split transaction bus architecture is one such performance improvement technique that better utilizes the shared bus, by implementing design changes that would allow cache controllers on the shared bus utilize the bus-time for newer transactions while the response for an older transaction is being prepared by the main memory (or another cache controller. This project aims at simulating the transactions on a split transaction bus using java threads. The cache controllers, main memory etc are represented as thread and they access the static variables in the address bus and databus objects
Introduction In a shared memory multiprocessor system there is a need to maintain cache coherency among the multiprocessors. Cache coherency in shared bus multiprocessor system is achievable by snooping mechanism. In a typical snoop based multiprocessor architecture, all cache controllers sharing a bus, snoop the bus for relevant transactions.On finding a transaction relevant to one of the memory blocks whose copy it owns, appearing on the bus, the cache controller takes necessary actions according to snooping protocolto maintain cache coherency. There are a number of snooping protocols that provide such as three state MSI, four state MESI, four state Dragon, seven state
Limited buffering is provided between the bus and the cache controllers.Flow control is implemented using NACK( Negative acknowledgement lines) on the bus. all other cachce controllers need to wait until the request is served by the main memory or any cache that has the block in modified state.e. A performance improvement is achievable by allowing multiple outstanding transactions.0 state transitions in MESI protocol Thus in a multi processor system implementing MESI protocol to share memory space.Multiple request for a single blck are disallowed(Conflicting transactions are disallowed) . Thus the bus is idle during the time when the memory is fetching the particular „memory block‟ or the cache is preparing to send the block. the number of outstanding transactions that could be allowed) Split transaction buses have separate buses for address and data Cache controllers have a buffer space that stores the outstanding requests and responses from them and also a request-table of outstanding transactions in bus. the Bus transactions are monitored by all cache controllers and actions are taken appropriately. thus it is not required to wait until the completion of a previously issued transaction before a new one could be issued (there are however some architecture specific limitations on . When such an implementation is done on a single „atomic‟ bus there could be only one outstanding transaction at any point in time i. The main design aspects of the architecture are as follows: .Transaction buses allow multiple outstanding transactions. This simulation emulates the Silicon Graphics challenge bus architecture.Current State of the memory block copy Modified Modified Exclusive Exclusive Shared Shared Invalid Invalid Request on Bus Actions for the Block BusRd Flush BusRdX Flush BusRd BusRdX BusRd BusRdX BusRd BusRdX Resulting state Shared Invalid Shared Invalid Shared Invalid Shared Invalid Table 1. . when a BusRdX or BusRd has been issued by a cache controller. This could be a major roadblock in a large multiprocessor system. The table entries are removed whenever a response for the particular request is seen on the bus.Eight outstanding transactions could be present on the bus . Split Transaction Bus Split.
thus it requires four bus cycles plus a one cycle turn around time for data block transfer. The bus phases are „Arb‟.The bus cycles consist of 5 phases. The bus command could be a BusRd or a BusRdX or BusWB. The SG challenge implements a uniform pipeline strategy. either separately for one of the buses or for both the buses as in the case of request from main memory controller. the requests and responses are matched with „tags‟. Each request is assigned a tag number which ranges from 0 to 7 (only eight outstanding transaction are allowed) and separate tag lines are used for the purpose. During „Arb‟ phase the requests for buses are made by controllers. The second phase is the request resolution phase („Rslv‟) in which all requests are considered and one of them is granted the access to the bus. Since the address bus is not used while responding to requests with data. And sets the tag line bits to indicate the tag number of the request that is being responded to. All cache controllers look up their request table for the particular tag entry and present their snoop results in the following phase. In the Address phase (Addr). the responding controller places the first part (part 1 of 4) of the datablock on the bus if the receiver is ready. thus the request phase too consists of five phases. In the „Ack‟ phase. The cache controller that was granted the address bus places the address of the memory block that it requestsalong with the bus command. the tag lines are activated by the controller that has been granted the DataBus. If the cache has a copy of the memory block in modified state. the data and address buses have a different function to do during each phase. The next phase is the decode phase in which the all cache controllers snoop the address bus for the address and the command associated with it. „Rslv‟. it places an „Addr Ack‟ signal on the address bus in the following phase and adds the response data to its own request queue. The response will thus be sent out later. When a response is being served. To make everyone aware that it will respond to the request. the data bus uses the tag lines to denote the tag number of the request that is being served to. Cache controllers look up their request tables to identify the address and other attributes of the data being served in the data bus. All cache controllers look up . Phases in a Split Transaction Bus cycle The SG challenge consists of a 5 phase bus cycle. it prepares itself to receive the data blocks.‟Addr‟.During the “Addr” phase. If it finds that it possesses a copy of the memory block. when the controller wins access to the databus. While the address bus is being used to place a new request.‟Dcd‟ and „Ack‟. the state of the memory block is stored in the request table with the new request table entry. In the data bus the arbitration and grant processes happen during “Arb” and “Rslv” phases. If the controller was the „originator‟ (determined by checking the originator field) for the request. the controller knows that it has to respond to the particular request. The cache blocks are of size 128 bytes and the bus is of width 256 bits. The cache controller also looks up its own memory blocks to determine if it is in possession of a copy of the memory block that is being requested. databus could be used could be used to serve a request that was made during an earlier time. A new request table entry is made when a a cache controller sees a new request on the bus in this phase.
The receipient loads the datablock from the Data Bus. indicating that the write request could not be serviced.0_04). If no acknowledgement was seen in the fifth phase. and the original request placed was a “BusRd” command.6. The primary reason for choosing java was its support for threads. but the snoop results are presented on the snoop lines only when the actual response to the request happens. the main memory controller assumes that it might have to respond to the particular request and initiates fetching of the relevant data block. If the shared line is raised. A separate GUI class was written using Java swing to display the transactions in the the buses. the main memory controller knows that it needs to service the particular request. the state in which the block need to be loaded is determined from the snoop results which are now available in the snoop lines. Whenever a BusRdX or BusRd request is being serviced by a cache controller. Main memory Controller’s responsibility Whenever a request for datablock appears on the address bus.their request table with the tag number at this point to determinewhat their response was to the particular request. At the end of the five phase cycle. If it was the originator. any cache controller that has a copy of the block in shared mode will raise “shared” OR-line. the memory block is loaded into the memory in “Shared” state else it is loaded into the memory with “Exclusive” state. all the cache controllers remove the particular request from the request tables. the cache controller knows that it has to be the recipient . Implementation The simulation program was created using Java (jdk version 1. the main memory controller responds with a negative acknowledgement NACK. The tag number of the entry that was removed will be made available for future requests by the arbitrator. it will be evident if any of the cache controllers have the particular datablock in modified state . This speculative fetching is done to improve the response time of the main memory.e adds the particular datablock in its write back queue. The variables in the AddressBus and DataBus classes are made static. In case the write back queue is full. since they would have placed an acknowledgment for the request in the fifth phase. Note: The snoop results is determined at the time of request appearing on the bus and are stored against the request entry in the request table. The cache controllers and the main memory controllers are invoked as threads and they all access common AddressBus. As per the requirement of the MESI protocol. The program is sclable for any number of cahce controllers while there could be only one main memory controller. and the values in memory blocks as they are updated. hence all the threads see the same variable values. . Each cache controller also looks up the request table to determine if it was the originator for the request. Thus the main memory updates itself with the modified version of the particular address when it sees a cache controller responding to a request i. In such a case the cache controller that had sent the particular data block needs to return later with a write request. Whenever a particular request has been responded to. it implies that the cache was last modified by that cache controller. DataBus and BusArbitrator class.
after it has cleared the particular Each thread iterates through the bus cycles. Each item in the queue consists of a requested address value and the bus request command. a bus request „BusRdx‟ need to be . a Bus request need to be generated. The Arbiter does not allow duplicate request from the same controller on the queue. The cache controller checks if the requested address is available in its own datablock array. an array for its datablocks. it tries to get the pending requests in its request served.e. RequestTable object. performing a set of actions at every phase. there is a chance of the arbiter receiving multiple bus requests for the same item in the thread‟s RequestQ resulting in the thread not using a bus when granted for a second time i. A random number generator creates a random address to be requested and randomly decides whether the particular request is a PrRd or a PrWr. it starts up threads with thread numbers -1 to „n‟. while the remaining threads are cache controllers. no bus bus request is generated since the cache controller has the latest modified copy and has exclusive write access to the requested block.if they are not already present in the queues. In case the requested block is in shared state and the random processor request is „PrWr‟. the action to be taken is decided as follows : If the state of the dataBlock is „M‟(modified). In the granting phase. As in earlier cases. ResponseQItem and RequestQItem. the Arbiter removes the controller‟s request from „pending requests queue‟ and clears all entries in the current requests queue.Functional Description When the program is invoked. The BusArbitr objects takes the requests from all the threads and adds them to a „current requests queue‟ and a „pending requests queue‟ of the respective buses. In case the requested block is not available in the datablock array or has the state attribute set to „I‟. then the datablock is promoted to „M‟ state and no bus request is generated. the arbiter assigns the bus to the controller that is earliest on the pending requests queue and is also present in the „current requests queue‟. then following the MESI protocol. Arbitration and Grant phases Bus Arbitration plays a very crtical role in the performance of multiprocessor systems and is an important research topic. During the arbitration phase. Random processor requests are generated at the beginning of every cycle in every thread (except the main memory thread). This simulation implements a simple arbitration method based on FIFO queues. Each thread creates an instance of a controller object. the request and response queues are provided by RequestTableItem. The data structures for the items in the request tables. generated. The thread numbered „-1‟ acts as the main memory controller. irrespective of whether it was a PrRd or a PrWr. The generated request is placed in the thread‟s RequestQ containing data objects of type „RequestQItem‟. place a request for the address bus. If the datablock is in the „E‟ (exclusive) state and the generated proceesor request is for Write. As the thread iterates through cycles. the threads that have a non-empty RequestQ. If available. Upon granting the bus to a particular controller.Since the controllers keep arbitrating for the buses during every cycle as long as they have a non-empty RequestQ. an array for RequestQ and an array for ResponseQ. the generated bus request will be a „BusRdX‟ or „BusRd‟ based on the processor‟s request.
Thus if the address that it is requesting for is already present in the request table‟s entries.item off its RequestQ when it was granted the Address Bus. The ACK command lets the mainmemory determine whether it needs to respond for the request or if it has been taken care off by a cache controller. it avoids placing the address in the bus. The disadvantage in this arbitration method is that the controller cannot get on the arbitration queue for any newly generated processor requests until the bus is granted for its earlier request. In case the cache controller had the memory block in „modified‟ state it knows that it needs to respond to the request and hence fetches the data and adds an entity in it‟s ResponseQ. But prior to placing the request on the bus and the request table. the cache controller that had the memory block that is being requested for in modified state. It holds the address bus until the end of the cycle. Acknowledgement Phase In the acknowledgement phase. all cache controllers read the AddressBus object. If the command was “IGNORE” no action is taken. This could be avoided if the controllers do not arbitrate for the same pending item in their RequestQ. the controller does not proceed with the request.e the Address phase In the third phase (address phase). But since the address bus has already been granted to the controller. Else based on the MESI protocol the controller invalidates its copy of the memory block if the command is “BusRdX” or demotes to shared state if the command was “BusRd”. the errors appeared due to lack of synchronization between the threads. During the acknowledgment phase. if the address is present in the request table. it looks up the RequestTable for the particular address. In our implementation the controller merely places an IGNORE command on the address bus during the third phase. to find the latest address and command that has been placed on the bus. it also adds an entry of the request in its own request table using the tag that was assigned to the request and sets the originator bit to recognize that it was the original requestor when the datablock arrives during the response phase. in which case the arbiter could accommodate every new request received from a controller. this is done to avoid conflicting outstanding requests for the same block. this simulation does not allow cache controllers possessing the requested block in „shared‟ state to respond to requests. the controller that was granted the address bus pulls out the first item of its RequestsQ and places its address and command in the common AddressBus object. the cache controller responds otherwise it is always the main memory that responds to requests. places an acknowledgement command „ACK‟ in the AddressBus object. only in case the requested block is in modified state. an improvisation to this could be that the controller uses the time slot on the address bus to place any other requests other than the conflicting request in . However the main memory uses a slightly different datastructure for its ResponseQ which includes the address and data but not the tag. i. As In the SG challenge architecture. Look up phase In the address look up phase. in which case no cycle will be wasted. The data structure of the cache controller‟s ResponseQitem comprises of the tag field and datafield in case of cache controllers. the RequestsQ. in other words.
Data Bus. it checks if it has a non-empty ResponseQ. Since any datatransfer from the main memory uses both address as well as the data bus. Figure 1. and prepare to present snoop results or receive the datablocks. During the arbitration (first) phase of the next cycle. the arbitration process needs to grant both AdressBus as well as Databus together to the main memory. When the data arrives during the fifth phase in the cycle. Main memory’s response At the end of a cycle of five phases. and cache controller GUIs showing the memory blocks currently present in the cache block and the request table entries in the cache block. If yes. it competes for the address bus. Upon gaining access to the buses. Lock was used to determine if the responding thread has arrived at the critical section or not. it adds the datablock feched for the particular address in it own ResponseQ. it needs to respond to the previously generated request. the mainmemory thread places a command to indicate that it is a data transfer happening form the main memory. the originator is prepared to receive the data from the main memory via the DataBus. main memory blocks. the cache controllers look up their request tables for the request relevant to the address placed in the address bus.0 Main memory GUI . In this simulation the command used for the purpose is “DMA”. the main memory is aware of whether. To avoid this a all threads were made to wait until the one that places data(responding thread) in the DataBus arrives at the phase and sets the values in the DataBus. If it has to respond. RESULT The split transaction simulator invokes a “Main Memory Graphical User Interface” showing the Address Bus.thread that was supposed to receive the data accessed the DataBus and tag lines even before the data was placed by the cache controller or Main memory. On seeing the command as DMA.
0 Cache #2 loaded with data block block set to modified (M) state by its processor Figure 1. CONCLUSION A split transaction bus in the lines of SG challenge bus architecture was simulated and various design aspects of the split transaction bus were analyzed. This simulation successfully implements the BusRd and BusRdx functionalities of a cache controller but does not include the WriteBack functionality. It is evident that there is a lot of scope for performance improvement or tuning of the split transaction in areas like Bus Arbitration.figure 3. also cache # 2‟s request for DataBlock from address 1 could be noted. the number of outstanding transactions that .0 shows the “Main memory” GUI in cycle -3 look up phase. Address number 7 has been the last value posted on the address bus.0 shows the cache #2‟s Datablock loaded with Data from address 2 and also moved to modified state(made dirty) by a PrWr command in the following cycles.0 Cache #0 loaded with data from “address-9” after cycle -1 Figure 3.Figure 2.0 shows cache # 0 loaded with a datablock in Shared state. Figure 2. The GUIs are updated at the end of every phase and whenever the values in the buses are updated.
Intel Corp . Suresh Marisetty. “Parallel computer Architecture – A hardware/software approach”. improvising the system to enable cache to cache transfer of Datablocks in Shared state etc. Culler et al 2. David E. Such a simulation will help in analyzing the impact of various design changes to the components in a split transaction bus.could be allowed on the bus. Bus grant prediction technique for a split transaction bus in a multiprocessor computer system. References 1.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.