You are on page 1of 48

Designand Simulationof aPipelined4 bitComputer

CSE404NDigitalSystemDesign

A1:5

03 05 011 03 05 012 03 05 018 03 05 026 03 05 030

BangladeshUniversityofEngineeringandTechnology DepartmentofComputerScienceandEngineering Dhaka,Bangladesh

Submitted By : Section A1 Group 5 Ishtiak Zaman Md. Tanvir Al Amin Ashic Mahtab Sukarna Barua Zinat Wali

DesignandSimulationofaPipelined4 bitComputer

A1:5

TableofContents Introduction Block diagram of Architecture and Control Unit Description of Important Components Addressing Modes Data Types Supported Instruction Set System Function and Instruction Set Functioning Mechanism of the System Instruction Timing and Steps Opcode and Control Matrix Instruction template and opcodes Control Matrix Sheet Implementation of components ICs Used Problems encountered and discussion About Simulation Tools Running a program in simulator Conclusion

3 4 5 13 15 15 20 20 20 27 28 29 31 36 37 46 46 47

DesignandSimulationofaPipelined4 bitComputer

A1:5

1Introduction
Our 4 bit computer system can execute 28 instructions. Each instruction requires only 1.4clockcyclesonaverage.Theentiresystemconsistsofa2stagepipelineinstruction fetch & decode unit and 4 bit execution unit. The execution unit has a shared 4 bit data bus. The memory system is organized as Harvard architecture. That is instruction memory and data memories are separate units. Programs are stored in instruction memory and data in data memory. Data memory also contains the program stack. All instructions are either 1 or 2 bytes long. Some instructions require 1 clock cycle on average and others require 2 clock cycles on average. The computer can also communicatewiththeexternaldevicesthroughtheinputandoutputportregisters.The instruction set contains all types of instructions which generalize the computer to execute any kind of program. However the 8 bit address bus restricts the program length to be within 256 Bytes. The 4 bit arithmetic and logic unit can perform addition andsubtractionandseverallogicaloperations.

DesignandSimulationofaPipelined4 bitComputer

A1:5

2BlockDiagramofArchitectureandControlUnit
Theentireprocessorsystemcanbedividedintotwoindependentsections. Fetchanddecodeunit Executionunit These two sections organized as a two staged pipeline processes instructions independently. While execution unit executes an instruction; at the same time fetch unitfetchesaninstructionintotheinstructionregister. All data values are 4 bit. Although immediate values take out 1 byte of instruction memory (it is because instruction memory is byte addressable), the higher nibble containsallzeros,lowernibblecontainstheactual4bitdata. There are separate memory for instructions and data. Program codes are loaded in instruction memory. All data writes are written to the data memory. Stack memory isalsodatamemory. Theexecutionunithasashared4bitdatabus.Allcomponentsintheexecutionunit share this bus for transferring data. All outputs therefore from execution unit registers that are connected to the shared bus are 3 states so that one component doesnotloadthebuswhileothersareusingit. All the memory address length is 8 bit. However this computer has no explicit addressbus.Becausewedidnotneedtosharethisaddressbus.

DesignandSimulationofaPipelined4 bitComputer

A1:5

Blockdiagramof4bitcomputer

Fig:Blockdiagramof4bitcomputer

DesignandSimulationofaPipelined4 bitComputer

A1:5

DescriptionofImportantComponents 2.1 Fetchanddecodeunit


Thisunitsuppliesinstructionsinprogramordertotheexecutionunit.Itperformsthe followingfunctions: Fetchesinstructionsthatwillbeexecutednext Decodesinstructionintomicroinstructions Generatesmicrocodeforexecutingmicroinstructions

2.1.1 Programcounterregister
The program counter register contains the address of the next instruction to be executed. It is incremented automatically at each clock cycle to contain the next instruction address or operand address. Sometimes it is directly loaded with address values by JMP, CALL, RET, JE and JO instructions. The output of the program counter registerisdirectlyfedtotheinputoftheinstructionmemory.

2.1.2 Instructionmemory
Instructionmemoryisa256X8RAM.Ithas8addresspinsand8outputpins.Soitisbyte addressable. Each byte contains the desired instruction. The read signal of instruction memory is always activated. Since address inputs are directly connected with the outputs of program counter register, hence the memory output is always available without requiring any extra clock cycle. As long as program counter outputs are valid, memory outputs are also valid. There is no need to write the instruction memory. Becauseweusedseparatememoryfordataandinstruction.Datawriteandstackwrites occurs on the data memory. Since program code is read only, hence write signal of instruction memory is deactivated permanently. Only at the start of the simulation programcodeneedstobeloadedintheinstructionRAM.

2.1.3Instructionregister
The instruction register holds the opcode of the instruction that is being executed on theexecutionunit.This registeris8bitslong.Theinstructionregistercontentisdirectly fed to the instruction decoder unit to generate the microoperations for each instruction.

2.1.4 Instructionmemorybufferregister
This register is used to hold the content of the second byte for all 2 byte instructions. The first byte which is the opcode byte is stored in the instruction register. During the first clock cycle the opcode byte is fetched into the instruction register. During the

DesignandSimulationofaPipelined4 bitComputer

A1:5

second clock cycle the second byte is fetched and stored in the instruction memory bufferregister.Theinstructionregisterisnotloadedduringthesecondclockcycle.

2.1.5 MUX1,Datamemoryaddressselector
ThisMUXcontrolstheinputtothedatamemoryaddress.Thepossibleinputscanbe SPvalue,requiredinPUSHinstruction SPvalue+1,requiredinPOPinstruction Instructionmemorybufferregisteroutput,requiredinindirectdatamemory accesswheredataaddressvalueisgivenassecondbyteoftheinstruction

2.1.6 AdderforcomputingPC+1andoutputbuffer
Thisaddercomputesthevalueofcurrentprogramcounter+1.ItisrequiredinCALL instructionwhereweneedtosavetheaddressofnextinstructiontothestack.The3 statebufferinfrontoftheadderforPC+1outputisnecessarysothattheadderoutput doesnotloadthedatamemoryoutputbuswhiledatamemoryisaccessedforother purpose.

2.1.7Bidirectionaltransceiver
Insomeinstructionswerequirethatthedatamemoryoutputistobepassedtothe shareddatabus. Insomeotherinstructionswerequirethatsharedbusoutputistobeloadedtothe datamemoryoutputbus. In other cases the data memory must not load the shared bus while others are usingit

To meet all the above purposes we require a bidirectional 3 state buffer. The bus transceiver does the same. It has a chip select pin. If this pin is in active it does no load the bus on both sides. When chip select pin is active, then depending on another pin (datadirectionpin),ittransfersdatafromanyonesidetotheotherside.

2.1.6Instructiondecoderunit
Instructiondecoderunitdecodeseachinstructionaccordingtotheirtype.ThisisaROM that generates the appropriate microoperation depending on the value in the instruction register. This decoder actually determines the ROM address at which the instruction is to be handled. The ROM address thus found generates the control signals for handling the execution of the instruction. However at the next clock cycle, the control ROM address might be selected from other sources other than instruction decoderoutputasdeterminedbytheinstructiontypeandlengthofclockcyclerequired to execute it. The important thing that it does is that it maps many instructions to the same initial address (all these instructions have same output from the control ROM) of the ROM since all those instructions have the same control signals in their first clock

DesignandSimulationofaPipelined4 bitComputer

A1:5

cycle. It reduces number of rows needed for the control ROM thus reducing the ROM size.

2.1.7 Controlunit
Control unit consists of a 32X32 ROM and a 2 bit flip flop. The ROM generates the control word for each microinstruction and the 2 bit flip flop stores the next address selectorvaluewhichisfedintoaMUXselectorpinstoselectcontrolROMaddress. Thefullblockdiagramofcontrolunitisgivenbelow:

DesignandSimulationofaPipelined4 bitComputer

A1:5

Instruction register

Instruction Decoder 5 bit 5bit Value 31 4 to 1 MUX

Next State Selector

Control ROM 64X32

Control word 30 bit 2 bit

Fig:blockdiagramforcontrolROM Controlstatediagram

DesignandSimulationofaPipelined4 bitComputer

A1:5

2 cycle instruction State 0 State 1

1 cycle instruction Fig:statediagramforcontrolROM Theworkingprincipleofcontrolunitisasfollows: Forallinstructions,atthefirstclockcyclethecontrolROMaddressisselectedfrom instructiondecoderoutput. ThenextaddressselectorinputfromthecontrolROMoutputselectsthe appropriatenextaddressofthecontrolROMthroughtheMUX. Forsome2cycleinstructions,atthesecondclockcyclethecontrolROMaddress inputisselectedfromtheinstructionregisteroutput.Theseinstructionshavesame controlsignalsintheirfirstclockcycle,howeverdifferentcontrolsignalsintheir secondclockcycle. Forsomeother2cycleinstructions,atthesecondclockcyclecontrolROMaddress inputisselectedastheconstantvalue31.Thisisbecausealltheseinstructionhave samecontrolsignalsintheirsecondclockcyclealthoughtheyhavedifferentcontrol signalsintheirfirstclockcycle.

2.2Executionunit
The execution unit can perform arithmetic and logic operations, transfer values between general purpose registers and data memory and input and output port registers.

2.2.1Arithmeticandlogicunit
Thisisa4bitarithmeticandlogicunitthatsupportsthefollowingoperations:

10

DesignandSimulationofaPipelined4 bitComputer

A1:5

The operation of arithmetic logic unit is as determined by the function pins are given below: One operand of the arithmetic logic unit is the accumulatorregister. The other operand is selected by MUX2. The two selector pins S1, S0 of MUX2 selects second operand as follows: S1 S0 Operand 0 0 Bregister 0 1 1 1 0 Datamemoryoutput 1 1 Instructionmemorybufferregister output Immediateoperandsareselectedfrominstructionmemorybufferregisteroutput. Memoryoperandsareselectedfromdatamemoryoutput. DuringNEGinstructionvalue1isselected. DuringallotherarithmeticandlogicinstructionsBregisteristhesecondoperand.

Arithmetic operations addition, subtraction, increment, decrement and transfer operation Logicaloperationslogicalor,xor,and,complimentoperations

2.2.2 Shifterunit
Theshifterunitcanshiftoperands1bitpositioneithertotheleftorright.Theoperation oftheshifterisdeterminedbythetwoselectorpinsS1,S0: S1 S0 Operation 0 0 Transfer 0 1 Shiftright 1 0 Shiftleft 1 1 Transfer0

2.2.3 Registers
Theentireregistersetconsistsof: TwogeneralpurposeregistersAccumulatorregisterandBregister

11

DesignandSimulationofaPipelined4 bitComputer

A1:5

Oneprogramstatusregisterorflagsregister Oneinputportregister Oneoutputportregister

2.2.3.1Functionsofeachregister
Accumulator For any arithmetic or logic operation one operand is the default accumulatorregisterandtheresultalsogoestothisregister. B register This works as a general purpose temporary storage register. We can movevaluesinand out ofthisregistertotheaccumulator.Thisregistercanalsobe selectedasthesecondoperandforsomearithmeticandlogicoperations. Input port register The input port register is used for interfacing with devices outsidethe computer.Thisregisterisusedtoreceiveinputvaluesfrom deviceslike keyboardoranyotherdevice. Output port register the output port register is used to send data to devices externaltothecomputer.Thisdevicecanbeeitheradisplayoranything. FlagsregisterStoresstatusofrecentarithmeticorlogicoperation.

2.2.3.2

Thestatusflagsregister

Thestatusflagsofthe4bitflagsregisterindicatetheresultsofarithmeticandlogic operationssuchasADD,SUB,OR,XOR.Thestatusflagfunctionsare: CF(bit0) CarryflagSetifanarithmeticoperationgeneratesacarryoraborrow outofthemostsignificantbitoftheresult;clearedotherwise.Thisflag indicatesanoverflowfortheunsignedintegerarithmetic.Itisalsoused in shiftoperationsandmultipleprecisionarithmetic. SF(bit1) SignflagSetequaltothemostsignificantbitoftheresultwhichisthe signbitofthesignedinteger. ZF(bit2) ZeroflagSetittheresultiszero;clearedotherwise. VF(bit3) OverflowflagSetiftheintegerresultistoolargeapositivenumberor toosmallanegativenumber(excludingthesignbit)tofitinthe destinationoperand;clearedotherwise.Thisflagindicatesanoverflow forsignedintegerarithmetic. Arithmeticoperationsmodifiesallflags CFandVFarenotchangedduringlogicoperations. CFcanbechangedbyshiftoperationssuchasSHL,SHR

12

DesignandSimulationofaPipelined4 bitComputer

A1:5

Thestatusflagsallowasinglearithmeticorlogicoperationtoproduceresultsfortwo differentdatatypes:unsignedintegersandsignedintegers. Whenperformingmultipleprecisionarithmeticonintegers,theCFflagisusedin conjunctionwithaddwithcarry(ADC)andsubtractwithborrow(SBB)instructionto propagateacarryorborrowfromonecomputationtothenext. Theconditionalbranchinstructions(suchasJE,JO)useoneormoreofthestatusflags asconditioncodesandtestthemforbranching.

13

DesignandSimulationofaPipelined4 bitComputer

A1:5

3 AddressingMode

3.1Instructionlength
Lengthofeachinstructioniseither1byteor2byte. Eachinstructionconsistsofanopcodebyteandoptionallyfollowedbyanoperandbyte. Theopcodebytedeterminesthespecificinstruction.Theoperandbytemaybeanyof thefollowing: animmediatevalue anaddressvalue(usedinimp,callinstructions) anaddressvaluefordatamemory(usedinindirectdataaddressing)

3.2Operandaddressingmodes
Eachinstructioniseitherazerooroneortwooperandinstruction.Someoperandsare specifiedexplicitlyandothersareimplicit.Thedataforasourceoperandcanbelocated in: theinstructionitself(animmediateoperand) aregister adatamemorylocation inputport Whenaninstructionreturnsdatatoadestinationoperanditcanbereturnedto: aregister adatamemorylocation outputport

3.2.1Immediateoperands
Someinstructionusedataencodedintheinstructionitselfasasourceoperand.These operandsarecalledimmediateoperands.Forexamplethefollowinginstruction subtractsimmediatevalue04Hfromtheaccumulatorregister:

14

DesignandSimulationofaPipelined4 bitComputer

A1:5

SUB04H Therearethreeinstructionsavailablenowthatuseimmediateoperands: 1. SUBimm 2. MOVACC,imm 3. XORimm

3.2.2 Registeroperands
Formoveoperationssourceanddestinationoperandcanbeeitheraccumulatorregister or B register or any of the input or output port register. But for arithmetic and logic operations one sourceoperand is the defaultaccumulatorregisterand the resultis also stored in the accumulator register. So no destination operand need to be specified. It is implied in each instruction as the accumulator. The other source operand can be selectedasBregister.

3.2.3Memoryoperands
Some operands can be directly specified by their address value in data memory where the operand is located. The actual operand is fetched from the data memory addressed by the value given in the instruction. For example the following instruction adds the operandvalueatdatamemorylocation04Hwiththeaccumulator: ADD[04H]

15

DesignandSimulationofaPipelined4 bitComputer

A1:5

4 DataTypes
4.1Numericdatatypes
Eachnumericdataisanibblethatisfourbitslong.Onlyintegerdatatypesare supported.Dependingontheinterpretationthiscanbeeither: SignedAllsignedoperandsarerepresentedusingtwoscompliment representation.Theleftmostbit(bit3)isthesignbit.Forpositivenumberthisbitis zero(0).Fornegativenumberthisbitisone(1).Therangeofnumbersincludes8 to+7. UnsignedAll4bitsareusedtorepresentthevalue.Therangeofnumbers includes0to+15.

4.2Pointerdatatypes
Allpointersareaddressestodatamemorylocations.Theyare8bitslong.Hencethe totalnumberoflocationsthatcanbeaddressedis256KBytes.Thetotalsizeofthedata segmentisthereforealso256KBytes.

5SupportedInstructionset

Instructionscanbedividedintothefollowinggroups: Datatransferinstructions Arithmeticinstructions Logicalinstructions Shiftinstructions Controltransferinstructions I/Oinstructions Stackinstructions Miscellaneousinstructions

5.1Datatransferinstructions

16

DesignandSimulationofaPipelined4 bitComputer

A1:5

Thedatatransferinstructionsmovedatabetween: datamemoryandaccumulator immediatevalueandaccumulator betweenaccumulatorandBregisterorinputportregisteroroutputportregister 1. MOVACC,BmovesthecontentsofBregistertotheaccumulatorregister. 2. MOVB,ACCmovesthecontentsofaccumulatortotheBregister. 3. LDA[addr]loadstheaccumulatorwiththevalueatdatamemoryaddress specifiedbyaddr.Theaddrvalueis8bitslongsothat256KBytesofdatamemory locationscanbedirectlyspecifiedbythisvalue. 4. STA[addr]storesthecontentsofaccumulatorregistertothedatamemory locationspecifiedbytheaddrvalue. 5. MOVACC,immmovestheimmediatevaluetotheaccumulatorregister.The immediatevalueisgivenwiththeinstruction.Thenextbyteaftertheinstruction bytecontainsthisimmediatevalue.

5.2 Arithmeticinstructions 1. ADDBaddsthecontentsoftheaccumulatorwiththecontentsoftheBregister. Thedefaultdestinationoftheresultistheaccumulator. 2. ADCBaddsthecontentsoftheaccumulatorwiththeBregisterandcontentsof thecarryflag. 3. SUBBsubtractsthecontentsoftheBregisterfromtheaccumulator. 4. SBBBsubtractsthecontentsoftheBregisterandcontentsofthecarryflagfrom theaccumulator. 5. CMPBsubtractsthecontentsoftheBregisterfromthecontentsofthe
accumulator.Howevertheresultisnotstoredintheaccumulator.Onlytheflagsare affected.

6. NEGnegatesthevalueoftheaccumulator.Theaccumulatorvaluechangestothe twoscomplimentofthevaluepreviouslystored. 7. SUBimmsubtractstheimmediatevaluefromthecontentsoftheaccumulator. 17

DesignandSimulationofaPipelined4 bitComputer

A1:5

8. ADD[addr]addswiththeaccumulatorthevalueatdatamemorylocation addressedbyaddr.
Note:Allflagsareaffectedduringalltheabovearithmeticinstructions.

5.3
1.

Logicalinstructions
ORBperformsabitwiseinclusiveORoperationbetweentheaccumulatorandB registerandstorestheresultintheaccumulator.Eachbitoftheresultissetto0if bothcorrespondingbitsoftheaccumulatorandBregisterare0;otherwiseeachbit issetto1.

2. XORimmperformsabitwiseexclusiveORoperationbetweentheaccumulator andtheimmediatevaluespecifiedbyimm.Eachbitoftheresultissetto0ifboth Correspondingbitsaresame;otherwiseeachbitissetto1. Note:Carryflagandoverflowflagsarenotaffectedduringtheselogicaloperations.The restoftheflagsareaffected.

5.4
1.

Shiftinstructions
SHLshiftsthecontentsoftheaccumulator1bitpositiontotheleft.Themost significantbitissavedinthecarryflag(thusdestroyingthepreviouscontentofthe carryflag).Theleastsignificantbitis0. SHRshiftsthecontentsoftheaccumulator1bitpositiontotheright.The rightmostbit(LSB)isshiftedoutandsavedinthecarryflag.Themostsignificantbit is0.

2.

5.5
1.

Controltransferinstructions
JMPaddrtransfersprogramcontroltoadifferentlocationintheinstruction memoryspecifiedbyaddrwithoutrecordingreturnvalue.Theaddrisan8bitvalue sothatthisinstructioncanjumptoanylocationinthememorywithin256KBytesof instructionmemory.

18

DesignandSimulationofaPipelined4 bitComputer

A1:5

2.

JOaddrtransfersprogramcontroltothespecifiedlocationiftheconditionismet thatisifthecontentsoftheoverflow(V)flagis1.Otherwisenextinstructionafter thisinstructionwillbesequentiallyexecuted. JEaddrtransfersprogramcontroltothespecifiedlocationifthecontentsofthe zeroflagis1.Otherwisenextinstructionafterthisinstructionwillbeexecuted sequentially. CALLaddrtransfersprogramcontrolthelocationspecifiedbyaddr.Butbefore thattheaddressofthenextsequentialinstructionisstoredinthestack. RETloadstheprogramcounterwiththevaluefromthestack.Itfirstpopsthe valuefromthestackandthenstoresitintheprogramcounter.

3.

4. 5.

5.6I/Oinstruction
1. 2. INthisinstructiontransferscontentsoftheinputportregistertotheaccumulator. OUTthisinstructiontransferscontentsoftheaccumulatortotheoutputport register.

Miscellaneousinstructions
1. 2. NOPthisinstructionperformsnooperation. HLTthisinstructionhaltstheexecutionoftheentireprocessor.

19

DesignandSimulationofaPipelined4 bitComputer

A1:5

6 SystemFunctions&InstructionSteps
6.1 Functioningmechanismofthesystem
After power on the computer system starts execution of the program starting at memory location 0000 of the instruction memory. Hence the first instruction of the program must be loaded at this address. After then an unconditional jump instruction canbeusedtorunprogramsstartingatanyotheraddress. The instruction memory must be loaded with the program to be executed. The data memoryshouldbeloadedwithinitialdatavaluesifprogramrequires.Thestackpointer (SP register) is initially loaded with the value FFH at start up. Note that all data values and stack pops fetches values from data memory. Stack pointer grows from higher end (address FF) of the data memory to the lower addresses. After start up contents of any otherregisterareundefined. The entire program must end with the HLT instruction. Otherwise processor will not stopexecutinginstructionsandthusnooutputcanbeobserved.

6.2 Instructiontimingdiagram
The two stage pipeline architecture allows one instruction in executing while another instruction is fetched from the instruction memory. Some instructions are executed in two clock cycles (containing 1 fetch cycle and 1 execution cycle) and others require threeclockcycles(containing2fetchcycleand1executioncycle). Atwocycleinstructionexecutionrequiresthefollowingsteps: Before the first clock cycle occurs, program counter contains the address of the instruction to be executed. This address value is directly connected to the input address of instruction memory. Hence the Opcode value is available at the output oftheinstructionmemory. Clock cycle 1 At the first clock cycle (low to high transition) the Opcode value is loaded in the instruction register. The output of the instruction register is directly fedtotheinputoftheinstructiondecoder.Hencedecodedoutputofthedecoderis available before the next clock cycle occurs. The decoded output is fed to the input of the control ROM which generates the control world before the next clock cycle occurs.

20

DesignandSimulationofaPipelined4 bitComputer

A1:5

Clock cycle 2 At this clock cycle the instruction is executed. Since control word was generated before this clock cycle, hence result of arithmetic or logical operationarealsocomputedbeforethisclockcycleandappearsintheshareddata bus.Onlytheoutputoftheinstructionisloadedatthelowtohightransitionofthis clockcycle.Fetchofnextinstructionalsooccursinthisclockcycle.

Clock cycle 1: Fetches opcode

Clock cycle 2: Executes instruction Fetches next instruction

A three cycle instruction execution (except the control transfer instructions) requires thefollowingsteps: Before the first clock cycle occurs, program counter contains the address of the instruction to be executed. This address value is directly connected to the input address of instruction memory. Hence the Opcode value is available at the output oftheinstructionmemory. Clockcycle1Atthefirstclockcycle(lowtohightransition)theOpcodevalue(first byteoftheinstruction)isloadedintheinstructionregister.Theoutputofthe instructionregisterisdirectlyfedtotheinputoftheinstructiondecoder.Hence decodedoutputofthedecoderisavailablebeforethenextclockcycleoccurs.The decodedoutputisfedtotheinputofthecontrolROMwhichgeneratesthecontrol worldbeforethenextclockcycleoccurs. Clockcycle2Atthesecondclockcycle(lowtohightransition)secondbyteofthe instructionisloadedintheinstructionmemorybufferregister(theinstruction registerremainsunchangedandcontainsthefirstbyteofthisinstructionthatwas loadedatthepreviousclockcycle).Thiscanbeeitheranimmediateoperandor datamemoryaddressoperand. Clockcycle3Atthethirdclockcycletheinstructionisexecuted.Sincecontrol wordwasgeneratedbeforethisclockcycle,henceresultofarithmeticorlogical operationarealsocomputedbeforethisclockcycleandappearsintheshareddata

21

DesignandSimulationofaPipelined4 bitComputer

A1:5

bus.Onlytheoutputoftheinstructionisloadedatthelowtohightransitionofthis clockcycle.Inthisclockcyclefetchofnextinstructionalsooccurs.

Clock cycle 1: Fetches opcode

Clock cycle cycle 2: 2: Clock Fetches Fetches operand operand value

Clock cycle 3: Executes instruction Fetches next instruction

Forcontroltransferinstruction(suchasJMP,CALL)thefollowingstepsoccur: Clockcycle1sameasothers. Clock cycle 2 At this clock cycle, the program counter is loaded with address value selected by MUX3 from either instruction memory ( during JMP, JE, JO, CALL instructions) or data memory ( during RET instruction). In case of CALL instruction, theaddressofthenextsequentialinstruction(currentprogramcountervalue+1)is also stored in the stack (data memory) at this clock cycle. The selector pin S0 determinestheaddressvaluethatisloadedtotheprogramcounterasfollows: S0 Value 0 Instructionmemoryoutput 1 Datamemoryoutput Clockcycle3Atthethirdclockcyclenooperationisperformed.Onlythefetchof thenextinstructionoccurs.Thiscanbeconsideredasapipelinestall.

22

DesignandSimulationofaPipelined4 bitComputer

A1:5

Clock cycle 1: Fetches opcode

Clock cycle 2: Executes instruction

Clock cycle 3: Fetches the Opcode for next instruction

6.2.1Actualtimingdiagramforexampleprogramwithcontrolsignals explicitlyshown:
Considerthefollowingprogramsegment: Address Instruction 00 IN 01 ADDB 02 SUB02 04 PUSH 05 ADDB 06 POP 07 JMP09 The actual timing diagram with control signals are shown below. In each clock pulse (a low to high transition of the clock signal), some results are stored in some registers and newcontrolsignalsaregeneratedaccordingtothefetchedinstruction.Theresultofthe previous control signals that are actually loaded in this clock cycle are shown above the clock cycle line and the control signals that are being generated for new instruction are shown below the clock cycle line. All control signals are named with their numeric interpretation (not the actual binary value that is generated). Only the effective control signalsareshownforeachclockcycle.

23

DesignandSimulationofaPipelined4 bitComputer

A1:5

PC = 01 IR=IN opcode

ACC = InPort PC = 02 IR=ADD B opcode

ACC=ACC+B PC = 03 IR=SUB imm opcode

PC = 04 IR = 02

ACC=ACC-02 PC = 05 IR = PUSH opcode

pc=00

pcOp =incr , LIR

pcOp = incr , InPort.OE , ACC.LD , LIR

pcOp = incr , ALU.op= add , ALU.OE , MUX.S= 00 , ACC.LD , LIR

pcOp = incr , LIR# ,

pcOp = incr , ALU.op= sub , ALU.OE , MUX.S=11 , ACC.LD , LIR

pcOp = incr , MUX1.S = SP, DM.W , SP.Decr , Latch4.CE , Lathc4.R , ACC.OE , LIR

PC =06 IR=ADD B opcode [SP] = ACC SP = SP -1

PC = 07 IR=POP opcode

ACC = [SP+1] SP = SP + 1; PC = 08 IR=JMP opcode

PC = 09 IR=JMP

PC = 09

pcOp = incr , ALU.op= add , ALU.OE , MUX.S= 00 , ACC.LD , LIR

pcOp = incr , MUX1.S = SP+1, DM.R , SP.incr , Latch4.CE , Lathc4.S , ACC.LD , LIR

pcOp = Load, LIR# , MUX3.S=IMO

pcOP=incr LIR

6.2.2Timelinediagramforexampleprogram
Address 00 01 Instruction IN ADDB

24

DesignandSimulationofaPipelined4 bitComputer

A1:5

02 03 05 06 07

MOVB,ACC SUB02 PUSH ADDB POP

C1 C2 1. IN 2. ADD B 3. MOV B,ACC 4. SUB 02 5. PUSH 6. ADD B 7. POP F1 E F1

C3

C4

C5

C6

C7

C8

C9

E F1 E F1 F2 E F1 E F1 E F1 E

F1:Fetchopcode F2:Fetchoperand E:Execution C:Clockcycle

Following is another program that has several pipeline stalls due to JMP, CALL and RET instructions.

Address 00 01 02 08 04 10 Instruction IN ADDB CALL08 RET JMP0A ADDB

25

DesignandSimulationofaPipelined4 bitComputer

A1:5

1. IN 2. ADD B

C1 C2
F1 E F1

C3

C4

C5

C6

C7

C8

C9

E F1 E1 S F1 E1 S F1 E1 S F1 E

3. CALL 08 4. RET 5. JMP 0A 6. ADD B

F1:Fetchopcode F2:Fetchoperand E:Execution E1:Executionandloadpc S:Stallofexecutionpipeline C:Clockcycle

6.3 AverageCPI
For2cycleinstructions,duetopipelinearchitectureclockcycle2(ofcurrentinstruction executing)andclockcycle1(ofnextinstruction)areperformedparallelinonly1clock cycle.For3cycleinstructionsclockcycle3(ofcurrentinstructionexecuting)andclock cycle1(ofnextinstruction)areperformedparallelinonly1clockcycleThisreducesthe averageclockcyclerequiresforeachinstructionexecuted.Thefollowingtableshows therequirednumberofclockcyclesforeachinstruction(consideringpipelined architecture): Instruction Clockcycles ADDB 1 ADCB 1 SUBB 1

26

DesignandSimulationofaPipelined4 bitComputer

A1:5

SBBB CMPB MOVACC,B MOVB,ACC NEG ORB IN OUT SHL SHR LDA[addr] STA[addr] MOVACC,imm SUBimm XORimm ADD[addr] JMPaddr JOaddr JEaddr CALLaddr RET PUSH POP NOP HLT AverageCPI=(11*2+17)/28 =1.4

1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1

27

DesignandSimulationofaPipelined4 bitComputer

A1:5

7 OpcodeandControlMatrix
7.1Opcodetableforallinstructions
Instruction Length (inByte) 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 Opcode(inbinary) Byte1 0000 0001 0000 0010 0000 0011 0000 0100 0000 0101 0000 0110 0000 0111 0000 1000 0000 1001 0000 1010 0000 1011 0000 1100 0000 1101 0000 1110 0000 1111 0001 0000 0001 Byte2 addr addr imm imm Opcode(inhex) Byte1 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 Byte2 addr addr imm imm

ADDB ADCB SUBB SBBB CMPB MOVACC,B MOVB,ACC NEG ORB IN OUT SHL SHR LDA[addr] STA[addr] MOVACC,imm SUBimm

28

DesignandSimulationofaPipelined4 bitComputer

A1:5

XORimm ADD[addr] JMPaddr JOaddr JEaddr CALLaddr RET PUSH POP NOP HLT

2 2 2 2 2 2 1 1 1 1 1

0001 0001 0010 0001 0011 0001 0100 0001 0101 0001 0110 0001 0111 0001 1000 0001 1001 0001 1010 0001 1011 0001 1100

imm addr addr addr addr addr

12 13 14 15 16 17 18 19 1A 1B 1C

imm addr addr addr addr addr

7.3ControlMatrixSheet Address(in Value(in


decimal)

0000 0001 0002 0003 0004 0005 0006 0007 0008

hex) 00 01 02 03 04 05 06 07 48

29

DesignandSimulationofaPipelined4 bitComputer

A1:5

0009 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 0020 0021 0022 0023 0024 0025 0026 0027 0028 0029 0030 0031

09 0A 0B 0C 0D 1E 1E 1E 1E 1E 1E 14 15 16 17 18 19 1A 1B 80 00 00 00

7.4.1 PindesignationsofthecontrolROM
ROM#
1 2 3 4

Bit7
PCOp. S1 Latch3. OE MUX2. S1 ALU. S2

Bit6
PCOp. S0 Latch3. S/R# MUX2. S0 ALU. S1

Bit5
MUX3. S ACC. OE B. LD# ALU. S0

Bit4
LIR ACC. LD# B. OE ALU. M

Bit3
MUX1. S1 DM. R/W# InPort. OE# CarrySel

Bit2
MUX1. S0 Latch1. OE# OutPort. LD# SHOp. S1

Bit1
SP. U/D# Latch2. OE NS. S1 SHOp. S0

Bit0
SP. CEN# Latch4. OE NS.S0 Flags. LD#

30

DesignandSimulationofaPipelined4 bitComputer

A1:5

PCOp.S1:S0thesetwobitsselectsthepcoperationatthenextclockcycle accordingtothefollowingtable: S1 0 0 1 1 S0 0 1 0 1 Operation IncrementPc LoadPc LoadPcifV=1 LoadPcifZ=1

MUX3.Sthispinselectsthevaluetobeloadedintotheprogramcounter.If1 loadspcwiththevaluefrominstructionmemoryoutput;otherwiseloadspcfrom datamemoryoutput. LIRLoadinstructionregister.If1loadstheinstructionregisterwiththevalueof instructionmemoryatthenextclockcycle. MUX1.S1:S0thesetwobitselectstheaddressthatwillbetheinputtothedata memoryaccordingtothefollowingtable: S1 S0 Addressinputtodatamemory 0 0 Instructionmemorybufferregister 0 1 SP 1 0 SP+1 1 1 Unused SP.U/D#if1incrementsthestackpointerregisteratthenextclockcycle; otherwisedecrements.NotethatthispinisactiveonlywhenSP.CEN=1. SP.CENcountenablesignalforSP.ThispinenablesSP.U/D#signal. Latch3.OEthispinenablesthebustransceiver. Latch3.S/R#if1thebustransceiversendsdata(fromdatamemorytothedata bus);otherwisereceivesdata(fromdatabustodatamemory). ACC.OEoutputenablesignalfortheaccumulatorregister.Ifactivethe accumulatorloadsthedatabuswithitscontents. ACC.LD#loadsignalfortheaccumulatorregister.Ifactiveaccumulatorwillbe loadedwiththevaluefromdatabusatthenextclockcycle. DM.R/W#datamemoryread/write#signal.If1datamemoryisselectedforread operation;otherwisedatamemoryisselectedforwriteoperation. Latch1.OE#thispinenablestheoutputbufferforPC+1outputlatch.Thisloads theoutputofthedatamemoryoutputbus. Latch2.OEthispinenablesoutputbufferforinstructionmemorybufferregister. Thisloadsthedatabuswiththevalueofthisregister. Latch4.OEoutputenablesignalforarithmeticlogicandshifterunit. MUX2.S1:S0thesetwobitsselectsthesecondoperandforthearithmeticand logicoperationaccordingtothefollowingtable:

31

DesignandSimulationofaPipelined4 bitComputer

A1:5

S1 0 0 1 1 S0 0 1 0 1 Operand Bregister 1 Datamemoryoutput Instructionmemorybufferregister output

B.LD#loadsignalforBregister. B.OEoutputenablesignalforBregister. InPort.OE#outputenablesignalforinputportregister OutPort.LD#loadsignalforoutputportregister. NS.S1:S0selectsnextROMinput ALU.S2:S0thesethreebitsselectstheappropriateALUoperationtobeperformed. ALU.MselectsthemodeofALUoperation.If1theoperationislogic;otherwise operationwillbearithmetic CarrySelthisbitdeterminesthecarryinsignalforALUalongwithsomeotherbits. SHOp.S1:S0selectstheshifteroperationaccordingtothefollowingtable: S1 S0 Operation 0 0 Transfer 0 1 Shiftright 1 0 Shiftleft 1 1 Transfer0 Flags.LD#loadsignalforflagsregister.

7.4.1 ContentsofthecontrolROM
address ROM1 (inhex) 0000 11 0001 11 0002 11 0003 11 0004 11 0005 11 ROM2 (inhex) 9C 8D 8D 8D 8D 9C ROM3 (inhex) 2C 2C 2C 2C 2C 2C ROM4 (inhex) 00 A8 A0 40 48 40

32

DesignandSimulationofaPipelined4 bitComputer

A1:5

0006 0007 0008 0009 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 0020 0021 0022 0023 0024 0025 0026 0027 0028 0029 0030 0031

11 11 11 11 11 11 11 11 11 11 11 11 11 11 41 81 C1 44 6A 15 1A 00 00 00 01 11

8C BC BD BD 8C BC 8D 8D 4C 34 8E 8D 8D 8D 9C 9C 9C 90 9C 34 4C 00 00 00 9C 9C

3C 0C 6C 2C 24 28 2C 2C 2C 2C 2C EC EC AC 2E 2E 2E 2E 2E 2C 2C 00 00 00 2D 2C

01 01 A8 D0 01 01 0C 0A 01 01 01 40 50 A8 01 01 01 01 01 01 01 00 00 00 01 01

33

DesignandSimulationofaPipelined4 bitComputer

A1:5

8Implementationofcomponents
ProgramcounterandStackpointerregister
Programcounterandstackpointerarebothimplementedbycounters.Weusedhere4 bitbidirectionalcounterICno74LS169.Ithasthefollowingfeaturesthatmadeit specialtoserveourpurpose: Upanddowncapability. Parallelloadcapability. Twoindependentcountenablesignals Onefullcountsignalforcascading Sincewerequired8bitcounterforbothprogramcounterandstackpointerwe cascadedtwo4bitcounterstomake8bitcounter.TheTC(pin15)outputpinofthelow nibblecounterisconnectedtothecountenablesignalsofthehighnibblecounter.

8.1.2Accumulatorregister
Itisa4bitflipflopIC74LS173.OneoutputofthisregisterispassedtotheALUoperand throughaMUX.AnotherisfedintotheIC74LS126.Itisa3statebuffer.Theoutputof thisbufferisconnectedtotheshareddatabus.Theoutputenablesignalofthisbuffer connectsanddisconnectstheaccumulatorregisteroutputfromthesharedbuffer.Itis requiredsothatitdoesnotloadthebuswhenanotherregisterisloadingthebuswith itsoutput.

8.1.3Bregister
Itissameasaccumulatorregisterexceptthatoneoutputofthisregisterisfedintothe MUX2.

8.1.4 Arithmeticlogicunit

34

DesignandSimulationofaPipelined4 bitComputer

A1:5

ICno74LS181isa4bitarithmeticlogicunit.Theinputandoutputcanbeconsideredas eitheractiveloworactivehigh.Howeverdependingontheactiveloworactivehigh interpretationofinputsandoutputsthecarryinsignalisjusttheoppositeofwhatwe considerittobe.Forexampleifwewanttoperformaddwithcarryoperationandwe consideractivehighinputsandoutputs,thencarryinsignalmustbeactivelow.The4 functionselectorpinsenablesvariousarithmeticandlogicoperations.Onemodeselect pinisavailabletoselectbetweenarithmeticandlogicoperations.

8.1.5 ShifterUnit
Theshifterunitisimplementedbytwo74LS153dual4to1MUX. Thisisa combinationalshifterandnoclockcycleisrequiredtoperformshifting.

8.1.6InputPortRegisterandOutputPortRegister
Theseareboth4bitflipflopIC74LS173.Theinputenablesignalandoutputenable signalareusedforinputandoutputenable.

8.1.7 RAM
Forsimulationofthecircuit,weusedcircuitmakersoftware.Incircuitmakersoftwarea genericRAM1Kisavailable.Ithas10addresspinsand8outputpins.Weuseditinour simulationforbothinstructionmemoryanddatamemory.Sincewerequired8address bits,thehighest2bitswereleftatzerovoltage(ground).

8.1.8PROM
Incircuitmakeronly1ROMwasavailable.Thatis32X8bitPROM.Ithas5addresspins and8outputpins.Weneeded32X32PROM.Sowecascaded4PROMs.Addresspinsof these4PROMsareconnectedtogethertogenerate32bitoutputsimultaneously.

8.1.8 Bidirectionalbuffer
Octal3statebustransceiverIC74LS245wasusedforconnectingtheoutputofthedata memorytothesharedbus.Forsomeinstructionswehavetosenddatafromthedata memorytothesharedbus.Forsomeotherinstructionwehavetoreceivedatafromthe

35

DesignandSimulationofaPipelined4 bitComputer

A1:5

sharedbustothedatamemory(duringmemorywriteoperations).Hencebidirectional bufferwasnecessary.TheS/R#pin(pin1)ofthisICcontrolsthedirectionofdata transferthroughthetransceiver.Ifchipselectpinisinactivethenthedevicedisconnects thememoryoutputfromthedatabussothatitdoesnotloadthedatabuswhileothers areusingthesharedbus.

8.2NameandnumberofICsused
IC# 74LS173 74LS181 74LS126 74LS153 74LS169A 74LS273 74LS157 74LS245 74LS83A 74LS373 74LS10 74LS04 74LS25 74LS27 74LS138 74LS08 74LS32 RAM PROM Name Quad3stateDtype flipflop 4bitarithmeticlogic unit Quad3statebuffers 4to1multiplexer 4bitbidirectional counter OctalDflipflop Quad2to1 multiplexer Octal3state transceiver 4bitbinaryfulladder Octal3statelatch 3inputNAND Hexinverter Dual4inputNOR gate 3inputNOR 1of8decoder Quad2inputAND Quad2inputOR GenericRAM1KX8bit GenericPROM 32X8bit Pcs 9 1 5 14 4 1 5 1 4 1 1 3 1 1 1 1 1 2 5

36

DesignandSimulationofaPipelined4 bitComputer

A1:5

9Problemsencounteredanddiscussion

Themainobjectiveofour4bitcomputerdesignwastoreducenumberofaverage clockcyclesforeachinstruction.Toachievethispurposewehadtoaddadditional hardwareinourdesign.OurinitialdesignwasbasedonVonNeumannarchitecture. SincetheinstructionsetthatwassuppliedtoustoimplementwasaslikeasIntels instructionset,wefollowedtheirarchitecture.Soweusedsamememoryfor instructionanddatainourinitialdesign.However,thisrequiredsomegreater numberofclockcyclesforsomeinstructions.Itisbecausetheclockcycle,atwhich wewerefetchingdatafrommemoryforaninstruction,wewereunabletofetch Opcodefornextinstructionatthesameclockcycle.Thisstalledthefetchpipeline. Laterweseparatedourdatamemoryfrominstructionmemory.Thisreduced numberofaverageclockcyclesforthoseinstructionstobenearlyequaltoone. Writingtodatamemorycausedusaseriousprobleminourdesign.Weneededthat theaddressvalueatwhichnewdatawillbewrittentomemorymustbestable beforethewritesignalisgenerated.Itisbecauseifthewritesignalisgenerated beforetheaddressvalueisstable;itmighthappenthatdatawillbewrittento unknownaddressvalues.Soatfirstwekeptthreeclockcyclesforeachmemory writeoperation.Thefirstclockcycleallowedtheaddressvaluetobestable,the actualwriteoccurredinthesecondclockcycleandthethirdclockcyclewasforde activatingthewritesignalbeforechangingtheaddressvalue.Howeverthis introducedtwoadditionalclockcyclesforeveryinstructionthatinvolvedmemory writeoperationsuchasPUSH,STA.Howeverlaterwesolvedthisproblemusingthe clocksignalforwriting.Theaddressvaluewasgeneratedatthelowtohigh transitionoftheclockcycle.Sowedecodedthewritesignalinthefollowingway: o whenclockishighwritesignalisinhibited o whenclockislowwritesignalisactived Thisensuredhalftheclockcycletime(afterlowtohightransitionandbefore hightolowtransitionoftheclock)fortheaddressvaluestobestablebefore writesignalbeginsactive.Soourproblemwassolvedeasilybyusingonly somegates. Wehadtoconsidereachinstructionseparatelytodesignthecomputerratherthan thinkinggenerally.Becauseweweresupposedtoimplementonlythoseinstructions atlowestclockcycles.Sowehadtheoptiontothinkeachinstructionseparatelyto

37

DesignandSimulationofaPipelined4 bitComputer

A1:5

reduceclockcyclebyaddinghardwarecomponents.Sincewehadnottoimplement theactualdesignwewerenotafraidoftoomuchchipsandwires.

Someflipflopshavenomasterresetpin.Thiscausedabigproblemforusafter designingthewholesystem.Becauseatthestartofeachnewsimulationtheflip flopvalueswereunknown.Moreoverunnecessaryvalueswerebeingwrittenin datamemory.Sowewantedthatwewouldzeroourallflipflopsatthestartof eachsimulationbyonecontrolsignal.Buttheunavailabilityofmasterresetpinof someflipflopscompelleduslatertochangethose.AtlastweusedIC74LS273 becauseithadamasterresetpin. WekepttwoseparatememoriesforinstructionanddatalikeMIPSarchitecture. HoweverMIPSarchitectureconsidersinstructiontobeoffixedsized.Our instructionsizewasnotfixed.Someinstructionswereof1byteandothersareof2 bytes.Thisisdissimilaritybetweenthearchitectureandinstructionset. Atfirstwehadtoomanycontrolsignals(nearly40)and6inputlinesforthecontrol ROM.ThisrequiredthatthecontrolROMshouldbeofsize64X40.Circuitmaker softwaresupportsonlyoneROMthatisofsize32X8.Wehadtocascade10ROMs togeneratetheallcontrolsignals.Howeverlaterwereducedtheinputlinesfrom6 to5byconsideringthefollowingfacts:

o firstexecutioncycleofsomeinstructionswasdoingthesamething o secondcycleofsomeotherinstructionswasdoingthesamething
WeusedaninstructiondecoderROM.Foreachinstructionthisgeneratesthe addressatwhichtheinstructionwillbeexecuted.Inthiswaywemappedseveral instructionstothesameaddressofthecontrolROM,sincethoserequiredthe samecontrolsignals.Laterwewasabletoreducetototalnumberofcontrol signalsto32.Werequiredonly5ROMstoimplementthecontrolcircuit. NEGinstructionrequiredtocomputetwoscomplementoftheaccumulator.Since4 bitALUwhichweusedinourdesigndoesnotsupportthisfunction,wefellina troubletoperformitinoneclockcycle.Wecouldeasilycomputeitintwoclock cycles.Inthefirstcyclewewouldcomputetheonescompliment(ALUsupportsthis function).Inthesecondclockcyclewewouldincrementtheaccumulator.However sinceallotherinstructionsrequiredatmosttwoclockcyclestwoexecute, implementingNEGinthiswaywouldrequireNEGthreeclockcycles.Sotoreduceit laterweaddedaMUXandinvertergatesinfrontofALUinputAfromthe accumulatortogeneratetheonescomplimentoftheaccumulatorandpassingit directlytotheALUinputwithoutusingtheALU.ThisallowedtheNEGtobe computedinonlyoneclockcycle.

38

DesignandSimulationofaPipelined4 bitComputer

A1:5

Atfirstrunningaprograminthesimulator(circuitmaker)werequiredtoloadthe hexvalueoftheprograminstructionsininstructionmemory.Howeverconverting eachinstructionmnemonictotheirhexvaluewasdisgusting.WewroteaC programassembler.c.Thisprogramtakesasinputfileandgeneratesanoutputfile convertingeachmnemonicinstructionstotheirhexvalues.Theprogramisgiven below:


#include<stdio.h> #include<string.h> #include<stdlib.h> intmain(intargc,char*argv[]) { if(argc!=3) { printf("Toomanyarguments...\n"); exit(0); } FILE*in,*out; in=fopen(argv[1],"r"); out=fopen(argv[2],"w"); if(in==NULL|out==NULL) { printf("Oneormorefilescouldnotbeopened..\n"); exit(1); } charstr[10]; intstate; state=0; intaddress=0; fprintf(out,"addr:value\n"); fprintf(out,"\n"); while(1) { if(feof(in)) { fclose(in); fclose(out);

39

DesignandSimulationofaPipelined4 bitComputer

A1:5

break;

fscanf(in,"%s",str); if(strcmp(str,"ADD")==0) { fscanf(in,"%s",str); if(strcmp(str,"B")==0) { fprintf(out,"%04d:%02X\n",address,1); address++; } else { fprintf(out,"%04d:%02X\n",address,19); address++; fprintf(out,"%04d:%s\n",address,str); address++; } } elseif(strcmp(str,"ADC")==0) { fscanf(in,"%s",str); if(strcmp(str,"B")==0) { fprintf(out,"%04d:%02X\n",address,2); address++; } else { printf("InvalidoperandforADC..\n"); } } elseif(strcmp(str,"SUB")==0) { fscanf(in,"%s",str); if(strcmp(str,"B")==0) { fprintf(out,"%04d:%02X\n",address,3); address++; } else { fprintf(out,"%04d:%02X\n",address,17); address++; fprintf(out,"%04d:%s\n",address,str); address++;

40

DesignandSimulationofaPipelined4 bitComputer

A1:5

} }elseif(strcmp(str,"SBB")==0) { fscanf(in,"%s",str); if(strcmp(str,"B")==0) { fprintf(out,"%04d:%02X\n",address,4); address++; } else { printf("InvalidoperandforSUB..\n"); } } elseif(strcmp(str,"CMP")==0) { fscanf(in,"%s",str); if(strcmp(str,"B")==0) { fprintf(out,"%04d:%02X\n",address,5); address++; } else { printf("InvalidoperandforCMP..\n"); } } elseif(strcmp(str,"MOV")==0) { fscanf(in,"%[]",str);

fscanf(in,"%[AZ]",str);

if(strcmp(str,"ACC")==0) { fscanf(in,"%[,]",str);//discand,andspacesifany fscanf(in,"%s",str);

if(strcmp(str,"B")==0) { fprintf(out,"%04d:%02X\n",address,6);

41

DesignandSimulationofaPipelined4 bitComputer

A1:5

} else {

address++;

fprintf(out,"%04d:%02X\n",address,16); address++;

fprintf(out,"%04d:%s\n",address,str); address++; } } elseif(strcmp(str,"B")==0) { fscanf(in,"%[,]",str); fscanf(in,"%s",str); } if(strcmp(str,"ACC")==0) { fprintf(out,"%04d:%02X\n",address,7); address++; } else { printf("InvalidoperandforMOVB,ACC"); }

}elseif(strcmp(str,"NEG")==0) { fprintf(out,"%04d:%02X\n",address,8); address++; } elseif(strcmp(str,"OR")==0) { fscanf(in,"%s",str); if(strcmp(str,"B")==0) { fprintf(out,"%04d:%02X\n",address,9); address++; } else { printf("InvalidoperandforOR..\n"); } }elseif(strcmp(str,"IN")==0)

42

DesignandSimulationofaPipelined4 bitComputer

A1:5

{ fprintf(out,"%04d:%02X\n",address,10); address++; }elseif(strcmp(str,"OUT")==0) { fprintf(out,"%04d:%02X\n",address,11); address++; }elseif(strcmp(str,"SHL")==0) { fprintf(out,"%04d:%02X\n",address,12); address++; }elseif(strcmp(str,"SHR")==0) { fprintf(out,"%04d:%02X\n",address,13); address++; }elseif(strcmp(str,"LDA")==0) { fprintf(out,"%04d:%02X\n",address,14); address++; fscanf(in,"%s",str); fprintf(out,"%04d:%s\n",address,str); address++; } elseif(strcmp(str,"STA")==0) { fprintf(out,"%04d:%02X\n",address,15); address++; fscanf(in,"%s",str); fprintf(out,"%04d:%s\n",address,str); address++; }elseif(strcmp(str,"XOR")==0) { fprintf(out,"%04d:%02X\n",address,18); address++; fscanf(in,"%s",str); fprintf(out,"%04d:%s\n",address,str); address++;

43

DesignandSimulationofaPipelined4 bitComputer

A1:5

}elseif(strcmp(str,"JMP")==0) { fprintf(out,"%04d:%02X\n",address,20); address++; fscanf(in,"%s",str); fprintf(out,"%04d:%s\n",address,str); address++; }elseif(strcmp(str,"JO")==0) { fprintf(out,"%04d:%02X\n",address,21); address++; fscanf(in,"%s",str); fprintf(out,"%04d:%s\n",address,str); address++; }elseif(strcmp(str,"JE")==0) { fprintf(out,"%04d:%02X\n",address,22); address++; fscanf(in,"%s",str); fprintf(out,"%04d:%s\n",address,str); address++; }elseif(strcmp(str,"CALL")==0) { fprintf(out,"%04d:%02X\n",address,23); address++; fscanf(in,"%s",str); fprintf(out,"%04d:%s\n",address,str); address++; }elseif(strcmp(str,"RET")==0) { fprintf(out,"%04d:%02X\n",address,24); address++; }elseif(strcmp(str,"PUSH")==0) { fprintf(out,"%04d:%02X\n",address,25); address++;

44

DesignandSimulationofaPipelined4 bitComputer

A1:5

}elseif(strcmp(str,"POP")==0) { fprintf(out,"%04d:%02X\n",address,26); address++; }elseif(strcmp(str,"NOP")==0) { fprintf(out,"%04d:%02X\n",address,27); address++; }elseif(strcmp(str,"HLT")==0) { fprintf(out,"%04d:%02X\n",address,28); address++; } else { printf("Opcodemismatch..\n"); }

return0;

Forexampleassumethefollowingprograminstructionsarewrittenininput.txtfile:

Howtoruntheassemblerprogram: Firstcompiletheprogramandbuildtheprogramtogeneatethebinaryfile assembler.exe Fromthecommandlinewrite:assembler.exeinput.txtoutput.asm

45

DesignandSimulationofaPipelined4 bitComputer

A1:5

Theprogramgeneratesthefollowingoutput.asmfile:

ForeachaddressvalueintheRAM,itshowsthehexvaluethatistobewritten.

46

DesignandSimulationofaPipelined4 bitComputer

A1:5

10Aboutthesimulationtool
WeusedCircuitmaker2000forsimulatingourprocessor.

11RunningaprograminSimulator
Torunaprograminthesimulationfollowingstepsaretobetaken. >>Writetheprograminassembly >>Usingtheassemblertoolprovided,makethemachinecodeofit. ItwillgiveAddress:Codeinhexformat >>Beforestartingthesimulation,doubleclickonInstructionMemory, andwritethecodesineachaddressascreatedbytheassembler.

47

DesignandSimulationofaPipelined4 bitComputer

A1:5

12Conclusion
We enjoyed designing this 4 bit computer. The design was revised in severalphases.Atfirstwedecidednottouseseparatedataandinstruction memory (use Von Neumann architecture). Then we found that we could improve the performance greatly by using Harvard architecture. Thus we ended up with a design, which could execute most of the instructions in onecycleandrestintwo.Thesimulationpartwasmostexciting.Becauseit letuscheckourowndesigninaction.

48

You might also like