You are on page 1of 75
Unit 1 Introduction to Structured Computer Organization ‘© Languages, Levels and virtual Machines. Contemporary Multilevel Machines, Evolution of Multilevel Machines, The Zeroth Generation, The First Generation, The Second Generation, The Third Generation, The Fourth Generation, The Fifth Generation. Unit 2 Computer Systems Organizations 12-32 * Processors: CPU Organization, Instruction Execution, RISC versus CISC, Design Principles of Modern Computers, Instruction-Level and Processor-Level Parallelism. Types of Memory: Primary Memory, Secondary Memory, Input/Output: Buses, DMA. Unit 3 ‘Combinational Logic Design 33: * Design of fast Adders: Carry Look Ahead adder, Ripple carry adder, Fast Multiplication: Booths array Multiplier, Bit pair Recording of Multiplier, Carry-save addition of sum ands, Integer, Division Unit 4 Microprocessor Architecture & Programming 38-46 8085 Microprocessor Architectures, The Microprocessors Based Personal Computer System, Internal Architecture, Instruction execution , Classification of Instruction set, Memory Addressing, Microcontroller, A single chip Microcontroller, Microprocessor Vs Microcontroller, 8/16 Bit Microcontrollers. Case Study: Learning model of 8051 microcontroller. Unit 5 Pentium Microprocessors 47-61 © Pentium Microprocessors, The memory System, Pentium Registers, Pentium Memory Management, Pentium Pro Microprocessor, Internal Structure of the Pentium Pro, The Memory System, The Pentium 4 and Core2, Pentium i3/i5/i7. Unit 6 Parallel Computer Architectures 62-73 ¢_On-Chip Parallelism: Instruction-Level Parallelism, On- Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Chip Multithreading , Single-Chip Multiprocessor, Coprocessor: Network Processor, Media processor, Cryptoprocessors , Grid computing. Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Unit 4: Introduction to Structured Computer Organization 1.1 Languages, Levels, virtual Machines The problem can be attacked in two ways: both involve designing a new set of instructions that is more convenient for people to use than the set of built-in machine instructions. Taken together, these new instructions also form a language, which we will call L1, just as the built-in machine instructions form a language, which we will call LO. The two approaches differ in the way programs written in L1 are executed by the computer, which, after all, can only execute programs written in its machine language, LO. One method of executing a program written in L1 is first to replace each instruction in it by an equivalent sequence of instructions in LO. The resulting program consists entirely of LO instructions. The computer then executes the new LO program instead of the old L1 program. This technique is called translation. The other technique is to write a program in LO that takes programs in L1 as input data and carries them out by examining each instruction in turn and executing the equivalent sequence of LO instructions directly. This technique does not require first generating a new program in LO. It is called interpretation and the program that carries it out is called an interpreter. * Multilevel Machines: Rather than thinking in terms of translation or interpretation, we imagine a hypothetical computer or virtual machine whose machine language is L1. Let's call the virtual machine M1 and the virtual machine corresponding to LO, MO.People could simply write their programs in L1 and computer execute them directly. Even if the virtual machine whose language is L1 is too expensive or complicated to construct out of electronic circuits, people can still write programs for it. These programs can either be interpreted or translated by a program written in LO. Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Languages, Levels, Virtual Machines Programs in nave elton erproted by Virwal machine Mn, wih | Level | machine a guago Ln Virtual machine M3, wih Levers | Vinnie anguagst Progam 12 re Sie meryatsy _-figprong ng Viualmacine wa, wan L---~ oni or ora Loot | Vinachae wrguags fnnntado 1 010 Progen Lt ara fier errsoaty Gaalmadice Wi wih |, enmirpeear ng on Levels | Vityalmaciog Mi. witt Hor ae worsted 110 machine guage Lt Programs in LO canbe _ dresy one hoaeror crooks “Aca computer WO wit |. evero | machine guage LO A multilevel machine Figure:1.1 © To perform translation or interpretation, the languages LO and L1 must not be too different. This constraint means that L1 is better than LO but still far from ideal for most applications. This result is maybe disappointing in comparison to the original purpose for creating L1. © Now, we invent another set of instructions that is more people-oriented and less machine-oriented than L1. This third set also forms a language, which we will call L2 with virtual machine M2. People can write programs in L2. these programs can either be translated to L1 or executed by an interpreter written in Li. © In the whole series of languages, each one is more convenient than its predecessors. And it continues until a suitable one is found. Each language uses its predecessor as a basis, so we may view a computer using this technique as a series of layers or levels, one on top of another, as shown in the above image. The bottommost language or level is the simplest, and the topmost language or level is the most complicated. Ina certain case, a computer with n levels can be regarded as n different virtual machines, each with a different machine language. The electronic circuits can directly carry only programs written in language LO without translation or interpretation. Programs are written in L1, L2, ..., Ln must either be interpreted by an interpreter running on a lower Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. level or translated to another language corresponding to a lower level. 1.2 Contemporary Multilevel Machines Six levels exist in contemporary multilevel machines, as shown in the below image. Level 0, at the bottom, is the machine's true hardware, and its circuits carry out the machine- language programs of level 1 Levels erry ‘Translation (compiler) Level 4 en) ‘Translation (assembler) POSER Cporating system machine lovel Partial interpretation (operating system) POSE | instruction set architecture level Interpretation (microprogram) or direct execution Level 1 ;chitecture level Hardware Level 0 Digital logic level figure:1-4 1. Digital logic level: It is built from analogue components, such as gates and transistors. Each gate has one or more digital inputs and computes some simple function of these inputs, such as AND or OR. Each gate is built up from at most one transistor. Assmall number of gates is combined to form a 1-bit memory, stored in 0 or 1. The 1-bit memories are combined in groups of 16, 32, or 64 to form registers. Each register can hold a single binary number up to some maximum. Gates can also be combined to form the main computing engine itself. 2. Microarchitecture level: It has a collection of 8 to 32 registers that form a local memory and a circuit called an ALU (Arithmetic Logic Unit), which can perform simple arithmetic operations. The registers are connected to the ALU to form a data path over which the data flow. The basic operation of the data path consists of selecting one or two registers, having the ALU operate on them, and storing the result stored back in some register. On some machines, the operation of the data path is controlled by a program called a microprogram. On other machines, the data path is controlled directly by Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. 13 hardware. On machines with software control of the data path, the microprogram is an interpreter for the instructions at level 2. It fetches, examines, and executes instructions one by one, using the data path to do so. Instruction Set Architecture (ISA) level: If a computer manufacturer provides two interpreters for one of its machines and interprets two different ISA levels, it will need to provide two machine language reference manuals, one for each interpreter. Operating system machine level: This level is usually a hybrid level, and most of the instructions in its language are also at the ISA level. There are also new instructions, a different memory organization, run two or more programs concurrently, and various other features. The operating system interprets some of the level 3 instructions, and some are interpreted directly by the microprogram. ‘Assembly language level: This level allows people to write programs for levels 1, 2, and 3 in a form that is not as unpleasant as the virtual machine languages themselves. Programs in assembly language are first translated to level 1, 2, or 3 languages then interpreted by the appropriate virtual or actual machine. The program that performs the translation is called an assembler. Problem-oriented language level: It consists of languages designed by applications programmers to solve problems. Such languages are often called high-level languages, and programs written in these languages are generally translated to level 3 or level 4 by translators known as compilers. In some cases, level 5 consists of an interpreter for a specific application domain, such as symbolic mathematics. It provides data and operations for solving problems in this domain so that people can understand easily, Evolution of Multilevel Machines * Invention of microprogramming * Invention of Operating System * Migration of functionality to microcode © Elimination of microprogramming The Invention of Microprogramming The first digital computers, back in the 1940s, had only two levels: the ISA level, in which all the programming was done, and the digital logic level, which executed these programs. The digital logic level's circuits were complicated, difficult to understand and build, and unreliable In 1951, Maurice Wilkes, a researcher at the University of Cambridge, suggested designing Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. a three-level computer in order to drastically simplify the hardware and thus reduce the number of (unreliable) vacuum tubes needed (Wilkes, 1951). This machine was to have a built-in, unchangeable interpreter (the microprogram), whose function was to execute ISA- level programs by interpretation. Because the hardware would now only have to execute microprograms, which have a limited instruction set, instead of ISA-level programs, which have a much larger instruction set, fewer electronic circuits would be needed. Because electronic circuits were then made from vacuum tubes, such a simplification promised to reduce tube count and hence enhance reliability (.e., the number of crashes per day).A few of these three-level machines were constructed during the 1950s. More were constructed during the 1960s. By 1970 the idea of having the ISA level be interpreted by a microprogram, instead of directly by the electronics, was dominant. All the major machines of the day used it. The Invention of the Operating System In these early years, most computers were “open shop,” which meant that the programmer had to operate the machine personally. Next to each machine was a sign-up sheet. A programmer wanting to run a program signed up for a block of time, say Wednesday morning 3 to 5 A.M. (many programmers liked to work when it was quiet in the machine room). When the time arrived, the programmer headed for the machine room with a deck of 80-column punched cards (an early input medium) in one hand and a sharpened pencil in the other. Upon arriving in the computer room, he or she gently nudged the previous programmer toward the door and took over the computer. The modern computer took its shape with the arrival of your time. It had been around 16th century when the evolution of the computer started. The initial computer faced many changes, obviously for the betterment. It continuously improved itself in terms of speed, accuracy, size, and price to urge the form of the fashionable day computer. This long period is often conveniently divided into the subsequent phases called computer generations: © Zeroth Generation Computers(1642-1940) + First Generation Computers (1940-1956), + Second Generation Computers (1956-1963 + Third Generation Computers (1964-1971) * Fourth Generation Computers (1971-Present} . Fifth Generation Computers (Present and Beyond) Before there are graphing calculators, spreadsheets, and computer algebra systems, mathematicians and inventors searched for solutions to ease the burden of calculation Below are the 8 mechanical calculators before modern computers were invented. 1. Abacus (ca. 2700 BC) 2. Pascal's Calculator (1652) 3. Stepped Reckoner (1694) 4, Arithmometer (1820) 5. Comptometer (1887) and Comptograph (1889) Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. 6. The Difference Engine (1822) 7. Analytical Engine (1834) 8. The Millionaire (1893) * The Zeroth generation -- mechanical computers (1642-1945) The first person to build a working calculating machine was the French scientist Blaise Pascal (1623-1662), in whose honor the programming language Pascal is named. This device, built in 1642, when Pascal was only 19, was designed to help his father, a tax collector for the French government. It was entirely mechanical, using gears, and powered by a hand-operated crank. Pascal's machine could do only addition and subtraction operations, but thirty years later the great German mathematician Baron Gottfried Wilhelm von Leibniz (1646-1716) built another mechanical machine that could multiply and divide as Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Year| Name Made by ‘Comments 1834] Analytical Engine | Babbage First attempt to build a digital computer 1936|21 Zuse First working relay calculating machi 1943] COLOSSUS _| British govt__| First electronic computer 1944] Mark | ‘Aiken First American general-purpose computer 1946] ENIAC Eckert/Mauchiey | Modern computer history starts here 1949 EDSAC Wilkes First stored-program computer 1951| Whirlwind | MLLT. First real-time computer TO52/1AS ‘Von Neumann _ | Most current machines use this design 1960|PDP-1 DEC First minicomputer (50 sold) 1961] 1401 BM Enormously popular small business machine 19627094 BM Dominated scientific computing in the early 1960s 1963] B5000 Burroughs First machine designed for a high-level language 1964|360 18M First product line designed as a far 1964]6600 CDC First scientific supercomputer 1965|PDP-6 DEC First mass-market minicomputer (60,000 sold) 1970] PDP-11 DEC Dominated minicomputers in the 1970s 1974|8080_ Intel First general-purpose 8-bit computer on a chip 1974| CRAY-1 ‘Cray First vector supercomputer 1976] VAX DEC First 32-bit superminicomputer 1981]IBM PC IBM Started the modern personal computer era 1981[Osborne-1 ‘Osborne First portable computer 1983|Lisa ‘Apple First personal computer with a GUI 1985) 386 intel First 82-bit ancestor of the Pentium line 1985|MIPS MIPS First commercial RISC machine 1985 XC2064 Xilinx, First ficld-programmable gate array (FPGA) 1987|SPARC ‘Sun First SPARC-based RISC workstation 1989) GridPad Grid Systems _| First commercial tablet computer 1990]RS6000 IBM First superscalar machine 1992) Alpha DEC First 64-bit personal computer 1992] Simon 1BM First smariphone 1999] Newion ‘Apple First palmtop computer (PDA) [2001[POWERS IBM | First dual-core chip multiprocessor ‘© First Generation Computers: Vacuum Tubes (1940-1956)The stimulus for the electronic computer was World War Il. During the early part of the war, German submarines were wreaking havoc on British ships. Commands were sent from the German admirals in Berlin to the submarines by radio, which the British could, and did, intercept. The problem was that these messages were encoded using a device called the ENIGMA, whose fore runner was designed by amateur inventor and former U.S. president, Thomas Jefferson. Early in the war, British intelligence managed to acquire an ENIGMA machine from Polish Intelligence, which had stolen it from the Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Germans. However, to break a coded message, a huge amount of computation was needed, and it was needed very soon after the message was intercepted to be of any use. To decode these messages, the British government set up a top secret laboratory that built an electronic computer called the COLOSSUS. The famous British mathematician Alan Turing helped design this machine. [Main first generation computers are: ‘* ENIACJElectronic Numerical Integrator and Computer, built by J. Presper Eckert and John V. Mauchly was a general-purpose computer. It had been very heavy, large, and contained 18,000 vacuum tubes. ‘* EDVAC] Electronic Discrete Variable Automatic Computer was designed by von Neumann. it could store data also as instruction and thus the speed was enhanced. ¢ UNIVACJ Universal Automatic Computer was developed in 1952 by Eckert and Mauchly. (Main characteristics of first generation computers are: Main electronic component Vacuum tube. Programming language Machine language Main memory Magnetic tapes and magnetic drums. input/output devices Paper tape and punched cards. Speed and size Very slow and very large in size (often taking up entire room). Examples of the first generation | IBM 650, IBM 701, ENIAC, UNIVAC1, etc. '* Second Generation Computers: Transistors (1956-1963) The transistor was invented at Bell Labs in 1948 by John Bardeen, Walter Brattain, and William Shockley, for which they were awarded the 1956 Nobel Prize in physics. Within 10 years the transistor revolutionized computers, and by the late 1950s, vacuum tube computers were obsolete. The first transistorized computer was built at M.LT.’s Lincoln Laboratory, a 16-bit machine along the lines of the Whirlwind I. It was called the TX-0 (Transistorized experimental computer 0) and was merely intended as a device to test the much fancier TX-2. Second-generation computers used the technology of transistors rather than bulky vacuum tubes. Another feature was the core storage. A transistor may be a device composed of semiconductor material that amplifies a sign or opens or closes a circuit. Transistors were invented in Bell Labs. The use of transistors made it possible to perform Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. powerfully and with due speed. It reduced the dimensions and price and thankfully the warmth too, which was generated by vacuum tubes. Central Processing Unit (CPU), memory, programming language and input, and output units also came into the force within the second generation. Programming language was shifted from high level to. programming language and made programming comparatively a simple task for programmers. Languages used for programming during this era were FORTRAN (1956), ALGOL (1958), and COBOL (1959) Main characteristics of second generation computers are:- Main electronic Transistor. component Programming language | Machine language and assembly language. Memory Magnetic core and magnetic tape/disk. Input/output devices Magnetic tape and punched cards. Power and size ‘Smaller in size, low power consumption, and generated less heat (in comparison with the first generation computers). Examples of second PDP-8, 1BM1400 series, IBM 7090 and 7094, UNIVAC 1107, generation CDC 3600 ete. ‘* Third Generation Computers: Integrated Circuits. (1964-1971) The invention of the silicon integrated circuit by Jack Kilby and Robert Noyce (working independently) in 1958 allowed dozens of transistors to be put on a single chip. This packaging made it possible to build computers that were smaller, faster, and cheaper than their transistorized predecessors. Some of the more significant computers from this generation are described below. By 1964 IBM was the leading computer company and had a big problem with its two highly successful and profitable machines, the 7094 and the 1401: they were as incompatible as two machines could be. One was a high-speed number cruncher using parallel binary arithmetic on 36-bit registers, and the other was a glorified input/output processor using serial decimal arithmetic on variable-length words in memory. Many of its corporate customers had both and did not like the idea of having two separate programming departments with nothing in common. During the third generation, technology envisaged a shift from huge transistors to integrated circuits, also referred to as IC. Here a variety of transistors were placed on , called semiconductors. The most feature of this era’s computer was the speed and reliabil has many transistors, registers, and capacitors built on one thin slice of silicon. The value size was reduced and memory space and dealing efficiency were increased during this generation. Programming was now wiped out Higher level languages like BASIC (Beginners All-purpose Symbolic Instruction Code). Minicomputers find their shape during this era. Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. (Main characteristics of third generation computers are] Main electronic component | Integrated circuits (ICs) Programming language High-level language Memory Large magnetic core, magnetic tape/disk Input / output devices Magnetic tape, monitor, keyboard, printer, etc. Examples of third generation | IBM 360, 18M 370, PDP-11, NCR 395, B6500, UNIVAC 1108, etc, ‘© Fourth Generation Computer 5: Micro-processors (1971-Present) By the 1980s, VLSI (Very Large Scale Integration) had made it possible to put first tens of thousands, then hundreds of thousands, and finally millions of transistors on a single chip. This development soon led to smaller and faster computers. Before the PDP-1, computers were so big and expensive that companies and universities had to have special departments called computer centers to run them. With the advent of the minicomputer, a department could buy its own computer. By 1980, prices had dropped so low that it was feasible for a single individual to have his or her own computer. The personal computer era had begun. Personal computers were used in a very different way than large computers. They were used for word processing, spreadsheets, and numerous highly interactive applications (such as games) that the larger computers could not handle well. Main characteristics of fourth generation computers are Main electronic Very large-scale integration (VLSI) and the microprocessor component (VLSI has thousands of transistors on a single microchip). Memory semiconductor memory (such as RAM, ROM, etc.) Inputoutput devices | pointing devices, optical scanning, keyboard, monitor, printer, etc. Examples of fourth IBM PC, STAR 1000, APPLE ll, Apple Macintosh, Alter 8800, generation ete. ‘* Fifth Generation Computers The technology behind the fifth generation of computers is Al. It allows computers to behave like humans. It is often seen in programs like voice recognition, area of medicines, and entertainment. Within the field of games playing also it’s shown remarkable performance where computers are capable of beating human competitors. The speed is highest, size is that the smallest and area of use has remarkably increased within the fifth Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. generation computers. Though not a hundred percent Al has been achieved to date but keeping in sight the present developments, it is often said that this dream also will become a reality very soon. In order to summarize the features of varied generations of computers, itis often said that, a big improvement has been seen as far because the speed and accuracy of functioning care, but if we mention the dimensions, it’s being small over the years. The value is additionally diminishing and reliability is in fact increasing. The Apple Newton, released in 1993, showed that a computer could be built in a package no bigger than a portable audiocassette player. Like the GridPad, the Newton used handwriting for user input, which in this case proved to be a big stumbling block to its success. However, later machines of this class, now called PDAs (Personal Digital Assistants), have improved user interfaces and are very popular. They have now evolved into smartphones. (Main characteristics of fifth generation computers are? Main electronic | Based on artificial intelligence, uses the Ultra Large-Scale Integration component (ULSI) technology and parallel processing method (ULSI has millions of transistors on a single microchip and Parallel processing method use two or more microprocessors to run tasks simultaneously). Language Understand natural language (human language). Size Portable and small in size. Input /output | Trackpad (or touchpad), touchscreen, pen, speech input (recognize device voice/speech), light scanner, printer, keyboard, monitor, mouse, etc. Example of fifth | Desktops, laptops, tablets, smartphones, etc. generation Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Unit 2: Computer Systems Organizations Processors:- CPU (Central Processing Unit) is the “brain” of the computer. Its function is to execute programs stored in the main memory by fetching their instructions, examining them, and then executing them one after another. The components are connected by a bus, which is a collection of parallel wires for transmitting address, data, and control signals. Buses can be external to the CPU, connecting it to memory and I/O devices, but also internal to the CPU, as we will see shortly. Modern computers have multiple buses. The CPU is composed of several distinct parts. The control unit is responsible for fetching instructions from main memory and determining their type. The arithmetic logic unit performs operations such as addition and Boolean AND needed to carry out the instructions. Computer System Organization vo system ERS isk | | Printer | | Monitor] |Keyooard) | Mouse Cache, ROM, Flash, EEPROM) ‘Conte Bus figure 2.1: The organization of a simple computer with one CPU and two I/O devices. 2.1 CPU organization: The internal organization of part of a simple von Neumann CPU. This part is called the data path and consists of the registers (typically 1 to 32), the ALU (Arithmetic Logic Unit), and several buses connecting the pieces. The registers feed into two ALU input registers, labeled A and B in the figure. These registers hold the ALU input while the ALU i performing some computation. The data path is important in all machines. Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. TE) Cau input Resister “ALU "ALU Output Register Figure 2.2: The data path of a typical von Neumann machine The ALU itself performs addition, subtraction, and other simple operations on its inputs, thus yielding a result in the output register. This output register can be stored back into a register. Later on, the register can be written (i.e., stored) into memory, if desired. Not all designs have the 4, 8, and output registers. In the example, addition is illustrated, but ALUs can also perform other operations. Most instructions can be divided into one of two categories: register-memory or register-register. Register-memory instructions allow memory words to be fetched into registers, where, for example, they can be used as ALU inputs in subsequent instructions. (‘“Words” are the units of data moved between memory and registers Other register-memory instructions allow registers to be stored back into memory. The process of running two operands through the ALU and storing the result is called the data path cycle and is the heart of most CPUs. To a considerable extent, it defines what the machine can do. Modern computers have multiple ALUs operating in parallel and specialized for different functions. The faster the data path cycle is, the faster the machine runs. 2.2 Instruction Exception The CPU executes each instruction in a series of small steps. Roughly speaking, the steps are as follows: 1. Fetch the next instruction from memory into the instruction register. 2. Change the program counter to point to the following instruction. 3. Determine the type of instruction just fetched. 4, If the instruction uses a word in memory, determine where itis. 5. Fetch the word, if needed, into a CPU register. 6. Execute the instruction, 7. Go to step 1 to begin executing the following instruction. This sequence of steps is frequently referred to as the fetch-decode-execute cycle. It is central to the operation of all computers. The machine being interpreted has two registers visible to user programs: the program counter (PC), for keeping track of the address of the next instruction to be fetched, and the accumulator (AC), for accumulating arithmetic results. It also has internal registers for holding the current instruction during its execution (instr), the type of the current instruction (instr type), the address of the instruction’s operand (data loc), and the current operand itself (data). Instructions are assumed to contain a single memory address. The memory location addressed contains the operand, Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. for example, the data item to add to the accumulator. The very fact that it is possible to write a program that can imitate the function of a CPU shows that a program need not be executed by a “hardware” CPU consisting of a box full of electronics. Instead, a program can be carried out by having another program fetch, examine, and execute its instructions. A program that fetches, examines, and executes the instructions of another program is called an interpreter. 2.3 RISC versus CISC In 1980, a group at Berkeley led by David Patterson and Carlo Se’quin began designing VLSI CPU chips that did not use interpretation (Patterson, 1985, Patterson and Sequin, 1982). They coined the term RISC for this concept and named their CPU chip the RISC | CPU, followed shortly by the RISC Il. Slightly later, in 1981, across the San Francisco Bay at Stanford, John Hennessy designed and fabricated a somewhat different chip he called the MIPS (Hennessy, 1984). These chips evolved into commercially important products, the SPARC and the MIPS, respectively. the acronym RISC stands for Reduced Instruction Set Computer, which was contrasted with CISC, which stands for Complex Instruction Set Computer (a thinly veiled reference to the VAX, which dominated university Computer Science Departments at the time). Nowadays, few people think that the size of the instruction set is a major issue, but the name stuck. To make a long story short, a great religious war ensued, with the RISC supporters attacking the established order (VAX, Intel, large IBM mainframes). They claimed that the best way to design a computer was to have a small number of simple instructions that execute in one cycle of the data path by fetching two registers, combining them somehow (e.g., adding or ANDing them), and storing the result back in a register. Their argument was that even if a RISC machine takes four or five instructions to do what a CISC machine does in one instruction, if the RISC instructions are 10 times as fast (because they are not interpreted), RISC wins. It is also worth pointing out that by this time the speed of main memories had caught up to the speed of read-only control stores, so the interpretation penalty had greatly increased, strongly favoring RISC machines. Starting with the 486, the Intel CPUs contain a RISC core that executes the simplest (and typically most common) instructions in a single data path cycle, while interpreting the more complicated instructions in the usual CISC way. The net result is that common instructions are fast and less common instructions are slow. While this hybrid approach is not as fast as a pure RISC design, it gives competitive overall performance while still allowing old software to run unmodified. 2.4 Design principal of Modern Design there is a set of design principles, sometimes called the RISC design principles, that architects of new general-purpose CPUs do their best to follow. External constraints, such as the requirement of being backward compatible with some existing architecture, often require compromises from time to time, but these principles are goals that most designers strive to meet. * All Instructions Are Directly Executed by Hardware All common instructions are directly executed by the hardware. They are not interpreted by microinstructions. Eliminating a level of interpretation provides Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. high speed for most instructions. For computers that implement CISC instruction sets, the more complex instructions may be broken into separate parts, which can then be executed as a sequence of microinstructions. This extra step slows the machine down, but for less frequently occurring instructions it may be acceptable. ‘© Maximize the Rate at Which Instructions Are Issued (MIPS stands for Millions of Instructions Per Second. Officially it stands for Microprocessor without Interlocked Pipeline Stages.) This principle suggests that parallelism can play a major role in improving performance, since issuing large numbers of slow instructions in a short time interval is possible only if multiple instructions can execute at once. Although instructions are always encountered in program order, they are not always issued in program order (because some needed resource might be busy) and they need not finish in program order. Of course, if instruction 1 sets a register and instruction 2 uses that register, great care must be taken to make sure that instruction 2 does not read the register until it contains the correct value. Getting this right requires a lot of bookkeeping but has the potential for performance gains by executing multiple instructions at once. ‘* Instructions Should Be Easy to Decode A critical limit on the rate of issue of instructions is decoding individual instructions to determine what resources they need. Anything that can aid this process is useful. That includes making instructions regular, of fixed length, and with a small number of fields. The fewer different formats for instructions, the better. ‘* Only Loads and Stores Should Reference Memory One of the simplest ways to break operations into separate steps is to require that operands for most instructions come from—and return to—CPU registers. The operation of moving operands from memory into registers can be performed in separate instructions. Since access to memory can take a long time, and the delay is unpredictable, these instructions can best be overlapped with other instructions assuming they do nothing except move operands between registers and memory. This observation means that only LOAD and STORE instructions should reference memory. All other instructions should operate only on registers. ‘* Provide Plenty of Registers Since accessing memory is relatively slow, many registers (at least 32) need to be provided, so that once a word is fetched, it can be kept in a register until itis no longer needed. Running out of registers and having to flush them back to memory only to later reload them is undesirable and should be avoided as much as possible. The best way to accomplish this is to have enough registers. 2.5 Instruction — Level Parallelism Computer architects are constantly striving to improve performance of the machines they design. Making the chips run faster by increasing their clock speed is one way, but for every new design, there is a limit to what is possible by brute force at that moment in history. Consequently, most computer architects look to parallelism (doing two or more things at once) as a way to get even more performance for a given clock speed. © Pipelining Ithas been known for years that the actual fetching of instructions from memory is a major bottleneck in instruction execution speed. To alleviate this problem, computers going back at least as far as the IBM Stretch (1959) have had the ability to fetch instructions from Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. memory in advance, so they would be there when they were needed. These instructions were stored in a special set of registers called the prefetch buffer. This way, when an instruction was needed, it could usually be taken from the prefetch buffer rather than waiting for a memory read to complete. In effect, prefetching divides instruction execution into two parts: fetching and actual execution. The concept of a pipeline carries this strategy much further. Instead of being divided into only two parts, instruction execution is often divided into many (often a dozen or more) parts, each one handled by a dedicated piece of hardware, all of which can run in parallel st 2 $3 s4 5 con Instwcton (oporans wecnon Wie Beck lPreftch rit] —] Decode Ui Feleh Unit, pea Unit A tve-stage pipeiine, sift ]2]ala}s|e|z7|sle s| |ilz|a}a{slel7le 33 t|2}3]4|slel7 5 s|2/s]alsle ss ilelalals nee 6 oT Bw The state of each stage as a function of time, Figure 2.3:(a) A five-stage pipeline (b) The stage of each stage as a function of time. Nine clock cycles are illustrated actual rate of processing is 500 MIPS, not 100 MIPS. Pipelining allows a trade-off between latency (how long it takes to execute an instruction), and processor bandwidth (how many MIPS the CPU has). ‘© Superscalar Architectures If one pipeline is good, then surely two pipelines are better. One possible design for a dual pipeline CPU, based on Fig. 2-3, is shown in Fig. 2-4. Here a single instruction fetch unit fetches pairs of instructions together and puts each one into its own pipeline, complete with its own ALU for parallel operation. To be able to run in parallel, the two instructions must not conflict over resource usage (e.g., registers), and neither must depend on the result of the other. As with a single pipeline, either the compiler must guarantee this, situation to hold (i.e., the hardware does not check and gives incorrect results if the instructions are not compatible), or conflicts must be detected and eliminated during execution using extra hardware. The term superscalar architecture was coined for this approach in 1987 (Agerwala and Cocke, 1987). Its roots, however, go back more than 40 years to the CDC 6600 computer. st 2 Es es Ps |_Jrnmeron | _[ cone |_[zemctor | _ [os eee eEeEer $2 33 A 3s tromston | [opera |__| ptector [ne ec | eeese tre) | eran ees Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Figure 2-4 Dual five-stage pipelines with a common instruction fetch unit. sa aw —— aw \\ Prefetcn Unit] Decoge Unit Fetcnunit a unt \iV) Figure 2-5 A superscalar processor with five functional units. 2.6 Processor-Level Parallelism Instruction-level parallelism helps a little, but pipelining and superscalar operation rarely win more than a factor of five or ten. To get gains of 50, 100, or more, the only way is to design computers with multiple CPUs. Data Parallel Computers A substantial number of problems in computational domains such as the physical sciences, ‘engineering, and computer graphics involve loops and arrays, or otherwise have a highly regular structure. Often the same calculations are performed repeatedly on many different sets of data. The regularity and structure of these programs makes them especially easy targets for speed-up through parallel execution. Two primary methods have been used to execute these highly regular programs quickly and efficiently: SIMD processors and vector processors. While these two schemes are remarkably similar in most ways, ironically, the first is generally thought of as a parallel computer while the second is considered an ex extension to a single processor. Data parallel computers have found many successful applications as a consequence of their remarkable efficiency. They are able to produce significant computational power with fewer transistors than alternative approaches. silicon gives data parallel computers a big edge over other processors, as long as the software they are running is highly regular with lots of parallelism. 1.7 Types of Memory: Primary Memory: The memory is that part of the computer where programs and data are stored. Some computer scientists (especially British ones) use the term store or storage rather than memory, although more and more, the term “storage” is used to refer to disk storage. Without a memory from which the processors can Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. read and write information, there would be no stored-program digital computers. Bit: The basic unit of memory is the binary digit, called a bit. A bit may contain a 0 or a 1. Itis the simplest possible unit. (A device capable of storing only zeros could hardly form the basis of a memory system; at least two values are needed.) People often say that computers use binary arithmetic because it is “efficient.” What they mean (although they rarely realize it) is that digital information can be restored by distinguishing between different values of some continuous physical quantity, such as voltage or current. decimal: 0001 1001 0100 0100 binary: 0000011110011000 Sixteen bits in the decimal format can store the numbers from 0 to 9999, giving only 10,000 combinations, whereas a 16-bit pure binary number can store 65,536 different combinations. For this reason, people say that binary is more efficient. a. Memory Addresses: Memories consist of a number of cells (or locations), each of which can store a piece of information. Each cell has a number, called its address, by which programs can refer to it. If a memory has n cells, they will have addresses 0 to n- 1. All cells in a memory contain the same number of bits. If a cell consists of k bits, it can hold any one of 2k different bit combinations. Following figure shows three different organizations for a 96-bit memory. Note that adjacent cells have consecutive addresses (by definition). aaseoss Adcress Cot Adaross eCOOOIIKD eCOCOoT OO eCOooeeee eo r SCOOCEOD Coco) | coe 200000 eC) eee sCOOCO00 sCoOOooe) see) 4COO00O «Coca « oe sCOOOOTD soo errr) sO rrr) eC scoot ~ tetas = TOOT 7a * a —— scoooor ° Oooo i ‘Bbits: @ Figure 2.6 Three ways of organizing a 96-bit memory. all computer manufacturers have standardized on an 8-bit cell, which is called a byte. The term octet is also used. Bytes are grouped into words. A computer with a 32-bit word has 4 bytes/word, whereas a computer with a 64-bit word has 8 bytes/word. The significance of a word is that most instructions operate on entire words, for example, adding two words together. Thus a 32-bit machine will have 32-bit registers and instructions for manipulating 32-bit words, whereas a 64-bit machine will have 64-bit registers and instructions for moving, adding, subtracting, and otherwise manipulating 64- bit words. b. Byte Ordering: The bytes in a word can be numbered from left to right or right to Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. left. At first it might seem that this choice is unimportant, but as we shall see shortly, it has major implications. c. Error-Correcting Code: Computer memories occasionally make errors due to voltage spikes on the power line, cosmic rays, or other causes. To guard against such errors, some memories use error-detecting or error-correcting codes. When these codes are used, extra bits are added to each memory word in a special way. When a word is read out of memory, the extra bits are checked to see if an error has occurred. To understand how errors can be handled, it is necessary to look closely at what an error really is. Suppose that a memory word consists of m data bits to which we will add redundant, or check, bits. Let the total length be n (ie., n = m-+r). An n-bit unit containing m data and r check bits is often referred to as an n-bit codeword. The number of bit position: which two codewords differ is called the Hamming distance (Hamming, 1950). Its main signi ince is that if two codewords are a Hamming distance d apart, it will require d single-bit errors to convert one into the other. For example, the codewords 11110001 and 00110000 are a Hamming distance 3 apart because it takes 3 single-bit errors to convert one into the other. d. Cache Memory: techniques are known for combining a small amount of fast memory with a large amount of slow memory to get the speed of the fast memory (almost) and the capacity of the large memory at a moderate price. The small, fast memory is called a cache (from the French cacher, meaning to hide, and pronounced (“‘cash”’) The basic idea behind a cache is simple: the most heavily used memory words are kept in the cache. When the CPU needs a word, it first looks in the cache. Only if the word is not there does it go to main memory. If a substantial fraction of the words are in the cache, the average access time can be greatly reduced. Success or failure thus depends on what fraction of the words are in the cache. The observation that the memory references made in any short time interval tend to use only a small fraction of the total memory is called the locality principle and forms the basis for all caching systems. The general idea is that when a word is referenced, it and some of its neighbours are brought from the large slow memory into the cache, so that the next time it is used, it can be accessed quickly. A common arrangement of the CPU, cache, and main memory is illustrated in Following figure .|f a word is read or written k times in a short interval, the computer will need 1 reference to slow memory and k - 1 references to fast memory. The larger k is, the better the overall performance. Main memory CPU Cache Bus Figure 2.7:The cache is logically between the CPU and main memory. Physically, Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. there are several possible places it could be located. When talking about these blocks inside the cache, we will often refer to them as cache lines. Cache design is an increasingly important subject for high-performance CPUs. One issue is cache size. The bigger the cache, the better it performs, but also the slower it is to access and the more it costs. A second issue is the size of the cache line. A 16-KB cache can be divided up into 1024 lines of 16 bytes, 2048 lines of 8 bytes, and other combinations. A third design issue is whether instructions and data are kept in the same cache or different ‘ones. Having a unified cache (instructions and data use the same cache) is a simpler design and automatically balances instruction fetches against data fetches. Nevertheless, the trend these days is toward a split cache, with instructions in one cache and data in the other. This design is also called a Harvard architecture, the reference going all the way back to Howard Aiken’s Mark Ill computer, which had different memories for instructions and data. fe, Memory Packaging and its Types: Since the early 1990s, a different arrangement has been used. A group of chips, typically 8 or 16, is mounted on a printed circuit board and sold as a unit. This unit is called a SIMM (Single Inline Memory Module) or a DIMM (Dual Inline Memory Module), depending on whether it has a row of connectors on one side or both sides of the board. SIMMS have one edge connector with 72 contacts and transfer 32 bits per clock cycle. They are rarely used these days. DIMMs usually have edge connectors with 120 contacts on each side of the board, for a total of 240 contacts, and transfer 64 bits per clock cycle. The most ‘common ones at present are DDR3 DIMMS, which is the third version of the double data- rate memories. A typical DIMM configuration might have eight data chips with 256 MB each. The entire module would then hold 2 GB. Many computers have room for four modules, giving a total capacity of 8 GB when using 2-GB modules and more when using larger ones. A physically smaller DIMM, called an SO-DIMM (Small Outline DIMM), is used in notebook computers. DIMMS can have a parity bit or error correction added, but since the average error rate of a module is one error every 10 years, for most garden-variety computers, error detection and correction are omitted. 1.8 — Secondary Memory: a. Memory Hierarchies: The traditional solution to storing a great deal of data is a memory hierarchy, as illustrated in Following figure. At the top are the CPU registers, which can be accessed at full CPU speed. Next comes the cache memory, which is currently on the order of 32 KB to a few megabytes. Main memory is next, with sizes currently ranging from 1 GB for entry-level systems to hundreds of gigabytes at the high end. After that come solid- state and magnetic disks, the current workhorses for permanent storage. Finally, we have magnetic tape and optical disks for archival storage. Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Main memory (RAM) Hard drive Tape drive Optical disk Figure2.8:A five-level memory hierarchy. a, Magnetic Disks: Most disks consist of multiple platters stacked vertically, as depicted in Fig. 2-9. Each surface has its own arm and head. All the arms are ganged together so they move to different radial positions all at once. The set of tracks at a given radial position is called a cylinder. Current PC and server disks typically have 1 to 12 platters per drive, giving 2 to 24 recording surfaces. High-end disks can store 1 TB on a single platter and that limit is sure to grow with time. first the arm must be moved to the right radial position. This action is called a seek. Average seek times (between random tracks) range in the 5- to 10-msec range, although seeks between consecutive tracks are now down below 1 msec. Once the head is positioned radially, there is a delay, called the rotational latency, until the desired sector rotates under the head. track 1, sonde arm assembly Figure 2-9: Magnetic Disks b. IDE Disks: Modern personal computer disks evolved from the one in the IBM PC XT, Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. which was 2 10-MB Seagate disk controlled by a Xebec disk controller on a plugin card. The Seagate disk had 4 heads, 306 cylinders, and 17 sectors/track. The controller was capable of handling two drives. The operating system read from and wrote to a disk by putting parameters in CPU registers and then calling the BIOS (Basic Input Output System), located in the PC’s built-in read-only memory. The BIOS issued the machine instructions to load the disk controller registers that initiated transfers. The technology evolved rapidly from having the controller on a separate board, to having it closely integrated with the drives, starting with IDE (Integrated Drive Electronics) drives in the mid 1980s. Eventually, IDE drives evolved into EIDE drives (Extended IDE), which also support a second addressing scheme called LBA (Logical Block Addressing), which just numbers the sectors starting at 0 up until a maximum of 2*- 1. c. SCISI Disks: SCSI disks are not different from IDE disks in terms of how their cylinders, tracks, and sectors are organized, but they have a different interface and much higher transfer rates. SCSI traces its history back to Howard Shugart, the inventor of the floppy disk, which was used on the first personal computers in the 1980s. His company introduced the SASI (Shugart Associates System Interface) disk in 1979. After some modification and quite a bit of discussion, ANSI standardized it in 1986 and changed the name to SCSI (Small Computer System Interface). SCS! is more than just a hard-disk interface. It is a bus to which a SCSI controller and up to seven devices can be attached. These can include one or more SCSI hard disks, CD-ROMs, CD recorders, scanners, tape units, and other SCSI peripherals. Each SCSI device has a unique ID, from 0 to 7 (15 for wide SCSI). Each device has two connectors: one for input and one for output. Cables connect the output of one device to the input of the next one, in series, like a string of cheap Christmas tree lamps. The last device in the string must be terminated to prevent reflections from the ends of the SCSI bus from interfering with other data on the bus. Typically, the controller is on a plug-in card and the start of the cable chain, although this. configuration is not strictly required by the standard, The most common cable for 8-bit SCSI has 50 wires, 25 of which are grounds paired one-to -one with the other 25 wires to provide the excellent noise immunity needed for high- speed operation. Of the 25 wires, 8 are for data, 1 is for parity, 9 are for control, and the remainder are for power or are reserved for future use. The 16-bit (and 32-bit) devices need a second cable for the additional signals. The cables may be several meters long, allowing for external drives, scanners, etc. SCSI controllers and peripherals can operate either as initiators or as targets. Usually, the controller, acting as initiator, issues commands to disks and other peripherals acting as targets. These commands are blocks of up to 16 bytes telling the target what to do. RAID: Patterson et al. defined RAID as Redundant Array of Inexpensive Disks, but industry redefined the | to be “Independent” rather than “Inexpensive” (maybe so they could use expensive disks?). Since a villain was also needed (as in RISC versus CISC, also due to Patterson), the bad guy here was the SLED (Single Large Expensive Disk). The idea behind a RAID is to installl a box full of disks next to the computer, typically a large server, replace the disk controller card with a RAID controller, copy the data over to the RAID, and then continue normal operation. In other words, a RAID should look like a SLED to the operating system but have better performance and better reliability. Since SCSI disks have good performance, low price, and the ability to have Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. up to 7 drives on a single controller (15 for wide SCSI), it is natural that many RAIDs consist of a RAID SCSI controller plus a box of SCSI disks that appear to the operating system as a single large disk. In this way, no software changes are required to use the RAID, a big selling point for many system administrators. In addition to appearing like a single disk to the software, all RAIDs have the property that the data are distributed over the drives, to allow parallel operation. Several different schemes for doing this were defined by Patterson et al., and they are now known as RAID level 0 through RAID level 5. The term “level” is something of a misnomer since there is no hierarchy involved; there are simply six different organizations, each with a different mix of reliability and performance characteristics. BBE (Se (Ses Ses Ses] Sra) fe] (Ses) (Ses) 1» [| [Sos] [See] [Ser] [pe] [res] [Be [Bro] fy ‘sane [Sos [Sie] [Seer] [Bars S11 Ee Figure 2.9:RAID levels 0 through 5. Backup and parity drives are shown shaded. '* Solid-State Disks: Disks made from non-volatile flash memory, often called solid- state disks (SSDs), are growing in popularity as a high-speed alternative to traditional magnetic disk technologies. While modern electronics may seem totally reliable, Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. the reality is that transistors slowly wear out as they are used. Every time they switch, they wear out a little bit more and get closer to no longer working One likely way that a transistor will fail is due to “hot carrier injection,” a failure mechanism in which an electron charge gets embedded inside a once-working transistor, leaving it in a state where it is permanently stuck on or off. While generally thought of as a death sentence for a (likely) innocent transistor, Fujio Masuoka while working for Toshiba discovered a way to harness this failure mechanism to create a new non volatile memory. In the early 1980s, he invented the first flash memory. __- Prepannna 008 an seat Suse anal oan i-— Figure 2-10: a flash memory cell Flash disks are made of many solid-state flash memory cells. The flash memory cells are made from a single special flash transistor. A flash memory cell is shown in Fig, 2-10. Embedded inside the transistor is a floating gate that can be charged and discharged using high voltages. Before being programmed, the floating gate does not affect the operation of the transistor, essentially acting as an extra insulator between the control gate and the transistor channel. If the flash cell is tested, it will act like a simple transistor. CD-ROMs: Optical disks were originally developed for recording television programs, but they can be put to more esthetic use as computer storage devices. Due to their large capacity and low price optical disks are widely used for distributing software, books, movies, and data of all kinds, as well as making backups of hard disks. A CD is prepared by using a high-power infrared laser to burn 0.8-micron diameter holes in a coated glass master disk. From this master, a mold is made, with bumps where the laser holes were. Into this mold, molten polycarbonate is injected to form a CD with the same pattern of holes as the glass master. Then a thin layer of reflective aluminum is deposited on the polycarbonate, topped by a protective lacquer and finally a label. The depressions in the polycarbonate substrate are called pits; the unburned areas between the pits are called lands. Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Figure 2-11: Recording structure of a Compact Disc or CD-ROM. In 1984, Philips and Sony realized the potential for using CDs to store computer data, so they published the Yellow Book defining a precise standard for what are now called CD- ROMs (Compact Disc-Read Only Memory). To piggyback on the by-then already substantial audio CD market, CD-ROMs were to be the same physical size as audio CDs, mechanically and optically compatible with them, and produced using the same polycarbonate injection molding machines. The consequences of this decision were that slow variable-speed motors were required, but also that the manufacturing cost of a CD-ROM would be well under one dollar in moderate volume. CD-Recordables: the equipment needed to produce a master CD-ROM (or audio CD, for that matter] was extremely expensive. But in the computer industry nothing stays expensive for long. By the mid 1990s, CD recorders no bigger than a CD player were a common peripheral available in most computer stores. These devices were still different from magnetic disks because once written, CD-ROMs could not be erased Nevertheless, they quickly found a niche as a backup medium for large magnetic hard disks and also allowed individuals or startup companies to manufacture their own small-run CD- ROMs (hundreds, not thousands) or make masters for delivery to high-volume commercial CD duplication plants. These drives are known as CD-Rs (CD-Recordables) CD-ReWritables: Although people are used to other write-once media such as paper and photographic film, there is a demand for a rewritable CD-ROM. One technology now available is CD-RW (CD-ReWritable), which uses the same size media as CD-R. However, instead of cyanine or pthalocyanine dye, CD-RW uses an alloy of silver, indium, antimony, and tellurium for the recording layer. This alloy has two stable states: crystalline and amorphous, with different reflectivities. CD-RW drives use lasers with three different powers. At high power, the laser melts the alloy, converting it from the high-reflectivity crystalline state to the low reflectivity amorphous state to represent a pit. At medium power, the alloy melts and reforms in its natural crystalline state to become a land again. At low power, the state of the material is sensed (for reading), but no phase transition occurs. The reason CD-RW has not replaced CD-R is that the CD-RW blanks are more expensive than the CD-R blanks. Also, for Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. applications consisting of backing up hard disks, the fact that once written, a CD-R cannot be accidentally erased is a feature, not a bug. DVD: The basic CD/CD-ROM format has been around since 1980. By the mid-1990s optical media technology had improved dramatically, so higher-capacity video disks were becoming economically feasible. At the same time Hollywood was looking for a way to replace analog video tapes with an optical disk technology that had higher quality, was cheaper to manufacture, lasted longer, took up less shelf space in video stores, and did not have to be rewound. It was looking as if the wheel of progress for optical disks was about to turn once again. This combination of technology and demand by three immensely rich and powerful industries has led to DVD, originally an acronym for Digital Video Disk, but now officially Digital Versatile Disk. DVDs use the same general design as CDs, with 120 -mm injection-molded polycarbonate disks containing pits and lands that are illuminated by a laser diode and read by a photodetector. What is new is the use of 1. Smaller pits (0.4 microns versus 0.8 microns for CDs) 2. A tighter spiral (0.74 microns between tracks versus 1.6 microns for CDs). A red laser (at 0.65 microns versus 0.78 microns for CDs). Blue ray: The successor to DVD is Blu-ray, so called because it uses a blue laser instead of the red one used by DVDs. A blue laser has a shorter wavelength than a red one, which allows it to focus more accurately and thus support smaller pits and lands. Single-sided Blu- ray disks hold about 25 GB of data; double-sided ones hold about 50 GB. The data rate is about 4.5 MB/sec, which is good for an optical disk, but still insignificant compared to magnetic disks (cf. ATAPI-6 at 100 MB/sec and wide Ultras SCSI at 640 MB/sec). It is. expected that Blu-ray will eventually replace CD-ROMs and DVDs, but this transition will take some years. 2.8 Input/Output: ‘As we mentioned in this chapter, a computer system has three major components: the CPU, the memories (primary and secondary), and the 1/0 (Input/Output) equipment such as printers, scanners, and modems. So far we have looked at the CPU and the memories. Now it is time to examine the I/O equipment and how it is connected to the rest of the system. * Buses The usual arrangement is a metal box with a large printed circuit board at the bottom or side, called the motherboard (parentboard, for the politically correct). The motherboard contains the CPU chip, some slots into which DIMM modules can be clicked, and various support chips. It also contains a bus etched along its length, and sockets into which the edge connectors of I/O boards can be inserted. Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Edge cohnector Figure 2-12: Physical structure of a personal computer. Each 1/0 device consists of two parts: one containing most of the electronics, called the controller, and one containing the /O device itself, such asa disk drive, The controller is usually integrated directly onto the motherboard or sometimes contained on a board plugged into a free bus slot. Even though the display (monitor) is not an option, the video controller is sometimes located on a plug- in board to allow the user to choose between boards with or without graphics accelerators, extra memory, and so on. The controller connects to its device by a cable attached to a connector on the back of the box. The job of a controller is to control its /O device and handle bus access for it. When a program wants data from the disk, for example, it gives a command to the disk controller, which then issues seeks and other commands to the drive. When the proper track and sector have been located, the drive begins outputting the data as a serial bit stream to the controller. It is the controller's job to break the bit stream up into units and write each unit into memory, as it is assembled. A unit is typically one or more words. 2.9 DMA: A controller that reads or writes data to or from memory without CPU intervention is said to be performing Direct Memory Access, better known by its acronym DMA. When the transfer is completed, the controller normally causes an interrupt, forcing the CPU to immediately suspend running its current program and start running a special procedure, called an interrupt handler, to check for errors, take any special action needed, and inform the operating system that the 1/0 is now finished. When the interrupt handler is finished, the CPU continues with the program that was suspended when the interrupt occurred Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Unit 3: Combinational logic design 3.1 Design of fast Adders: 1 bit ripple carry adder may have to much delay in developing its outputs Spthrough S,1 and C,, The delay through a network of logic gates depends on Integrated circuit fabrication technology and the number of gates in the path{s) from input(s) to output(s) . The delay incurred with any combinational logic network constructed from gate technology can be determined by adding up the network. a given We require that an arithmetic operation is completed in one clock cycle. Example: for A processor operating at 100Mhz,an addition must complete in 10ns. Suppose that the delay from C to C,: of any adders block is Ans. An n-bit addi performed in the time it takes the carry signal to reach C,.. position plus the time it takes to develop S,.1, ‘A 32-bit addition may approximately 32 ns. Two approaches can be used to reduce this delay that is Faster circuit technology and make S, and C.., independent of C, figue 31 * Carry Look Ahead Adder: A carry-Lookahead adder is a fast parallel adder as it reduces the propagation delay by more complex hardware, hence it is costlier. In this design, the carry logic over fixed groups of bits of the adder is reduced to two-level logic, which is nothing but a transformation of the ripple carry design. This method makes use of logic gates so as to look at the lower order bits of the augend and addend to see whether a higher order carry is to be generated or not. Let us discuss in detail. Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Figure 3-2 a 8 ci | Civa Condition o ° 0 o 0 0 1 0 No carry generate o 1 o ° 0 z a a T o 0 o No carry propagate i O 1 a i Z o zy 1 z 1 1 Carry generate © Ripple carry adder: Multiple full adder circuits can be cascaded in parallel to add an N-bit number. For an N - bit parallel adder, there must be N number of full adder circuits. A ripple carry adder is a logic circuit in which the carry-out of each full adder is the carry in of the succeeding next most significant full adder. It is called a ripple carry adder because each carry bit gets rippled into the next. stage. In a ripple carry adder the sum and carry out bits of any half adder stage is not valid until the carry in of that stage occurs. Propagation delays inside the logic circuitry is the reason behind this. Propagation delay is time elapsed between the application of an input and occurance of the corresponding output. Consider a NOT gate, When the input is “0” the output will be “1 ” and vice versa. The time taken for the NOT gate’s output to become “0” after the application of logic “1” to the NOT gate’s input is the propagation delay here. Similarly the carry propagation delay is the time elapsed between the application of the carry in signal and the occurance of the carry out (Cout) signal. Circuit diagram of a 4-bit ripple carry adder is shown below. Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. 3 BS a2 -B2 At 81 AO BO Full Adder 4 Full Adder 3 Full Adder 2 Full Adder 1 Cour Cin iCout Cin] ICout Cin cout Cin $3 2 st 4bit ripple carry adder www circuitstoday com Figure 3-3:Ripple carry adder ‘Sum out $0 and carry out Cout of the Full Adder 1 is valid only after the propagation delay of Full Adder 1. In the same way, Sum out $3 of the Full Adder 4 is valid only after the joint propagation delays of Full Adder 1 to Full Adder 4. In simple words, the final result of the ripple carry adder is valid only after the joint propogation delays of all full adder circuits inside it. 3.2 Fast Multiplication: Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. There are two techniques discussed for speeding the Multiplication Operation i. Reducing Maximum number of Summands: - Bit Pair Recoding of Multipliers reduces the maximum number of summands ( versions of multiplicand } that must be added to n/2 for n bit operands(1] il. Faster Summands addition: - Summand addition uses a) Carry save — In this carry generated by Full Adders in i" row propagated to (\+1) row Full Adders instead of rippling through adder in same row and b) Summands Reduction Technique - Techniques like 3 to 2, 4 to 2 reduction of summands [1] i. Reducing Maximum number of Summands using Bit Pair Recoding of Mult Bit-pair recoding of the multiplier —It is a modified Booth Algorithm, In this it uses one summand for each pair of booth recoded bits of the multiplier. Step 1: Convert the given Multiplier into a Booth Recode the Multiplier. Step 2: Group the recoded Multiplier bits in pairs and observe the following For example - If the pair is (+1 -1) - It is equivalent to the pair (0 +4) Reason: A pair (+1 -1) means adding (-1) time the Multiplicand M at shifted position i with (+1) times Multiplicand at shifted position (i+1) is equivalent to @ pair (0 +1) which adds +1 times * rs Multiplicand at position i Apait (+1 -1)= (2? xM-2° XM] =(2M—1M)=+1M Let us say Multiplicand = (1 1 0 1) and Bit pair( +1 -1) a) By Booth Recoding ) By Bit Pairing ap ye 2 a [a fo [a x )sa a] By Bit Pairing __ x [a of) [2 1 [2 [o [2 ae fee Fe Hp 2 [2 [2 [oe [2] SvSenédensin> [a [i] o [a For example - Ifthe pair is (+1 0) Itis equivalent to the pair (0 +2). A pair (+1 0) -> ( 2? x M- 2°x M = +2M -OM) = +2M, Therefore (+1 0 ) is equivalent to adding +2 times Multiplicand at position i a) By Booth Recoding b) By Bit Pairing 13 i aa or xi [gar ot] eee, az o lo jo ByBitPairing [a [a fo [1 |o 1 [2 [fo fa a _[a [o [1 [o 1 [a fo [a [o Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. For example - If the pair is (-1 +1) - It is equivalent to the pair (0-1). (-1 +1) > ( 2'-2° = -2+1 =-1) Therefore (-1 +1 ) Is equivalent to adding -1 times Multiplicand at position i The following table indicates bit-pair recoding of multiplier for all the combinations for a given multiplier (Not the booth recoded) Table 3.3: Table of multiplicand selection decisions Normal Multiplier Booth Algorithm | Modified Booth Algorithm Multiplier bit-pair | Multiplier [Booth recoded | Bit pair recoded Value bit on the | multiplier Multiplicand Selected at Position i Right ie i ia el i i i 0 0 0 0 0 (0x2"- 027) OxM o 0 1 0 +1 (0x2 + 2)M +M 0 1 0 +1 =a (aba) +bM 0 1 1 a 0 (ea aM 2M 1 0 0 I 0 (22"0x2")M -2xM 1 0 1 +1 1 (+2?= 2°)M +x 1 1 ° 0 a (0x2 127M -1xM 1 1 1 0 0 (0x27 - 0x2") ‘OxM Now let us see how bit pair Recoding Multiplier (Modified Booth) reduces the number of summands in comparison with Booth Recoding Multiplier algorithm Example 3; Multiplication of (+13) = (01101): by (-6) = (11010) By Booth Recoding of Multiplier (110 1 0)=(0 -1 +1 -10) 2scomplementofm [1 ]1 |1 |1 [1 [0 [0 [1 ]2 Reduced Operations ZscomplementofM [a Ja ]1 ]o 0 |1 |i 1 [1 [o fo ja [a 1 [1 [1 Jo [a [1 Jo Jo 12 [0 Figure 3.24: Example for Multiplication using Booth recoded of Multiplier Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. By Bit Pair Booth Recoding of Multiplier (0-1 +1-1 0) Bybit pairing O[-L +11[-1 0] HO [-2'+2°1[-2*+0] 0 [Ez] Now the Multiplicand is multiplied by Bit pair recoded multiplier ( 0, -1, -2) oO j1 }1 [0 ja oO j1 ]1 [0 [21 >; a 1 _> [> [ease 1}1}1 [0 }1 }1 |o Jo |1 Jo Figure 3.24: Example for Multiplication using Bit Pair recoded of Multiplier Note: The summands are shifted by 2 positions towards left as the multipliers generated are based on bit pairs Example 4: The worst case of Booth Recoding Multiplier. In this example the given Multipliar is (0 1 0 1 0 1) and Muttiplicand is (0.0 1 1 1 0). The recoding of Normal Multiplier using Booth Recoding and Bit pair is as shown below (O101041)—> (1-2 42-2 42-- 44) —>[ (41,2) (41,42) (41-1) > 4141 41 It is clear the worst case of booth recoded Multiplier is also reduced to n/2 summands in Bit pair Booth algorithm a) Normal Multiplier 'b) Booth Recoding Multiplier Bit Pair Recoded Multiplier of fi iyo fo jo |i a jt ]o ee ff po x fo [1 fo [a [o [a | [x Jat [a [aja Ja [a x [a fala Advantages: This reduces the maximum number of summands (versions of multiplicand) that must be added to n/2 for n bit operands Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. li, Faster Summands addition a) Carry Save Addition of Summands ‘As we know Multiplication involves the addition of several summands. A technique called carry- save addition (CSA) can be used to speed up the process, Let us consider the 4 x 4 multiplication array (m3 m2 m1 mO multiplied with q3 42 a1 a0) Use of Full Adders in first row is not needed as there is No addition of summands involved, therefore in first row (Level) is made to consists of just the AND gates that produce the partial products (m3q0, m2q0, mg and mOq0). From second row onwards n-bit full adders are use and it can be seen that in Second raw each full adder takes summands of first row as one input and summands of second row as second input with carry rippling from in row. This basically reduces the number of Full Adder by n. °7 PS PS Pa P3 P2 PL Po Figure 3.25: 4x4 Multiplication using Ripple Carry Array In this, Even though the number of Full Adder are reduced by n numbers, it can be still seen that the Carry is getting rippled from an adder to the other adder in each row (of same level) which basically delays the summands addition Instead of letting the carries ripple along the rows (I" row), they can be “saved” and introduced into the next row i.e., (i+1)" row, at the correct weighted positions, as shown in Figure 3.26. As carry in i row propagates to (i+1)" row, it frees up Cin input of three full adders in the Second row (Full adder at LSB takes carry input as Zero). Now these inputs are now used to introduce the third summand bits m2q2, m1q2, and mOq2. The summand m3q2 goes as input to the FA at the left end of next successive row [1]. Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. The Carry Save Addition of summands for n=m=4 is as shown below. m2go mgd ‘mOqo fo m3qi_ | m2qi | migi | moga m3q2 | [me | mse | Level 1 FA d FA ul FA Level2 Sy | 0 : ! Level 3 FA FA Figure 3.26: 4x4 Multiplication using Carry Save Array Now, two inputs of each of three full adders in the third row (level 3) are fed by the sum and carry outputs from the Second row (Level 2). The third input is used to introduce the bits m2q3, m1q3, and mOq3 of the fourth summand. The high-order bit m3q3 goes input to the Full Adder at the left end of next successive row. The saved carry bits and the sum bits from the third row are now added in the fourth row (Level 4), which is a ripple-carry adder, to produce the final product bits [1]. The delay through the carry-save array is somewhat less than the delay through the ripple-carry array. This is because the 5 and C vector outputs from each row are produced in parallel in one full-adder delay [1] Unit 4: Microprocessor Architecture & Programming. 4,1 8085 Microprocessor Architecture: Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. In 1977, Intel Corporation introduced an updated version of the 8080—the 8085. The 8085, was to be the last 8-bit, general-purpose microprocessor developed by Intel. Although only slightly more advanced than an 8080 microprocessor, the 8085 executed software at an even higher speed. The main advantages of the 8085 were its internal clock generator, internal system controller, and higher clock frequency. This higher level of component integration reduced the 8085's cost and increased its usefulness. Intel has managed to sell well over 100 million copies of the 8085 microprocessor, its most successful 8-bit, general- purpose microprocessor. Because the 8085 is also manufactured (second-sourced) by many other companies, there are over 200 million of these microprocessors in existence. Ey rai aa Ba | sae sac rar | Figure 4.1: 8085 Microprocessor Arc 4.2 The Microprocessor Based Personal Computer System: cture Computer systems have undergone many changes recently. Machines that once filled large areas have been reduced to small desktop computer systems because of the roprocessor. Although these desktop computers are compact, they possess compu! power that was only dreamed of a few years ago. Million-dollar mainframe computer systems, developed in the early 1980s, are not as powerful as the Pentium Core2-based computers of today. In fact, many smaller companies have replaced their mainframe computers with microprocessor-based systems. (A bus is the set of common connections that carry the same type of information. For example, the address bus, which contains 20 or more connections, conveys the memory address to the memory.) Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Computer systems have four parts pau) Hardware Software Data Susser User Figure 4.2 The typical processor system consists of: = CPU (central processing unit) ALU (arithmetic-logic unit)\ Control Logic = Registers, etc... = Memory Input / Output interfaces Interconnections between these units: . Address Bus . Data Bus . Control Bus Bus: A shared group of wires used for communicating signals among devices. address bus: the device and the location within the device that is being accessed data bus: the data value being communicated control bus: describes the action on the address and data buses = High-level language: a = b +c Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Assembly language: add r1 12 3 Machine language: 0001001010111010101 The 8-bit 8085 CPU (or MPU ~ Micro Processing Unit) communicates with the other units using a 16-bit address bus, an 8-bit data bus and a control bus — ‘Address Bus iil up E> Rea Figure 4.3: 8085 Bus Structure 4.3 Internal Architecture Before a program is written or any instruction investigated, the internal configuration of the microprocessor must be known. This section of the chapter details the program-visible internal architecture of the 8086-Core2 microprocessors. Also detailed are the function and purpose of each of these internal registers. Note that in a multiple core microprocessor each core contains the same programming model. The only difference is that each core runs a separate task or thread simultaneously. ‘* The Programming Model The programming model of the 8086 through the Core2 is considered to be program visible because its registers are used during application programming and are specified by the instructions. Other registers, detailed later in this chapter, are considered to be program invisible because they are not addressable directly during applications programming, but may be used indirectly during system programming. Only the 80286 and above contain the program-invisible registers used to control and operate the protected memory system and other features of the microprocessor. Figure 4-4 illustrates the programming model of the 8086 through the Core2 microprocessor. including the 64-bit extensions. The earlier 8086, 8088, and 80286 contain Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. { NEE 16-bit internal figure 4-4: The programming 8-bit Names model of the 8086 through the Core2 microprocessor including the 64-bit extensions. architectures, a subset of the registers shown in Figure 2-1. The 80386 through the Core2 microprocessors contain full 32-bit internal architectures. The architectures of the earlier 8086 through the 80286 are fully upward-compatible to the 80386 through the Core2. The shaded areas in this illustration represent registers that are found in early versions of the. 8086, 8088, or 80286 microprocessors and are provided on the 80386-Core2 microprocessors for compatibility to the early versions 4.4 Instruction Execution: The instructions which are to be executed by microprocessor are first stored in the memory of the processor and then executed. But the processor does not execute the instructions directly. It reads the instruction byte by byte and then executes it.An instruction is a binary pattern designed inside a microprocessor to perform a specific. function. The entire group of instructions that a microprocessor supports is called Instruction Set. 8085 has 246 instructions. Each instruction is represented by an 8-bit binary value. These 8-bits of binary value is called Op-Code or Instruction Byte. ‘* Instruction Fetch Operation All instructions (program steps) are stored in memory. To run a program, the individual instructions must be read from the memory in sequence, and executed. Program counter Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. puts the 16-bit memory address of the instruction on the address bus Control unit sends the Memory Read Enable signal to access the memory The 8-bit instruction stored in memory is placed on the data bus and transferred to the instruction decoder Instruction is decoded and executed. cag #=—idoT Microprocessor Data Bus Memory. Internal Data Bus a Accumuaer || re {| | lmsemton agsforgor Flops|| } Beseser Arithmetic/Logic Unit Controt Unit 2006 Adress Bus (Contror Signals 4.5 Instruction Set of 8085: An instruction is a binary pattern designed inside a microprocessor to perform a specific function. The entire group of instructions that a microprocessor supports is called /nstruction Set. 8085 has 246 instructions. Each instruction is represented by an 8-bit binary value.These 8- bits of binary value is called Op-Code or Instruction Byte. 4.6 Classification of Instruction Set: ‘© Data Transfer Instruction These instructions move data between registers, or between memory and registers. These instructions copy data from source to destination. While copying, the contents of source are not modified. d Ore ern Mov Rd, Rs Copy from source to destination. M, Rs Rd, M * This instruction copies the contents of the source register into the destination register. * The contents of the source register are not altered. Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. * [fone of the operands is a memory location, its location is specified by the contents of the HL registers. * Example: MOV B, C or MOV B, M Cs Operand MVI Rd, Data Move immediate 8-bit M, Data * The 8-bit data is stored in the destination register or memory. * Ifthe operand is a memory location, its location is specified by the contents of the H-L registers. © Example: MVI B, 57H or MVIM, 57H Crs xe peer LDA 16-bitaddress_ Load Accumulator © The contents of a memory location, specified by a 16-bit address in the operand, are copied to the accumulator. ‘* The contents of the source are not altered. * Example: LDA 2034H ers ee een) LDAX B/D Register Load accumulator indirect Pair ‘© The contents of the designated register pair point to a memory location. ‘© This instruction copies the contents of that memory location into the accumulator. ‘© The contents of either the register pair or the memory location are not altered. Example: LDAX B oper Operand peeerrrny LXI Reg, pair, 16-bit Load register pair immediate data * This instruction loads 16-bit data in the register pair. © Example: LX! H, 2034 H * Arithmetic Instructions Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. ‘These instructions perform the operations like: a. Addition Any 8-bit number, or the contents of register, or the contents of memory location can be added to the contents of accumulator. The result (sum) is stored in the accumulator. No two other 8-bit registers can be added directly. Example: The contents of register B cannot be added directly to the contents of register c b. Subtraction Any 8-bit number, or the contents of register, or the contents of memory location can be subtracted from the contents of accumulator. The result is stored in the accumulator. Subtraction is performed in 2’s complement form. if the result is negative, itis stored in 2's complement form.No two other 8-bit registers can be subtracted directly. c. Increment and Decrement The 8-bit contents of a register or a memory location can be incremented or decremented by 1.The 16-bit contents of a register pair can be incremented or decremented by 1.lncrement or decrement can be performed on any register or a memory location. © Logical Instructions These instructions perform logical operations on data stored in registers, memory and status flags. The logical operations are: a, AND,OR,XOR: Any 8-bit data, or the contents of register, or memory location can logically have AND operation, OR operation,XOR operation with the contents of accumulator. b. Rotate: Each bit in the accumulator can be shifted either left or right to the next position. ©. Compare: Any 8-bit data, or the contents of register, or memory location can be compares for: Equality, Greater Than,Less Than with the contents of accumulator. ‘* Complement: The contents of accumulator can be complemented. Each O is replaced by 1 and each 1 is replaced by 0. ‘© Branching Instructions: Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. oer ort) een CALL: 16-bit Call unconditionally address The program sequence is transferred to the memory location specified by the 16-bit address given in the operand. Before the transfer, the address of the next instruction after CALL (the contents of the program counter) is pushed onto the stack. Example: CALL 2034 H. oer ont) pT RET None Return unconditionally The program sequence is transferred from the subroutine to the calling program. The two bytes from the top of the stack are copied into the program counter, and program execution begins at the new address. Example: RET. ‘* Control instructions: The control instructions control the operation of microprocessor. Operand eats None No operation * No operation is performed. ‘* The instruction is fetched and decoded but no operation is executed. * Example: NOP 4.7 Memory Addressing: Addressing modes of 8085: '* To perform any operation, we have to give the corresponding instructions to the microprocessor. * Ineach instruction, programmer has to specify 3 things: * Operation to be performed. «Address of source of data © The method by which the address of source of data or the address of destin Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. result is given in the instruction is called Addressing Modes. ‘* The term addressing mode refers to the way in which the operand of the instruction is specified. * Inte! 8085 uses the following addressing modes: © Direct Addressing Mode n this mode, the address of the operand is given in the instruction itself. LDA is the operation.2500 H is the address of source.Accumulator is the destination © Register Addressing Mode In this mode, the operand is in general purpose register. MOV is the operation. Bis the source of data. Ais the destination. © Register Indirect Addressing Mode In this mode, the address of operand is specified by a register pair. MOV is the operation. Mis the memory location specified by H-L register pair. Ais the destination. < Immediate Addressing Mode © In this mode, the operand is specified within the instruction itself. ‘* MVlis the operation. * 05 His the immediate data (source). Ais the destination © Implicit Addressing Mode © If address of source of data as well as address of destination of result is fixed, then there is no need to give any operand along with the instruction. 4.8 Microcontroller ‘A microcontroller is a single chip microcomputer made through VLSI fabrication. A microcontroller also called an embedded controller because the microcontroller and its support circuits are often built into, or embedded in, the devices they control. A microcontroller is available in different word lengths like microprocessors Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. (4bit 8bit,16bit,32bit,64bit and 128-bit microcontrollers are available today) Figure 4.5: Microcontroller Chip 1) Amicrocontroller basically contains one or more following components: Central processing unit(CPU) Random Access Memory)(RAM) Read Only Memory(ROM) Input/output ports Timers and Counters Interrupt Controls Analog to digital converters Digital analog converters Serial interfacing ports Oscillatory circuits A microcontroller internally consists of all features required for a computing system and functions as a computer without adding any external digital parts in it. Most of the pins in the microcontroller chip can be made programmable by the user. A microcontroller has many bit handling instructions that can be easily understood by the programmer. A microcontroller is capable of handling Boolean functions. Higher speed and performance. On-chip ROM structure in a microcontroller provides better firmware security. Easy to design with low cost and small size. 4.7 A Single chip Microcontroller 4.8 Microprocessor Vs Microcontroller Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. VO port cpu cceRoesc al timer Rom RAM MICROCONTROLLER MICROPROCESSOR: Figure 4.6 : The main comparison between microprocessor and microcontroller shown in fig (1.2) Difference between Microprocessor and Microcontroller Microprocessors Microcontrollers 1 Tt is only a general purpose computer | Itis a microcomputer itself cPU 2 Memory, 1/0 ports, timers, interrupts | All are integrated inside the microcontroller chip are not available inside the chip 3 This must have many additional digital | Can function as a microcomputer without any a components to perform its operation _| components. 4 Systems become bulkier and expensive. | Make the system simple, economic and compact 5 Not capable for handling Boolean | Handling Boolean functions functions 6 Higher accessing time required Low accessing time 7 Very few pins are programmable Most of the pins are programmable 8 Very few number of bit handling | Many bit handling instructions instructions 3 Widely Used in modern PC and laptops | widely in small control systems Eg INTEL 8086, INTEL Pentium series INTELB051,89960,PICIGF877 Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. 4.9 8/16 Bit Microcontrollers: Microcontrollers are like small computers that can carry out small programs and are often used for automation and robotics. The most popular to those who are just starting out are 8 bit and 16 bit microcontrollers. The main difference between 8 bit and 16 bit ocontrollers is the width of the data pipe. As you may have already deduced, an 8 bit microcontroller has an 8 bit data pipe while a 16 bit microcontroller has a 16 bit data pipe. This fundamental difference between 8 bit and 16 bit microcontrollers is felt during mathematical operations. A 16 bit number gives you a lot more precision than 8 bit numbers. Although relatively rare, using an & bit microcontroller may not suffice the required accuracy of the application. 16 bit microcontrollers are also more efficient in processing math operations on numbers that are longer than 8 bits. A 16 bit microcontroller can automatically operate on two 16 bit numbers, like the common definition of an integer. But when you are using an 8 bit microcontroller, the process is not, as straightforward. The functions implemented to operate on such numbers will take additional cycles. Depending on how processing intensive your application is and on how many calculations you do, this may affect the performance of the circuit. Another key difference between 8 bit and 16 bit microcontrollers is in their timers. 8 bit al range of 0x00 ~ OxFF (0-255) every ocontrollers can only use 8 bits, resulting in a cycle. In contrast, 16 bit microcontrollers, with its 16 bit data width, has a range of 0x0000 — OxFFFF (0-65535) for every cycle. A longer timer maximum value can surely come in handy in certain applications and circuits. Initially, the price of 16 bit microcontrollers was way above that of 8 bit microcontrollers. But as time progressed and designs improved, the price of 8 bit and 16 bit microcontrollers has reduced quite a lot. 8 bit microcontrollers can be purchased dirt cheap. While 16 bit microcontroller cost more, prices tend to vary a lot depending on the features that are included in the microcontroller. Case Stud : Learning model of 8051 microcontroller Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. 8051 introduced by Intel in late 1970s.Now produced by many companies in many variations.The most pupular microcontroller ~ about 40% of market share 8-bit microcontroller 128 Bytes ‘Two 16 Bit Data Timer/Event Memory Counters Oscillator and 4096 Bytes timing Program Memory 8051) we nal data bus GK Bye bus | [ReaD] || Programmable fee Expansion ’ort Full Duplex. | Control CS ae —, shite ES ce ee Parallel ports ae: Address Data Bus seria Taput VO pins Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Unit 5 : Pentium Microprocessor! 5.1 Pentium Microprocessor The Pentium microprocessor signals an improvement to the architecture found in the 80486 microprocessor. The changes include an improved cache structure, @ wider data bus width, a faster numeric coprocessor, a dual integer processor, and branch prediction logic. The cache has been reorganized to form two caches that are each 8K bytes in size, one for caching data, and the other for instructions. The data bus width has been increased from 32 bits to 64 bits. The numeric coprocessor operates at approximately five times faster than the 80486 numeric coprocessor. A dual integer processor often allows two instructions per clock. Finally, the branch prediction logic allows programs that branch to execute more efficiently. Notice that these changes are internal to the Pentium, which makes software upward-compatible from earlier Intel 80X86 microprocessors. A later improvement to the Pentium was the addition of the MMX instructions. Before the Pentium or any other microprocessor can be used in a system, the function of each pin must be understood, This section of the chapter details the operation of each pin, along with the external memory system and 1/0 structures of the Pentium microprocessor. Figure 5-1 illustrates the pin-out of the Pentium microprocessor, which is packaged in a huge 237- pin PGA (pin grid array). The Pentium was made available in two versions: the fullblown Pentium and the P24T version called the Pentium OverDrive. The P24T version contains a 32-bit data bus, compatible for insertion into 80486 machines, which contains the P24T socket. The P24T version also comes with a fan built into the unit. The most notable difference in the pin-out of the Pentium, when compared to earlier 80486 microprocessors, is that there are 64 data bus connections instead of 32, which require a larger physical footprint. The architectural representation of the Pentium processor is considered to be an advancement of 80386 and 80486 microprocessors. Basically, Pentium has included modifications related to cache structure, the width of the data bus, numeric coprocessor with faster speed along with providing dual integer processor In the case of a Pentium processor, there are two caches, one for caching data while another for caching information and each one is of 8K size. By using a dual integer processor, two instructions can be executed in each clock cycle. The data bus width in Pentium is 64-bit which was 32-bit in 80386 and the numeric coprocessor exhibits quite a faster speed than that of 80486. Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur. Pentium core on so SESRRER RSS| FSP PRT ES TRE ESS PP SEE iH SH3S FESS REREBEES Pe DBRUMREBRRSRRZAZARERZzBREBERD| i “re 55 ejazuyend eegneeemmpriceerreeen res men FEF TTEPRPP EE EPA PERSE EH ett Me 35 38; # ie bliin FigureS-1: The pin-out of the Pentium microprocessor. 5.2 The Memory System: Mrs. Tejashri Deokar Assistant Professor Dept of CSE(Data Science), DYPCTE Kolhapur.

You might also like