You are on page 1of 13

Assembler

An assembly language is a low-level language for programming computers. It implements a symbolic representation of the numeric machine codes and other constants needed to program a particular CPU architecture. This representation is usually defined by the hardware manufacturer, and is based on abbreviations (called mnemonics) that help the programmer remember individual instructions, registers, etc. An assembly language is thus specific to a certain physical or virtual computer architecture (as opposed to most high-level languages, which are usually portable). Assembly languages were first developed in the 1950s, when they were referred to as second generation programming languages. They eliminated much of the error-prone and time-consuming first-generation programming needed with the earliest computers, freeing the programmer from tedium such as remembering numeric codes and calculating addresses. They were once widely used for all sorts of programming. However, by the 1980s (1990s on small computers), their use had largely been supplanted by high-level languages, in the search for improved programming productivity. Today, assembly language is used primarily for direct hardware manipulation, access to specialized processor instructions, or to address critical performance issues. Typical uses are device drivers, low-level embedded systems, and real-time systems. A utility program called an assembler is used to translate assembly language statements into the target computer's machine code. The assembler performs a more or less isomorphic translation (a one-to-one mapping) from mnemonic statements into machine instructions and data. (This is in contrast with high-level languages, in which a single statement generally results in many machine instructions. This is done by one of two means: a compiler is used to most-efficiently translate high-level language statements into machine code "executable" files; an interpreter executes similar statements directly and in its own application environment.) Many sophisticated assemblers offer additional mechanisms to facilitate program development, control the assembly process, and aid debugging. In particular, most modern assemblers (although many have been available for more than 40 years already) include a macro facility (described below), and are called macro assemblers.

Assembler
Typically a modern assembler creates object code by translating assembly instruction mnemonics into opcodes, and by resolving symbolic names for memory locations and other entities.[1] The use of symbolic references is a key feature of assemblers, saving tedious calculations and manual address updates after program modifications. Most

assemblers also include macro facilities for performing textual substitutione.g., to generate common short sequences of instructions to run inline, instead of in a subroutine. More sophisticated high-level assemblers provide language abstractions such as:

Advanced control structures High-level procedure/function declarations and invocations High-level abstract data types, including structures/records, unions, classes, and sets Sophisticated macro processing Object-Oriented features such as encapsulation, polymorphism, inheritance, interfaces

Assembly language
A program written in assembly language consists of a series of instructions--mnemonics that correspond to a stream of executable instructions, when translated by an assembler, that can be loaded into memory and executed. For example, an x86/IA-32 processor can execute the following binary instruction as expressed in machine language (see x86 assembly language):

Binary: 10110000 01100001 (Hexadecimal: B0 61)

The equivalent assembly language representation is easier to remember (example in Intel syntax, more mnemonic):
MOV AL, 61h

This instruction means:

Move the value 61h (or 97 decimal; the h-suffix means hexadecimal) into the processor register named "AL".

The mnemonic "mov" represents the opcode 1011 which moves the value in the second operand into the register indicated by the first operand. The mnemonic was chosen by the instruction set designer to abbreviate "move", making it easier for the programmer to remember. A comma-separated list of arguments or parameters follows the opcode; this is a typical assembly language statement. In practice many programmers drop the word mnemonic and, technically incorrectly, call "mov" an opcode. When they do this they are referring to the underlying binary code which it represents. To put it another way, a mnemonic such as "mov" is not an opcode, but as it symbolizes an opcode, one might refer to "the opcode mov" for example when one intends to refer to the binary opcode it symbolizes rather than to the symbol--the

mnemonic--itself. As few modern programmers have need to be mindful of actually what binary patterns are the opcodes for specific instructions, the distinction has in practice become a bit blurred among programmers but not among processor designers. Transforming assembly into machine language is accomplished by an assembler, and the reverse by a disassembler. Unlike in high-level languages, there is usually a one-to-one correspondence between simple assembly statements and machine language instructions. However, in some cases, an assembler may provide pseudoinstructions which expand into several machine language instructions to provide commonly needed functionality. For example, for a machine that lacks a "branch if greater or equal" instruction, an assembler may provide a pseudoinstruction that expands to the machine's "set if less than" and "branch if zero (on the result of the set instruction)". Most full-featured assemblers also provide a rich macro language (discussed below) which is used by vendors and programmers to generate more complex code and data sequences. Each computer architecture and processor architecture has its own machine language. On this level, each instruction is simple enough to be executed using a relatively small number of electronic circuits. Computers differ by the number and type of operations they support. For example, a new 64-bit machine would have different circuitry from a 32-bit machine. They may also have different sizes and numbers of registers, and different representations of data types in storage. While most general-purpose computers are able to carry out essentially the same functionality, the ways they do so differ; the corresponding assembly languages reflect these differences. Multiple sets of mnemonics or assembly-language syntax may exist for a single instruction set, typically instantiated in different assembler programs. In these cases, the most popular one is usually that supplied by the manufacturer and used in its documentation.

Linker
In computer science, a linker or link editor is a program that takes one or more objects generated by a compiler and combines them into a single executable program. On Unix variants the term loader is often used as a synonym for linker. Because this usage blurs the distinction between the compile-time process and the run-time process, this article will use linking for the former and loading for the latter. However, in some operating systems the same program handles both the jobs of linking and loading a program; see dynamic linking. Computer programs typically comprise several parts or modules; all these parts/modules need not be contained within a single object file, and in such case refer to each other by means of symbols. Typically, an object file can contain three kinds of symbols:

defined symbols, which allow it to be called by other modules, undefined symbols, which call the other modules where these symbols are defined, and local symbols, used internally within the object file to facilitate relocation.

When a program comprises multiple object files, the linker combines these files into a unified executable program, resolving the symbols as it goes along. Linkers can take objects from a collection called a library. Some linkers do not include the whole library in the output; they only include its symbols that are referenced from other object files or libraries. Libraries exist for diverse purposes, and one or more system libraries are usually linked in by default. The linker also takes care of arranging the objects in a program's address space. This may involve relocating code that assumes a specific base address to another base. Since a compiler seldom knows where an object will reside, it often assumes a fixed base location (for example, zero). Relocating machine code may involve re-targeting of absolute jumps, loads and stores. The executable output by the linker may need another relocation pass when it is finally loaded into memory (just before execution). This pass is usually omitted on hardware offering virtual memory every program is put into its own address space, so there is no conflict even if all programs load at the same base address. This pass may also be omitted if the executable is a position independent executable.

Dynamic linking
Many operating system environments allow dynamic linking, that is the postponing of the resolving of some undefined symbols until a program is run. That means that the executable code still contains undefined symbols, plus a list of objects or libraries that will provide definitions for these. Loading the program will load these objects/libraries as well, and perform a final linking. This approach offers two advantages:

Often-used libraries (for example the standard system libraries) need to be stored in only one location, not duplicated in every single binary. If an error in a library function is corrected by replacing the library, all programs using it dynamically will benefit from the correction after restarting them. Programs that included this function by static linking would have to be re-linked first.

Loader
In computing, a loader is the part of an operating system that is responsible for loading programs from executables (i.e., executable files) into memory, preparing them for execution and then executing them. The loader is usually a part of the operating system's kernel and usually loaded at system boot time and stays in memory until the system is rebooted, shut down, or powered off. Some operating systems that have a pageable kernel may have the loader in the pageable part of memory and thus the loader sometimes may be swapped out of memory. All operating systems that support program loading have loaders. Some embedded operating systems in highly specialized computers run only one program and have no program loading capabilities and thus no loaders, for example embedded systems in cars or stereo equipment.

Relocating loaders
Some computers need relocating loaders, which adjust addresses (pointers) in the executable to compensate for variations in the address at which loading starts. The computers which need relocating loaders are those in which pointers are absolute addresses rather than offsets from the program's base address. One well-known example is IBM's System/360 mainframes and their descendants, including the System z9 series.

Dynamic linkers
Dynamic linkers are another type of loader that load and link shared libraries (like .dll files) to already loaded running programs.

Programming tool
A programming tool or software development tool is a program or application that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs that can be combined together to accomplish a task, much as one might use multiple hand tools to fix a physical object. Tools were originally simple and light weight. As some tools have been maintained, they have been integrated into more powerful integrated development environments (IDEs). These environments consolidate functionality into one place, sometimes increasing simplicity and productivity, other times sacrificing flexibility and extensibility. The workflow of IDEs is routinely contrasted with alternative approaches, such as the use of Unix shell tools with text editors like Vim and Emacs. The distinction between tools and applications is murky. For example, developers use simple databases (such as a file containing a list of important values) all the time as tools. However a full-blown database is usually thought of as an application in its own right. For many years, computer-assisted software engineering (CASE) tools were sought after. Successful tools have proven elusive. In one sense, CASE tools emphasized design and architecture support, such as for UML. But the most successful of these tools are IDEs. The ability to use a variety of tools productively is one hallmark of a skilled software engineer.

Categories
Software development tools can be roughly divided into the following categories:

performance analysis tools debugging tools static analysis and formal verification tools correctness checking tools memory usage tools application build tools integrated development environment

List of tools
Software tools come in many forms:

Bug Databases: gnats, Bugzilla, Trac, Atlassian Jira, LibreSource, SharpForge Build Tools: Make, automake, Apache Ant, SCons, Rake, Flowtracer, cmake, qmake

Code coverage: C++test,GCT, Insure++, Jtest, CCover Code Sharing Sites: Freshmeat, Krugle, Sourceforge, ByteMyCode. See also Code search engines. Compilation and linking tools: GNU toolchain, gcc, Microsoft Visual Studio, CodeWarrior, Xcode, ICC Debuggers: gdb, GNU Binutils, valgrind. Debugging tools also are used in the process of debugging code, and can also be used to create code that is more compliant to standards and portable than if they were not used. Disassemblers: Generally reverse-engineering tools. Documentation generators: Doxygen, help2man, POD, Javadoc, Pydoc/Epydoc, asciidoc Formal methods: Mathematically-based techniques for specification, development and verification GUI interface generators: Qt Designer, Cocoa InterfaceBuilder, Windows Forms Visual Studio Library interface generators: Swig Integration Tools: OESIS Memory Use/Leaks/Corruptions Detection: Aard, dmalloc, Electric Fence, duma, Insure++. Memory leak detection: In the C programming language for instance, memory leaks are not as easily detected - software tools called memory debuggers are often used to find memory leaks enabling the programmer to find these problems much more efficiently than inspection alone.

Text editor
A text editor is a type of program used for editing plain text files. Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code.

Plain text files vs. word processor files


There are important differences between plain text files created by a text editor, and document files created by word processors such as Microsoft Word, WordPerfect, or OpenOffice.org. Briefly:

A plain text file is represented and edited by showing all the characters as they are present in the file. The only characters usable for 'mark-up' are the control characters of the used character set; in practice this is newline, tab and formfeed. The most commonly used character set is ASCII, especially recently, as plain text files are more used for programming and configuration and less frequently used for documentation than in the past. Documents created by a word processor generally contain fileformat-specific "control characters" beyond what is defined in the character set. These enable functions like bold, italic, fonts, columns, tables, etc. These and other common page formatting symbols were once associated only with desktop publishing but are now commonplace in the simplest word processor. Word processors can usually edit a plain text file and save in the plain text file format. However one must take care to tell the program that this is what is wanted. This is especially important in cases such as source code, HTML, and configuration and control files. Otherwise the file will contain those "special characters" unique to the word processor's file format and will not be handled correctly by the utility the files were intended for.

Interpreter
In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language. While interpretation and compilation are the two principal means by which programming languages are implemented, these are not fully distinct categories, one of the reasons being that most interpreting systems also perform some translation work, just like compilers. An interpreter may be a program that either 1. executes the source code directly 2. translates source code into some efficient intermediate representation (code) and immediately executes this 3. explicitly executes stored precompiled code[1] made by a compiler which is part of the interpreter system The terms interpreted language or compiled language merely mean that the canonical implementation of that language is an interpreter or a compiler; a high level language is basically an abstraction which is (ideally) independent of particular implementations. The main disadvantage of interpreters is that when a program is interpreted, it typically runs slower than if it had been compiled. The difference in speeds could be tiny or great; often an order of magnitude and sometimes more. It generally takes longer to run a program under an interpreter than to run the compiled code but it can take less time to interpret it than the total time required to compile and run it. This is especially important when prototyping and testing code when an edit-interpret-debug cycle can often be much shorter than an edit-compile-run-debug cycle. Interpreting code is slower than running the compiled code because the interpreter must analyze each statement in the program each time it is executed and then perform the desired action, whereas the compiled code just performs the action within a fixed context determined by the compilation. This run-time analysis is known as "interpretive overhead". Access to variables is also slower in an interpreter because the mapping of identifiers to storage locations must be done repeatedly at run-time rather than at compile time. There are various compromises between the development speed when using an interpreter and the execution speed when using a compiler. Some systems (e.g., some LISPs) allow interpreted and compiled code to call each other and to share variables. This means that once a routine has been tested and debugged under the interpreter it can be compiled and thus benefit from faster execution while other routines are being developed. Many interpreters do not execute the source code as it stands but convert it into some more compact internal form. For example, some BASIC interpreters replace keywords with single byte tokens which can be used to find the instruction in a jump table. An interpreter might well use the same lexical analyzer and parser as the compiler and then interpret the resulting abstract syntax tree.

Debug Monitor
Angel is the debug monitor program for ARM processors, as supplied with the ARM Development Board (PID) as well as many other boards from ARM's semiconductor and tools partners. It is provided in source and binary form with the ARM Software

Development Toolkit. It provides the following services to the developer: Debug capability, including memory inspection, image download and execution, breakpointing and single step CPU and board startup and basic exception handling A full ANSI C library, using semihosting to provide services from the host which are not available on the target A full source distribution, allowing developers a kickstart in developing standalone applications Angel interfaces with the ARM Software Development Toolkit (SDT) in two ways: The interface library "Remote_A" is used by debuggers to communicate with an Angel target when debugging or executing code. Application code uses software interrupt (SWI) calls to request services of Angel either directly or via the toolkit's C library. Angel Debug Monitor 1.20 This version of Angel, as supplied with SDT 2.51. This release introduces better operation in supervisor mode, and makes the use of application SWIs, IRQs and FIQs more robust. It is thus now possible to single step interrupt service routines, and use applications which use supervisor mode in combination with interrupts. The serialiser has been overhauled, with: Task lifetime context blocks All CPU registers saved on task switch Improved interrupt handling and masking Reduced interrupt latency The internal debugging code has been significantly improved RDI reads and writes of 2 and 4 byte quantities now do short and word accesses, architecture permitting For compatibility with other ARM debug agents two new SWIs, SYS_ELAPSED and SYS_TICKFREQ have been added although The interrupt service routine does not zero angel_GhostCount, the count of interrupts for which no service routine could be found. Angel will reset once this count reaches 5. The following points are relevant for those who have written ports of Angel or otherwise use the source code. References to Angel_MutexSharedTempRegBlocks should be changed to Angel_GlobalRegBlock

A number of new constants have been included in pid/makelo.c, and will need to be added to any port-specific version. Some include: DEBUG_BASE and DEBUG_SIZE, an area of memory used when compiling Angel with DEBUG=1 by the event logging code. It can be small (1Kb), but is more useful when large TS_Running et al, the task state enumeration RB_Interrupted et al, the global regblock names Device drivers need to call Angel_SerialiseTask with an extra trailing parameter, which should be angel_GlobalRegBlock[RB_Interrupted].