You are on page 1of 2

Manipulate and Translate Machine and Assembly language

How does an assembler convert assembly language in ASCII to


binary machine code?
It sounds like your question is actually multiple questions.  I'll try to break them down at a high
level, since you can drill down fairly deeply on each of these questions.  I'll just give an
overview.

 Bootstrapping.  How can a compiler/assembler/etc. be written in its own language?


 Representation.  How is my input represented inside the computer such that a
program can operate on it?
 Translation.  Once the computer has my program in an electronic format, how does it
turn that into machine code?
Representation
When you type your assembly program into a computer, the computer immediately converts it
into patterns of 0s and 1s.  You mentioned ASCII code, which is fairly common today.  The
letter A in ASCII is stored as the binary pattern 01000001, for instance.  For convenience, we
usually write binary sequences in hexadecimal; the binary sequence for the letter A is 41 in
hexadecimal.  Note that hexadecimal is just a notation.  The computer still stores 1s and 0s.
Older computers used other mappings between letters and binary patterns, but the principle
extends from the earliest computers that manipulated text up through the computers today. 
ASCII only really supports English well.

Translation
The assembler's job is to read these binary patterns and translate them into a different
representation, namely the machine code of the target processor.  Translation comprises four
main steps:

1. Parsing / lexical analysis.  This step reads the bytes and tries to group them into
lexical tokens.  In the case of an assembler, it groups up bytes and determines if they
represent mnemonics, register names, numbers, labels, directives, or other syntactic
pieces such as commas and colons.  
You can compare this to trying to understand an English sentence by first breaking it
up into words, numbers and punctuation, for example, as opposed to just a long,
undifferentiated sequence of letters.
(code ko byte ma convert krna or is ma reg or mnemonic jasi cheezon ko check krna

2. Syntactic analysis.  This step groups up the tokens and tries to make sense of them. 
Not all sequences of tokens are valid in assembly language.  MOV EAX, EBX is a
valid x86 assembly statement.  EAX, MOV EBX is not, even though it has the same
tokens, just in a different order.  

Continuing the comparison to reading English:  This is similar to understanding the


structure of a sentence, such as whether it has a subject, a verb, and so on.  You can
find basic syntactic errors at this step.  For example, if you read "This sentence no
verb," you'd immediately notice there's a grammar issue with that sentence even
though all the words are spelled correctly.
[is ma codes ko combine kr deya jaeay ga or is ke logic ko check keya jaey ga agr
logic galat ha to error aey ga or us ke correction krni pry gi]

3. Semantic analysis. At this step, the assembler determines the meaning of what you
wrote.  The assembler now determines that the statement MOV EAX, EBX means you
want the processor instruction that moves between two registers, and you want it to
move between registers EAX and EBX.  If you ask for an impossible instruction, it's at
this point the assembler notices.  For example, MOV EIP, EAX is not a legal
instruction on x86, even though it's syntactically correct.

When reading English, you do the same thing when trying to understand the meaning
of a sentence.  "Turn the radio on" asks you to perform an action on the radio.  "Boil
the radio purple," on the other hand is syntactically correct, but I'd be at a loss as to
what it means.
[jo logic banai gai ha us ka matlab bataey ga ka ya kia mov wagyra kr raha ha]

4. Code generation.  Once the assembler has determined what all of your statements
mean, it can generate the actual stream of machine code 1s and 0s for your program. 
Most assembly statements have a direct 1:1 translation to machine code, so the process
is fairly straightforward.  Some assembly statements, such as assembler directives,
guide the overall process.  And other aspects of assembly code, such as resolving
labels, require additional steps.

Comparing to reading English again, this is where you'd actually act on what you just
read.  Most English sentences are straightforward.  Some need additional context to be
fully understood.  For example, pronouns in English need antecedents, just like label
references in assembly need label definitions.
[is ma code ke generation ho gi]

You might also like