You are on page 1of 4

Lexical rules

identifier

starts with a letter or underscore, and is followed by zero or more letters, underscores, or digits and
is not a keyword token

literal string

starts with a quote, followed by zero or more non-quote characters, and ends with a quote

literal integer

one or more digits

literal float

one dot, one or more digits and no other characters

longest token rule

when creating a token, create the longest token possible

whitespace rule

for whitespace not at the start of a line:

● if whitespace is inside a literal string it is part of the literal string


● otherwise, it ends the current token and no token is created for it
Current lexical tables:
delimiter

( ) [ ] {

} , : . ;

@ = -> += -=

*= /= //= %= &=

|= ^= >>= <<= **=

operator

+ - * ** /

// % << >> &

| ^ ~ < >

<= >= == !=
keyword

False class finally is return

None continue for lambda try

True def from nonlocal while

and del global not with

as elif if or yield

assert else import pass

break except in raise

Current individual tokens:


newline

a token that represents an end of line character, created by pressing the enter key

indent

a token that represents all the whitespace at the start of a line that is larger than the whitespace at
the start of the previous line

dedent

a token that represents some of the whitespace at the start of a line that is smaller than the
whitespace at the start of the previous line. See the Lexical Analysis section of the Python Language
Reference Manual for a complete explanation of how dedent tokens are created.
The Python interpreter creates a newline token for the last line of a program. In addition if the
last line of the program is indented, the interpreter also creates dedent tokens so that the
total number of dedent tokens equals the total number of indent tokens in the program.

You might also like