You are on page 1of 8

Compiler

Design
ICT023
NCST-CSD - BSCS
Size vs
Speed
Size VS Speed
Size versus speed refers to the trade-off between
the size of the generated lexical analyzer and the
speed at which it operates.
Lexers
A lexer, also known as a scanner, is a program that breaks down
the input source code into a sequence of tokens, which are the
basic building blocks of a program. These tokens can then be
passed on to the next stage of the compilation process, which
typically involves parsing the tokens to build an abstract syntax
tree.
Lexer Generator
A lexer generator is a tool that generates a lexer program
automatically based on a set of rules or regular expressions. The
input to the lexer generator is a description of the tokens in the
language being compiled, along with the regular expressions that
define each token. The output is a lexer program that can be
integrated into the compiler.
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class LexerGenerator {


private static final Pattern[] PATTERNS = {
Pattern.compile("\\d+"), // Matches any digit (0-9)
Pattern.compile("[a-zA-Z]+"), // Matches any letter (a-z, A-Z)
Pattern.compile("\\s+"), // Matches any whitespace character
Pattern.compile("[^\\w\\s]") // Matches any non-word, non-space character
};

public static void main(String[] args) {


String input = "This is a sample input string 123 with digits and special characters !@#";
Matcher matcher;

for (Pattern pattern : PATTERNS) {


matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.printf("Matched pattern '%s': %s%n", pattern.pattern(), matcher.group());
} }}}
Regular Languages
Regular languages refer to a class of formal languages that can
be generated by regular expressions or represented by a
deterministic finite automaton (DFA)
Regular Languages
These languages are simple and have well-defined properties
that make them easy to analyze and manipulate in the process
of lexical analysis. They are commonly used to specify the
syntax of programming languages and to generate lexical
analyzers, also known as lexers or scanners, that recognize the
tokens in the input source code.

You might also like