Lexical Analyzer Implementation in C

LEXICAL ANALYSER
ABSTRACT
This project aims to implement a lexical analyzer for a programming
language using the C programming language. The program will analyze the
source code of the programming language and identify the various tokens
and lexemes present in the code. The project team will design and
implement the program, optimize it for performance and efficiency, and
provide comprehensive documentation for its usage. The success of the
project will depend on careful planning, adequate testing, and overcoming
the challenges posed by limited experience with implementing lexical
analyzers and unclear specifications for the programming language. The
deliverables will include a working lexical analyzer, a user manual, a
technical report, and a test plan and test cases. Ultimately, the project
will contribute to the development of a programming language by providing
a valuable component in its implementation.
INTRODUCTION
AIM OF THE PROJECT
Aim of the project is to develop a Lexical Analyzer that can generate tokens for the
further processing of compiler.
PURPOSE OF THE PROJECT
The lexical features of a language can be specified using types-3 grammar. The job of the lexical
analyzer is to read the source program one character at a time and produce as output a stream of
tokens. The tokens produced by the lexical analyzer serve as input to the next phase, the parser.
Thus, the lexical analyzer’s job is to translate the source program in to a form more conductive to
recognition by the parser.
GOALS
To create tokens from the given input stream.
SCOPE OF PROJECT
Lexical analyzer converts the input program into character stream of valid words of language,
known as tokens.
The parser looks into the sequence of these tokens & identifies the language construct occurring
in the input program. The parser and the lexical analyzer work hand in hand; in the sense that
whenever the parser needs further tokens to proceed, it request the lexical analyzer. The lexical
analyzer in turn scans the remaining input stream & returns the next token occurring there. Apart
from that, the lexical analyzer also participates in the creation & maintenance of symbol table.
This is because lexical analyzer is the first module to identify the occurrence of a symbol. If the
symbol is getting defined for the first time, it needs to be installed into the symbol table. Lexical
analyzer is most widely used for doing the same.
SYSTEM DESIGN
Process: The lexical analyzer is the first phase of a compiler. Its main task is to read the input
characters and produce as output a sequence of tokens that the parser uses for syntax analysis. This
interaction, summarized schematically .
Upon receiving a “get next token “command from the parser, the lexical analyzer reads the input
characters until it can identify next token.
Sometimes, lexical analyzers are divided into a cascade of two phases, the first called “scanning”,
and the second “lexical analysis”.
The scanner is responsible for doing simple tasks, while the lexical analyzer proper does the more
complex operations.
The lexical analyzer which we have designed takes the input from an input file. It reads one
character at a time from the input file, and continues to read until end of the file is reached. It
recognizes the valid identifiers, keywords and specifies the token values of the keywords.
It also identifies the header files, #define statements, numbers, special characters, various
relational and logical operators, ignores the white spaces and comments. It prints the output in a
separate file specifying the line number.
BLOCK DIAGRAM:
Different tokens or lexemes are:
▪ Keywords
▪ Identifiers
▪ Operators
▪ Constants
Take below example.
c = a + b;
After lexical analysis a symbol table is generated as given below.
Token Type
c identifier
= operator
a identifier
+ operator
b identifier
SOFTWARE DEVELOPMENT LIFE CYCLE
Systems Development Life Cycle (SDLC), or Software Development Life Cycle, in systems
engineering and software engineering relates to the process of developing systems, and the models
and methodologies, that people use to develop these systems, generally computer or information
systems.
In software engineering this SDLC concept is developed into all kinds of software development
methodologies, the framework that is used to structure, plan, and control the process of developing
an information system, the software development process.
Overview
Systems Development Life Cycle (SDLC) is any logical process used by a systems analyst to
develop an information system, including requirements, validation, training, and user ownership.
An SDLC should result in a high quality system that meets or exceeds customer expectations,
within time and cost estimates, works effectively and efficiently in the current and planned
Information Technology infrastructure, and is cheap to maintain and cost-effective to enhance.[2]
In project management a project has both a life cycle and a "systems development life cycle"
during which a number of typical activities occur. The project life cycle (PLC) encompasses all
the activities of the project, while the systems development life cycle (SDLC) is focused on
accomplishing the product requirements.
Systems Development Phases
Systems Development Life Cycle (SDLC) adheres to important phases that are essential for
developers, such as planning, analysis, design, and implementation, and are explained in the
section below. There are several Systems Development Life Cycle Models in existence. The oldest
model, that was originally regarded as "the Systems Development Life Cycle" is the waterfall
model: a sequence of stages in which the output of each stage becomes the input for the next. These
stages generally follow the same basic steps but many different waterfall methodologies give the
steps different names and the number of steps seems to vary between 4 and 7. There is no
definitively correct Systems Development Life Cycle model, but t he steps can be characterized
and divided in several steps.
Phases
Initiation Phase
The Initiation Phase begins when a business sponsor identifies a need or an opportunity. The
purpose of the Initiation Phase is to:
• Identify and validate an opportunity to improve business accomplishments of the

organization or a deficiency related to a business need.
• Identify significant assumptions and constraints on solutions to that need.
• Recommend the exploration of alternative concepts and methods to satisfy the need
including questioning the need for technology, i.e., will a change in the business process
offer a solution?
• Assure executive business and executive technical sponsorship.
System Concept Development Phase
The System Concept Development Phase begins after a business need or opportunity is validated
by the Agency/Organization Program Leadership and the Agency/Organization CIO. The purpose
of the System Concept Development Phase is to:
Determine the feasibility and appropriateness of the alternatives.
Identify system interfaces.
Identify basic functional and data requirements to satisfy the business need.
Establish system boundaries; identify goals, objectives, critical success factors, and performance
measures.
• Evaluate costs and benefits of alternative approaches to satisfy the basic functional
requirements
• Assess project risks
• Identify and initiate risk mitigation actions, and
• Develop high-level technical architecture, process models, data models, and a concept of
operations.
Planning Phase
During this phase, a plan is developed that documents the approach to be used and includes a
discussion of methods, tools, tasks, resources, project schedules, and user input. Personnel
assignments, costs, project schedule, and target dates are established. A Project Management Plan
is created with components related to acquisition planning, configuration management planning,
quality assurance planning, concept of operations, system security, verification and validation, and
systems engineering management planning.
Requirements Analysis Phase
This phase formally defines the detailed functional user requirements using high-level
requirements identified in the Initiation, System Concept, and Planning phases.The requirements
are defined in this phase to a level of detail sufficient for systems design to proceed. They need to
be measurable, testable, and relate to the business need or opportunity identified in the Initiation
Phase. The requirements that will be used to determine acceptance of the system are captured in
the Test and Evaluation Master Plan.
The purposes of this phase are to:
Further define and refine the functional and data requirements and document them in the
Requirements Document,
Complete business process reengineering of the functions to be supported (i.e., verify what
information drives the business process, what information is generated, who generates it, where
does the information go, and who processes it),
Develop detailed data and process models (system inputs, outputs, and the process.
Develop the test and evaluation requirements that will be used to determine acceptable system
performance.
Design Phase
During this phase, the system is designed to satisfy the functional requirements identified in the
previous phase. Since problems in the design phase could be very expensive to solve in the later
stage of the software development, a variety of elements are considered in the design to mitigate
risk. These include:
• Identifying potential risks and defining mitigating design features.

• Performing a security risk assessment.
• Developing a conversion plan to migrate current data to the new system.
• Determining the operating environment.
• Defining major subsystems and their inputs and outputs.
• Allocating processes to resources.
• Preparing detailed logic specifications for each software module.
Development Phase
Effective completion of the previous stages is a key factor in the success of the Development phase.
The Development phase consists of:
• Translating the detailed requirements and design into system components.
• Testing individual elements (units) for usability.
• Preparing for integration and testing of the IT system.
Integration and Test Phase
Subsystem integration, system, security, and user acceptance testing is conducted during the
integration and test phase. The user, with those responsible for quality assurance, validates that
the functional requirements, as defined in the functional requirements document, are satisfied by
the developed or modified system. OIT Security staff assesses the system security and issue a
security certification and accreditation prior to installation/implementation. Multiple levels of
testing are performed, including:
• Testing at the development facility by the contractor and possibly supported by end users
• Testing as a deployed system with end users working together with contract personnel
• Operational testing by the end user alone performing all functions.
Implementation Phase
This phase is initiated after the system has been tested and accepted by the user. In this phase, the
system is installed to support the intended business functions. System performance is compared
to performance objectives established during the planning phase. Implementation includes user
notification, user training, installation of hardware, installation of software onto production
computers, and integration of the system into daily work processes.
This phase continues until the system is operating in production in accordance with the defined
user requirements.
Operations and Maintenance Phase
The system operation is ongoing. The system is monitored for continued performance in
accordance with user requirements and needed system modifications are incorporated. Operations
continue as long as the system can be effectively adapted to respond to the organization’s needs.
When modifications or changes are identified, the system may reenter the planning phase. The
purpose of this phase is to:
• Operate, maintain, and enhance the system.

• Certify that the system can process sensitive information.
• Conduct periodic assessments of the system to ensure the functional requirements continue
to be satisfied.
• Determine when the system needs to be modernized, replaced, or retired.
Features
The relatively low-level nature of the language affords the programmer close control over what
the computer does, while allowing special tailoring and aggressive optimization for a particular
platform. This allows the code to run efficiently on very limited hardware, such as embedded
systems.
C does not have some features that are available in some other programming languages:
• No assignment of arrays or strings (copying can be done via standard functions; assignment
of objects having struct or union type is supported)
• No automatic garbage collection
• No requirement for bounds checking of arrays
• No operations on whole arrays
• No syntax for ranges, such as the A..B notation used in several languages
• No separate Boolean type: zero/nonzero is used instead[6]
• No formal closures or functions as parameters (only function and variable pointers)
• No generators or co routines; intra-thread control flow consists of nested function calls,
except for the use of the longjmp or setcontext library functions
• No exception handling; standard library functions signify error conditions with the global
errno variable and/or special return values
• Only rudimentary support for modular programming
• No compile-time polymorphism in the form of function or operator overloading
• Only rudimentary support for generic programming
• Very limited support for object-oriented programming with regard to polymorphism and
inheritance
• Limited support for encapsulation
• No native support for multithreading and networking
• No standard libraries for computer graphics and several other application programming
needs
A number of these features are available as extensions in some compilers, or can be supplied by
third-party libraries, or can be simulated by adopting certain coding disciplines.
Operators
• bitwise shifts (<<, >>)

• assignment (=, +=, -=, *=, /=, %=, &=, |=, ^=, <<=, >>=)
increment and decrement (++, --) Main article: Operators in C and C++
C supports a rich set of operators, which are symbols used within an expression to specify the
manipulations to be performed while evaluating that expression. C has operators for:
• arithmetic (+, -, *, /, %)
• equality testing (==, !=)
• order relations (<, <=, >, >=)
• boolean logic (!, &&, ||)
• bitwise logic (~, &, |, ^)
• reference and dereference (&, *, [ ])
• conditional evaluation (? :)
• member selection (., ->)
• type conversion (( ))
• object size (sizeof)
• function argument collection (( ))
• sequencing (,)
• subexpression grouping (( ))
• C has a formal grammar, specified by the C standard.
Data structures
C has a static weak typing type system that shares some similarities with that of other ALGOL
descendants such as Pascal. There are built-in types for integers of various sizes, both signed and
unsigned, floating-point numbers, characters, and enumerated types (enum). C99 added a boolean
datatype. There are also derived types including arrays, pointers, records (struct), and untagged
unions (union).
Deficiencies
Although the C language is extremely concise, C is subtle, and expert competency in C is not
common—taking more than ten years to achieve.[11] C programs are also notorious for security
vulnerabilities due to the unconstrained direct access to memory of many of the standard C library
function calls.
It is inevitable that C did not choose limit the size or endianness of its types—for example, each
compiler is free to choose the size of an int type as any anything over 16 bits according to what is
efficient on the current platform. Many programmers work based on size and endianness
assumptions, leading to code that is not portable.
Therefore the kinds of programs that can be portably written are extremely restricted, unless
specialized programming practices are adopted.
SOFTWARE AND HARDWARE TOOLS
Windows XP
Windows XP is a line of operating systems produced by Microsoft for use on personal computers,
including home and business desktops, notebook computers, and media centers. The name "XP"
is short for "experience". Windows XP is the successor to both Windows 2000 Professional and
Windows Me, and is the first consumer-oriented operating system produced by Microsoft to be
built on the Windows NT kernel and architecture.
Windows XP introduced several new features to the Windows line, including:
• Faster start-up and hibernation sequences

• The ability to discard a newer device driver in favor of the previous one (known as driver
rollback), should a driver upgrade not produce desirable results
• A new, arguably more user-friendly interface, including the framework for developing
themes for the desktop environment
• Fast user switching, which allows a user to save the current state and open applications of
their desktop and allow another user to log on without losing that information
• The Clear Type font rendering mechanism, which is designed to improve text readability
on Liquid Crystal Display (LCD) and similar monitors
• Remote Desktop functionality, which allows users to connect to a computer running
Windows XP Pro from across a network or the Internet and access their applications, files,
printers, and devices
• Support for most DSL modems and wireless network connections, as well as networking
over FireWire, and Bluetooth.
Turbo C++
Turbo C++ is a C++ compiler and integrated development environment (IDE) from Borland. The
original Turbo C++ product line was put on hold after 1994, and was revived in 2006 as an
introductory-level IDE, essentially a stripped-down version of their flagship C++ Builder. Turbo
C++ 2006 was released on September 5, 2006 and is available in 'Explorer' and 'Professional'
editions. The Explorer edition is free to download and distribute while the Professional edition is
a commercial product. The professional edition is no longer available for purchase from Borland
HARDWARE REQUIREMENT
Processor : Pentium (IV) or Above
RAM : 256 MB
Hard Disk : 40 GB or Above
FDD : 4 GB or Above
SOFTWARE REQUIREMENT
Platform Used : TurboC++ 3.0
Operating System : WINDOWS XP & other versions
Languages :C
FEASIBILITY STUDY
Feasibility study: The feasibility study is a general examination of the potential of an idea to be
converted into a business. This study focuses largely on the ability of the entrepreneur to convert
the idea into a business enterprise. The feasibility study differs from the viability study as the
viability study is an in-depth investigation of the profitability of the idea to be converted into a
business enterprise.
Types of Feasibility Studies
The following sections describe various types of feasibility studies.
• Technology and System Feasibility

This involves questions such as whether the technology needed for the system exists, how
difficult it will be to build, and whether the firm has enough experience using that
technology. The assessment is based on an outline design of system requirements in terms
of Input, Processes, Output, Fields, Programs, and Procedures. This can be quantified in
terms of volumes of data, trends, frequency of updating, etc in order to estimate if the new
system will perform adequately or not.
• Resource Feasibility
This involves questions such as how much time is available to build the new system, when
it can be built, whether it interferes with normal business operations, type and amount of
resources required, dependencies, etc. Contingency and mitigation plans should also be
stated here so that if the project does over run the company is ready for this eventuality.
• Schedule Feasibility
A project will fail if it takes too long to be completed before it is useful. Typically this
means estimating how long the system will take to develop, and if it can be completed in a
given time period using some methods like payback period.
• Technical feasibility
Centers around the existing computer system and to what extent it can support the proposed
addition
SYSTEM DESIGN
A lexical analyzer generator creates a lexical analyzer using a set of specifications usually in the
format
p1 {action 1}
p2 {action 2}
............
pn {action n}
Where pi is a regular expression and each action actioni is a program fragment that is to be executed
whenever a lexeme matched by pi is found in the input. If more than one pattern matches, then
longest lexeme matched is chosen. If there are two or more patterns that match the longest lexeme,
the first listed matching pattern is chosen.
This is usually implemented using a finite automaton. There is an input buffer with two
pointers to it, a lexeme-beginning and a forward pointer. The lexical analyzer generator constructs
a transition table for a finite automaton from the regular expression patterns in the lexical analyzer
generator specification. The lexical analyzer itself consists of a finite automaton simulator that
uses this transition table to look for the regular expression patterns in the input buffer.
This can be implemented using an NFA or a DFA. The transition table for an NFA is
considerably smaller than that for a DFA, but the DFA recognises patterns faster than the NFA.
Using NFA
The transition table for the NFA N is constructed for the composite pattern p1|p2|. . .|pn,
The NFA recognizes the longest prefix of the input that is matched by a pattern. In the final NFA,
there is an accepting state for each pattern pi. The sequence of steps the final NFA can be in is after
seeing each input character is constructed. The NFA is simulated until it reaches termination or it
reaches a set of states from which there is no transition defined for the current input symbol. The
specification for the lexical analyzer generator is so that a valid source program cannot entirely fill
the input buffer without having the NFA reach terminationThe pattern making this match identifies
the token found, and the lexeme matched is the string between the lexeme beginning and forward
pointers. If no pattern matches, the lexical analyser should transfer control to some default
recovery routine.
Using DFA
Here a DFA is used for pattern matching. This method is a modified version of the
method using NFA. The NFA is converted to a DFA using a subset construction algorithm. Here
there may be several accepting states in a given subset of nondeterministic states. The accepting
state corresponding to the pattern listed first in the lexical analyzer generator specification has
priority. Here also state transitions are made until a state is reached which has no next state for the
current input symbol. The last input position at which the DFA entered an accepting state gives
the lexeme.
TESTING STRATEGY
A software testing strategy is a well-planned series of steps that result in the successful
construction of the software. It should be able to test the errors in software specification, design &
coding phases of software development. Software testing strategy always starts with coding &
moves in upward direction. Thus a testing strategy can also divide into four phases:
• Unit Testing : Used for coding

• Integration Testing : Used for design phase
• System Testing : For system engineering
• Acceptance testing : For user acceptance
Unit Testing
In computer programming, unit testing is a method of testing that verifies the individual units of
source code are working properly. A unit is the smallest testable part of an application. In
procedural programming a unit may be an individual program, function, procedure, etc., while in
object-oriented programming, the smallest unit is a method, which may belong to a base/super
class, abstract class or derived/child class.
Benefits
The goal of unit testing is to isolate each part of the program and show that the individual parts are
correct. A unit test provides a strict, written contract that the piece of code must satisfy. As a result,
it affords several benefits. Unit tests find problems early in the development cycle.
Integration Testing
'Integration testing' (sometimes called Integration and Testing, abbreviated I&T) is the phase of
software testing in which individual software modules are combined and tested as a group. It
follows unit testing and precedes system testing.
Integration testing takes as its input modules that have been unit tested, groups them in larger
aggregates, applies tests defined in an integration test plan to those aggregates, and delivers as its
output the integrated system ready for system testing.
Purpose
The purpose of integration testing is to verify functional, performance and reliability requirements
placed on major design items. These "design items", i.e. assemblages (or groups of units), are
exercised through their interfaces using Black box testing, success and error cases being simulated
via appropriate parameter and data inputs. Simulated usage of shared data areas and inter-process
communication is tested and individual subsystems are exercised through their input interface.
Test cases are constructed to test that all components within assemblages interact correctly, for
example across procedure calls or process activations, and this is done after testing individual
System Testing
System testing of software or hardware is testing conducted on a complete, integrated system to

evaluate the system's compliance with its specified requirements. System testing falls within the
scope of black box testing, and as such, should require no knowledge of the inner design of the
code or logic.
The purpose of integration testing is to detect any inconsistencies between the software units that
are integrated together (called assemblages) or between any of the assemblages and the hardware.
System testing is a more limiting type of testing; it seeks to detect defects both within the "inter-
assemblages" and also within the system as a whole.
Implementation & maintenance
Implementation
The final phase of the progress process is the implementation of the new system. This phase is
culmination of the previous phases and will be performed only after each of the prior phases has
been successfully completed to the satisfaction of both the user and quality assurance. The tasks,
comprise the implementation phase, include the installation of hardware, proper scheduling of
resources needed to put the system in to introduction, a complete of instruction that support both
the users and IS environment.
Coding
This means program construction with procedural specification has finished and the coding for the
program begins:
Once the design phase was over, coding commenced.
Coding is natural consequence of design.
Coding step translate a detailed design representation of software into a programming language
realization.
Main emphasis while coding was on style so that the end result was an optimized code.
The following points were kept in to consideration while coding.
Coding style
The structured programming method was used in all the modules the projects.
It incorporated the following features.
The code has been written so that the definitions and implementation of each function is contained
in one file.
A group of related function was clubbed together in one file to include it when needed and save us
from the labor of writing it again and again.
Maintenance
Maintenance testing is that testing which is performed to either identify equipment problems,
diagnose equipment problems or to confirm that repair measures has been effective. It can be
performed at either the system level (e.g., the HVAC system), the equipment level (e.g., the
blower in a HVAC line), or the component level (e.g., a control chip in the control box for the
blower in the HVAC line).
Preventive maintenance
The care and servicing by personnel for the purpose of maintaining equipment and facilities in
satisfactory operating condition by providing for systematic inspection, detection, and correction
of incipient failures either before they occur or before they develop into major defects.
Maintenance, including tests, measurements, adjustments, and parts replacement, performed

specifically to prevent faults from occurring.
Corrective maintenance
The idle time for production machines in a factory is mainly due to the following reasons:
Lack of materials
Machine fitting, cleaning, tools replacement etc.
Breakdowns
Taking into consideration only breakdown idle time it can be split in some components:
Operator's inspection time - That is the time required by the machine operator to check the machine
in order to detect the breakdown reason, before calling the Maintenance department
Operator's repairing time - That means time required by machine operator to fit the machine by
himself in case he is able to do it.
Maintenance dead time - Time lost by machine operator waiting for the machine to be repair by
maintenance personnel, from the time they start doing it until the moment they finish their task.
OUTPUT
Bibliography
• C++ book by Steve oualline.

• C++ COOLBOOK by D. Ryan Stephen
• www.learncpp.com.
• www.wikipedia.com
• www.progamiz.com
• www.tutorialpoint.com
• C++ book by Balaji Guruswami

Lexical Analyzer Implementation in C

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lexical Analyzer Implementation in C

Uploaded by

Copyright:

Available Formats

LEXICAL ANALYSER

AIM OF THE PROJECT

To create tokens from the given input stream.

• Identify and validate an opportunity to improve business accomplishments of the

System Concept Development Phase

Determine the feasibility and appropriateness of the alternatives.

Identify system interfaces.

Requirements Analysis Phase

The purposes of this phase are to:

• Identifying potential risks and defining mitigating design features.

Integration and Test Phase

Operations and Maintenance Phase

• Operate, maintain, and enhance the system.

• bitwise shifts (<<, >>)

Windows XP introduced several new features to the Windows line, including:

• Faster start-up and hibernation sequences

Processor : Pentium (IV) or Above

Hard Disk : 40 GB or Above

Platform Used : TurboC++ 3.0

Operating System : WINDOWS XP & other versions

Types of Feasibility Studies

The following sections describe various types of feasibility studies.

• Technology and System Feasibility

• Unit Testing : Used for coding

System testing of software or hardware is testing conducted on a complete, integrated system to

Implementation & maintenance

Once the design phase was over, coding commenced.

Coding is natural consequence of design.

The following points were kept in to consideration while coding.

It incorporated the following features.

Maintenance, including tests, measurements, adjustments, and parts replacement, performed

Machine fitting, cleaning, tools replacement etc.

• C++ book by Steve oualline.

You might also like