Aqsa Shamsa Hira SCD Project

ABSTRACT:
It has been observed that there is a slot between the NL and the PL. In order to code a program,
syntax of programming language should must be known to the programmers. The purpose of
this proposed model is that, the user can communicate with the system either by speech or text
so it does away with syntactical structure of code. In order to make coding efficient and faster
such an application is designed which will be in human interactive form and based on
recognition of speech. Firstly user's instructions are taken in form of speech or text then basic
compiler processes such as lexical, syntax and semantic analysis is performed than after these
processes code is generated. In given design model we use the programming language C.
Webpage is the editor of code and the instructions that are taken from user are delivered from
processing to Flask server. Different python libraries like NLTK and ply are used for the purpose
of converting the speech or text to code, after that code is returned to the client. For the
generation of output procedure LALR parser is used which specify the syntax. For the purpose
of record, the results are analyzed which shows the conversion of NL to PL and efficiency is
measured with the help of precision, correctness and recall.
SYSTEM DESIGN:
Various researchers work on this topic and provide their suggestions. Compiler performs some
basic compilation processes, which includes the analysis of lexical, syntax and semantic analyzer
after that immediate code is generated. After optimization this immediate code is used to give
machine level instructions. User firstly interact with the system either the user can type or pass
out the instructions. If user want to interact with system by passing his/her voice then the
system firstly listen user’s voice and then convert user’s voice instructions to textual. The
second step is performed by lexical analyzer, it extract some compulsory keywords from the
textual instruction. Syntax analysis helps us to convert the textual instructions into code.
Semantic analysis is performed on the code, checks out that the declarations and statements
that are used in the code are semantically correct or not. After these three phases of lexical,
syntax and semantic analysis our code will be generated.
A. SEARCH ENGINE
1) SPEECH DETECTION:
VAD stands for Voice Activation Detection and is considered as such a technique which shows
and identify the voice input. This technique identifies that the input voice of user has been
recorded or not if recorded than these recorded frames are sended to the module of speech
recognition. In order to increase the speech recognition accuracy, input voice should have low
level of noise. VAD system working is shown in figure below. Energy and duration segments are
extract for the determination of speech.
2) SPEECH ANALYSIS:
The input voice features are extracted and these extracted features are passed from speech to
text module. The bands of frequency speech are separated and normalized the volume of the
speech signal then these signals are broken down into segments in order to match the language
phenomes.
3) SPEECH TO TEXT:
Below figure show the working of an automatic recognition of speech. To remove the noise detected in
audio signals are filtered. Sound wave include features like amplitude, frequency and the pitch of voice.
These features are analyzed and which signal of sounds the voice represents is determined by similarity.
For the insurance or minimum loss, words that are used for prediction should be most accurate.
B. CODE EDITOR:
In order to view commands and the code that is generated code editor is used as an interface.
User can itself add correction to the code and write additional commands in the code that are
needed.
C. LEXICAL ANALYSIS:
1) KEYWORD IDENTIFICATION:
Keywords identify that which construct of coding is used in the proposed system. Keywords
include for, if and else etc. Firstly system will identify the words and the parameters will be
identified in order to generate the code.
Figure below shows the working of keyword identification. By collecting a list of
candidate keywords, words score is calculated for each word. The ratio of degree and
frequency is called word score. The total number or presence of given word with the other
words that are present in candidate keywords is called degree and the total number of
occurrence of given word in list is referred as frequency. Filtered out the candidate keywords
and then those keywords are selected out whose scores are above the threshold such that the
mean of all the candidate keywords.
2) REMOVAL OF STOP WORDS:
In order to increase the speed of processing, stop words can be removed because they do not
have any contribution in understanding of sentence semantic meaning. User enter command
then this command is break down into words and these words are checked up in the list of stop
words if present then removed them.
3) TEXT TO NUMBER CONVERSION:
If user wants that number should be printed then text is extracted from speech and then this
text module us converted to the number.
D. SYNTAX ANALYSIS
1) TAGGING OF POS:
POS is the part of speech, figure below shows the POS tagging process. The purpose of lexical
analyzer is to break down the user command into tokens. Every token is assigned by a tag of
POS.
2) PARAMETERS DETERMINATION:
From POS tags are assigned for different words then for the construction of specific code,
parameters are defined. Parameters include nouns, decimal numbers and verbs for example
check, iterate etc.
3) RELEVANT CODE STRUCTURE MAPPING:
Lexical analyzer determines the keywords and the syntax analyzer determines parameters,
these both keywords and the parameters are used for the mapping of user input to the
particular code construction.
E. SEMANTIC ANALYSIS
1) SYMBOL TABLE:
Symbol table give us the track of variables such that there number, type and value. This will
help the system to determine that the variable is already utilized or not and check that there is
a need of creating a new variable or not.
2) CHECKING TYPE:
For the determination of semantically validation of constructed code, type of variable that is
utilized must be known. By type checking, we can check reusability or variables.
3) IDENTIFIERS CREATION:
Symbol table holds the name, type and value of identifier that user creates. This will help in
future to easily approach and retrieve the variable data.
F. CODE GENERATION:
1) CODE CONSTRUCTION FROM KEYWORDS AND ATTRIBUTES:
Lex is used for the mapping of keywords and attributes in order to construct the code. Lex
identify the patterns that are used in given text and specify different possibilities using regular
expression for each constructed code.
2) APPROPRIATE CODE GENERATION:
Yacc is used for the generation of appropriate code when pattern matches using Lex.
Corresponding code is generated by grammar using Yacc.
SYSTEM IMPLEMENTATION:
The backend of the proposed system is a Flask server, while the front end is an HTML website.
A.THE FRONT END LAYER:

1. INPUT: Users can send voice or type in natural language commands. The voice
recognition software that is applied here is JavaScript Web Speech API.
ALGORITHM FOR VOICE RECOGNITION:

Now, here the function named webSpeechRecognition() is made. The recognition begins when
the specific language is set. When recognition is fixed to continuous, it continues even if the
user takes a break. After recognition, the transcript is sent to the onResult event handler. The
web speech API is event based. This aids in the asynchronous processing of voice to text.
Intermediate voice recognition enables the user to receive real-time feedback.
In the input text area, users may also write natural language instructions. The command is
delivered to the Flask server after the user has entered it.
2. CODE EDITOR: The outcome of the processing is shown in a box that serves as the code
editor. The editor additionally displays the line number, cursor location, output code
and the amount of characters in each line.
In the code editor, the user may do the following things:
 Move to specific line number.

 Move the cursor to left location
 Move the cursor to right location.
These routines assist in ensuring that the output code is placed in the proper location.
B.LEXICAL ANALYSIS:
The input command is analyzed by the lexical analyzer. Followings are the measures to take:
 It is determined if the request is for "space," "open braces," "closed braces," "enter," or
"semicolon. All of these are the essential punctuations that are required for structure in
C.
 Next step is to handle the numbers. The text representation of numbers may be used
instead of the actual number in the input command. The python module word2number
takes care of this. The word2number module of python converts number text (twenty
one) to numeric number (21).
 The pre-processing method is used here. This process removes the useless words
(known as stop words) from the search engine. Example of stop words are the, a, in, an
etc. In Python, the NLTK (Natural Language Toolkit) contains a list of stop words in 16
different languages.
 The input code is analyzed to see, if there’s any keyword is present in the code or not.
These keywords are stored in the programming language. The appropriate code
structure cannot be built without a keyword, and the system will throw an internal
server error.
C.SYNTAX ANALYSIS:
The input command is tokenized first. The Averaged Perceptron Tagger of NLTK is then used
to tag sections of speech. The tags are used to generate a feature list. POS tags are used to
specify attributes such as numbers, operations, variable names, and so on. The feature list is
then delivered to the Lex file for the identified keyword after it has been produced.
D.SEMANTIC ANALYSIS:
Symbol table consists of variables and their types. So, when the user enters the variable, the
symbol table is used to validate its type.
E.CODE GENERATION:
Each Lex file contains a number of different ways to generate a sentence for a given keyword.
The feature list corresponds to one of the Lex possibilities. For each appropriate Lex match,
Yacc creates a code structure.
RESULT AND ANALYSIS:

The outcomes of the implementation are covered in detail below, along with their analysis. The
sample for calculating a number's factorial is provided in the figure below with the appropriate
code. Moreover, research is done on the processing of NL commands.
Here is the simple program to find the factorial of a number.
A.TIME CALCULATION:
This figure depicts the time it takes to process each natural language command and create the
matching C code. The processing duration is 0.329 seconds at the most. This demonstrates that
Lex and Yacc create code in a short amount of time. As a result, the suggested system may be
implemented in real time.
B.NL command precision in Compiler:
Construction of NL commands with increased accuracy demonstrate that the Lex files can
handle a wider range of phrase forms. However, Lower accuracies indicates that the user must
follow a certain structure while sending instructions.
This diagram shows the precision of sample operations with the NL commands that are
implemented in the system. Because looping statements may be represented in semantically
varied ways, so they have an accuracy of roughly 82 percent to 96 percent. Multiple operations
are performed on the system depicts the great precision and system has been prompted to
incorporate and identify a variety of statements.
C. CORRECTNESS MEASURED IN NL COMMANDS:
The fraction of correctly produced code to all code generated is known as precision.
So,
Precision = TP/ (TP+FP)
Here, the number of mistranslated code sequences is denoted by FP.
This figure shows that for NL instructions, there is a high percentage of accuracy and a low probability of
mistranslated code. Semantic problems result from mistranslated codes, which are difficult to diagnose
while performing operations. This is prevented in the system by providing it with a large number of
keywords to recognize.
D.CALCULATION OF RECALL OF NL COMMANDS:

The percentage of correctly produced code to the amount of input instructions delivered is
referred to as recall.
Recall = TP/ (TP+FN)
In this equation, the number of input commands that are not translated to code is given by FN.
The ratio of untranslated to translated words was used to obtain the recall values in the above
figure. The majority of constructions are entirely translated. More rules and criteria for code
creation should be provided for constructions that are not created, which takes lower number
of attempts necessary to successfully generate the codes.
E.PERCENTAGE OF ERROR IN CHARACTERS:
The percentage of error is determined by exploring the amount of missing or unneeded
characters or words when expressing a construct using NL.
These figures are summaries by the following table:
The compiler also handles these problems, displaying a warning for amending the words of the
NL instruction for optimal code conversion.
CONCLUSIONS AND FUTURE WORK:
By using C programming language, the system is designed for the purpose of converting the
human's natural language to the programming language. Java and C++ are proposed as a future
work for the implementation of the natural language code generator to the OOPs PL. To handle
the problems of real world with good accuracy and efficiency using OOPs programming,
multiple classes and nesting of those classes are implemented. Furthermore to implement
parsing we can use natural mapping and the algorithms of dynamic learning.

Aqsa Shamsa Hira SCD Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aqsa Shamsa Hira SCD Project

Uploaded by

Copyright:

Available Formats

ABSTRACT:

A.THE FRONT END LAYER:

ALGORITHM FOR VOICE RECOGNITION:

 Move to specific line number.

RESULT AND ANALYSIS:

Here, the number of mistranslated code sequences is denoted by FP.

D.CALCULATION OF RECALL OF NL COMMANDS:

You might also like