You are on page 1of 13

International Journal of Information Processing, 9(3), 38-50, 2015

ISSN : 0973-8215
IK International Publishing House Pvt. Ltd., New Delhi, India

Object Oriented Analysis using Natural Language


Processing concepts: A Review
Abinash Tripathya , Santanu Kumar Ratha
a
Department of Computer Science and Engineering, National Institute of Technology, Rourkela,
Odisha, India, Contact: abi.tripathy@gmail.com, skrath@nitrkl.ac.in

The Software Development Life Cycle (SDLC) starts with eliciting requirements of the customers in the form
of Software Requirement Specification (SRS). SRS document needed for software development is mostly
written in Natural Language(NL) convenient for the client. From the SRS document only, the class name,
its attributes and the functions incorporated in the body of the class are traced based on pre-knowledge of
analyst. The paper intends to present a review on Object Oriented (OO) analysis using Natural Language
Processing (NLP) techniques. This analysis can be manual where domain expert helps to generate the
required diagram or automated system, where the system generates the required digram, from the input in
the form of SRS.

Keywords : Natural Language, Natural Language Processing, Object Oriented, Parts Of Speech, Software
Development Life Cycle, Software Requirement Specification.

1. INTRODUCTION and in his own language and style [2].


Software Requirement Specification (SRS) doc- In recent years, the object-oriented software
ument forms the basis of problem analysis be- development style is a preferred style over con-
tween client and developer. SRS needs to be ventional style by developers as the present day
very specific, while serving as a basis, to pro- software development languages are object ori-
ceed towards implementation of desired soft- ented in nature. Hence, OO analysis of soft-
ware. It is very often observed that SRS is ex- ware helps to find out the candidate for class,
pressed in any natural language as comprehen- function, and the attributes associated with
sible by the client. But it may be ambiguous, those classes.
possibly inconsistent, and probably unmanage- Natural Language Processing (NLP) combines
ably large from the software analyst’s point of the effect of computer science and linguistics
view. branch which are concerned with the interac-
Identifying major functionalities from the OO tion between the computer and human lan-
analysis point of view plays an important role guages [3]. Natural Language generation sys-
in project success. The use of formal languages tems mostly extracts right information from
like Unified Modeling Language (UML) have statements which are in human readable form.
been applied to avoid the inherent problems of The aim of the work is to present a review on
natural language such as incompleteness and existing literature of application of NLP in Ob-
ambiguity [1]. Earlier analysis was used to help ject Oriented Analysis (OOA) based on litera-
for an explanatory model called as build and ture available such as: Abbott [5], Saekai and
fix programming style. But this style was ob- Enamoto [6], Nanduri and Rugaber [7], Juristo
served to be very informal and there are no set and Moreno [8], Popescu et al., [9], Ibrahim
of rules as to which one is superior. Every pro- and Ahmed [10],Harmain and Gaizauskas [11],
grammer formulates his own software develop- Overmyer and Rambow [12], Mich [13]. Among
ment technique solely guided by his expertize these literatures few authors suggested auto-

38
Object Oriented Analysis using Natural Language Processing concepts: A Review 39

mated tools, other use manual approach and 2. PROPOSAL OF THE USE OF NLP
few combine both tool and manual approach IN OO ANALYSIS
to obtain the different elements of OO analysis
. 2.1. Historical review on NLP
Jones in his paper presented a review paper
In order to examine each proposal, the follow- on NLP based on the historical study [4]. The
ing dimensions may be considered paper reviews study of NLP from late 1940’s
to present by distinguishing four phases in the
• Steps to modify the SRS into re- history of NLP. The impact of use of ma-
quired form: As discussed above the chine translation, artificial intelligence influ-
SRS is written in a form i.e., convenient ence, logico-grammatical style adaption and
to the user. During this step the SRS is massive language data attack act as the basis
modified to a format by which the process of the phase division.
of finding the keywords became easier for
analysts. Ph 1: The early phase of study on NLP was
during late 1940s to late 1960s and in
• Finding out the candidate for class
this period focus was mainly in Machine
and object from modified SRS: After
Translation (MT). Noticeable amount of
transforming the SRS to the required for-
work done in USSR, USA, Europe and
mat for analysis, then the candidates for
Japan during this period. Thus the lan-
the class name and its details are traced
guages considered for research in this pe-
out. The process of identification of the
riod were mostly Russian and English
class name and its detail can be a man-
[14]. Language syntax was mainly the
ual or a automated process. In manual
area of research in this period as syntac-
process, the domain expert analyzes text
tic processing was manifestly necessary,
to bring out “intermediate output” then
and partly carried out through implicit
automated process considers the interme-
or explicit endorsement of the idea of
diate output to generate the desired out-
syntax-driven processing. Though dur-
put.
ing this period use of computers for liter-
ary and linguistic study has began, but
The objective of this work is two-fold:
it has never been linked with NLP.
1. This analysis provides information on
available techniques for use of NLP, fur- Ph 2: Next phase of study was undertaken
ther to be considered under OO analysis. from late 1960s to late 1970s and
the work mainly focused on use of artifi-
2. It provides an overview of the current cial intelligence (AI) in NLP, with much
state for the use of NLP in OO analysis, more priority on word knowledge and
focusing on the strengths and weaknesses on its implementation in the construc-
of existing proposal. Thus researchers tion and manipulation of meaning rep-
can have a broad knowledge into the work resentations. AI was mainly considered
that already being done and that still can during this period for construction and
be carried out in this field. addressing of knowledge base or data.
In late 1960s, the prevalent theory of
The paper is organized as follows: In section linguistic is transformational grammar,
2 the existing proposals of the use of NLP for which provides the semantic information
OOA analysis are presented. Section 3 presents about NLP.
an analysis on all proposals. Finally, section
4 discusses on some concluding remarks and Ph 3: This phase was mainly concerned to the
highlights the future trend. period of late 1970s to late 1980s
40 Abinash Tripathy and Santanu Kumar Rath

and characterized as grammatico-logical 2.2.1. Abbott, 1983 [5]


phase. The requirement of development The study made by Abbott proposes a
of grammatical theory and movement method to derive the elements of object ori-
towards incorporation of logic in knowl- ented analysis i.e., data type, variables, opera-
edge representation and reasoning trig- tors, attributes and candidate for class name
gered during 70s. During this phase, form English statements. This paper shows
deliberate attempts made to transform the process of analysis of the English statement
the commercially available dictionary of SRS and helps to generate elements of OO
to machine-readable form which further analysis.
helps in text corpora validation and cus- The approach used in this paper is divided into
tomizing lexical data. three different sections. These are as follows

Ph 4: The forth and the final phase can be i. Development of informal strategy for
attributed as the study carried out the problem: The informal strategy
from late 1980’s onwards. During should suggest the problem solution on the
this phase, the main area of research conceptual level. This step express the so-
is statistical language data processing. lution of the problem in terms of problem
The identification of linguistic occur- domain.
rence and patterns in the corpus for both
syntactic and semantic analysis, drew ii. Formalize the Strategy: The second
interest in this period. The present at- step is formalizing the solution by finding
tention on lexicon, retrieving statistical out its data types, objects, operators, and
information, and restore interest in MT. control constructs. The steps of formaliza-
tion are:
Table 1 compares the NLP research based on
time period (a) Identify the Data types: The data
types are suggested by the common
Table 1 nouns. The name of a class of beings
Comparison of NLP Research Based on Time or things are known as common noun.
Phase Time period NLP research (b) Identify the objects of those types:
Phase 1 Late 1940s to Machine Translation (MT), Lan-
Objects are suggested by Proper
late 1960s guage syntax analysis. Nouns or Direct References. The
Phase 2 Late 1960s to Use of AI, implement of word name of specific things or beings is
late 1970s knowledge to construction known as proper noun. A specific,
knowledge base or data.
previously identified being or thing
Phase 3 Late 1970s to Grammatico-logical analysis, without necessarily referring to it by
late 1980s Transformation of dictionary to
machine readable form and text name is known as direct reference.
corpora validation.
(c) Identify the operators for the objects:
Phase 4 Late 1980s Statical data processing, Both
onwards syntactic and semantic analysis
Operators are suggested by verb, at-
of Corpus and restoration of MT. tribute, predicate or descriptive ex-
pression. Attribute is a property, as-
sociation, characteristics or compo-
2.2. OO Analysis using NLP approaches nent of something. Predicate desig-
In course of this section, the study made by nates a property or relation that can
various authors are analyzed on the basic of be consider True or false i.e., to hold
how they transform the SRS and how the can- or not to hold. A descriptive expres-
didate for the class name and its details are sion is a characterization for which
found out from transformed SRS. there may be some particular object.
Object Oriented Analysis using Natural Language Processing concepts: A Review 41

(d) The control structures are directly 2.2.2. Saeki et al.,, 1989 [6]
provided by the English language. The paper by Saeki et al., discusses the pro-
cess of derivation of formal specification from
iii. Segregate the solution into two an informal specification written in natural
parts, A package and subprogram: language. The informal specification contains
The package will contain the formalization important information leading to their formal
of the problem domain, that is, the data specification or the prototype program. Then
types and their operators. Then subpro- the similarity between the structure of the
gram(s) will contain the specific steps (ex- words and the structure of software component
pressed in terms of the data types and op- is analyzed. In this paper, the “Lift Control
erators defined in the package) for solving System” example has been explained as an
the particular problem. informal specification to explain the process.
During the course of the paper, different types The process consist of three major steps as fol-
of nouns are analyzed i.e., the difference of lows:
common noun with proper noun, direct refer- i. Design Activity: The purpose of this
ences and mass nouns are provided. Classes step is to construct a module design doc-
of objects are referred by Common Noun but ument from informal specification. The
specific and individual objects are referred by modular design document presents the
Proper noun. Mass nouns are names of quali- modular structure of a formal specifica-
ties, substances, and activity that do not have tion, i.e., external design of class modules
an a priori organization into individual units or which contain class names, method names
instances. and message protocols. This design activ-
In this paper Abbott had taken an example of ity consists of several sub-activities. Each
“Calculating the days between two given dates” of them produce an intermediate product
to explain his technique. As per the analysis using the informal specification. The inter-
referred in the paper, the process is divided mediate products obtained are as follows:
into three different steps: • Noun table: This table contains
• In step 1, an informal strategy of the information about extracted nouns.
problem analysis is provided. In this step According to the author, noun can
the process of getting the detailed solu- be classified into different group, i.e.,
tion is being analyzed. Class noun identifies object or set of
objects, Value noun identifies the val-
• In step 2, the data types, objects, op- ues or set of values, Attribute noun
erators, and control structures are found identifies the attribute of the objects
for the specific problem form the informal and Action noun identifies the actions
specification. to be carried out.
• In step 3, the final solution is being pre- • Verb table: This table contains in-
pared. The package for the problem is formation about extracted verbs, i.e.,
assigned and the subprogram details are verb names, their categories, their
provided in this step. subjects and objectives. According
to the author, verbs can also be clas-
Though this paper is comparatively easy in sified into different groups. Relation
solving the problem of finding the candidate verb specifies the relation between ob-
for class name and its details but the process jects or between objects and their at-
is manual. A software engineer having a good tributes. State verb specifies the in-
knowledge about the domain requires to pro- ternal state of the object or the at-
vide the step-wise solution of the problem. tribute values of the object. Action
42 Abinash Tripathy and Santanu Kumar Rath

verb specifies the action to the ob- yet informal. Whenever the elaborate-
jects and Action relational verb spec- design cycle is carried out, a pair of in-
ifies the relation between the actions. formal specification and its module de-
• Action table: This table presents sign document is obtained. The elaborate
the extracted actions, their agents, activity consists of sub-activities such as
target objects and input output pa- paste, refine and an intermediate product
rameter associated with them. For a paste document is generated.
each action verb, there is always an • Paste: During this phase, the sen-
agent and its target object. The ac- tences of informal specification may
tion verb changes the state of the tar-
be paraphrased to accurate sentence
get object. In order to extract a tar- and then replace the original one. In
get object from a sentence, verb pat- the updated expression, the nouns
terns that appear in various kinds of
and verbs are extracted as classes,
natural language specification are ex- and attributes or method to be used
amined.
respectively.
• Action Relation table: The infor-
• Refine: The informal specification of
mal specification, its verb table and
each pasted module are constructed
its action verb are needed to iden-
during refine activity. During this
tify the relationship between the ac-
phase, the internal behavior and
tions. For every action to be per-
property of each class module is
formed, the sender, receiver and the
rewritten again, which are used for
message transmission between them
construction of each class.
need to be identified. in this paper,
the authors have used a rule called • Design activity for elaborate Infor-
“action relation rule” to generate a mal Specification: Before this activ-
few candidate for a sender-receiver ity, the module design document for
pair. each class is constructed from elab-
orate informal specification. During
• Module Design Document: A
this activity, each class module and
module design document is con-
method module are composed into
structed using above mentioned ta-
small sub module to realize the inter-
bles. The noun table helps to iden-
nal behavior and property.
tify objects and their attributes. The
verb table is used to extract relation- The cycle continues until a formal specifi-
ship among objects and kind of at- cation is obtained from the informal spec-
tributes, each object possess. The ification. The requirements need to be re-
module design document can have fined and made simpler and smaller during
both graphical and textual represen- this cycle.
tation. Both of them are based on
syntax of formal specification lan- iii. Software process based on Natural
guage TELL and object oriented lan- Language: During this process the de-
guage Smalltalk80. sign and elaborate process are embedded.
Before this phase, the informal specifica-
ii. Elaborate - Design Activity Cycle: tion is already converted to formal specifi-
The task of this activity is to refine and cation, the rest steps are as follows
rewrite the informal specification as per
module design document. The output of • Analyze activity: This step acquires a
this activity is a natural language descrip- problem description by means of in-
tion which is accurate, detailed, structured teraction between customer and de-
Object Oriented Analysis using Natural Language Processing concepts: A Review 43

veloper. An informal specification is Step 2: From the obtained relationship after


obtained as an output of this step. parsing, the tool creates the elements
of object-oriented analysis model of
• Evaluate activity: During this activ-
the specified system.
ity, the obtained formal specification
is executed and verified. The diag- Step 3: The diagram of the model is generated,
nostic document keeps a record of the which is reviewed by human reviewer
execution and verification. to detect any inconsistency or ambigu-
• Evolve activity: During this activ- ity.
ity, the developers check the diagnos-
The authors in this paper prefer the use of
tics document and find whether some-
model over analysis of the whole NL SRS due
thing is to be modified or not. If so,
to following reason
they create a new version of informal
specification, this process is called as • All software companies irrespective of
“evolve activity”. From this new doc- their domain use NL SRS for description
ument, new module design document, of software system.
new formal specification, new diagno-
sis document are produced. • An Object Oriented Analysis Module
• Instantiate activity: When the devel- (OOAM) shows the concepts and the re-
opers judge that the formal specifi- lationship among them.
cation is accepted to the customer’s • The OOAM model for each sentence is
need, they finally produce the con- identified while OO design selects parts
crete program code. This activity is of the sentences, such as class from sub-
called as instantiate activity. A for- ject, attributes from adjective and meth-
mal specification is considered to be ods from verb.
a generalization or abstraction of pro-
gram code. During the course of the paper, the authors had
used few concepts such as
In this paper, the author has presented a soft-
ware design process based on natural language • Constraining Grammar: It is dif-
and obtained formal specification from infor- ferent from the internal grammar that
mal one through the design and elaborate ac- parser uses. The constraining grammar
tivity. The technique used in this paper is verb- attempts to reduce the number of way
oriented which has an impact on the dynamic a statement can be represented and also
nature of informal specification. Along with try to make it uniguous (not ambiguous).
this, nouns also needed to extract the hierar- On the other hand, the NL parser gram-
chy of class. mar checks the legitimacy of the sentence
only, without checking whether it is am-
2.2.3. D Popescu et al., 2008 [9] biguous or not.
This paper by D Popescu et al., proposed
an approach to help the writer or reviewer in This paper used the constraining gram-
identifying the ambiguities in NL SRS. A tool mar proposed by Juristo et.al., in their
named “Dowser” is proposed by them which paper [8]. It attempts to generate an
creates OO digram from NL SRS. Their ap- unambiguous mapping form this gram-
proach consists of three steps mar to OOAM. The constraining gram-
mar influences the structure of NL SRS
Step 1: The NL SRS is parsed according to a as it uses simple sentence consisting of
constraining grammar. subject, object and verbs.
44 Abinash Tripathy and Santanu Kumar Rath

• Natural Language Parsing : As Dowser applied two post processing rules


OOAM is generated using syntactic infor- after transforming NL SRS. These are
mation automatically, parsing of NL SRS
is needed to obtain the required informa- i. It converts all classes that aggre-
tion. The parser uses the link-grammar gated to another class as attribute
that consists of set of words. These words of other class.
act as terminal symbols and have differ- ii. It removes class “system” from
ent liner requirements. OOAM.

• Transformation Rules: These rule • Domain-Specific Terms(DST): Spe-


helps to transform the obtained syn- cial domain data dictionary is needed by
tactic information to targeted OOAM. the parser to interpret the DST. But for
Dowser have thirteen different transfor- each domain such dictionary does not ex-
mation rules, the most generally used ist. Hence, to improve the DST recall,
rules are: the link grammar has a guessing mode
that uses the syntactic role of unknown
– If the sentence contains both sub- terms.
ject and object link after pars-
• Diagramming OOAM: The textual
ing, then two different classes are
OOAM is created using previous steps.
created with association named as
The tool UMLGraph transforms the tex-
verb.
tual information into a dot file while
– To find aggregation, if the parsed then transforms into graphic format us-
sentence consists of both subject ing Graphviz tool.
and object link and the verb is one
of “have, posses, contain or include” • Interpretation of OOAM: In this
then the object is aggregated to sub- step, a human analyst checks for ambi-
ject. guity in generated diagram. The defects
that can be found out in OOAM are as
– “if or when ” always represent the follows
start of an event. If an event is
detected by Dowser and the main – An association can be ambiguous;
clause only consists of subject link, so, the analyst checks whether dif-
then class is created with the noun ferent classes transmit message to
present in subject link and verb acts same class or not.
as a method to it. – The classes should represent only
– If genitivity detected by Dowser, one concept. As Dowser does not
two different classes are created allow generalization principle, the
with one linking aggregation. In concepts such as cash payment and
order to fix whether both became on-line payment are not combined
aggregated class or one became at- to form payment as a whole; but can
tribute of other, both syntactic and be identified as two different classes.
semantic information of the sen- – If the classes have attributes that
tence are needed. are not of primitive type, proper
– Though active clauses preferred in definition added to it so, it can be
NL SRS still passive clause helps to represented in own class.
describe relations and states. The – The class must be associated with
passive verb and its connecting word other class otherwise it becomes un-
describe the association. specified.
Object Oriented Analysis using Natural Language Processing concepts: A Review 45

This paper only supports the static behavior/ Step 4. Use RACE stemming algorithm to
relationship of OOAM present in NL SRS. It stem each words and store them in
does not manage the modeling behavior. a list.

2.2.4. Ibrahim and Ahmad, 2010 [10] Step 5. Use OpenNLP to parse whole doc-
ument.
This paper of Ibrahim and Ahmad proposes
method to facilitate requirement analysis pro- Step 6. From the parsed output extract
cess and extraction of class diagram from re- the words with POS Proper Nouns
quirements using NLP and Domain Ontology. (NN), Noun Phrases (NP), verb
A tool named “ Requirements Analysis and (VB) and store them in Concept-
Class Diagram Extraction (RACE) ” is being list.
proposed by the authors that analyzes the tex- Step 7. For each concept in concept-list, if
tual requirements, finding out the relationships any other concept is synonym with
and finally extracts the class diagram. present one, then it can be con-
The RACE system consists of different inter- veyed that both are semantically
nal and external components or sub-systems. related.
These can be described as follows: Step 8. For each concept in concept-list, if
any other concept is item Require-
i. OpenNLP Parser: The OpenNLP parser ment document is taken as input.
used in this paper for lexical and syntactic Step 9. Stop words are identified and
parsing. The parser takes English text as stored as Stop-words Found list
input and provides corresponding POS tag
for each word as output. Step 10. Calculate the frequency of each
words in the document, except the
ii. RACE Stemming Algorithm: Stemming is Stop words.
a process of removing affixes and suffixes Step 11. Use RACE stemming algorithm to
from a word and generating the base word. stem each word and store them in
The generated base word reduces the re- a list.
dundancy and increases efficiency.
Step 12. Use OpenNLP to parse whole doc-
iii. WordNet: It is used to validate the seman- ument.
tic of the sentences that generated after Step 13. From the parsed output extract
syntactic analysis. It also helps to display the words with POS Proper
hyponyms for a selected noun, which helps Nouns(NN), Noun Phrases (NP),
to know the “a kind of” relationship. verb (VB) and store them in
Concept-list.
iv. Concept Extraction Engine: This module Step 14. For each concept in concept-list, if
is used to extract concepts according to the any other concept is synonym with
requirement document. The algorithm for present one then it can be con-
this module is as follows: veyed that both are semantically
related.
Step 1. Requirement document is taken as
input. Step 15. For each concept in concept-list,
if any other concept is hyponyms
Step 2. Stop words are identified and
with present one i.e., lexically
stored as Stop-words Found list
same then it can be conveyed
Step 3. Calculate the frequency of each that former is a kind of later and
words in the document except the saved in Generalization-list. with
Stop words. present one i.e., lexically same
46 Abinash Tripathy and Santanu Kumar Rath

then it can be conveyed that for- – Using step 8 of concept ex-


mer is a kind of later and saved in traction engine, the element
Generalization-list. of generalization-list transferred
as Generalization (is-a) relation-
v. Domain Ontology: It is used to improve ship.
the performance of concept identification. – If there exists a sentence having
In RACE system Library system ontology (CT1-VB-CT2) where CT1 and
is being used. XML is being used to build CT2 are classes, then VB is an
the ontology. association rule.
vi. Class Extraction Engine: The input to – If the sentence is of the form
this module is the output of “Concept Ex- CT1+R1+CT2+“AND”+CT3
traction Engine”. During this step, some where CT1, CT2 and CT3 are
heuristic rules are used by the authors to classes and R1 is the relation-
extract the class diagram. The rules are as ship, then there exists relation-
follows ship between (CT1,CT2) and
(CT1,CT3).
• Class Identification Rules: The rules
vii. RACE Concept Management: User inter-
used for extraction of classes are:
action is important in RACE system. The
– If the occurrence of the concept is UI helps to perform tasks such as creating
only one or frequency is 2%, then and printing requirement and acts as an in-
the concept is ignored as class. terface to add, modify, view and organize
– If the concept is related to design relationships.
elements, location name or per-
son name, then ignore as class. The RACE system is implemented using C#,
MS Access is being used for database opera-
– If the concept found in high level
tion, and to open textual requirements word
of hypernyms tree or an attribute
document, text file, rich text file, and HTML
then ignore it as a class.
file are being used.
– If the concept is a noun phase
and the second part is an at- 2.2.5. Overmyer and Rambow,2001 [12]
tribute then consider the first This paper of Overmyer and Rambow pro-
part for class name. poses a tool called Linguistic assistant for Do-
• Attribute Identification Rules: The main Analysis (LIDA) that provides linguis-
rules for attribute identification is as tic assistance for model development process.
follows This tool helps to obtain the OO model for a
domain using UML. In order to perform this
– If the concept is a noun phase in- task, large volume of text from “Legacy sys-
cluding underscore between two tem” is collected. The LIDA tool considers the
nouns, then the first noun is a following features
candidate for class name and the
second part is attribute of that • Domain independent linguistic process-
class. ing used to group different form of base
– If the concept has only one value words using POS and to find multi-word
then it is an attribute. terms.

• Relationship Identification Rules: • The final output is in the form of full text,
The rules for relationship identifi- word-list and UML model in parallel. So
cation is as follows: the user can compare all of them.
Object Oriented Analysis using Natural Language Processing concepts: A Review 47

• Key Word In Context ‘KWIC’ view dis- • Suggest operations for combining el-
plays the words or group of words in sen- ements to class model.
tences. • Add textual context helpful for pro-
• Hypertext description model used to help cessing model builder.
in documentation of the model. • Generate textual description of
model for documentation and valida-
• Completed model can be exported any tion of model with domain expert.
CASE tool or any model can be imported
from any CASE tool to LIDA. ii. LIDA text description: LIDA uses Model-
Explainer an integrated tool, which gen-
LIDA consists of following components: erates the hypertext description document
i. Text analysis environment: This compo- for object model. This document is gener-
nent is the main component of LIDA as ated from customized text which includes
it provides the central functionality. The the class information like superclass, sub-
main functionalities this component per- class, attributes, operation and association
forms are: with other class. These descriptions help
to obtain additional information about the
• It takes the text input in RTF and final result.
ASCII format.
The following Table 2 provides a comparative
• Then it assigns POS tag to each word. analysis of the approaches to obtain the ele-
For POS tagging, It uses MXPOST, a ments of OO analysis from SRS.
software tool developed at University
of Pennsylvania,USA. 3. ANALYSIS OF APPROACHES
• Base word is obtained form each word
and their frequency is calculated. In present day scenario, the use of object ori-
ented system is widely applied for software
• Multi-word phases are checked for a development[15-19]. The customer mention all
given base word. it’s requirements in a document called Software
• Users are allowed to mark the words requirement specification. This SRS document
or phases as candidate model and is written in NL which is understandable by
highlights these words in the text . the customer side, but it is sometimes incom-
• Retrieve textual context of marked plete and ambiguous. The development team
words. need to go through these document and gener-
ate UML diagram and analyze on basic of OO
Mode editing environment: This model of- analysis. The UML being very often used for
fers the functionality requirement to gen- OOA tries to fix the class diagram where class
erate a model from the proposed model is also basic element of OO system.
marked in LIDAs Text Analyzing Environ-
During the course of the paper, it can be found
ment. The functional features of this com-
out that different approaches are adapted to
ponents are:
generate the class diagram and its correspond-
• Display list of candidate model ele- ing details. These approaches can be men-
ment marked and add them to model tioned as follows:
editing environment. Transfer of in- • The software requirement document is
formation between text analysis en- considered as an input for the analysis.
vironment and mode editing environ-
ment helps developer to analyze the • As it is written in NL, it contains
problem in details. some ambiguities or unwanted informa-
48 Abinash Tripathy and Santanu Kumar Rath

Table 2
Comparison of generation of OO elements from SRS using manual approach

Authors Proposed Approach Advantage Limitation

Abbott [5] This paper analyzes the English statement of SRS Comparatively easy to find out the candidate Domain knowledge is required for the analysis.
and generate elements of OO analysis. Identifying for class and it’s details
data type, objects, operators and Control structure

Saeki et al. [6] This paper derive formal specification from informal The informal SRS document is refined and The large size of informal specification may be
specification in English and from that obtain ele- rewritten to a formal document understand- a concern and also further analysis on nouns
ments of OO. Generate Noun table, Verb table, Ac- able for everyone. needed as the proposed approach is mainly
tion table, Action Relation table and Module Design verb oriented.
document from formal specification

Nanduri and Rugaber [7] This paper extract the candidate objects, methods A graphical model is generated from the re- Parser inadequacy, ambiguous and incomplete
and its association from requirement document then quirement document specification and lack of domain knowledge
composing them to generate object model. Use link makes the final result unsatisfactory.
grammar based parser to parse sentence and generate
the object diagram from knowledge gained.

Juristo et. al.[8] This paper uses the linguistic information from infor- The proposed approach prevent incorrect As the process totally depend on requirement
mal specification. Analyzes the information semanti- modeling construct and model can be repeat- specification, an assumption taken that the
cally and syntactically and finally apply semi-formal able textual document is correct.
procedure to obtain OO system component.

D. Popescu et al. [9] This paper identify the ambiguities in NL SRS. The The OOAM diagram is generated using tool, Only the static behavior of SRS is considered,
proposed “Dowser” tool use constraining grammar, again verified by human analyst for better ac- it does not manage modeling behavior.
NL parsing and Transformation rule to generate the curacy.
Object model.

Ibrahim and Ahmad [10] This paper uses the requirement analysis process and RACE find the concepts based on nouns, noun It could not find out one to one, one to many
extract class diagram using NLP and Domain On- phase and verb analysis. It can able to find or many to one relationship and RACE is not
tology. The proposed RACE tool analyzes textual generalization, association , composition, ag- platform independent, it works only in win-
requirements, finds relationship among them and fi- gregation and dependency relationships. dows platform.
nally generate class diagram.

Harmain and Gaizauskas [11] This paper uses CM-Builder tool for OO analysis. The proposed model used different linguistic The final output is obtained in CDIF form
After analysis of software requirement document a technique to analyze and define rule to gen- which is not understandable by everyone and
discourse model is designed from which the object erate candidate for class model. So, the am- a CASE tool supporting CDIF needs to gen-
class and relationships is generated. biguities present in the software requirement erate class diagram graphically
document do not hamper the result.

Overmyer and Rambow [12] The paper uses LIDA tool to provide linguistic in- It provides a graphical approach to analyze The text analysis carried out is mostly manual
formation assistance in model development process. the text and have features that can simplify so it is time taking and the analyst should be
This assistance facilitates the analysis and extent the the process of class generation. a domain expert which is quite difficult to find
creation of class model. out.

Mich [13] This paper uses an NL-OOPS tool based on It provides an graphical interface which make In order to make the analysis fully automated,
LOLITA. The OO modeling module, use algorithms the process of generation quite easy. Again a senior analyst have to control the output and
that filter the entity and event nodes generated by this tool can be very easily integrated to other the final output class model is not at par with
LOLITA and identify classes and associations. CASE toolto support lower level development. the UML class diagram.

tion. So, in order to remove that different • After obtaining the root noun words, the
steps are carried out in all papers. higher frequency nouns are considered
and they are the most eligible onces for
• Each words in the text is tagged with a fixing class name.
POS. Then the words are combined to-
gether depending upon their POS. • For operations in class, the verbs present
in the sentence are the best candidate.
• The noun and verb tagged words are • The Adjectives present in text act as an
mainly used for class name and their op- attribute for the class for that noun it
erations respectively. So these words are tries to modify.
then stemmed to obtain the root word
and their suffixes. • In order to find the relationship between
Object Oriented Analysis using Natural Language Processing concepts: A Review 49

classes, the relation between the subject ing, in Proceedings of the Twenty-Eighth
and object of a sentence is found out. Hawaii International Conference on System
Sciences, 3, IEEE, pages 362–368, 1995.
• For other rules like multiplicity deter- 8. N Juristo, A M Moreno and M L ó pez.
mines are used that specify the relation- How to Use Linguistic Instruments for Object-
ship like one-one, one-many, many-one, Oriented Analysis, IEEE software, 17(3):80–
many-many. 89, 2000.
9. D Popescu, S Rugaber, N Medvidovic and
4. CONCLUSIONS AND FUTURE D M Berry. Reducing Ambiguities in Re-
SCOPE quirements Specifications via Automatically
Created Object-Oriented Models, in Innova-
There are different tools that have been devel- tions for Requirement Analysis. From Stake-
oped to analyze the text; but as there is no holders Needs to Formal Designs, Springer,
exhaustive dictionary which helps to provide pages 103–124, 2008.
POS for each words. Although few tools gen- 10. M Ibrahim and R Ahmad. Class Diagram Ex-
erate the class diagram but different authors traction from Textual Requirements using Nat-
suggest that a manual intervention is needed to ural Language Processing Techniques, in Pro-
improve the final result. Until and unless there ceedings of IEEE 2010 Second International
is specific rules for writing the SRS document, Conference on Computer Research and Devel-
the ambiguities continue to be present in it and opment, pages 200–204, 2010.
that cause issue in compiling the SRS. Though 11. H M Harmain and R Gaizauskas. Cm-builder:
many approaches have been proposed and also An Automated Nl-based Case Tool, in Pro-
are used to obtain the elements of OO analysis ceedings of 15th IEEE International Con-
still there is scope for research in this area. To ference on Automated Software Engineering,
automated understanding the SRS written in pages 45–53, 2000.
informal NL is also an issue in research. 12. S P Overmyer, B Lavoie and O Rambow. Con-
ceptual Modeling through Linguistic Analysis
REFERENCES using Lida, in Proceedings of the 23rd Inter-
national Conference on Software Engineering,
1. J Rumbaugh, I Jacobson and G Booch. Unified IEEE Computer Society, pages 401–410, 2001.
Modeling Language Reference Manual, Pear- 13. L Mich and R Garigliano. Nl-oops: A Require-
son Higher Education, 2004. ments Analysis Tool Based on Natural Lan-
2. R S Pressman. Software Engineering: A Prac- guage Processing, in Proceedings of Third In-
titioner’s Approach, McGraw-hill New York, 7, ternational Conference on Data Mining Meth-
2010. ods and Databases for Engineering, Bologna,
3. E Kumar. Natural Language Processing, IK Italy, 2002.
International Pvt Ltd, 2011. 14. A D Booth. Machine Translation, North-
4. K S Jones. Natural Language Processing: A Holland Publishing Company, 1967.
Historical Review, in Current Issues in Com- 15. J Rumbaugh, M Blaha, W Premerlani, F Eddy,
putational Linguistics: in Honour of Don W E Lorensen et al.. Object-oriented Modeling
Walker, Springer, pages 3–16, 1994. and Design. Prentice-hall Englewood Cliffs, NJ,
5. R J Abbott. Program Design by Infor- 199, 1991.
mal English Descriptions, Commun. ACM, 16. F N Paulisch and W F Tichy. Edge: An Ex-
26(11):882–894, Nov. 1983. tendible Graph Editor, Software: Practice and
6. M Saeki, H Horai and H Enomoto. Software Experience, 20(1):S63–S88, 1990.
Development Process from Natural Language 17. M Jackson. Developing Ada programs using
Specification, in Proceedings of the 11th Inter- the Vienna Development Method, Software:
national Conference on Software Engineering, Practice and Experience, 15(3):305–318, 1985.
ser. ICSE ’89, New York, NY, USA: ACM, 18. R Gaizauskas, K Humphreys, H Cunningham
pages 64–73, 1989. and Y Wilks. University of sheffield: Descrip-
7. S Nanduri and S Rugaber. Requirements Vali- tion of the Lasie System as used for muc-6, in
dation via Automated Natural Language Pars- Proceedings of the 6th Conference on Message
50 Abinash Tripathy and Santanu Kumar Rath

Understanding, Association for Computational Analysis.


Linguistics, pages 207–220, 1995.
19. R E Callan. Building Object-Oriented Sys- Santanu Kumar Rath
tems: An Introduction from Concepts to is a Professor in the De-
Implementation in C++, Computational partment of Computer
Mechanics, 1994. Science and Engineering,
NIT Rourkela since 1988.
Abinash Tripathy is currently His research interests are
pursing his Ph.D at National In- in Software Engineer-
stitute of Technology, Rourkela. ing, System Engineering,
He obtained his Master degrees, Bioinformatics, Natural
M.Sc Computer Science from Language Processing and Management. He is a
Utkal University, Bhubaneswar Senior Member of the IEEE, USA and ACM, USA
and M.Tech Computer Science and Petri Net Society, Germany.
and Engg. from KIIT Univer-
sity, Bhubaneswar. His research
interest are Software Testing,
UML, Natural Language Processing and Sentiment

You might also like