Automatic parsing For Arabic sentences
Zainab Ali Khalaf*
School of computer scienceUniversiti Sains Malaysia (USM)Penang, MalaysiaE-mail:zak10_com026@student.usm.my*(Ass. Prof. In Computer Science Dept.,Basra University, Iraq)
Dr. Tan Tien Ping
School of computer scienceUniversiti Sains Malaysia (USM)Penang, MalaysiaE-mail:
tienping@cs.usm.my
Abstract
__
The designed system is a parser for Arabicsentences using syntactic and semantic relationsbetween deep and surface structures. The systemdepends on implementation of Case theory of Fillmore.
The parsing algorithm starts analyzing the inputsentence to check its syntax, semantic and spelling usingArabic transformation rules proposed in Al_Khouly togain semantic strength. The proposed system dependson the effective elements represented by the verb of thesentence .This element is used to control the parsingoperation.The proposed system permits as input differentsurface structures of Arabic sentences to produceautomatic parsing forms for these input sentences.
Keywords__Artificial intelligence; natural languageprocessing; transformation rules; deep structure and surface structure; parsing Arabic sentences .
I.
I
NTRODUCTION
Arabic language is a parsing language . Parsingmeans the relation among the words in the sentence.The most important component is the verb which actsas the basic unit to control the rules of choosing otherelements. Although Arabic sentences have differentstructures , but it is recognized as a ( verb , subject ,object ) language. The subject or the object may beprecede the verb in the Arabic sentences according tothe pragmatic necessity [1,3,4].Arabic Syntactic facilitates the flexibility of thedeep structure and the surface structure of sentence tobe connected together strongly. This propriety helps
Arabic language accept for automatic processing[4,5].The proposed system aims to use these propertiesto parse Arabic sentences depending on the positionof the words in the sentence and the functionalmeaning of them.II.
S
YSTEM
C
OMPONENTS
The syntactical properties of any naturallanguage are formally described by the use of whatChomsky calls production systems. A formal systemgenerally depends on three types of data [2,3,6]:
A.
Data of vocabulary lexicon
The lexicon plays an important role in any NLPsystem. It is a huge data base of variable entriesdescribing the meaning of words in synonymy (andantinomy) contextual fashion [3,6]. The implementedlexicon consists of entries saved as a rule
( Entrance[ Word , Features ] ).
•
The Entrance
is one of the following indicators :-Verb , Noun , Preposition , Determinate , Assistantand Negation.The Word is a string index for the lexicon entry.
•
The Features
is a list of structured integers codedto hold the syntactical and
semantic information of the word. Each coded integer, written as
[Fp]
,consists of two parts F and p. The [p] part is either 1or 0 depending on whether the feature
[F]
exists ornot. The
[F]
part is the feature code.
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 3, March 201158 http://sites.google.com/site/ijcsis/ISSN 1947-5500