You are on page 1of 5

Proceedings of the International Conference on Computer and Communication Engineering 2008 May 13-15, 2008 Kuala Lumpur, Malaysia

Unicode Searching Algorithm Using Multilevel Binary Tree Applied On


Bangla Unicode

Md. Akhtaruzzaman
ARKSIT Limited, Bangladesh. (www.arksit.com)
akhter900@yahoo.com, akhter900@gmail.com

Abstract contain two or more independent or dependent


characters, must joined with a symbol named
Unicode Searching Algorithm using multilevel ‘Hasanta’ ( ).
binary tree is proposed to search the Unicode in
In plain text all the characters and symbols may
efficient way. The algorithm is applied on Bangla
placed independently anywhere in a sentence. Bijoy
Unicode searching to convert Bijoy string into
follows this rule. But Unicode maintains a unique
Unicode string. Firs, the algorithm build a multilevel
format to use the symbols with a character. In Unicode
binary tree based on a multilevel binary sorted data
the symbols must be used after a character with no gap
containing ASCII code and its corresponding Unicode.
between them i.e. “character + symbol”. But in some
The data must be sorted based on ASCII code. The
cases like ‘Chandrabindu’ or ‘Ref’ the placement is
algorithm takes Bangla Bijoy string as input value and
different. ‘Chandrabindu’ must be used after a
output the same string in Unicode format. The input
character if no symbols are exists with that character
Bijoy string must be in Unicode readable format
i.e. “character + Chandrabindu”. If any symbol is exist
with a character then the ‘Chandrabindu’ is used after
I. INTRODUCTION the symbol i.e. “character + symbol + Chandrabindu”.
There exists some Bangla writing software like ‘Ref’ must be placed before character. Figure 1 shows
Bijoy, Avro, Akkhor etc. Bijoy is the most popular and an example, representing Bijoy plain text and its
oldest software which is used to write the plain text Unicode readable format of Bangla sentences.
only. Avro uses Unicode to write Bangla sentences.
Now a day Unicode format is used to write any
language and sometimes it is necessary to convert the
plain text into Unicode format. So searching Unicode
is necessary. In Unicode, there are 65,535 distinct
characters that cover all modern languages of the
world. Here a Multilevel Tree based Unicode
searching algorithm is proposed which will be more
efficient and reliable for searching Unicode of
different languages.

II. PRELIMINARY STUDIES

A. Bijoy String to Unicode Readable Format String Fig. 1 Bijoy plain text and its Unicode readable formatted text
showing ASCII and Unicode of corresponding character.
There exist 11 independent characters (vowel) and
39 dependent characters (consonant) [6] in Bangla B. Multilevel Binary Sort
literature. There also exist some independent and Binary search is a well known and efficient search
dependent character symbols called ‘Kar’ and ‘Fala’ algorithm. To apply it on a list, the list must be sorted
respectively. These symbols must be used with a in increasing or decreasing order. Here the term
character. A large number of Complex Characters ‘Binary Sort’ is used because the method follows the
(combination of two of more characters) exist in technique of Binary Search algorithm and rearranges
Bangle language. A single Complex Character may the data into in-order tree format.

978-1-4244-1692-9/08/$25.00 ©2008 IEEE 445


Figure 2 shows the Binary Sorted data generated level. Here exist 2 threes for common value ‘2’ of the
from a non-decreasing sorted list. In the Binary sorted first level and is treated as single three. Applying
data the first value ‘8’ indicates the middle value of Binary Sort on the values of common two and five of
Simple sorted data. Second value ‘4’ is the middle first level the values will be rearranged where the
value of the first division (1 to 7) and so on. sequence of three of second level will remain
unchanged. In the third level the sort will be applied on
the values of the common three of second level. At the
end of the process the three level non-decreasing data
will become as the last format shown on the figure 3.
Now if the multilevel Binary Sort is applied on the
Fig. 2 Binary Sorted data generated from a simple non-decreasing
sorted list.
data of the first table in figure 4 based on the ASCII
value, the result will become as the same as the data of
the second table in figure 4.
Here another term ‘Multilevel’ is used with the term
‘Binary Sort’ because this method is applied on
multilevel sorted data. Figure 3 shows the Multilevel A. Binary Tree and Binary Search Tree
Binary Sorted Data formation. Here in the first level, Binary tree is a basic architecture in which each node
has two own child. Binary Search tree is organized in
an in-order binary tree. The tree may be represented in
linked data structure where each node is an object.
Figure 5 shows a graphical representation of a
complete binary tree based on the Binary Sorted (in-
order) data shown in figure 2.

Fig. 3 Three level Binary Sorted data generation from three levels
non-decreasing sorted list. Fig. 5 A complete binary tree structure

the non-decreasing list has 5 twos and 3 fives which Let ‘x’ is a node of a Binary Search Tree and ‘y’ is its
are treated as single integer each. So applying Binary left child, then y < x. If ‘z’ is right child of ‘x’, then z
Sort on the basis of first level the sequence of 2 and 5 > x or z = x. The time complexity of binary search tree
will remain unchanged. Now come to the second is O(h) where h is the height of the tree.

III. MULTILEVEL BINARY TREE


Binary search tree can be defined in two ways based
on two conditions, z > x and z >= x (here z and x are
denoted as right and root nodes respectively). But in
the Multilevel Binary Tree structure only z > x is
considered because all the same values of a single level
of the Multilevel Binary Tree treated as a single node.
If a desired value is found in any node of a level it will
transfer its control to its branch node which holds the
root of the next level tree of that node. The same
condition is true for each node of a level of the Tree.
Suppose ‘T’ is a Multilevel Binary tree. Then T is
Fig. 4 Multilevel Binary Sort is applied on the ASCII based called a Multilevel Binary tree if each node N of T has
Unicode data list. the following properties:

446
• N must have two leaf nodes and one branch Param2: Integer value as the length of the array
node. Leaf nodes may hold sub tree of its level
Param3: Integer value indicating the point of that
and the branch node must hold its next level
array from where the ASCII codes will be used.
tree of that node.
• The value of N must be grater than every value Param4: String value containing the corresponding
in its left sub tree (L) and must be less than Unicode.
every value in its right sub tree (R) i.e. N > L Param5: MLBST_Node type value indicating the
and N < R. Parent Node.)
• If any value V is equal to N of that level, then
V’s corresponding values will formed Start: If the ‘Param3’ is greater or equal to
the next level tree with the same properties. the ‘Param2’ then return TRUE.
If ‘Param5’ is equal to NULL then
Figure 6 shows the 3 levels Binary tree formed as
the values of the 3 level Binary Sorted data. First box Start: AsciiCode := value of ‘Paremeter1’ at the
indicates the first level tree then second level tree and position of ‘Param3’.
finally third box shows the third level tree. On the tree
97 and 170 has no repetition to maintain the properties If ‘Param3’ is equal to the ‘Paremeter2’ then
of the Multilevel Binary Search Tree (MLBST). Start: UniCode := value of ‘Param4’End:
Create a BranchNode of ‘Param5’.
Increment ‘Param3’ by One.
Call Buield_MLBST method with the
BranchNode of ‘Param5’ and return.
End:
If the value of ‘Param1’ at the position of
‘Param3’ is greater than
AsciiCode of ‘Param5’ then
Fig. 6 Three level Binary Sorted Tree Start: If RightNode of ‘Patameter5’ is NULL then
Start: Create a RightNode of ‘Param5’ End:
IV. ALGORITHM
Here first MLBST_Node class is declared that holds Call Buield_MLBST method with the
the elements and links of the nodes. The class, RightNode of ‘Param5’ and return.
MLBST_Node, contains first two variables to hold
searched Unicode data and last three values contains End:
the MLBST_Node type object indicating the two leafs Else If the ‘Param1’ at the position of ‘Param3’
and a branch node.
is less than AsciiCode of ‘Param5’ then
The Buield_MLBST method builds the MLBST
(Multilevel Binary Sorted Tree i.e. in-order tree) from Start: If LeftNode of ‘Param5’ is NULL then
the Multilevel Binary Sorted data. Start: Create a LeftNode of ‘Param5’ End:
------------------------------------------------------------ Call Buield_MLBST method with the LeftNode
*Declaring MLBST_Node Class of ‘Param5’ and return.
Start: AsciiCode, UniCode, LeftNode, RightNode, End:
BranchNode End:
Else If the ‘Param1’ at the position of ‘Param3’
*Declaring Buield_MLBST method (
is equal to AsciiCode of ‘Param5’ then
Param1: Integer array containing the ASCII codes of a
single character (simple/complex). Start: If BranchNode of ‘Patam5’ is NULL then

447
Start: Create a BranchNode of ‘Param5’ Increment StartIndex by One.
End: Increment ‘Param3’ by One.
Increment ‘Param3’ by One. Call Search_MLBST method with the
Call Buield_MLBST method with the BranchNode of ‘Param5’ and return.
BranchNode of ‘Param5’ and return. End:
End: End:
End:
V. COMPLEXITY ANALYSIS
*Declare UniCodeTemp as a string type variable.
*Declare StartIndex as an integer type variable. A. Build Tree
MLBST is like the Binary tree so each nod of this tree
*Declaring Search_MLBST method (
has two child nodes (Branch node holds its next level
Param1: Integer array containing the ASCII codes of a tree so branch node is not considered). Let the height
single character (simple/complex). of each level tree of the MLBST is ‘h’. So, each level
tree holds ( 2 − 1 ) nodes. Here each node has a branch
h
Param2: Integer value as the length of the array.
node that holds the root of its next level tree. If the
Param3: Integer value Indicating the point of that MLBST has two level trees then the total nodes
array from where the ASCII codes will be used. become ( 2 * 2 = 2 ). Here ‘-1’ is omitted for large
h h 2h

Param4: MLBST_Node type value indicating the value of ‘h’. If MLBST has ‘n’ level tree then the total
nh
Parent Node.) nodes will be 2 and complexity will be Ο(2 ) .
nh

Start: If the ‘Param3’ is greater or equal to the


B. Search Tree
‘Param2’ then return the UniCode of that Node.
Let the height of each level tree is ‘h’ and the MLBST
If the ‘Param1’ at the position of Param3’ is has ‘n’ trees. In worst case to search a desired data the
greater than AsciiCode of ‘Param5’ then ( )
complexity becomes Ο nh .
Start: If RightNode of ‘Patameter5’ is NULL then The best case occurs when the searched data exists in
return FALSE. the root of the first level tree.

Else Call Search_MLBST method with the C. Complexity of MLBST applied on Bangla Unicode
RightNode of ‘Param5’ and return. In Bangla the total characters are approximately 300
including characters (vowel, consonant, complex
End: character) and symbols. Again, for a character the
Else If the value of ‘Param1’ at the position of ASCII code level is not larger than 4. Figure 7 shows a
Bangla complex character having three level ASCII
‘Param3’ is less than AsciiCode of ‘Param5’ then values and its corresponding UNICODE.
Start: If LeftNode of ‘Patameter5’ is NULL then
return FALSE.
Else Call Search_MLBST method with the
LeftNode of ‘Param5’ and return.
Fig. 7 Bangla complex character having three levels ASCII.
End:
Else If the value of ‘Param1’ at the position of Considering the ASCII level the MLBST has 4 level
‘Param3’ is equal to AsciiCode of ‘Param5’ then trees i.e. n = 4 . Let the first level tree holds the 300
Start: UniCodeTemp := UniCode of ‘Paremeter5’. nodes with the first level ASCII values. The each
second level tree contains not more than 6 nodes of

448
second level ASCII values (consider 10 rather than 6). In the last two terms have (+/-) together. It indicates
Third and fourth level tree has a little number of nodes that the even position of a term has the sign ‘-’
with the 3rd and 4th level ASCII values. Let the number (negative) and odd position of a term has the sign ‘+’
(positive).
is 4. So, for the first level tree, 300 = 2 − 1 . Taking
h

log in both sides the equation becomes,


log 301 = h log 2 i.e. h = 8.233 . So for the 1st VI. CONCLUSION
The algorithm is applied on Bangla Unicode but it may
level tree it can be shown as h1 = 9 (selling) i.e. height be applied on its related problem. It can also be used
of the first level tree is 9. Similarly, for the 2nd level Unicode searching of other various languages. The
tree,
h2 = 4 (selling) and for 3rd and 4th level tree,
main drawback of the algorithm is that its tree building

complexity is Ο( 2 ) but MLBST build its tree only


nh
h3 = h4 = 3
(selling). So the total nodes of the MLBST
for first time.
becomes,

( 2 9 − 1) * ( 2 4 − 1) * ( 2 3 − 1) * ( 2 3 − 1) . REFERENCES
[1] Introduction to Algorithms (Second Edition) by. Thomas H.
Omitting ‘-1’ the tree building complexity becomes Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
Ο(2 ) and searching complexity becomes Ο(19) .
19
[2] http://www.acm.uiuc.edu/conference/index.php
[3] http://www.jorendorff.com/articles/index.html
[4] http://www.unicode.org/unicode/reports/
[5] tr10/tr10-8.html
Representation of total number of nodes of a MLBST [6] http://www.connect-bangladesh.org/bangla
Let the MLBST has ‘n’ number of level trees. If each [7] webbangla.html
[8] http://www.betelco.com/bd/bangla/bangla.html
level has a different size of height then the total
number of nodes of MLBST can be represented as

N = (2 h1 − 1) * (2 h 2 − 1) * (2 h3 − 1) * ... * (2 hn − 1)

Here N indicates the total number of nodes and ‘h1’ is


the height of first level tree, ‘h2’ is the height of the
each second level tree and so on. The equation can also
be represented as,
n n n

∑hb ∑hb )−ha ∑hb )−( ha +hm )


n
Cn −1 ( n−1 n (
N = 2 b=1 − ∑ 2 b=1 +∑ ∑2 b =1

a=1 a=1 m=a+1
n
n−2 n−1 n ( ∑hb )−( ha +hm +hn ) n
Cn−1

∑∑
a=1 m=a+1 n=m+1
∑2 b =1
+ .......± ∑ 2ha ± 1
a=1

449

You might also like