Professional Documents
Culture Documents
1 ABSTRACT Steganography is one of the oldest methods for securely sending and transferring secret
2 information between two people without raising suspicion. Recently, the use of Artificial Intelligence (AI)
3 has become simpler and more widely used. Since the emergence of natural language processing (NLP),
4 building language models using deep learning has become more. Furthermore, because of the importance
5 of concealing secret information in delivered messages, Artificial Intelligence theories along with Natural
6 Language Processing algorithms were employed to conceal secret information within the text cover. The
7 Arabic language was used because of its large number of words, vocabulary, and linguistic meanings, and
8 its most significant feature is Arabic poetry. This study discovered a new way to hide secret data inside
9 newly formulated Arabic poetry based on previous Arabic poetic texts and a database of a number of Arab
10 poets from the ancient and modern eras using Artificial Intelligence and Long Short-Term Memory (LSTM)
11 theories to increase storage capacity by 45 percent. The linguistic accuracy and volume of secret data hidden
12 within the formulated poetry were increased using a Baudot Code algorithm, where the secret data is hidden
13 at the level of letters rather than words, and the linguistic accuracy and volume of secret data hidden within
14 the formulated poetry were increased to eliminate the drawbacks found in previous studies.
15 INDEX TERMS Text steganography, hiding information, letter frequency, LSTM, Baudot code.
16 I. INTRODUCTION the basis of the type of cover that carries the message into 33
17 As electronic communication methods become more preva- four categories, namely the Image cover, Audio cover, Video 34
18 lent in the modern era, nearly most human activities like cover, and Text cover [3]. Because every second comprises 35
19 the transmission of confidential and financial data and infor- 24 frames as a moderate limit, a video cover is good. It is 36
20 mation depend completely on electronic communication used to hide high confidential data, large size, and excellent 37
21 methods [1]. Because data transmission in cyberspace has efficiency [4], and some use the video cover as watermarks to 38
22 numerous drawbacks and concerns, and intruders and infor- protect the copyrights of digital videos [5]. Text cover is con- 39
23 mation trolls significantly impact computer network users’ sidered one of the oldest methods used to conceal confidential 40
24 fears, data and information cannot be transmitted smoothly texts and information. The first recorded use of this term was 41
25 due to the fear of being exposed to and profiting from it by in 1499 by Johannes Trethimius in his book Steganographia, 42
26 a third party. Therefore, the security of data and information which is a treatise on encryption and concealment disguised 43
27 transmission has become one of the most important sciences as a book on magic [6]. Text cover is one of the most difficult 44
28 and topics of interest to all Internet users [2]. ways to hide information compared to the others (image, 45
29 Steganography is one of the oldest methods for concealing audio, and video), due to the lack of a large space and storage 46
30 information and sending it without danger of its secret content space capacity like the other methods. 47
31 being discovered since it is contained inside a cover that The writing method and alphabetic letters utilized in this 48
32 serves as the carrier of the secret text. It is classified on study were based on the most extensively used types of 49
The associate editor coordinating the review of this manuscript and over 31 living languages that use the same Arabic alphabet 51
approving it for publication was Jerry Chun-Wei Lin . graphics, such as Persian, Urdu, Ottoman Turkish, and so 52
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 94403
O. F. A. Adeeb, S. J. Kabudian: Arabic Text Steganography Based on Deep Learning Methods
53 on. We will construct words using artificial intelligence tech- TABLE 1. Dotted letters.
54 nologies and then hide secret messages within the generated
55 text cover. We contributed to the security of transmission
56 and reception of secret information with high capacity and
57 security and height speed to eliminate the possibility of dis-
58 covery by finding similarities with previously written texts.
59 In the end, the main objective of steganography remains to
60 avoid raising any suspicion when the process of transmitting
61 a message carrying confidential information between two
languages are used because they contain the character- 107
62 parties, regardless of the cover that carries the data (whether
istics of multiple characters, which help in the process 108
63 that is text, audio or image). This paper is organized as
of hiding important information within the texts cover, 109
64 follows. The first section is a general section on how to hide
and among these characteristics are the dotted letters 110
65 information and hide a text that used Arabic language or
that characterize these two languages. While English 111
66 other languages that are similar in terms of letter forms. The
language has only 2 dotted letters i and j [9]. 112
67 second section provides an overview of some of the previous
68 works. In the third section, the algorithms and ideas required 1.1. Some studies use the text as a cover to conceal a 113
69 for the proposed method are explained in detail with the secret text based on the dots of the letters, where 114
70 results. Finally, the fourth section concludes the paper with the dots of the letters are displaced or moved by 115
71 the conclusions reached. producing a new font with letters that are removed 116
73 Persian and Arabic are considered the most ancient lan- displacement of all dots occurs simultaneously, 119
74 guages. They have the same characteristics, but the letters of and the secret text is hidden by converting it to the 120
75 the Persian language exceed Arabic by four letters, including binary system and then moving the dots bit by bit 121
to hide it. If the hidden bit has the value 1, the dots 122
76 ( ). The Arabic and Persian languages are similar in
of the chosen letter are moved, but if the hidden 123
77 writing [7], i.e., Farsi and Arabic are written from right to
bit has the value 0, the letter remains unchanged. 124
78 left in their written form, but the pronunciation is different in
1.2. This study is in line with the previous research 125
79 some letters and words [3].
on using semicolons to hide the secret text inside 126
80 Both languages are characterized by some features not
the cover text by hiding 2 bits in each hiding 127
81 found in other languages, such as the number of dotted let-
process by moving the letter dots horizontally and 128
82 ters. The number of dotted letters in the Persian language is
vertically. No change is made to the letters if the 129
83 18 out of 32, while in Arabic is 15 out of 28, as in table (1).
bits are in the sequence 00. In the case of the 130
84 Likewise, both languages do not contain vowels; but instead
sequence of bits 01, a slight horizontal space will 131
85 they contain decorative features that help in pronouncing the
be added between the dots of the same letter. Also, 132
86 letters, which are 8 forms ( ). In addition, each letter
if the sequence of bits is 10, the letter dots will 133
87 contains four forms of witting. For example, when written
move vertically with a slight space. In the event 134
88 separately, one of the letters is written as ( ), at the beginning
of the bit sequence 11, the dots will be moved 135
89 of the word, it is written ( ), at the middle ( ), and at the end
in both directions (horizontal and vertical). The 136
90 of the word ( ), where each of them has 16 bits [6].
amount of this movement is about 1/300, and it 137
91 Persian and Arabic text steganography is divided into seven
is so small that it is nearly invisible to the naked 138
92 known methods:
eye. Still, it can be determined using a specific 139
93 1- Dotted Letters program designed to extract hidden text from the 140
100 1. For a person who does not know either of the two ( ).The extension can be found in the Persian 147
101 languages, the Persian and Arabic script will be con- and Arabic languages as a kind of embellishment and 148
102 sidered the same. This is true to a large extent, since adjustment to equalize the length of all the lines of 149
103 the modern Persian language is used in many coun- the text where the extension is made between any two 150
104 tries, such as Afghanistan, and the writing is in the letters except for the end of the previous seven letters. 151
105 Persian-Arabic script, which is the Arabic text, but with Thus, the extension cannot be made at the beginning 152
106 slight modifications in pronunciation [8]. These two or end of the word but only within the single word and 153
154 between its letter. The extension is used to arrange the TABLE 2. Harakat repeating frequency in Holy Quran.
155 text and give it an aesthetic shape. It is considered a
156 type of decoration and art in calligraphy. It hides the
157 textual data inside the text cover by adding an extension
158 to the letters that can carry elongation. If the bit to be
159 concealed has the value 1, the letters of the text cover
160 words are extended and if the bit has the value 0, the
161 letters of the text cover words remain unchanged [12].
162 2.1. Another study used the Kashida to hide the secret
163 bits in the text cover, adding two Kashida if a bit
164 with a value of 1 was hidden and one Kashida if a
165 bit with a value of 0 was hidden [13], [14], [15].
166 2.2. The difficulty of the secret text to be hidden being
3. Decoration (Harakat): The Harakat (plural and singu- 210
167 smaller than the text cover was solved by locating
lar Haraka) was developed to denote the phonemic 211
168 6 bits 111111 marking the end of the secret text to
meaning of the spoken word due to the large number 212
169 be hidden, and then adding the kashida randomly
of words and sounds of Arabic and Persian letters 213
170 to the rest of the text to improve security and
and the fact that any single word may have different 214
171 make the text safer [16]. The same algorithm is
meanings based on its pronunciation. They are placed 215
172 then applied as in the previous case, with a single
above or below the letters to clarify their pronunciation. 216
173 kashida being added if the bit has a value of 0 and
There are seven primary Harakats that have been uti- 217
174 two kashida being added if the bit has a value of 1.
lized in Arabic and Persian historical books, including 218
175 2.3. In this study, the same technique as the previous
( ), where the Harakat ( ) is used and repeated 219
176 one was used with an addition of using kashida
more than others according to what is mentioned in the 220
177 related to the dotted letters. To increase security
Holy Qur’an as shown in table (2). So, they are used 221
178 and ensure that the data cached within the cover
to hide the secret information and texts inside the text 222
179 text is not revealed, the hiding method has been
cover. 223
180 divided by the lines in the cover text. So, if the
181 line is odd, such as the first or third, the secret 3.1. The process of concealing the confidential data 224
182 bit with value 1 is hidden by adding the kashida can be done by placing the Harakat in the cover 225
183 after the dotted letter, and if the bit has a value text after converting the secret text message to the 226
184 of 0, no change is made to the dotted letter. The binary code. The hidden bit is taken and tested, 227
185 secret bit with the value 1 is hidden by inserting and if it has the value 1, the Haraka Fatha ( ) 228
186 the kashida before the dotted letter on even lines, is placed, and if it has the value 0, any of the 229
187 such as the second and fourth, and no change is remaining six Harakat ( ) is placed [20]. 230
188 made on the dotted letter if the secret bit has a 3.2. The use of décor is considered one of the dis- 231
189 value of 0 [17]. tinct methods for hiding data. In this study, the 232
190 2.4. To increase hiding capability, the secret text is Fatha ( ) decoration was used in a different way, 233
191 converted into binary codes. Then it is divided where it was used reversely, as shown in the fig- 234
192 into blocks, with four different scenarios for hid- ure below. The secret bits are hidden bit by bit 235
193 ing blocks within the text cover. Each scenario separately. In case the bit carries a value of 1, 236
194 is randomly chosen to hide the blocks. The first the reversed Fatha is used on the letter, but if it 237
195 scenario is done by hiding the bit that represents 1, carries a value of 0, the Fatha does not change and 238
196 adding one kashida after the dotted letter, and not remains the same [21]. Reversed Fatha is shown 239
197 changing the dotted letter that carries the value like this ( ). 240
198 0. Scenario two is represented by adding a single 4. Unicode: To represent the English language on the 241
199 kashida after the non-dotted letter to hide the bit computer, a coding system, ASCII coding, consisting 242
200 with value 1, and no change is made to the non- of 8 bits, was formed to represent each letter of the 243
201 dotted letter that carries a value 0. In the third English language, numbers, symbols, and signs, total- 244
202 scenario, one kashida is added after the letter ing 256 letters and symbols. With the advancement 245
203 (whether it is a dotted or non-dotted letter) when of technology and the use of the rest of the world’s 246
204 the bit to be concealed carries the value 1, and no languages and by entering the binary system, the Uni- 247
205 change is made when the bit to be hidden carries code Complex was established as a non-profit organi- 248
206 the value 0. The fourth scenario is the opposite of zation that coordinates and organizes the development 249
207 the third scenario. When the bit to be hidden has work of the Unicode system, which seeks to replace 250
208 a value of 0, a kashida is added, and when it has and organize all symbols within the systems in a uni- 251
209 a value of 1, no modification is made [18], [19]. fied and global manner. It consists of 16 bits and can 252
TABLE 3. Arabic and persian letters with isolated forms. TABLE 4. The number of sharp edges.
260 4.1. It is possible to hide confidential data by switch- result of adding the digits of the secret code is odd, 296
261 ing the letter to hide the bit that carries the then the number of edges of the non-dotted letters is 297
262 value 1 with the set of letters within the range calculated [26]. 298
263 FE70- FEFF and keeping the letter that carries the 6. Pseudo connection character: The Persian and Arabic 299
264 value of the secret bit 0 within the range 0600- letters differ in that each letter has four different writing 300
265 06FF [23], [24]. styles depending on its place in the word. So, when a 301
266 4.2. To hide information in Persian and Arabic writ- letter appears on its own, it has a different shape than 302
267 ings, it was discovered that the Unicode speci- when it appears at the beginning, middle, or end of a 303
268 fication for the letter ‘‘ ’’ and also for the letter word. If it appears at the beginning, it has a different 304
269 ‘‘ ’’ in the Persian and Arabic languages are two shape than when it appears in the middle, at the end, 305
270 separate symbols. Their symbols have the same or independently, as seen in the table (5): 306
271 shape but distinct codes [25]. Because they have 6.1. The Persian and Arabic languages need Zero- 307
272 two separate codes, the letters ( ) and ( ) at the Width-Joiner (ZWJ), which is used to connect 308
273 beginning and middle of the word have different letters with each other in complex texts. It has 309
274 shapes. One can use this characteristic to hide the symbol of (U+200D). They also need Zero- 310
275 information inside the cover text. If we want to Width-Not-Joiner (ZWNJ), which is used to sepa- 311
276 hide a bit with value 0, then we change the letter rate letters and has the symbol of (U+200C). Both 312
277 ( ) or ( ) to the Persian language with code 06CC of them have no visible effect on the letters listed 313
278 or 0643. If we want to hide a bit with value 1, then between them, and each of them is considered 314
279 we change the letter to the Arabic language with an unprinted letter. This feature can be used to 315
280 code 064A or 0643. hide the confidential data inside the text cover, 316
281 5. Sharp Edges: Persian and Arabic are two languages that converting the text to be hidden into the binary 317
282 share many features due to the graphics of the letters system, and then the bit to be hidden is tested. 318
283 and the multiplicity of forms. Sharp edges are one of If it has a value of 1, Zero-Width-Joiner (ZWJ) is 319
284 their distinguishing characteristics. The letters of the inserted between the specific letters (the current 320
285 two languages are separated into five groups based on letter and the one that follows). But if the bit to be 321
286 the number of sharp edges carried by each letter, as seen hidden carries a value of 0, nothing is changed in 322
287 in the table (4). the text cover, and this process is continued until 323
288 One can use this feature to hide the data and create all the bits are hidden [27], [28]. 324
289 a cover text that contains invisible confidential infor- 6.2. The researchers used zero-width-character (ZWC) 325
290 mation, which helps in the safe transfer of data. The and zero-width-joiner (ZWJ) for independent and 326
291 sharp edges of letters are used to hide the secret bits cursive letters. The independent letters, which 327
292 to be transferred by entering a secret numeric code by comprised seven letters ( ), are special 328
TABLE 6. Two hidden bits in each letter. TABLE 7. Blood groups division of letters.
349 7.1. In this study, by converting the secret text into a researchers took advantage of this feature to hide 399
350 binary code and dividing it into an odd matrix and the confidential data inside the Arabic text cover. 400
351 an even matrix, theIr is paired with the kashida This study will show the method of utilizing and 401
352 to extract a difficult-to-detect concealment path. combining the decoration and changing the sym- 402
353 The odd matrix elements are hidden by using bolic value of the letters. One researcher uses two 403
354 the decoration. The elements of the even matrix types of hiding, the first type by taking the word 404
355 are hidden through the kashida. When the odd that begins with the two letters ( ) followed by a 405
356 matrix elements are hiddI decor Fatha ( ) is added Solar letter. He substitutes the independent letter 406
357 if the bit to be hidden is 1, and it is canceled ( ) with the same one from the other Unicode 407
358 from the letter if the bit to be hidden is 0. As for to hide the value of the secret bit 1. In case of 408
359 the even matrix, one kashida is added when the hiding the secret bit 0, he searches for the word 409
360 bit to be hidden carries the value 1 and remains that begins with the two letters ( ), followed by 410
361 without change when the bit carries the value 0. a Lunar letter, and he replaces the independent 411
362 As shown in tables (12), the data in the odd matrix letter ( ) with the same letter but with another 412
363 is concealed first, followed by the data in the even Unicode symbol. 413
364 matrix [31], [32]. As for the second type, the researcher hides two 414
365 7.2. Some researchers have used pairing styles in con- bits at a time. He hides two bits with a value of 415
366 cealment methods to find the best way to protect 00 by searching for the word that begins with 416
367 hidden cover texts. They combined the Pseudo the two letters ( ) followed by a Lunar letter. 417
368 Connection letter and letter dots to generate a text He begins to change the value of the letter ( ) with 418
369 cover that is well camouflaged. After converting another symbol from the Unicode and adds the 419
370 the secret text to the binary code, the bit to be decoration (Fatha ) to the Lunar letter. In case 420
371 hidden is tested. If it carries the value 1, a pseudo- of hiding the two bits 01, he searches for the 421
372 space will be added after the dotted letter, but if it word that begins with the two letters ( ), followed 422
373 carries the value 0, no pseudo-space will be added by a Lunar letter and replaces the letter ( ) with 423
374 after the dotted letter. If the letter is not dotted another Unicode and adds decoration (any deco- 424
375 and the bit to be hidden carries a value 1, then no ration except the Fatha ( ). In case of hiding the 425
376 pseudo-space is added after the non-dotted letter, two bits 10, he searches for the word that begins 426
TABLE 8. Solar and lunar letters. TABLE 10. Three types of space.
427 with the two letters ( ) followed by a Solar let- A. BAUDOT CODE 471
428 ter and replaces the independent ( ) with another The Baudot Code or International Teleprinter Code was 472
429 from the Unicode and adds any decoration (except invented by Emile Baudot in 1870. The binary code is used by 473
430 Fatha ) to the Solar letter. Finally, in case of slashes and dots. He used this system instead of Morse code. 474
431 hiding the two bits 11, he searches for the word Five bits are encoded in which each bit has two possibilities, 475
432 that begins with the two letters ( ) followed by i.e., 2∧5 = 32 where 32 characters can be used, which is 476
433 a Solar letter. He later replaces the independent equivalent to five bits. The following Table (12) presents the 477
434 letter ( ) with another from the Unicode and adds possibility of using Baudot notation to represent 60 characters 478
435 the decoration (Fatha ) to the Solar letter [35]. by dividing them into two groups. Each set contains 30 sym- 479
436 7.5. The combination of extension and pseudo-space bols. Thus, the compression rate of the text to be hidden is 480
437 is used to hide the data as these two methods are from eight to five bits. The value of the stored secret text is 481
438 the most concealable methods in terms of size. less than the value of the original secret text by approximately 482
439 In this method, the letter is checked to see if it 40% [38], [39]. 483
442 if the letter accepts extension. While in case of Letter frequency is simply the number of times an alphabet 485
443 hiding the bit that carries the value 1, two kashida appears on average in a written language. Letter frequency 486
444 will be added if the letter accepts extension. If the analysis goes back to the Arab mathematician Al-Kindi, 487
445 letter does not accept the extension and the bit to who formally developed a method for fractions and deci- 488
446 be hidden carries the value 0, the pseudo-space mals. Letter frequency analysis gained importance in Europe 489
447 is added once, and if the bit to be hidden carries with the development of movable type, where one must 490
448 the value 1, the pseudo-space is added twice. estimate how much type is required for each letter. Letter 491
449 The spaces between words are also exploited by frequency analysis is a basic method of language identifica- 492
450 adding pseudo-space once in the case of hiding tion used by linguists. It is particularly useful in determining 493
451 the bit with a value of 0 and twice in the case of whether an unknown writing system is alphabetical, syllabic, 494
453 7.6. There are three types of small spaces (Thin Space The use of letter frequencies and frequency analysis plays a 496
454 (TS), Hair Space (HS), and Six-PPEm Space fundamental role in coding and many puzzle games, includ- 497
455 (SS)) as shown in table (9): ing Hangman, Scrabble, and the TV game show Wheel of 498
456 The secret bits are hidden in this study by com- Fortune. One of the earliest descriptions in classical literature 499
457 bining the extension method, which hides three of applying knowledge of English letter frequency to solving 500
458 bits within the space between the words for the a cipher is found in Edgar Allan Poe’s famous story The 501
459 letters that permit extension, and the second way, Gold-Bug, in which the method was successfully applied to 502
460 which hides three bits inside the space between decipher a message directed to the whereabouts of a treasure 503
461 the words by adding one of the three types of hidden by Captain Kidd [9]. 504
462 spaces. When the letter accepts extension, one The repetition of characters in the text has been studied 505
463 kashida is added in the case of hiding a bit with a for use in cryptanalysis and frequency analysis in particu- 506
464 value of 1. If you hide the bit with the value 0, lar, as the method has been formally developed (Breakable 507
465 nothing will be done to the letter that accepts cyphers using this technique date back at least to Julius 508
466 extension. But in the case of space between the Caesar’s Caesar cypher, suggesting this method may have 509
467 words, three bits are inserted and hidden together been explored in classical times). 510
TABLE 11. Shows an example of hiding bits in an Arabic sentence where different spelling than British English, such as ‘‘analyse’’ in 524
the red color presented a bit with value 1 and the blue color presented a
bit with value 0. the United Kingdom and ‘‘analyze’’ in the United States. This 525
will greatly affect the frequency of the letter ‘‘z’’ because 526
total usage. The ‘‘first eight’’ characters make up about 65% 529
of the total usage. Many rank functions can fit letter frequency 530
put gate, and a forget gate [41]. The cell remembers values 540
over arbitrary time intervals, and the three gates regulate the 541
519 Also, keep in mind that the frequency of a letter varies ft = Forget gate activation vector
520 depending on the dialect. For example, a writer in the United it = Input/update gate activation vector
521 States might write something in which the letter ‘‘z’’ is ot = Output gate activation vector
522 more common than a writer in the United Kingdom writing + = Summation
523 on the same topic. Some words in American English have × = Hadamard product 545
TABLE 12. Baudot codes. to take or forget the Ct−1 . Now multiply the old state by 564
(it ) with (C̃t ), but we must first get the results of (it , C̃t ) 565
layer decides which parts of the cell to ignore and which 574
parts to use and output and then uses it to extract the 575
hidden layers within the LSTM cell. We will use 128 hidden 589
ex − e−x layer units, as shown in Fig (3).
547 tanh (x) = (1) 590
ex + e−x
σ = Sigmoid function is shown as Equation (2)
548
1
549 σ (x) = (2)
1 + e−x
b= bias vector parameters which need to be
learned during training
550
561 ft = σ (Wf . [ht−1 , xt ] + bf ) (3) The SoftMax function is a function that converts any real- 592
562 2. The second step is to update the old cell state Ct−1 for convert the values of the vector v to probabilistic values, 594
563 the new cell state Ct . The last step is to decide whether regardless of whether the elements are positive, negative, 595
596 or zero. If the input is small or negative, the function turns it from the first phase one word at a time, allowing each word a 638
597 into small probabilities [42]. And if the entries contain large possibility to be learned from the 100 words that preceded 639
598 values, the function turns it into a high probability. No matter it. The Keras LSTM model is used to make predictions is 640
599 how large or small, the probabilistic vector must be between to first start off with a seed sequence of words as a new 641
600 zero and one, as shown as Equation (9). input, generate the next word after that update the seed words 642
sequence to add the generated word on the end and trim off 643
eZi
601 Z̃i = (9) the first word. This process is repeated for as long as we want 644
k
P
eZj to generate new words, for example a sequence of 1000 words 645
j=1 in length. The LSTM algorithm has 2 layers and 128 nodes 646
per layer. Also, a 128-nodes dense layer is used. After that, 647
Z̃ = The output vector of the softmax the results are purified by the SoftMax function with a Batch 648
Zi = All the zi values are the elements of the input size equal to 16 and 20 epochs. The seed words are entered 649
vector to the softmax function and they can to generate the new words, as shown in Figure (5). 650
602
take any real value positive, zero or negative.
620 the text file is prepared, where the process of removing the the system, in which a cover text carrying hidden data is 652
621 difference between the lines begins, and then it is converted generated. Two groups of letters carrying the hidden bits are 653
622 into a list to delete all the special characters (!"#$%&’()∗+, - defined, including the first set (e,r,o,n,l,u) representing a bit 654
623 ./: <=>?@[\]∧ _‘{|}∼). The text is divided into Tokens, and with a value 1 and the second set (a,i,t,s,c,d) representing a bit 655
624 then it is identified in the form of sequences consisting of 32 with a value 0. After entering the secret text to be hidden, it 656
625 words, as in figure (4). is compressed and encoded by a 5-bit Baudot code to reduce 657
626 In the second phase, a vocabulary dictionary is created, its size by 45%. Then, the generated word is tested, where 658
627 containing the single words in the text without repetition. letter after letter of the generated word is tested. The letter is 659
628 Unique words are converted into numbers to pass them inside utilized inside the set of letters representing the bit 1, and the 660
629 the neural network because it deals only with numbers. Create bit to be hidden has a value 1. If the letter to be tested is in the 661
630 a dictionary of integers to unique words and a dictionary set of letters that represent 0, and the bit to be hidden has a 662
631 of words to integers. The sequences extracted from the first value of 0, the letter is also used. But if the letter is not among 663
632 phase are converted to a digital array, The sequences extracted the two groups and is considered to have a neutral value, it is 664
633 from the first phase, will be split into subsequences with a used without comparison with the bit to be hidden. But in the 665
634 stable length of 100 words, each training pattern of the RNN case of asymmetry between the bit to be hidden and the letter 666
635 is consisting of 100 time steps of one word (X) followed by tested, the word is deleted, and a new word is regenerated. 667
636 one word output (y). When these sequences were created, This process is done until all secret text is hidden inside the 668
637 we slide this window along the whole sequences extracted cover text [45], as shown in figure (6). 669
FIGURE 5. The second phase. Poetry was an Arab means of communication in the pre- 673
Islamic era, and the tribe used to celebrate when one of 674
their sons was a talented poet. Poetry was used in the past 675
among Arabs to raise the status of a tribe and degrade another. 676
In the early days of Islam, poetry was one of the means of 677
Quraysh. During the Umayyads and the era of the Abbasids, 679
Poetry was also a means for the conflicting political and 680
factors. 691
All these features made Arabic poetry more widely circu- 692
lated among people. So, it is used to hide the secret text within 693
Arabic poetry, where the Arabic letters were divided into two 694
group has 9 letters with equal frequencies. The first group 696
carries a value 0 bit, and the second group represents the value 697
When generating words, they are tested one by one. If the 701
word has less than four letters, it is added to the cover 702
text without comparing the bits of the secret text. But when 703
FIGURE 6. The third phase. the generated word contains more than three letters, the 704
text, and if they do not match, they are excluded, and 707
670 Arabic Encoding Algorithm (AEA): What distinguishes a new word is re-generated. This process continues until 708
671 Arabic poetry is that it adheres to meter and rhyme, in all all the secret bits to be hidden are canceled as shown in 709
672 its styles and through its different generations. algorithm (1). 710
Algorithm 1 Arabic Encoding Algorithm TABLE 14. Hiding the secret bits in the cover text.
Input text file as (TX)
Input secret message as (SM)
Extract rhymes letters from text as (RH)
Compress (SM) using BAUDOT Code as (BL)
Split (TX) to words (Tokens) as (TO)
Create word Vocabulary
Create word-to-integer Dictionary
Create integer-to-word Dictionary
Convert (TO) to integer depend on repeated
Split (TX) to batches
Create matrix [words length : words length]
depend on words neighbor
Update (TX) by shift word after word one position
Preparing data to training
Define LSTM nodes =128
Extract the weight using Embedding method
Extract probabilities in range (0-1) using SoftMax
Find the Loss based on Cross-Entropy method
Define an optimizer to starting training
Input seed words
Input the group-1 as G1 and group-0 as G0
Generate words W in range BL
If the new_word_line >3 characters
For the character in word
if bit in BL=1 and character[word] in G1
OR bit in BL=0 and character[word] in G0
BL = BL – 1
New_word= new_word + character[word]
else
if bit in BL=1 and character[word] in G0
OR bit in BL=0 and character[word] in G1
return the old BL
delete the current word
Generate new word W Word =3 721
else
delete the current word W
144 − 90
Generate new word W Compression Percentage = ∗ 100 = 37.5 % 726
End 144
720 Secret message = Al Razi University • Generating Arabic poetry Steganography 737
738 deep learning theories are used to generate sentences and 748
749 texts that carry confidential information, using the theory of [17] A. A.-A. Gutub and A. A. Al-Nazer, ‘‘High-capacity steganography tool 817
750 LSTM. Words are generated where each word enables more for Arabic text using Kashida,’’ Int. J. Inf. Secur., vol. 2, no. 2, pp. 107–118, 818
Jul. 2010. 819
751 than one secret bit. Unlike previous theories, the process of [18] A. Odeh, K. Elleithy, and M. Faezipour, ‘‘Steganography in Arabic 820
752 hiding is based on letters rather than words, which enhances text using Kashida variation algorithm (KVA),’’ in Proc. IEEE Long 821
753 the capacity of the cover text to carry more hidden bits and Island Syst., Appl. Technol. Conf. (LISAT), May 2013, pp. 1–6, doi: 822
10.1109/LISAT.2013.6578239. 823
754 generate numerous texts at once to choose the finest texts [19] H. M. Ahmed, ‘‘Arabic language script steganography based on dynamic 824
755 appropriate for the occasion to be transmitted. As a result, random linear regression,’’ Mustansiriyah J. Sci. Eduction, vol. 17, no. 1, 825
756 the desired goal in text steganography is achieved. Future pp. 397–414, 2016. 826
[20] A. A. Gutub, L. M. Ghouti, Y. S. Elarian, S. M. Awaideh, and A. K. Alvi, 827
757 research will be conducted in order to find new ways to ‘‘Utilizing diacritic marks for Arabic text steganography,’’ Kuwait J. Sci. 828
758 increase the embedding rate while increasing its security, Eng., vol. 37, no. 1, pp. 89–109, Jun. 2010. 829
759 as well as researching smart algorithms that help generate [21] M. S. Memon and D. A. Shah, ‘‘A novel text steganography technique 830
760 words more efficiently than the algorithms in place to sim- to Arabic language using reverse Fat5Th5Ta,’’ Pakistan J. Eng., Tech- 831
nol. Sci., vol. 1, no. 2, pp. 106–113, Sep. 2015, doi: 10.22555/pjets. 832
761 ulate human-generated words. v1i2.167. 833
[22] N. Alanazi, E. Khan, and A. Gutub, ‘‘Inclusion of unicode standard seam- 834
762 REFERENCES less characters to expand Arabic text steganography for secure individual 835
uses,’’ J. King Saud Univ. Comput. Inf. Sci., vol. 34, no. 4, pp. 1343–1356, 836
763 [1] P. Dobriyal, J. Yadav, and J. Jain, ‘‘A review on text based steganography,’’ Apr. 2022, doi: 10.1016/j.jksuci.2020.04.011. 837
764 Int. J. Res. Publication’s, vol. 4, no. 3, pp. 44–50, Jan. 2015. [23] A. S. Sabir, ‘‘A new Arabic text diacritics, non diacritics steganography,’’ 838
765 [2] S. Mersal, S. Alhazmi, R. Alamoudi, and N. Almuzaini, ‘‘Arabic text Basrah J. Sci., vol. 31, no. 3, pp. 85–96, 2013. 839
766 steganography in smartphone,’’ Int. J. Comput. Inf. Technol., vol. 3, no. 2, [24] N. Alanazi, E. Khan, and A. Gutub, ‘‘Efficient security and capacity tech- 840
767 pp. 764–2279, Mar. 2014. niques for Arabic text steganography via engaging unicode standard encod- 841
768 [3] M. Hanaa Ahmed and M. A. A. khodher, ‘‘Arabic language document ing,’’ Multimedia Tools Appl., vol. 80, no. 1, pp. 1403–1431, Jan. 2021, doi: 842
769 steganography based on Huffman code using DRLR as RNG,’’ Al-Mansour 10.1007/s11042-020-09667-y. 843
770 J., vol. 2016, p. 57, 2016. [Online]. Available: https://www.iasj.net/
[25] M. H. Shirali-Shahreza and M. Shirali-Shahreza, ‘‘Arabic/Persian text 844
771 iasj/download/2eeb7e70324e480c, doi: 10.36541/0231-000-026-007.
steganography utilizing similar letters with different codes,’’ Arabic J. Sci. 845
772 [4] M. Y. Valandar, P. Ayubi, M. J. Barani, and B. Y. Irani, ‘‘A chaotic
Eng., vol. 35, no. 1, pp. 213–222, Apr. 2010. 846
773 video steganography technique for carrying different types of secret mes-
[26] N. A. Roslan, R. Mahmod, and N. I. Udzir, ‘‘Sharp-edges method in Arabic 847
774 sages,’’ J. Inf. Secur. Appl., vol. 66, May 2022, Art. no. 103160, doi:
text steganography,’’ J. Theor. Appl. Inf. Technol., vol. 33, no. 1, pp. 32–41, 848
775 10.1016/j.jisa.2022.103160.
15, Nov. 2011. 849
776 [5] E. Farri and P. Ayubi, ‘‘A robust digital video watermarking based on
[27] M. Shirali-Sh and S. Shirali-Sh, ‘‘High capacity Persian/Arabic text 850
777 CT-SVD domain and chaotic DNA sequences for copyright protection,’’
steganography,’’ J. Appl. Sci., vol. 8, no. 22, pp. 4173–4179, Nov. 2008, 851
778 J. Ambient Intell. Hum. Comput., vol. 2022, pp. 1–25, Feb. 2022, doi:
doi: 10.3923/jas.2008.4173.4179. 852
779 10.1007/s12652-022-03771-7.
780 [6] M. S. Kadhem and D. Wameedh, ‘‘Proposed Arabic text steganography [28] A. F. Al Azzawi, ‘‘A multi-layer Arabic text steganographic method based 853
781 method based on new coding technique,’’ J. Eng. Res. Appl., vol. 6, no. 9, on letter shaping,’’ Int. J. Netw. Secur. Appl., vol. 11, no. 1, pp. 27–40, 854
782 pp. 38–46, Sep. 2016. Jan. 2019, doi: 10.5121/ijnsa.2019.11103. 855
783 [7] R. A. Khekan, H. M. W. Majeed, and F. O. A. Adeeb, ‘‘New text [29] A. Ditta, C. Yongquan, M. Azeem, K. G. Rana, H. Yu, and M. Q. Memon, 856
784 steganography method using the Arabic letters dots,’’ Indonesian J. Electr. ‘‘Information hiding: Arabic text steganography by using Unicode 857
785 Eng. Comput. Sci., vol. 4752, vol. 21, no. 3, pp. 1784–1793, 2021, doi: characters to hide secret data,’’ Int. J. Electron. Secur. Digit. Foren- 858
786 10.11591/ijeecs.v21.i3.pp1784-1793. sics, vol. 10, no. 1, pp. 61–78, 2018, doi: 10.1504/IJESDF.2018. 859
788 approach for steganography in Arabic text basedon, DNA coding and [30] R. Din, R. A. Thabit, N. I. Udzir, and S. Utama, ‘‘Traid-bit embedding 861
789 Arabic diacritics,’’ Int. J. Adv. Res., vol. 2, no. 12, pp. 954–965, 2014. process on Arabic text steganography method,’’ Bull. Electr. Eng. Infor- 862
790 [9] S. F. Lu, O. Farooq, and H. Ali, ‘‘New steganography method using mat., vol. 10, no. 1, pp. 493–500, Feb. 2021, doi: 10.11591/eei.v10i1. 863
791 litter manipulations frequency,’’ in Proc. 2nd Int. Conf. Inf. Technol. Ind. 2518. 864
792 Automat., 2017, pp. 1–6. [31] E. M. Ahmadoh and A. A.-A. Gutub, ‘‘Utilization of two diacritics for 865
793 [10] A. Odeh, A. Alzubi, Q. B. Hani, and K. Elleithy, ‘‘Steganography by Arabic text steganography to enhance performance,’’ Lect. Notes Inf. The- 866
794 multipoint Arabic letters,’’ in Proc. IEEE Long Island Syst., Appl. Technol. ory, vol. 3, no. 1, pp. 1–6, 2015, doi: 10.18178/lnit.3.1.42-47. 867
795 Conf. (LISAT), May 2012, pp. 1–7, doi: 10.1109/LISAT.2012.6223209. [32] H. M. S. Alshahrani and G. Weir, ‘‘Hybrid Arabic text steganography,’’ Int. 868
796 [11] M. H. Shirali-Shahreza and M. Shirali-Shahreza, ‘‘A new approach to J. Comput. Inf. Technol., vol. 6, no. 6, pp. 329–338, 2017. 869
797 Persian/Arabic text steganography,’’ in Proc. 5th IEEE/ACIS Int. Conf. [33] R. A. Alotaibi and L. A. Elrefaei, ‘‘Utilizing word space with pointed and 870
798 Comput. Inf. Sci. 1st IEEE/ACIS Int. Workshop Component-Based un-pointed letters for Arabic text watermarking,’’ in Proc. UKSim-AMSS 871
799 Softw. Engineering,Software Archit. Reuse (ICIS-COMSAR), Jul. 2006, 18th Int. Conf. Comput. Model. Simul. (UKSim), Apr. 2016, pp. 111–116, 872
801 [12] R. Thabit, N. I. Udzir, S. Yasin, A. Asmawi, and N. A. Roslan, ‘‘A compar- [34] S. Malalla and F. R. Shareef, ‘‘A novel approach for Arabic text steganog- 874
802 ative analysis of Arabic text steganography,’’ Appl. Sci., vol. 11, p. 6851, raphy based on the ‘BloodGroup’ text hiding method,’’ Eng., Tech- 875
803 Jul. 2021, doi: 10.3390/app11156851. nol. Appl. Sci. Res., vol. 7, no. 2, pp. 1482–1485, Apr. 2017, doi: 876
804 [13] A. A. Gutub and M. M. Fattani, ‘‘A novel Arabic text steganography 10.48084/etasr.1090. 877
805 method using letter points and extensions,’’ Int. J. Comput. Inf. Eng., vol. 1, [35] H. K. Tayyeh, M. S. Mahdi, and A. S. A. AL-Jumaili, ‘‘Novel steganog- 878
806 no. 3, pp. 28–31, 2007. raphy scheme using Arabic text features in Holy Quran,’’ Int. Journ- 879
807 [14] A. F. Al-Azawi and M. A. Fadhil, ‘‘Arabic text steganography using nal Electr. Comput. Eng., vol. 9, no. 3, pp. 1910–1918, 2019, doi: 880
808 Kashida extensions with Huffman code,’’ J. Appl. Sci., vol. 10, no. 5, 10.11591/ijece.v9i3.pp1910-1918. 881
809 pp. 436–439, Feb. 2010, doi: 10.3923/jas.2010.436.439. [36] S. M. A. Al-Nofaie and A. A.-A. Gutub, ‘‘Utilizing pseudo-spaces to 882
810 [15] A. A. A. Gutub and W. Al-Alwani, ‘‘Improved method of Arabic text improve Arabic text steganography for multimedia data communications,’’ 883
811 steganography using the extension ‘Kashida’ character,’’ Bahria Univ. Multimedia Tools Appl., vol. 79, nos. 1–2, pp. 19–67, Jan. 2020, doi: 884
812 J. Inf. Commun. Technol., vol. 3, no. 1, pp. 68–72, 2010. 10.1007/s11042-019-08025-x. 885
813 [16] F. Al-Haidari, A. Gutub, K. Al-Kahsah, and J. Hamodi, ‘‘Improving secu- [37] A. Taha, A. S. Hammad, and M. M. Selim, ‘‘A high capacity algorithm 886
814 rity and capacity for Arabic text steganography using ’Kashida’ exten- for information hiding in Arabic text,’’ J. King Saud Univ. Comput. Inf. 887
815 sions,’’ in Proc. IEEE/ACS Int. Conf. Comput. Syst. Appl. (AICCSA), Sci., vol. 32, no. 6, pp. 658–665, Jul. 2020, doi: 10.1016/j.jksuci.2018. 888
816 May 2009, pp. 396–399, doi: 10.1109/AICCSA.2009.5069355. 07.007. 889
890 [38] A. Mahmood, T. Latif, and K. M. A. Hasan, ‘‘An efficient 6 bit OMER FAROOQ AHMED ADEEB was born 918
891 encoding scheme for printable characters by table look up,’’ in in Iraq, Baghdad, in 1973. He received the 919
892 Proc. Int. Conf. Electr., Comput. Commun. Eng. (ECCE), Feb. 2017, B.S. degree in computer science from Al- 920
893 pp. 468–472. Mustansiriyah University, Baghdad, in 1999, and 921
894 [39] M. Malhotra, D. Scientf Aalysi Group, N. G. Gupta, and R. S. Prasad, the M.Sc. degree in computer engineering from the 922
895 ‘‘Software-based solution for analysis and decoding of FSK-2 modu- Huazhong University of Science and Technology 923
896 lated, baudot-coded signals,’’ Defence Sci. J., vol. 56, no. 2, pp. 259–268, (HUST), Wuhan, China, in 2017. He is currently 924
897 Apr. 2006.
pursuing the Ph.D. degree with Razi University, 925
898 [40] T. Fang, M. Jaggi, and K. Argyraki, ‘‘Generating steganographic text
Kermanshah, Iran. His research interest includes 926
899 with LSTMs,’’ in Proc. 55th Annu. Meeting Assoc. Comput. Linguistics
900 Student Res. Workshop, Vancouver, BC, Canada, Jul. 2017, pp. 100–106, computer security. 927
911 pp. 381–399, 1995. digital signal processing, sound/audio/music sig- 936
912 [44] I. Z. Botev, P. D. Kroese, Y. R. Rubinstein, and P. L’Ecuyer, ‘‘The cross- nal processing, speech processing, pattern recog- 937
913 entropy method for optimization,’’ in Handbook of Statistics, vol. 31. nition, machine learning, data mining, neural networks, deep learning, 938
914 Amsterdam, The Netherlands: Elsevier, 2013. global optimization, meta-heuristic algorithms, evolutionary computation, 939
915 [45] E. A. Khan, ‘‘Using Arabic poetry system for steganography,’’ Asian swarm intelligence, text/natural language processing, biometrics, biomedical 940
916 J. Comput. Sci. Inf. Technol., vol. 4, no. 6, pp. 55–61, 2014, doi: data/signal processing, and social networks. 941
917 10.15520/ajcsit.v. 942