Arabic Text Steganography Based On Deep Learning Methods

Received 30 May 2022, accepted 2 July 2022, date of publication 23 August 2022, date of current version 14 September 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3201019
Arabic Text Steganography Based on Deep

Learning Methods
OMER FAROOQ AHMED ADEEB AND SEYED JAHANSHAH KABUDIAN
Department of Electrical and Computer Engineering, Razi University, Kermanshah 6714414971, Iran
Corresponding author: Seyed Jahanshah Kabudian (Kabudian@razi.ac.ir)
1 ABSTRACT Steganography is one of the oldest methods for securely sending and transferring secret
2 information between two people without raising suspicion. Recently, the use of Artificial Intelligence (AI)
3 has become simpler and more widely used. Since the emergence of natural language processing (NLP),
4 building language models using deep learning has become more. Furthermore, because of the importance
5 of concealing secret information in delivered messages, Artificial Intelligence theories along with Natural
6 Language Processing algorithms were employed to conceal secret information within the text cover. The
7 Arabic language was used because of its large number of words, vocabulary, and linguistic meanings, and
8 its most significant feature is Arabic poetry. This study discovered a new way to hide secret data inside
9 newly formulated Arabic poetry based on previous Arabic poetic texts and a database of a number of Arab
10 poets from the ancient and modern eras using Artificial Intelligence and Long Short-Term Memory (LSTM)
11 theories to increase storage capacity by 45 percent. The linguistic accuracy and volume of secret data hidden
12 within the formulated poetry were increased using a Baudot Code algorithm, where the secret data is hidden
13 at the level of letters rather than words, and the linguistic accuracy and volume of secret data hidden within
14 the formulated poetry were increased to eliminate the drawbacks found in previous studies.
15 INDEX TERMS Text steganography, hiding information, letter frequency, LSTM, Baudot code.
16 I. INTRODUCTION the basis of the type of cover that carries the message into 33
17 As electronic communication methods become more preva- four categories, namely the Image cover, Audio cover, Video 34
18 lent in the modern era, nearly most human activities like cover, and Text cover [3]. Because every second comprises 35
19 the transmission of confidential and financial data and infor- 24 frames as a moderate limit, a video cover is good. It is 36
20 mation depend completely on electronic communication used to hide high confidential data, large size, and excellent 37
21 methods [1]. Because data transmission in cyberspace has efficiency [4], and some use the video cover as watermarks to 38
22 numerous drawbacks and concerns, and intruders and infor- protect the copyrights of digital videos [5]. Text cover is con- 39
23 mation trolls significantly impact computer network users’ sidered one of the oldest methods used to conceal confidential 40
24 fears, data and information cannot be transmitted smoothly texts and information. The first recorded use of this term was 41
25 due to the fear of being exposed to and profiting from it by in 1499 by Johannes Trethimius in his book Steganographia, 42
26 a third party. Therefore, the security of data and information which is a treatise on encryption and concealment disguised 43
27 transmission has become one of the most important sciences as a book on magic [6]. Text cover is one of the most difficult 44
28 and topics of interest to all Internet users [2]. ways to hide information compared to the others (image, 45
29 Steganography is one of the oldest methods for concealing audio, and video), due to the lack of a large space and storage 46
30 information and sending it without danger of its secret content space capacity like the other methods. 47
31 being discovered since it is contained inside a cover that The writing method and alphabetic letters utilized in this 48
32 serves as the carrier of the secret text. It is classified on study were based on the most extensively used types of 49
letters in the world, the Arabic alphabet graphics. There are 50
The associate editor coordinating the review of this manuscript and over 31 living languages that use the same Arabic alphabet 51
approving it for publication was Jerry Chun-Wei Lin . graphics, such as Persian, Urdu, Ottoman Turkish, and so 52
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 94403
O. F. A. Adeeb, S. J. Kabudian: Arabic Text Steganography Based on Deep Learning Methods
53 on. We will construct words using artificial intelligence tech- TABLE 1. Dotted letters.
54 nologies and then hide secret messages within the generated
55 text cover. We contributed to the security of transmission
56 and reception of secret information with high capacity and
57 security and height speed to eliminate the possibility of dis-
58 covery by finding similarities with previously written texts.
59 In the end, the main objective of steganography remains to
60 avoid raising any suspicion when the process of transmitting
61 a message carrying confidential information between two
languages are used because they contain the character- 107
62 parties, regardless of the cover that carries the data (whether
istics of multiple characters, which help in the process 108
63 that is text, audio or image). This paper is organized as
of hiding important information within the texts cover, 109
64 follows. The first section is a general section on how to hide
and among these characteristics are the dotted letters 110
65 information and hide a text that used Arabic language or
that characterize these two languages. While English 111
66 other languages that are similar in terms of letter forms. The
language has only 2 dotted letters i and j [9]. 112
67 second section provides an overview of some of the previous
68 works. In the third section, the algorithms and ideas required 1.1. Some studies use the text as a cover to conceal a 113
69 for the proposed method are explained in detail with the secret text based on the dots of the letters, where 114
70 results. Finally, the fourth section concludes the paper with the dots of the letters are displaced or moved by 115
71 the conclusions reached. producing a new font with letters that are removed 116
with a small amount at a ratio of 1/300 of an inch 117
72 II. LITERATURE REVIEW

up or down. For letters with more than one dot, the 118
73 Persian and Arabic are considered the most ancient lan- displacement of all dots occurs simultaneously, 119
74 guages. They have the same characteristics, but the letters of and the secret text is hidden by converting it to the 120
75 the Persian language exceed Arabic by four letters, including binary system and then moving the dots bit by bit 121
to hide it. If the hidden bit has the value 1, the dots 122
76 ( ). The Arabic and Persian languages are similar in
of the chosen letter are moved, but if the hidden 123
77 writing [7], i.e., Farsi and Arabic are written from right to
bit has the value 0, the letter remains unchanged. 124
78 left in their written form, but the pronunciation is different in
1.2. This study is in line with the previous research 125
79 some letters and words [3].
on using semicolons to hide the secret text inside 126
80 Both languages are characterized by some features not
the cover text by hiding 2 bits in each hiding 127
81 found in other languages, such as the number of dotted let-
process by moving the letter dots horizontally and 128
82 ters. The number of dotted letters in the Persian language is
vertically. No change is made to the letters if the 129
83 18 out of 32, while in Arabic is 15 out of 28, as in table (1).
bits are in the sequence 00. In the case of the 130
84 Likewise, both languages do not contain vowels; but instead
sequence of bits 01, a slight horizontal space will 131
85 they contain decorative features that help in pronouncing the
be added between the dots of the same letter. Also, 132
86 letters, which are 8 forms ( ). In addition, each letter
if the sequence of bits is 10, the letter dots will 133
87 contains four forms of witting. For example, when written
move vertically with a slight space. In the event 134
88 separately, one of the letters is written as ( ), at the beginning
of the bit sequence 11, the dots will be moved 135
89 of the word, it is written ( ), at the middle ( ), and at the end
in both directions (horizontal and vertical). The 136
90 of the word ( ), where each of them has 16 bits [6].
amount of this movement is about 1/300, and it 137
91 Persian and Arabic text steganography is divided into seven
is so small that it is nearly invisible to the naked 138
92 known methods:
eye. Still, it can be determined using a specific 139
93 1- Dotted Letters program designed to extract hidden text from the 140
94 2- Extended Letters text cover [10], [11]. 141
95 3- Decorate 2. Extended letters (Kashida): Persian and Arabic lan- 142

96 4- Unicode guages differ from English in that their letters are 143
97 5- Sharp Edges connected rather than separated in printing, with one 144
98 6- Pseudo Connection Character letter joining the next, with the exception of seven 145
99 7- Hybrid Algorithms letters that cannot be joined to the next, which are 146
100 1. For a person who does not know either of the two ( ).The extension can be found in the Persian 147
101 languages, the Persian and Arabic script will be con- and Arabic languages as a kind of embellishment and 148
102 sidered the same. This is true to a large extent, since adjustment to equalize the length of all the lines of 149
103 the modern Persian language is used in many coun- the text where the extension is made between any two 150
104 tries, such as Afghanistan, and the writing is in the letters except for the end of the previous seven letters. 151
105 Persian-Arabic script, which is the Arabic text, but with Thus, the extension cannot be made at the beginning 152
106 slight modifications in pronunciation [8]. These two or end of the word but only within the single word and 153
94404 VOLUME 10, 2022

154 between its letter. The extension is used to arrange the TABLE 2. Harakat repeating frequency in Holy Quran.
155 text and give it an aesthetic shape. It is considered a
156 type of decoration and art in calligraphy. It hides the
157 textual data inside the text cover by adding an extension
158 to the letters that can carry elongation. If the bit to be
159 concealed has the value 1, the letters of the text cover
160 words are extended and if the bit has the value 0, the
161 letters of the text cover words remain unchanged [12].
162 2.1. Another study used the Kashida to hide the secret
163 bits in the text cover, adding two Kashida if a bit
164 with a value of 1 was hidden and one Kashida if a
165 bit with a value of 0 was hidden [13], [14], [15].
166 2.2. The difficulty of the secret text to be hidden being
3. Decoration (Harakat): The Harakat (plural and singu- 210
167 smaller than the text cover was solved by locating
lar Haraka) was developed to denote the phonemic 211
168 6 bits 111111 marking the end of the secret text to
meaning of the spoken word due to the large number 212
169 be hidden, and then adding the kashida randomly
of words and sounds of Arabic and Persian letters 213
170 to the rest of the text to improve security and
and the fact that any single word may have different 214
171 make the text safer [16]. The same algorithm is
meanings based on its pronunciation. They are placed 215
172 then applied as in the previous case, with a single
above or below the letters to clarify their pronunciation. 216
173 kashida being added if the bit has a value of 0 and
There are seven primary Harakats that have been uti- 217
174 two kashida being added if the bit has a value of 1.
lized in Arabic and Persian historical books, including 218
175 2.3. In this study, the same technique as the previous
( ), where the Harakat ( ) is used and repeated 219
176 one was used with an addition of using kashida
more than others according to what is mentioned in the 220
177 related to the dotted letters. To increase security
Holy Qur’an as shown in table (2). So, they are used 221
178 and ensure that the data cached within the cover
to hide the secret information and texts inside the text 222
179 text is not revealed, the hiding method has been
cover. 223
180 divided by the lines in the cover text. So, if the
181 line is odd, such as the first or third, the secret 3.1. The process of concealing the confidential data 224
182 bit with value 1 is hidden by adding the kashida can be done by placing the Harakat in the cover 225
183 after the dotted letter, and if the bit has a value text after converting the secret text message to the 226
184 of 0, no change is made to the dotted letter. The binary code. The hidden bit is taken and tested, 227
185 secret bit with the value 1 is hidden by inserting and if it has the value 1, the Haraka Fatha ( ) 228
186 the kashida before the dotted letter on even lines, is placed, and if it has the value 0, any of the 229
187 such as the second and fourth, and no change is remaining six Harakat ( ) is placed [20]. 230
188 made on the dotted letter if the secret bit has a 3.2. The use of décor is considered one of the dis- 231
189 value of 0 [17]. tinct methods for hiding data. In this study, the 232
190 2.4. To increase hiding capability, the secret text is Fatha ( ) decoration was used in a different way, 233
191 converted into binary codes. Then it is divided where it was used reversely, as shown in the fig- 234
192 into blocks, with four different scenarios for hid- ure below. The secret bits are hidden bit by bit 235
193 ing blocks within the text cover. Each scenario separately. In case the bit carries a value of 1, 236
194 is randomly chosen to hide the blocks. The first the reversed Fatha is used on the letter, but if it 237
195 scenario is done by hiding the bit that represents 1, carries a value of 0, the Fatha does not change and 238
196 adding one kashida after the dotted letter, and not remains the same [21]. Reversed Fatha is shown 239
197 changing the dotted letter that carries the value like this ( ). 240
198 0. Scenario two is represented by adding a single 4. Unicode: To represent the English language on the 241
199 kashida after the non-dotted letter to hide the bit computer, a coding system, ASCII coding, consisting 242
200 with value 1, and no change is made to the non- of 8 bits, was formed to represent each letter of the 243
201 dotted letter that carries a value 0. In the third English language, numbers, symbols, and signs, total- 244
202 scenario, one kashida is added after the letter ing 256 letters and symbols. With the advancement 245
203 (whether it is a dotted or non-dotted letter) when of technology and the use of the rest of the world’s 246
204 the bit to be concealed carries the value 1, and no languages and by entering the binary system, the Uni- 247
205 change is made when the bit to be hidden carries code Complex was established as a non-profit organi- 248
206 the value 0. The fourth scenario is the opposite of zation that coordinates and organizes the development 249
207 the third scenario. When the bit to be hidden has work of the Unicode system, which seeks to replace 250
208 a value of 0, a kashida is added, and when it has and organize all symbols within the systems in a uni- 251
209 a value of 1, no modification is made [18], [19]. fied and global manner. It consists of 16 bits and can 252
VOLUME 10, 2022 94405

TABLE 3. Arabic and persian letters with isolated forms. TABLE 4. The number of sharp edges.
TABLE 5. The letter position shape.
253 represent more than 100,000 letters and symbols from

254 all the world’s languages. An Arabic Unicode table in
255 the range of 0600-06FF represents standard forms for
256 all letters used in the Arabic language, and another
257 Unicode table in the range of FE70- FEFF has all
the sender, where he collects the value of digits of the 293
258 Arabic letters with isolated forms [22], as presented in
number. If the result is even, then the number of edges 294
259 table (3).
of the letters is calculated for dotted letters. Still, if the 295
260 4.1. It is possible to hide confidential data by switch- result of adding the digits of the secret code is odd, 296
261 ing the letter to hide the bit that carries the then the number of edges of the non-dotted letters is 297
262 value 1 with the set of letters within the range calculated [26]. 298
263 FE70- FEFF and keeping the letter that carries the 6. Pseudo connection character: The Persian and Arabic 299
264 value of the secret bit 0 within the range 0600- letters differ in that each letter has four different writing 300
265 06FF [23], [24]. styles depending on its place in the word. So, when a 301
266 4.2. To hide information in Persian and Arabic writ- letter appears on its own, it has a different shape than 302
267 ings, it was discovered that the Unicode speci- when it appears at the beginning, middle, or end of a 303
268 fication for the letter ‘‘ ’’ and also for the letter word. If it appears at the beginning, it has a different 304
269 ‘‘ ’’ in the Persian and Arabic languages are two shape than when it appears in the middle, at the end, 305
270 separate symbols. Their symbols have the same or independently, as seen in the table (5): 306
271 shape but distinct codes [25]. Because they have 6.1. The Persian and Arabic languages need Zero- 307
272 two separate codes, the letters ( ) and ( ) at the Width-Joiner (ZWJ), which is used to connect 308
273 beginning and middle of the word have different letters with each other in complex texts. It has 309
274 shapes. One can use this characteristic to hide the symbol of (U+200D). They also need Zero- 310
275 information inside the cover text. If we want to Width-Not-Joiner (ZWNJ), which is used to sepa- 311
276 hide a bit with value 0, then we change the letter rate letters and has the symbol of (U+200C). Both 312
277 ( ) or ( ) to the Persian language with code 06CC of them have no visible effect on the letters listed 313
278 or 0643. If we want to hide a bit with value 1, then between them, and each of them is considered 314
279 we change the letter to the Arabic language with an unprinted letter. This feature can be used to 315
280 code 064A or 0643. hide the confidential data inside the text cover, 316
281 5. Sharp Edges: Persian and Arabic are two languages that converting the text to be hidden into the binary 317
282 share many features due to the graphics of the letters system, and then the bit to be hidden is tested. 318
283 and the multiplicity of forms. Sharp edges are one of If it has a value of 1, Zero-Width-Joiner (ZWJ) is 319
284 their distinguishing characteristics. The letters of the inserted between the specific letters (the current 320
285 two languages are separated into five groups based on letter and the one that follows). But if the bit to be 321
286 the number of sharp edges carried by each letter, as seen hidden carries a value of 0, nothing is changed in 322
287 in the table (4). the text cover, and this process is continued until 323
288 One can use this feature to hide the data and create all the bits are hidden [27], [28]. 324
289 a cover text that contains invisible confidential infor- 6.2. The researchers used zero-width-character (ZWC) 325
290 mation, which helps in the safe transfer of data. The and zero-width-joiner (ZWJ) for independent and 326
291 sharp edges of letters are used to hide the secret bits cursive letters. The independent letters, which 327
292 to be transferred by entering a secret numeric code by comprised seven letters ( ), are special 328
94406 VOLUME 10, 2022

TABLE 6. Two hidden bits in each letter. TABLE 7. Blood groups division of letters.
but if it carries a value of 0, a pseudo-space will 377
be added after the non-dotted letter [33]. 378

329 letters that, when placed at the beginning or end
7.3. The blood group technique, in which letters are 379
330 of a word, divide it into many sections, such as
divided into three groups based on blood group 380
331 ( ) which means (Orbits). We can see that
classification (A, B, and AB). Group A contains 381
332 this word is divided into parts ( ), where
dotted letters, group B contains non-dotted letters, 382
333 two bits are hidden in each letter for independent
while group AB contains the independent letters, 383
334 and non-independent letters, as shown in table (6).
as shown in table (7). If the bit to be hidden 384
335 The text to be hidden is converted to binary. The
has a value of 1 and the current letter is from 385
336 secret bits are divided into pairs, and the letter
the group A preceded by a letter from the group 386
337 Is tested to see if it is one of the seven different
A, one kashida is inserted between them after 387
338 letters or one of the combinable regular letters.
converting the secret text into the binary system. 388
339 Then, as shown in the previous table, every two
The same process is used when the current letter 389
340 bits are added together [29].
is from group B, preceded by a letter from group 390
341 7. Hybrid Algorithms: In this section, two or more meth- B. But when the current letter is from the group 391
342 ods of the previous six methods are combined or mixed AB preceded by a letter from the groups (A, B, 392
343 together to devise a new method to hide the data inside or AB), it will be replaced with the same letter 393
344 the cover text. Any new method is just the consequence from the group (ISO-8859-6). In case the secret 394
345 of combining two or more of the six procedures listed bit value equals 0, no change will be made [34]. 395
346 above, which are regarded as the foundation for the 7.4. In the Arabic language, letters are divided accord- 396
347 concealment process in the Persian and Arabic lan- ing to their pronunciation into two types Solar 397
348 guages [30]. and Lunar letters, as shown in table (8). The 398
349 7.1. In this study, by converting the secret text into a researchers took advantage of this feature to hide 399
350 binary code and dividing it into an odd matrix and the confidential data inside the Arabic text cover. 400
351 an even matrix, theIr is paired with the kashida This study will show the method of utilizing and 401
352 to extract a difficult-to-detect concealment path. combining the decoration and changing the sym- 402
353 The odd matrix elements are hidden by using bolic value of the letters. One researcher uses two 403
354 the decoration. The elements of the even matrix types of hiding, the first type by taking the word 404
355 are hidden through the kashida. When the odd that begins with the two letters ( ) followed by a 405
356 matrix elements are hiddI decor Fatha ( ) is added Solar letter. He substitutes the independent letter 406
357 if the bit to be hidden is 1, and it is canceled ( ) with the same one from the other Unicode 407
358 from the letter if the bit to be hidden is 0. As for to hide the value of the secret bit 1. In case of 408
359 the even matrix, one kashida is added when the hiding the secret bit 0, he searches for the word 409
360 bit to be hidden carries the value 1 and remains that begins with the two letters ( ), followed by 410
361 without change when the bit carries the value 0. a Lunar letter, and he replaces the independent 411
362 As shown in tables (12), the data in the odd matrix letter ( ) with the same letter but with another 412
363 is concealed first, followed by the data in the even Unicode symbol. 413
364 matrix [31], [32]. As for the second type, the researcher hides two 414
365 7.2. Some researchers have used pairing styles in con- bits at a time. He hides two bits with a value of 415
366 cealment methods to find the best way to protect 00 by searching for the word that begins with 416
367 hidden cover texts. They combined the Pseudo the two letters ( ) followed by a Lunar letter. 417
368 Connection letter and letter dots to generate a text He begins to change the value of the letter ( ) with 418
369 cover that is well camouflaged. After converting another symbol from the Unicode and adds the 419
370 the secret text to the binary code, the bit to be decoration (Fatha ) to the Lunar letter. In case 420
371 hidden is tested. If it carries the value 1, a pseudo- of hiding the two bits 01, he searches for the 421
372 space will be added after the dotted letter, but if it word that begins with the two letters ( ), followed 422
373 carries the value 0, no pseudo-space will be added by a Lunar letter and replaces the letter ( ) with 423
374 after the dotted letter. If the letter is not dotted another Unicode and adds decoration (any deco- 424
375 and the bit to be hidden carries a value 1, then no ration except the Fatha ( ). In case of hiding the 425
376 pseudo-space is added after the non-dotted letter, two bits 10, he searches for the word that begins 426
VOLUME 10, 2022 94407

TABLE 8. Solar and lunar letters. TABLE 10. Three types of space.
TABLE 9. The space types and their Unicodes.
using three types of spaces [37], as shown in 468
table (10): 469
III. MATERIALS 470
427 with the two letters ( ) followed by a Solar let- A. BAUDOT CODE 471
428 ter and replaces the independent ( ) with another The Baudot Code or International Teleprinter Code was 472
429 from the Unicode and adds any decoration (except invented by Emile Baudot in 1870. The binary code is used by 473
430 Fatha ) to the Solar letter. Finally, in case of slashes and dots. He used this system instead of Morse code. 474
431 hiding the two bits 11, he searches for the word Five bits are encoded in which each bit has two possibilities, 475
432 that begins with the two letters ( ) followed by i.e., 2∧5 = 32 where 32 characters can be used, which is 476
433 a Solar letter. He later replaces the independent equivalent to five bits. The following Table (12) presents the 477
434 letter ( ) with another from the Unicode and adds possibility of using Baudot notation to represent 60 characters 478
435 the decoration (Fatha ) to the Solar letter [35]. by dividing them into two groups. Each set contains 30 sym- 479
436 7.5. The combination of extension and pseudo-space bols. Thus, the compression rate of the text to be hidden is 480
437 is used to hide the data as these two methods are from eight to five bits. The value of the stored secret text is 481
438 the most concealable methods in terms of size. less than the value of the original secret text by approximately 482
439 In this method, the letter is checked to see if it 40% [38], [39]. 483
440 accepts an extension or not. When hiding the bit

441 that carries a value of 0, only one kashida is added B. LETTER FREQUENCY 484
442 if the letter accepts extension. While in case of Letter frequency is simply the number of times an alphabet 485
443 hiding the bit that carries the value 1, two kashida appears on average in a written language. Letter frequency 486
444 will be added if the letter accepts extension. If the analysis goes back to the Arab mathematician Al-Kindi, 487
445 letter does not accept the extension and the bit to who formally developed a method for fractions and deci- 488
446 be hidden carries the value 0, the pseudo-space mals. Letter frequency analysis gained importance in Europe 489
447 is added once, and if the bit to be hidden carries with the development of movable type, where one must 490
448 the value 1, the pseudo-space is added twice. estimate how much type is required for each letter. Letter 491
449 The spaces between words are also exploited by frequency analysis is a basic method of language identifica- 492
450 adding pseudo-space once in the case of hiding tion used by linguists. It is particularly useful in determining 493
451 the bit with a value of 0 and twice in the case of whether an unknown writing system is alphabetical, syllabic, 494
452 the bit to be hidden with a value of 1 [36]. or ideographic. 495
453 7.6. There are three types of small spaces (Thin Space The use of letter frequencies and frequency analysis plays a 496
454 (TS), Hair Space (HS), and Six-PPEm Space fundamental role in coding and many puzzle games, includ- 497
455 (SS)) as shown in table (9): ing Hangman, Scrabble, and the TV game show Wheel of 498
456 The secret bits are hidden in this study by com- Fortune. One of the earliest descriptions in classical literature 499
457 bining the extension method, which hides three of applying knowledge of English letter frequency to solving 500
458 bits within the space between the words for the a cipher is found in Edgar Allan Poe’s famous story The 501
459 letters that permit extension, and the second way, Gold-Bug, in which the method was successfully applied to 502
460 which hides three bits inside the space between decipher a message directed to the whereabouts of a treasure 503
461 the words by adding one of the three types of hidden by Captain Kidd [9]. 504
462 spaces. When the letter accepts extension, one The repetition of characters in the text has been studied 505
463 kashida is added in the case of hiding a bit with a for use in cryptanalysis and frequency analysis in particu- 506
464 value of 1. If you hide the bit with the value 0, lar, as the method has been formally developed (Breakable 507
465 nothing will be done to the letter that accepts cyphers using this technique date back at least to Julius 508
466 extension. But in the case of space between the Caesar’s Caesar cypher, suggesting this method may have 509
467 words, three bits are inserted and hidden together been explored in classical times). 510
94408 VOLUME 10, 2022

TABLE 11. Shows an example of hiding bits in an Arabic sentence where different spelling than British English, such as ‘‘analyse’’ in 524
the red color presented a bit with value 1 and the blue color presented a
bit with value 0. the United Kingdom and ‘‘analyze’’ in the United States. This 525
will greatly affect the frequency of the letter ‘‘z’’ because 526
British speakers of the English language rarely use it. 527
The ‘‘first twelve’’ characters make up about 80% of the 528
total usage. The ‘‘first eight’’ characters make up about 65% 529
of the total usage. Many rank functions can fit letter frequency 530
as a rank function, with the Cocho/Beta rank function being 531
the best. Another classification function without an adjustable 532
free parameter also fits reasonable letter frequency distribu- 533
tion, as shown in Fig (1). 534
FIGURE 1. Arabic letter frequency.
C. LONG SHORT-TERM MEMORY (LSTM) 535
It is an artificial recurrent neural network (RNN) that is 536
uses in the field of deep learning. Unlike standard feedfor- 537
ward neural networks, LSTM has feedback connections [40]. 538
A common LSTM unit comprises a cell, an input gate, an out- 539
put gate, and a forget gate [41]. The cell remembers values 540
over arbitrary time intervals, and the three gates regulate the 541
flow of information into and out of the cell, as shown in 542
Fig (2). 543
511 Because all writers write slightly differently, no specific

512 pattern of letter frequency underlies a particular language.
513 Most languages, however, have a different distribution that
514 appears in longer texts. Strong trends in the frequencies of
515 related letters show across a small sample of written syllables,
FIGURE 2. LSTM structure.
516 from most common to least frequent, until the language shifts
517 dramatically from Old English to Modern English (mutually
518 incomprehensible). Where: 544
519 Also, keep in mind that the frequency of a letter varies ft = Forget gate activation vector
520 depending on the dialect. For example, a writer in the United it = Input/update gate activation vector
521 States might write something in which the letter ‘‘z’’ is ot = Output gate activation vector
522 more common than a writer in the United Kingdom writing + = Summation
523 on the same topic. Some words in American English have × = Hadamard product 545
VOLUME 10, 2022 94409

TABLE 12. Baudot codes. to take or forget the Ct−1 . Now multiply the old state by 564
(it ) with (C̃t ), but we must first get the results of (it , C̃t ) 565
as shown as Equations 4 & 5. 566
it = σ (Wi . [ht−1 , xt ] + bi ) (4) 567
C̃t = tanh(Wc . [ht−1 , xt ] + bc ) (5) 568
Now we can calculate cell state output as shown as 569
Equation (6). 570
Ct = ft × Ct−1 + ıt × (6) 571
3. The final step is to determine what output is required 572
based on the cell condition after filtration. The sigmoid 573
layer decides which parts of the cell to ignore and which 574
parts to use and output and then uses it to extract the 575
values in the interval (−1,+1) before multiplying with 576
the sigmoid gate output, which means we only output 577
the sections we need as shown as Equations 7 & 8. 578
ot = σ (W o . [ht−1 , xt ] + bo ) (7) 579
ht = ot × tanh(C t ) (8) 580
By building an LSTM model, we will use the Sequen- 581

ht = Hidden state vector also known as output vec- tial model, Embedding layer, LSTM layer, and Dense 582
tor of the LSTM unit Layer to train the model. 583
C̃ = Cell input activation vector
ht−1 = Previous output E. EMBEDDING LAYER 584
Ct = Cell state
The embedding layer is defined as the first hidden layer of a 585
Ct−1 = Previous Cell state
network. It must specify three arguments: Input, Output, and 586
xt = Input vector to the LSTM unit
the input length. 587
tanh = Tanh function is shown as Equation (1)
546 LSTM Layer: First, we provide the number of nodes in the 588
hidden layers within the LSTM cell. We will use 128 hidden 589
ex − e−x layer units, as shown in Fig (3).
547 tanh (x) = (1) 590
ex + e−x
σ = Sigmoid function is shown as Equation (2)
548
1
549 σ (x) = (2)
1 + e−x
b= bias vector parameters which need to be
learned during training
550
551 D. LSTM GATES STEP BY STEP

552 1. The first step is to employ the forget gate layer, a sig-
553 moid layer that chooses between throwing away or
554 using information from the cell state by using the pre-
555 vious output (ht−1 ) and the new input (xt ). The results
556 must be in the interval (0,1) for each cell state number
557 (Ct−1 ). When the results equal 1, the cell state is used; FIGURE 3. Multi-layer LSTM.
558 when the results equal 0, the forget gate excludes this
559 cell state and does not enable the operation to proceed
560 as shown as Equation (3). F. SoftMax FUNCTION 591
561 ft = σ (Wf . [ht−1 , xt ] + bf ) (3) The SoftMax function is a function that converts any real- 592
valued vector to a probability distribution. Its function is to 593
562 2. The second step is to update the old cell state Ct−1 for convert the values of the vector v to probabilistic values, 594
563 the new cell state Ct . The last step is to decide whether regardless of whether the elements are positive, negative, 595
94410 VOLUME 10, 2022

596 or zero. If the input is small or negative, the function turns it from the first phase one word at a time, allowing each word a 638
597 into small probabilities [42]. And if the entries contain large possibility to be learned from the 100 words that preceded 639
598 values, the function turns it into a high probability. No matter it. The Keras LSTM model is used to make predictions is 640
599 how large or small, the probabilistic vector must be between to first start off with a seed sequence of words as a new 641
600 zero and one, as shown as Equation (9). input, generate the next word after that update the seed words 642
sequence to add the generated word on the end and trim off 643
eZi
601 Z̃i = (9) the first word. This process is repeated for as long as we want 644
k
P
eZj to generate new words, for example a sequence of 1000 words 645
j=1 in length. The LSTM algorithm has 2 layers and 128 nodes 646
per layer. Also, a 128-nodes dense layer is used. After that, 647
Z̃ = The output vector of the softmax the results are purified by the SoftMax function with a Batch 648
Zi = All the zi values are the elements of the input size equal to 16 and 20 epochs. The seed words are entered 649
vector to the softmax function and they can to generate the new words, as shown in Figure (5). 650
602
take any real value positive, zero or negative.
603 This algorithm is used in neural networks since many end

604 in the penultimate layer, which produces results with large
605 values; it is difficult to scale or deal with. Therefore, the
606 softmax function is used because it converts the real values
607 into a probability distribution [43].
608 G. CROSS-ENTROPY LOSS FUNCTION

609 Rubinstein was the first to apply the cross-entropy approach
610 to sample rare events of adaptive importance for probabilities
611 estimation [44]. The rare event can even be estimated by
612 translating and improving many problems. Therefore, adap-
613 tive sampling methods such as the CE method can be taken as
614 random optimization methods, as shown in the equation (10).
1 X
yi log ŷi + (1 − yi ) log 1 − ŷi

615 loss = − (10)
m
m= Number of data points
yi = Is the ground-truth value taking a value 0 or 1.
616
ŷi = Is the SoftMax probability for the ith data point
FIGURE 4. The first phase.

617 H. PROPOSED ALGORITHM
618 Our work is divided into three phases. Each phase is separated
619 from the next but depends on it completely. In the first phase, The third phase is one of the most important phases of 651
620 the text file is prepared, where the process of removing the the system, in which a cover text carrying hidden data is 652
621 difference between the lines begins, and then it is converted generated. Two groups of letters carrying the hidden bits are 653
622 into a list to delete all the special characters (!"#$%&’()∗+, - defined, including the first set (e,r,o,n,l,u) representing a bit 654
623 ./: <=>?@[\]∧ _‘{|}∼). The text is divided into Tokens, and with a value 1 and the second set (a,i,t,s,c,d) representing a bit 655
624 then it is identified in the form of sequences consisting of 32 with a value 0. After entering the secret text to be hidden, it 656
625 words, as in figure (4). is compressed and encoded by a 5-bit Baudot code to reduce 657
626 In the second phase, a vocabulary dictionary is created, its size by 45%. Then, the generated word is tested, where 658
627 containing the single words in the text without repetition. letter after letter of the generated word is tested. The letter is 659
628 Unique words are converted into numbers to pass them inside utilized inside the set of letters representing the bit 1, and the 660
629 the neural network because it deals only with numbers. Create bit to be hidden has a value 1. If the letter to be tested is in the 661
630 a dictionary of integers to unique words and a dictionary set of letters that represent 0, and the bit to be hidden has a 662
631 of words to integers. The sequences extracted from the first value of 0, the letter is also used. But if the letter is not among 663
632 phase are converted to a digital array, The sequences extracted the two groups and is considered to have a neutral value, it is 664
633 from the first phase, will be split into subsequences with a used without comparison with the bit to be hidden. But in the 665
634 stable length of 100 words, each training pattern of the RNN case of asymmetry between the bit to be hidden and the letter 666
635 is consisting of 100 time steps of one word (X) followed by tested, the word is deleted, and a new word is regenerated. 667
636 one word output (y). When these sequences were created, This process is done until all secret text is hidden inside the 668
637 we slide this window along the whole sequences extracted cover text [45], as shown in figure (6). 669
VOLUME 10, 2022 94411

TABLE 13. Compress secret message to baudot code.
FIGURE 5. The second phase. Poetry was an Arab means of communication in the pre- 673
Islamic era, and the tribe used to celebrate when one of 674
their sons was a talented poet. Poetry was used in the past 675
among Arabs to raise the status of a tribe and degrade another. 676
In the early days of Islam, poetry was one of the means of 677
defending the message of Islam against the polytheists of 678
Quraysh. During the Umayyads and the era of the Abbasids, 679
Poetry was also a means for the conflicting political and 680
intellectual groups to communicate their opinions and defend 681
their principles in the face of their opponents. 682
Thus, Arab poetry had a prominent role in literary, intel- 683
lectual, and political life. Poetry develops according to 684
the development of Arab and Islamic people and accord- 685
ing to their relations with other peoples. New advanced 686
arts emerged in poetry, such as descriptive poetry, polit- 687
ical poetry, mystical poetry, social and national poetry, 688
and modern contemporary poetry, in terms of substance, 689
style, and language, as well as weights, rhymes, and other 690
factors. 691
All these features made Arabic poetry more widely circu- 692
lated among people. So, it is used to hide the secret text within 693
Arabic poetry, where the Arabic letters were divided into two 694
groups based on their frequency in the Qur’an texts. Each 695
group has 9 letters with equal frequencies. The first group 696
carries a value 0 bit, and the second group represents the value 697
of 1 bit. The rest of the letters represented by 10 letters are 698
considered to have a neutral value to complete the 28 letters 699
used in the Arabic language. 700
When generating words, they are tested one by one. If the 701
word has less than four letters, it is added to the cover 702
text without comparing the bits of the secret text. But when 703
FIGURE 6. The third phase. the generated word contains more than three letters, the 704
letters are compared sequentially with the sequence of bits 705
to be hidden. If they match, they are added to the cover 706
text, and if they do not match, they are excluded, and 707
670 Arabic Encoding Algorithm (AEA): What distinguishes a new word is re-generated. This process continues until 708
671 Arabic poetry is that it adheres to meter and rhyme, in all all the secret bits to be hidden are canceled as shown in 709
672 its styles and through its different generations. algorithm (1). 710
94412 VOLUME 10, 2022

Algorithm 1 Arabic Encoding Algorithm TABLE 14. Hiding the secret bits in the cover text.
Input text file as (TX)
Input secret message as (SM)
Extract rhymes letters from text as (RH)
Compress (SM) using BAUDOT Code as (BL)
Split (TX) to words (Tokens) as (TO)
Create word Vocabulary
Create word-to-integer Dictionary
Create integer-to-word Dictionary
Convert (TO) to integer depend on repeated
Split (TX) to batches
Create matrix [words length : words length]
depend on words neighbor
Update (TX) by shift word after word one position
Preparing data to training
Define LSTM nodes =128
Extract the weight using Embedding method
Extract probabilities in range (0-1) using SoftMax
Find the Loss based on Cross-Entropy method
Define an optimizer to starting training
Input seed words
Input the group-1 as G1 and group-0 as G0
Generate words W in range BL
If the new_word_line >3 characters
For the character in word
if bit in BL=1 and character[word] in G1
OR bit in BL=0 and character[word] in G0
BL = BL – 1
New_word= new_word + character[word]
else
if bit in BL=1 and character[word] in G0
OR bit in BL=0 and character[word] in G1
return the old BL
delete the current word
Generate new word W Word =3 721
new_word_line = new_word_line + new_word Letters without space = 16 letters 722
else Letters with space = 18 letters 723
if the last letters in W = RH In 8-bit binary code = 144 bit 724
new_word_line + (W) Using Baudot Code = 90 bit as in table (13) 725
else
delete the current word W
144 − 90
Generate new word W Compression Percentage = ∗ 100 = 37.5 % 726
End 144
Data set 727
711 I. EXPERIMENTAL RESULTS

• Al Mutanabbi’s poetry Words (21,498) 728
712 1- The programming language used to implement the Characters (no spaces) (91,584) 729
713 algorithm: Python 3.9 (64-bit) Characters (with spaces) (108,659) 730
714 2- The applied environment: PyCharm 2018.1.6 Commu- • Nizar Qabbani’s poetry Words (83,1750) 731
715 nity Edition Characters (no spaces) (364,688) 732
716 2- The Library Packages: collections-Counter, numpy, Characters (with spaces) (454,832) 733
717 tensorflow, tkinter • Al-Akhtal’s poetry Words (20,929) 734
718 3- Using LSTM with 128-nodes layers (SoftMax, Cross- Characters (no spaces) (91,844) 735
719 Entropy loss, text with 800,000 vocabulary words) Characters (with spaces) (108,491) 736
720 Secret message = Al Razi University • Generating Arabic poetry Steganography 737
VOLUME 10, 2022 94413

TABLE 15. Arabic text steganography algorithms advantages and disadvantages.
TABLE 16. Comparison between arabic text steganograpy algorithms.
Number of lines = 12 739
Number of words = 59 740
Characters with space = 301 741
Characters without space = 254 742
Total hidden bits = 90 bits as in table (14) 743
The total capacity used = 254−90

254 ∗ 100 = 64.5 % 744
IV. CONCLUSION 745
In this study, a new approach is proposed to hide confidential 746
information inside the cover text. Artificial intelligence and 747
738 deep learning theories are used to generate sentences and 748
94414 VOLUME 10, 2022

749 texts that carry confidential information, using the theory of [17] A. A.-A. Gutub and A. A. Al-Nazer, ‘‘High-capacity steganography tool 817
750 LSTM. Words are generated where each word enables more for Arabic text using Kashida,’’ Int. J. Inf. Secur., vol. 2, no. 2, pp. 107–118, 818
Jul. 2010. 819
751 than one secret bit. Unlike previous theories, the process of [18] A. Odeh, K. Elleithy, and M. Faezipour, ‘‘Steganography in Arabic 820
752 hiding is based on letters rather than words, which enhances text using Kashida variation algorithm (KVA),’’ in Proc. IEEE Long 821
753 the capacity of the cover text to carry more hidden bits and Island Syst., Appl. Technol. Conf. (LISAT), May 2013, pp. 1–6, doi: 822
10.1109/LISAT.2013.6578239. 823
754 generate numerous texts at once to choose the finest texts [19] H. M. Ahmed, ‘‘Arabic language script steganography based on dynamic 824
755 appropriate for the occasion to be transmitted. As a result, random linear regression,’’ Mustansiriyah J. Sci. Eduction, vol. 17, no. 1, 825
756 the desired goal in text steganography is achieved. Future pp. 397–414, 2016. 826
[20] A. A. Gutub, L. M. Ghouti, Y. S. Elarian, S. M. Awaideh, and A. K. Alvi, 827
757 research will be conducted in order to find new ways to ‘‘Utilizing diacritic marks for Arabic text steganography,’’ Kuwait J. Sci. 828
758 increase the embedding rate while increasing its security, Eng., vol. 37, no. 1, pp. 89–109, Jun. 2010. 829
759 as well as researching smart algorithms that help generate [21] M. S. Memon and D. A. Shah, ‘‘A novel text steganography technique 830
760 words more efficiently than the algorithms in place to sim- to Arabic language using reverse Fat5Th5Ta,’’ Pakistan J. Eng., Tech- 831
nol. Sci., vol. 1, no. 2, pp. 106–113, Sep. 2015, doi: 10.22555/pjets. 832
761 ulate human-generated words. v1i2.167. 833
[22] N. Alanazi, E. Khan, and A. Gutub, ‘‘Inclusion of unicode standard seam- 834
762 REFERENCES less characters to expand Arabic text steganography for secure individual 835
uses,’’ J. King Saud Univ. Comput. Inf. Sci., vol. 34, no. 4, pp. 1343–1356, 836
763 [1] P. Dobriyal, J. Yadav, and J. Jain, ‘‘A review on text based steganography,’’ Apr. 2022, doi: 10.1016/j.jksuci.2020.04.011. 837
764 Int. J. Res. Publication’s, vol. 4, no. 3, pp. 44–50, Jan. 2015. [23] A. S. Sabir, ‘‘A new Arabic text diacritics, non diacritics steganography,’’ 838
765 [2] S. Mersal, S. Alhazmi, R. Alamoudi, and N. Almuzaini, ‘‘Arabic text Basrah J. Sci., vol. 31, no. 3, pp. 85–96, 2013. 839
766 steganography in smartphone,’’ Int. J. Comput. Inf. Technol., vol. 3, no. 2, [24] N. Alanazi, E. Khan, and A. Gutub, ‘‘Efficient security and capacity tech- 840
767 pp. 764–2279, Mar. 2014. niques for Arabic text steganography via engaging unicode standard encod- 841
768 [3] M. Hanaa Ahmed and M. A. A. khodher, ‘‘Arabic language document ing,’’ Multimedia Tools Appl., vol. 80, no. 1, pp. 1403–1431, Jan. 2021, doi: 842
769 steganography based on Huffman code using DRLR as RNG,’’ Al-Mansour 10.1007/s11042-020-09667-y. 843
770 J., vol. 2016, p. 57, 2016. [Online]. Available: https://www.iasj.net/
[25] M. H. Shirali-Shahreza and M. Shirali-Shahreza, ‘‘Arabic/Persian text 844
771 iasj/download/2eeb7e70324e480c, doi: 10.36541/0231-000-026-007.
steganography utilizing similar letters with different codes,’’ Arabic J. Sci. 845
772 [4] M. Y. Valandar, P. Ayubi, M. J. Barani, and B. Y. Irani, ‘‘A chaotic
Eng., vol. 35, no. 1, pp. 213–222, Apr. 2010. 846
773 video steganography technique for carrying different types of secret mes-
[26] N. A. Roslan, R. Mahmod, and N. I. Udzir, ‘‘Sharp-edges method in Arabic 847
774 sages,’’ J. Inf. Secur. Appl., vol. 66, May 2022, Art. no. 103160, doi:
text steganography,’’ J. Theor. Appl. Inf. Technol., vol. 33, no. 1, pp. 32–41, 848
775 10.1016/j.jisa.2022.103160.
15, Nov. 2011. 849
776 [5] E. Farri and P. Ayubi, ‘‘A robust digital video watermarking based on
[27] M. Shirali-Sh and S. Shirali-Sh, ‘‘High capacity Persian/Arabic text 850
777 CT-SVD domain and chaotic DNA sequences for copyright protection,’’
steganography,’’ J. Appl. Sci., vol. 8, no. 22, pp. 4173–4179, Nov. 2008, 851
778 J. Ambient Intell. Hum. Comput., vol. 2022, pp. 1–25, Feb. 2022, doi:
doi: 10.3923/jas.2008.4173.4179. 852
779 10.1007/s12652-022-03771-7.
780 [6] M. S. Kadhem and D. Wameedh, ‘‘Proposed Arabic text steganography [28] A. F. Al Azzawi, ‘‘A multi-layer Arabic text steganographic method based 853
781 method based on new coding technique,’’ J. Eng. Res. Appl., vol. 6, no. 9, on letter shaping,’’ Int. J. Netw. Secur. Appl., vol. 11, no. 1, pp. 27–40, 854
782 pp. 38–46, Sep. 2016. Jan. 2019, doi: 10.5121/ijnsa.2019.11103. 855
783 [7] R. A. Khekan, H. M. W. Majeed, and F. O. A. Adeeb, ‘‘New text [29] A. Ditta, C. Yongquan, M. Azeem, K. G. Rana, H. Yu, and M. Q. Memon, 856
784 steganography method using the Arabic letters dots,’’ Indonesian J. Electr. ‘‘Information hiding: Arabic text steganography by using Unicode 857
785 Eng. Comput. Sci., vol. 4752, vol. 21, no. 3, pp. 1784–1793, 2021, doi: characters to hide secret data,’’ Int. J. Electron. Secur. Digit. Foren- 858
786 10.11591/ijeecs.v21.i3.pp1784-1793. sics, vol. 10, no. 1, pp. 61–78, 2018, doi: 10.1504/IJESDF.2018. 859
787 [8] E. A. Kadhim, H. B. AbdulWahab, and S. M. Kadhem, ‘‘Proposed 089214. 860
788 approach for steganography in Arabic text basedon, DNA coding and [30] R. Din, R. A. Thabit, N. I. Udzir, and S. Utama, ‘‘Traid-bit embedding 861
789 Arabic diacritics,’’ Int. J. Adv. Res., vol. 2, no. 12, pp. 954–965, 2014. process on Arabic text steganography method,’’ Bull. Electr. Eng. Infor- 862
790 [9] S. F. Lu, O. Farooq, and H. Ali, ‘‘New steganography method using mat., vol. 10, no. 1, pp. 493–500, Feb. 2021, doi: 10.11591/eei.v10i1. 863
791 litter manipulations frequency,’’ in Proc. 2nd Int. Conf. Inf. Technol. Ind. 2518. 864
792 Automat., 2017, pp. 1–6. [31] E. M. Ahmadoh and A. A.-A. Gutub, ‘‘Utilization of two diacritics for 865
793 [10] A. Odeh, A. Alzubi, Q. B. Hani, and K. Elleithy, ‘‘Steganography by Arabic text steganography to enhance performance,’’ Lect. Notes Inf. The- 866
794 multipoint Arabic letters,’’ in Proc. IEEE Long Island Syst., Appl. Technol. ory, vol. 3, no. 1, pp. 1–6, 2015, doi: 10.18178/lnit.3.1.42-47. 867
795 Conf. (LISAT), May 2012, pp. 1–7, doi: 10.1109/LISAT.2012.6223209. [32] H. M. S. Alshahrani and G. Weir, ‘‘Hybrid Arabic text steganography,’’ Int. 868
796 [11] M. H. Shirali-Shahreza and M. Shirali-Shahreza, ‘‘A new approach to J. Comput. Inf. Technol., vol. 6, no. 6, pp. 329–338, 2017. 869
797 Persian/Arabic text steganography,’’ in Proc. 5th IEEE/ACIS Int. Conf. [33] R. A. Alotaibi and L. A. Elrefaei, ‘‘Utilizing word space with pointed and 870
798 Comput. Inf. Sci. 1st IEEE/ACIS Int. Workshop Component-Based un-pointed letters for Arabic text watermarking,’’ in Proc. UKSim-AMSS 871
799 Softw. Engineering,Software Archit. Reuse (ICIS-COMSAR), Jul. 2006, 18th Int. Conf. Comput. Model. Simul. (UKSim), Apr. 2016, pp. 111–116, 872
800 pp. 310–315, doi: 10.1109/ICIS-COMSAR.2006.10. doi: 10.1109/UKSim.2016.34. 873
801 [12] R. Thabit, N. I. Udzir, S. Yasin, A. Asmawi, and N. A. Roslan, ‘‘A compar- [34] S. Malalla and F. R. Shareef, ‘‘A novel approach for Arabic text steganog- 874
802 ative analysis of Arabic text steganography,’’ Appl. Sci., vol. 11, p. 6851, raphy based on the ‘BloodGroup’ text hiding method,’’ Eng., Tech- 875
803 Jul. 2021, doi: 10.3390/app11156851. nol. Appl. Sci. Res., vol. 7, no. 2, pp. 1482–1485, Apr. 2017, doi: 876
804 [13] A. A. Gutub and M. M. Fattani, ‘‘A novel Arabic text steganography 10.48084/etasr.1090. 877
805 method using letter points and extensions,’’ Int. J. Comput. Inf. Eng., vol. 1, [35] H. K. Tayyeh, M. S. Mahdi, and A. S. A. AL-Jumaili, ‘‘Novel steganog- 878
806 no. 3, pp. 28–31, 2007. raphy scheme using Arabic text features in Holy Quran,’’ Int. Journ- 879
807 [14] A. F. Al-Azawi and M. A. Fadhil, ‘‘Arabic text steganography using nal Electr. Comput. Eng., vol. 9, no. 3, pp. 1910–1918, 2019, doi: 880
808 Kashida extensions with Huffman code,’’ J. Appl. Sci., vol. 10, no. 5, 10.11591/ijece.v9i3.pp1910-1918. 881
809 pp. 436–439, Feb. 2010, doi: 10.3923/jas.2010.436.439. [36] S. M. A. Al-Nofaie and A. A.-A. Gutub, ‘‘Utilizing pseudo-spaces to 882
810 [15] A. A. A. Gutub and W. Al-Alwani, ‘‘Improved method of Arabic text improve Arabic text steganography for multimedia data communications,’’ 883
811 steganography using the extension ‘Kashida’ character,’’ Bahria Univ. Multimedia Tools Appl., vol. 79, nos. 1–2, pp. 19–67, Jan. 2020, doi: 884
812 J. Inf. Commun. Technol., vol. 3, no. 1, pp. 68–72, 2010. 10.1007/s11042-019-08025-x. 885
813 [16] F. Al-Haidari, A. Gutub, K. Al-Kahsah, and J. Hamodi, ‘‘Improving secu- [37] A. Taha, A. S. Hammad, and M. M. Selim, ‘‘A high capacity algorithm 886
814 rity and capacity for Arabic text steganography using ’Kashida’ exten- for information hiding in Arabic text,’’ J. King Saud Univ. Comput. Inf. 887
815 sions,’’ in Proc. IEEE/ACS Int. Conf. Comput. Syst. Appl. (AICCSA), Sci., vol. 32, no. 6, pp. 658–665, Jul. 2020, doi: 10.1016/j.jksuci.2018. 888
816 May 2009, pp. 396–399, doi: 10.1109/AICCSA.2009.5069355. 07.007. 889
VOLUME 10, 2022 94415

890 [38] A. Mahmood, T. Latif, and K. M. A. Hasan, ‘‘An efficient 6 bit OMER FAROOQ AHMED ADEEB was born 918
891 encoding scheme for printable characters by table look up,’’ in in Iraq, Baghdad, in 1973. He received the 919
892 Proc. Int. Conf. Electr., Comput. Commun. Eng. (ECCE), Feb. 2017, B.S. degree in computer science from Al- 920
893 pp. 468–472. Mustansiriyah University, Baghdad, in 1999, and 921
894 [39] M. Malhotra, D. Scientf Aalysi Group, N. G. Gupta, and R. S. Prasad, the M.Sc. degree in computer engineering from the 922
895 ‘‘Software-based solution for analysis and decoding of FSK-2 modu- Huazhong University of Science and Technology 923
896 lated, baudot-coded signals,’’ Defence Sci. J., vol. 56, no. 2, pp. 259–268, (HUST), Wuhan, China, in 2017. He is currently 924
897 Apr. 2006.
pursuing the Ph.D. degree with Razi University, 925
898 [40] T. Fang, M. Jaggi, and K. Argyraki, ‘‘Generating steganographic text
Kermanshah, Iran. His research interest includes 926
899 with LSTMs,’’ in Proc. 55th Annu. Meeting Assoc. Comput. Linguistics
900 Student Res. Workshop, Vancouver, BC, Canada, Jul. 2017, pp. 100–106, computer security. 927
901 doi: 10.18653/v1/P17-3017.

902 [41] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ SEYED JAHANSHAH KABUDIAN received 928
903 Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997, doi: the B.S. and M.S. degrees in computer engi- 929
904 10.1162/neco.1997.9.8.1735. neering and the Ph.D. degree in computer 930
905 [42] F. Wang, J. Cheng, W. Liu, and H. Liu, ‘‘SVD SoftMax: Fast softmax engineering–artificial intelligence and robotics 931
906 approximation on large vocabulary neural networks,’’ in Proc. 31st Conf. from the Amirkabir University of Technology, 932
907 Neural Inf. Process. Syst. (NIPS), Long Beach, CA, USA, Dec. 2018,
Tehran, Iran. He is currently an Assistant Profes- 933
908 pp. 5469–5479, doi: 10.1109/LSP.2018.2822810.
sor with Razi University, Kermanshah, Iran. His 934
909 [43] S. Gold and A. Rangarajan, ‘‘Softmax to softassign: Neural network
910 algorithms for combinatorial optimization,’’ J. Artif. Neural, vol. 2, no. 4, research interests include artificial intelligence, 935
911 pp. 381–399, 1995. digital signal processing, sound/audio/music sig- 936
912 [44] I. Z. Botev, P. D. Kroese, Y. R. Rubinstein, and P. L’Ecuyer, ‘‘The cross- nal processing, speech processing, pattern recog- 937
913 entropy method for optimization,’’ in Handbook of Statistics, vol. 31. nition, machine learning, data mining, neural networks, deep learning, 938
914 Amsterdam, The Netherlands: Elsevier, 2013. global optimization, meta-heuristic algorithms, evolutionary computation, 939
915 [45] E. A. Khan, ‘‘Using Arabic poetry system for steganography,’’ Asian swarm intelligence, text/natural language processing, biometrics, biomedical 940
916 J. Comput. Sci. Inf. Technol., vol. 4, no. 6, pp. 55–61, 2014, doi: data/signal processing, and social networks. 941
917 10.15520/ajcsit.v. 942
94416 VOLUME 10, 2022

Arabic Text Steganography Based On Deep Learning Methods

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Arabic Text Steganography Based On Deep Learning Methods

Uploaded by

Copyright:

Available Formats

Received 30 May 2022, accepted 2 July 2022, date of publication 23 August 2022, date of current version 14 September 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3201019

Arabic Text Steganography Based on Deep

letters in the world, the Arabic alphabet graphics. There are 50

with a small amount at a ratio of 1/300 of an inch 117

72 II. LITERATURE REVIEW

94 2- Extended Letters text cover [10], [11]. 141

95 3- Decorate 2. Extended letters (Kashida): Persian and Arabic lan- 142

94404 VOLUME 10, 2022

VOLUME 10, 2022 94405

TABLE 5. The letter position shape.

253 represent more than 100,000 letters and symbols from

94406 VOLUME 10, 2022

but if it carries a value of 0, a pseudo-space will 377

be added after the non-dotted letter [33]. 378

VOLUME 10, 2022 94407

TABLE 9. The space types and their Unicodes.

using three types of spaces [37], as shown in 468

table (10): 469

III. MATERIALS 470

440 accepts an extension or not. When hiding the bit

452 the bit to be hidden with a value of 1 [36]. or ideographic. 495

94408 VOLUME 10, 2022

British speakers of the English language rarely use it. 527

The ‘‘first twelve’’ characters make up about 80% of the 528

as a rank function, with the Cocho/Beta rank function being 531

the best. Another classification function without an adjustable 532

free parameter also fits reasonable letter frequency distribu- 533

tion, as shown in Fig (1). 534

FIGURE 1. Arabic letter frequency.

C. LONG SHORT-TERM MEMORY (LSTM) 535

It is an artificial recurrent neural network (RNN) that is 536

uses in the field of deep learning. Unlike standard feedfor- 537

ward neural networks, LSTM has feedback connections [40]. 538

A common LSTM unit comprises a cell, an input gate, an out- 539

flow of information into and out of the cell, as shown in 542

Fig (2). 543

511 Because all writers write slightly differently, no specific

VOLUME 10, 2022 94409

as shown as Equations 4 & 5. 566

it = σ (Wi . [ht−1 , xt ] + bi ) (4) 567

C̃t = tanh(Wc . [ht−1 , xt ] + bc ) (5) 568

Now we can calculate cell state output as shown as 569

Equation (6). 570

Ct = ft × Ct−1 + ıt × (6) 571

3. The final step is to determine what output is required 572

based on the cell condition after filtration. The sigmoid 573

values in the interval (−1,+1) before multiplying with 576

the sigmoid gate output, which means we only output 577

the sections we need as shown as Equations 7 & 8. 578

ot = σ (W o . [ht−1 , xt ] + bo ) (7) 579

ht = ot × tanh(C t ) (8) 580

By building an LSTM model, we will use the Sequen- 581

551 D. LSTM GATES STEP BY STEP

valued vector to a probability distribution. Its function is to 593

94410 VOLUME 10, 2022

603 This algorithm is used in neural networks since many end

608 G. CROSS-ENTROPY LOSS FUNCTION

FIGURE 4. The first phase.

VOLUME 10, 2022 94411

TABLE 13. Compress secret message to baudot code.

defending the message of Islam against the polytheists of 678

intellectual groups to communicate their opinions and defend 681