Evaluative Approach towards Text Steganographic Techniques

: Steganography is a technique for hiding data in sensitive


Introduction
Good data hiding hides the data in an undetectable manner in other electronic media or "Covers" like Text, Image, Audio, Video, etc. When data is screened or hidden inside a cover, it is called Steganography and one that employs a text as cover is called Text Steganography 1 . Text Steganography is preferred over other media, because of lesser space occupied by the text, communicate more information and need less cost for printing as well as some other advantages 2 . Each steganography communication system consists of an Embedding algorithm and an Extraction algorithm. The secret message embedded in cover text using Embedding algorithm 3 . Hiding information may require a steganographic key which is additional secret information, such as a password, required for embedding the information 4 . The Embedding algorithm then produces a stego text that can be stored and/or transferred through communication channels. The Extracting algorithm receives the stego text and the (optional) stego-key, and extracts the secret Message as shown in Figure 1 5 . Text Steganography embeds the secret data in text files through various techniques given below:

Format based Method
It modifies the existing text in order to hide the Steganographic text. It involves the insertion of spaces, resizing the text and changing the style of text to hide the secret message 6 .

Random and Statistical Method
Random Method hides the characters that appear in random sequence. Statistical methods determine the statistics such as means, variance and chi square test which can measure the amount of redundant information to be hidden within the text 6 .

Linguistic Method
Linguistic Method is a combination of Syntax and Semantics methods. Linguistic Steganography considers the linguistic properties of generated and modified text and uses linguistic structure as the space in which messages are hidden. Syntactic steganalysis is to ensure that structures are syntactically correct. Because the text is generated from the grammar, unless the grammar is syntactically flawed, the text is guaranteed to be syntactically correct. In Semantic Method you can assign the value to synonyms and data can be encoded into actual words of text 6 .

Criteria for Measuring Goodness of Data Hiding Algorithm
The technique used for a particular purpose is measured against these basic criteria. No algorithm matches all but a balance should be made before choosing one.

Embedding Capacity
Embedding Capacity (also known as payload) is the amount of data that can be hidden in a cover, compared to the size of the cover. This feature can be measured numerically in units of bit-per-bit (bpb). A Steganographic algorithm with small embedding capacity may have other good features such as Robustness, so it may be the ideal choice when only a small amount of data, such as a short message, has to be hidden 7 .

Invisibility
Any data hidden in a cover causes it to be modified. Invisibility (also termed perceptual transparency or algorithm quality) is a measure of the amount of distortion (alteration) to the cover. A large embedding capacity is useless if it causes large distortions to the cover 7 .

Undetectability
An attacker may be able to detect the presence of hidden data in a given file by computing certain statistical properties of the file and comparing them to what is expected in that type of file 7 .

Robustness
This is a measure of the ability of the algorithm to retain the data embedded in the cover even after the cover has been subjected to various changes as a result of lossy compression and decompression or of certain types of processing such as conversion to analog and back to digital 7 .

Text Steganographic Techniques
Hiding a secret text inside a cover media which is also a text is a tricky one where the cover text file has less redundant bits used for hiding.

Text Rotation Techniques in Ms Excel Document
Convert the secret message to be hidden into Binary bits using ASCII to Binary conversion method. Select the excel document to be used as a cover text. First find the non-empty cell and then find the length of non-empty cell. If the calculated length is less than or equal to limit (limit means number of letters in a cell) specified and if the secret bit is 1 then find that cell contains text or numeric. If it is text then rotate that cell to 1 o rotation. Else if it is numeric then rotate that cell to -1 o rotation. If the secret bit is 0 leave that cell with no rotation. Finally the formatted excel document is the stego text 8 .

Embedding Algorithm:
Input: MS Excel document, Limit p, secret bits Output: stego-text Body: • For each non-empty cell G do.
• Get the selected cell's length n.
• If n < p and secret bit is 1 then • If the type of G is text then Rotate the angle of G to 1 o • Else If the type of G is numeric then Rotate the angle of G to -1 o • Output the embedded document. The excel document of student marksheet is used as a cover text to hide secret message as shown in Table 1.
Text rotates in MS Excel document is hard to detect the minor changes in the text's angle through. But when the text's length in a cell is less than 4, the rotation is hard to detect but the stego-text gets worse with the increase of text's length in a cell by setting limit is greater 4.

Mixed Case Font Technique in Ms Word Document
First, the secret message is converted into bits as an array S. The Text file is choosen as a cover text. Each letter is separately taken in an array T. If ith element of S is bit 1 then the ith element of T is changed to capital letter else if ith element of S is bit 0 then ith element of T is changed to small letter. This method is iterated until the last index element of S is completed 9 . Embedding Algorithm: Input file: Text file T, Secret Message M.
Output file: Stego Text S. • Choose a text file T.
• Get the secret message M.  • Convert secret messge M into stream of bits b.
• Select Ti from T and bi from b.
• IF the bi is 'one' then change T; case into capital else change Ti case into small. • Repeat step 6, 7 till the whole b is hidden.
• The resultant file will be the stego text S.
• If Ti case is in capital letter then include bit 1 in S i th index else if Ti case is in small letter then include bit 0 in S i th index. • Convert the secret bit S into ASCII value to get the secret message. • The resultant message will be the secret message.
For the same secret text converted to ASCII secret bits After secret message "Arrive On Friday" is hidden, the cover text shown in Figure 3 is converted to stego text. Figure 4 shows stego text document.
Hidden information cannot be destroyed when stego text is enlarged or desized. By this method, a large volume of information can be hidden in text when compared with other methods, because the above mentioned method is not using spaces between words or between paragraphs but using the letters themselves. But if they come to know the algorithm or technique the secret information can be easily extracted.

Font Type in MS-Word Document
Before starting this method create a resemble font array which contains a table of cover document font and their resembling fonts for assumption 15 type of cover document fonts and their resembling font. Create a code table that contains coding of each symbol in secret message represented by three types of fonts, thus, 27 characters (English alphabets with space) can be hidden in 3 letters      Q  2  3  2  3  C  1  1  3  18  R  2  3  3  4  D  1  2  1  19  S  3  1  1  5  E  1  2  2  20  T  3  1  2  6  F  1  2  3  21  U  3  1  3  7  G  1  3  1  22  V  3  2  1  8  H  1  3  2  23  W  3  2  2  9  I  1  3  3  24  X  3  2  3  10  J  2  1  1  25  Y  3  3  1  11  K  2  1  2  26  Z  3  3  2  12  L  2  1  3  27  space  3  3  3  13  M  2  2  1  14  N  2  2  2  15 O 2 2 3 Figure 5. Cover Text. The stego text will attract no attention because it will look like the "cool fonts" used in chat rooms and presentations-HIGH Hidden effect is HIGH since resemblance fonts are used to hide secret message.

HIGH
Hidden information cannot be destroyed when stego text is enlarged or reduced -HIGH Because the stego document will not change during compression, copying and paste between computer programs, the data hidden in texts remains intact during these operations. HIGH Figure 6. Stego Text.
take the resemblance font name for that font. Cover fonts name and their resemblance name are shown in Table 3.
The coding of each letter in secret message are represented by three types of font, thus, 27 characters (English alphabets with space) can be hidden in 3 letters of cover using 3 different fonts. Table 4 shows the code  table. Secret message = "Arrive On Friday". Cover document's font style = Verdana, size =12. Any word document with more capital letters is chosen as a cover document. Figure 5 shows the cover text.
After secret message "Arrive On Friday" is hidden, the cover text is converted to stego text. Figure 6 shows stego text document.
Six capital letters are needed to hide 2 secret characters. Average ratio of Stego-Document's size increasing in percentage will be 0.766%. But secret message containing numbers and special symbols are cannot be hidden through this method.

Based on the Goodness of the Algorithm
Text rotation in Ms excel document, mixed case font in Ms Word Document, Font Type in MS-Word Document are the techniques differs in the criteria is shown in Table 5.
In Font Type method, to hide two characters of secret message 6 capital letters are needed in cover document. As secret message increases, cover document also increases since capital letters alone are needed to hide. This method is restricted to hide alphabets and spaces only and not for numbers and other symbols. But if the secret message contains only alphabets and spaces this method is the right choice. In the mixed case font method the letter can be hidden in only 8 letters no 8 words as by using spaces between words. Its payload capacity is very high when compared to other method. So, it's a large amount of data compared to other methods keeping the exact meaning of the text and make it looks like the fonts used in chat rooms. Secret message containing Alpha numeric can also be hidden through this method. In text rotation in an excel document benefits the feature of hard to detect. Since many cells in excel document are very short in length, the embedding rate would be very high. Many text-based steganography methods can be used in excel document, text rotation method will be the best method for excel document.

Based on Capacity and Similarity Measure
Capacity: Capacity is defined as the ability of a cover text to hide secret message. The capacity ratio is computed by dividing the amount of hidden bytes by the size of the cover text in bytes. Capacity ratio = (amount of hidden bytes) / (size of the cover text in bytes).
Assuming one character occupies one byte in memory, we have calculated the percentage capacity which is capacity ratio multiplied by 100 11  Jaro-Winkler Distance for Similarity Measure: The Jaro-Winkler score (or distance) takes into account the number of matching characters and the transposition of characters in two strings. If the Jaro score is 0 then the two strings are dissimilar and 1 means both are exactly same. Jaro score nearest to 1 indicates cover text and Stego text is closely similar. The number of matching (but different sequence order) characters divided by 2 defines the number of transpositions 12 .
To get the Jaro score,

1/3 * (m / length (s1) + m / length (s2) + (m -t) / m),
Where, m is the number of matching characters, s1 is the first string, s2 is the second string, t is the number of transpositions.
To calculate Jaro -Winkler Distance, Jaro_score + (L * p * (1 -Jaro_score)), Where, L is the length of the common prefix at the start of the string up to a maximum of 4, P is the constant scaling factor (usually 0.1 and not more than 0.25) 13 .
In the text rotation technique secret message is hidden without altering the cover text, cover text and stego text are same. So, the Jaro score is 1. In the Mixed case font technique the Jaro score is 0.72 and the Jaro -Winkler distance is 0.77. In the font type technique secret message is hidden without altering the cover text, Cover text and Stego text are same. So, the jaro score is 1. The following Table 6 shows the percentage capacity and Jaro-Winkler distance.

Conclusion
In Table 6 the first technique's embedding capacity is more than third technique and less than second technique. If all the cells in excel document is less than or equal to limit (four) then capacity ratio will be very high compared to the other two techniques. Its Jaro score is 1 so there is no dissimilarity between cover and stego text. In the second technique the embedding capacity is very high when compared to other technique but its jaro score is less than the other two. It shows there is slightly dissimilar with the cover and stego text. In the third technique the embedding capacity is lower when compared to other technique because only capital letters are used to hide. Its Jaro score is 1 which shows the exact similarity between Cover and Stego text.