
On the other hand, PPM and Neural technique achieved the best compression time compared to other techniques. Next come the PPM, Lempel–Ziv–Storer–Szymanski (LZSS), LZ77, Lempel–Ziv-Welch (LZW), Lempel–Ziv Ross Williams (LZRW), Huffman techniques.

The best compression ratio had been obtained in case of Neural Compression. This study was applied on English or Arabic texts for compression. The first approach considers the general purpose techniques and do not take into account the features of Arabic languages.Ī study for a variety of data compression techniques had presented by Khafagy. Two approaches of research on Arabic text compression can be found in the literature. Also to exploit one of morphological features of the Arabic language, Citing way (Omer and Khatatneh ,2010) to improve performance of Lempel Ziv Welch (LZW) techniques. This paper aims to study two different methods of data compression techniques, and to compare their performance on Arabic and English text files. Examples of the last category are Burrows Wheeler Transform (BWT) and Prediction by Partial Matching (PPM).ĭeveloping new compression techniques based on the morphological and grammatical features of Arabic and other Semitic language may present a new paradigm which will be able to improve the compression ratio and performance.
Text encoding for arabic how to#
There are also techniques which look at the frequency of the character and at the character that occurs at its nearby when they decide how to encode a character. Example of this category of compression techniques is Lempel-Ziv Codes (LZ). These techniques are called dictionary based techniques. Other techniques look at strings in the text and put pointers to strings or substrings that have been already appeared. Examples of this category of techniques are the Run Length Coding and Huffman Coding. Therefore, they are called statistical compression methods. They use the frequency of characters in order to replace the most frequent characters by short codes. Some of these techniques proceed at the level of characters.

There are many techniques used to compress data in general and the texts in particular. Text compression is a subfield of data compression it focuses on compressing natural language texts as they occur in real world. It aims at reducing the size of data in order to improve the speed of transmission and reduce the size that is needed for the storage. INTRODUCTIONĭata compression has important applications in the areas of data transmission and data storage. Keywords: Text compression, Arabic Texts, BWT, LZW. We found that the enhanced LZW was the best one for all categories of the Arabic texts, then the LZW standard and BWT respectively. Additional to exploiting morphological features of the Arabic language to improve performance of LZW techniques. Compare these techniques on Arabic and English text files is introduced.

Study and evaluate the efficiency of LZW and BWT techniques for different categories of Arabic text files of different sizes.
