Study on mult-lingual LZ77 and LZ78 text compression


     Related Videos
Technical Construction of Lingual Wire
Modular 3-D Lingual Arch
Porter and Lingual Wire Soldering Techniques
Area Study Centre QAU- Dr Cabierie Robinson at Area Study Centre
OUHK - Organizing Yourself for study with the OU Part 2

     Related Hubpages

    •  Doc. Url:    Embed Code: 

    • IEEE  status
      (0) (0 Votes)
      Views: (2002)   Date: (Publication Date: 30 Mar-1 Apr...)   Pages: ()
    • Author:  Chi-Hung Chi Dept. of Inf. Syst. & Comput. Sci.  Nat. Univ. of Singapore;  

    • Abstract:  Abstract Summary form only given. We studied the effectiveness of this multi-lingual character sampling on Lempel-Ziv (LZ) compression algorithms. LZSS and LZW algorithms were chosen to represent LZ77 and LZ78 compression respectively in the study. They were modified to adapt the characteristics of non-English information such as Chinese. It is interesting to see that the Chinese LZW compression outperforms the original one by a larger percentage than the Chinese LZSS compression does (14.5% vs. 3.7% on average). CLZW also performs better than CLZSS. This can be explained by two factors: the overall dictionary size and the constraints in each of the algorithms. The dictionary size in the LZ78 algorithms or the sliding window size in the LZ77 algorithms determines how much previous content that the compressor can make use of in order to find repeated phrases. Our result shows that the Chinese LZ78 compressor can make use of a larger dictionary much more effectively than the sliding window in LZ77 family does without introducing any bad side-effects. This also illustrates that previous content is particularly helpful in compressing Chinese text. In terms of the linguistic structure of the Chinese language, the occurrence of repeated phrases in Chinese text does not occur as often as that in English. In other words, within a small, fixed amount of text, it is easier to find repeated phrases in English text than that in Chinese text. Since the LZW preserves large volume of previous content, the Chinese implementation can make good use of it. The difference constraints in the two algorithms also contribute to their performance difference. From the analysis, we can conclude that the LZ88 algorithm (and thus the LZW) is a more suitable Lempel-Ziv family to extend for multi-lingual text compression than the LZ77 does

         Related Documents

           Related Groups

             Related Science News

               More on Sciencestage

               Answers

               News

               Related on Wikipedia




























           

          Powered free by PHPmotion