Handbook of Excellent Compression
Looking back upon my career, I have been honed all my life towards becoming a lean, mean, compressing machine. Every post, every task, every area of interest has contributed to my skills at finding and exploiting compression.
For the current run at the Hutter Prize, I started from the codebase of paq8hp12any written by Alexander Ratushnyak, with a substantial contribution in the form of textfilter.hpp 3.0 for PAQ (based on WRT 4.6) 02.03.2006 by P.Skibinski.
Pigmaei gigantum humeris impositi plusquam ipsi gigantes vident
- Isaac Newton
The best I've achieved so-far in pursuit of the Hutter Prize, is enwik8 compressed to 16,361,203 bytes plus a 91,244 byte daqoder2.exe executable decompressor, which improves upon the reigning record by 0.16% Even that small gain takes substantial wit. I'd like to share an explanation of the techniques employed to move beyond good compression, i.e. GZIP at around 36%, past great compression such as WinRAR around 22%, to Excellent Compression, i.e. sub 18%. I have posted the 16,361,203 byte file for you to download enwik8.daq. I also post here the 262,211 byte enqoder2.exe that produces it when given command switch -7 Full command line for invocation is:
enqoder2 -7 enwik8.daq enwik8
Before running enqoder2, you will need to obtain enwik8 itself, and a decompressor for the format you select, for instance from here 35,012,219 bytes or here 21,388,296 bytes or here 16,361,203 bytes. Do you start to see already, the advantages of Excellent Compression?
Last modified: 30-Nov-2009