Data Compression

Data compression is an operation that reduces the size of a file by minimizing redundant data. In a file that contains text, redundant data can be frequently occurring characters, such as the space character, or common vowels, such as the letters e and a; it can also be frequently occurring character strings. Data compression creates a compressed version of a file by minimizing this redundant data.

Each type of data-compression algorithm minimizes redundant data in a unique manner. For example, the Huffman encoding algorithm assigns a code to characters in a file based on how frequently those characters occur. Another compression algorithm, called run-length encoding, generates a two-part value for repeated characters: the first part specifies the number of times the character is repeated, and the second part identifies the character. Another compression algorithm, known as the Lempel-Ziv algorithm, converts variable-length strings into fixed-length codes that consume less space than the original strings.

To compress large applications or data files, you can run COMPRESS.EXE from the command line. COMPRESS.EXE uses the Lempel-Ziv compression algorithm.