LZW (Lempel–Ziv–Welch) Compression Technique

Overview

LZW (Lempel–Ziv–Welch) is a universal lossless data compression technique. This compression algorithm was developed by Abraham Lempel, Jakob Ziv, and Terry Welch. In hardware implementations, the algorithm is simple and has the potential for very high throughput. It is the algorithm used in the GIF image format and is part of the widely used Unix file compression utility compress.

Why Do We Need a Compression Algorithm?

Although modern computers can store an increasing number of files, file size is still an issue. We can store more files if our files are smaller. We use various compression algorithms to reduce the amount of space required to represent a file. Lossless and lossy compression algorithms are the two types of compression algorithms. Lossless compression algorithms reduce file size without hampering any information in the file. In contrast, lossy compression algorithms reduce the file size by removing unnecessary or less important information in the file. There are various reasons why we need data compression algorithms, some of which are mentioned below.

Data compression reduces the amount of space that files take up on a hard drive and the amount of time taken to transfer or download them.
To reduce storage costs, we need compression algorithms that allow us to store large amounts of data at a lower cost.
For example, in a 2:1 compression ratio, a 20 MB file takes up 10 MB of space, allowing us to store large files in less storage space.

Some lossy compression algorithms are:

Vector Quantisation
DCT (Discrete Cosine Transform)
Transform Coding

Some lossless compression algorithms are:

LZW (Lempel Ziff Welch)
RLE (Run Length Encoding)
String-table compression

What is Lempel–Ziv–Welch (LZW) Algorithm?

Lempel–Ziv–Welch (LZW) Algorithm is a common lossless data compression algorithm. As it is a lossless compression algorithm, there is no data loss during compression. LZW (Lempel–Ziv–Welch) is named after the scientists who developed it, Abraham Lempel, Jakob Ziv, and Terry Welch. LZW is a 'dictionary-based' lossless compression algorithm that scans a file for data patterns that appear more than once. These patterns are then saved in a dictionary, and references are placed within the compressed file wherever repetitive data occurs. In hardware implementations, the algorithm is simple and has the potential for very high throughput. It is the algorithm used in the GIF image format and is part of the widely used Unix file compression utility compress.

How Does It Work?

The LZW Algorithm is divided into two parts: the encoding algorithm, which converts strings to integer codes, and the decoding algorithm, which does the opposite. Both encoder and decoder algorithms have a default table or dataset that serves as the initial model for both encoder and decoder. As the algorithm runs, new integer codes for various string patterns are added to this table. The number of table entries in the code table is usually set to 4096. When encoding begins, the code table only contains the first 256 entries, with the remaining entries being blanks. LZW detects repeated sequences in the data and adds them to the code table as the encoding progresses. Each code from the compressed file is decoded through the code table to determine which character or characters it represents.

Implementation

LZW Compression When the input data is processed, the compression algorithm keeps a dictionary corresponding to the longest words encountered with a list of code values. Because the words are swapped out for their matching codes, the input file is compressed. As a result, the algorithm becomes more effective as the volume of long, repeated words in the input stream increases.

Pseudo-code for LZW encoding

Example 1: Compress the string ABABBABBB using the LZW technique. The steps are depicted in the diagram below in a proper sequence.

compress-string-using-lzw-technique-steps

compress-string-using-lzw-technique-steps2

compress-string-using-lzw-technique-steps3

LZW Decompression During decompression, the LZW (Lempel–Ziv–Welch) decompressor generates the same string table. Single characters are initialized for the first 256 string table entries. Except for the first character in the input string, the string table is updated. Decoding is accomplished by reading the codes and converting them using the code table that is being created. The following C++ code is provided for LZW compression encoding and decoding:

Pseudo-code for LZW decoding

Example 2: Decompress the output sequence of: 65>66>256>257>66>260> using LZW. The steps involved are systematically shown in the diagram below.

steps-to-decompress-the-output-sequence

The following is a C++ code for LZW compression, both encoding, and decoding.

Output

Advantages of LZW over Huffman

The LZW algorithm has the following advantages:

The algorithm is straightforward, simple, and effective.
LZW does not require any prior knowledge of the input data stream.
The LZW algorithm has a 60 - 70% compression ratio for some text files.
The LZW algorithm performs better for files with a lot of repetitive data.
The LZW algorithm compresses the data in a single pass.

Conclusion

Lempel–Ziv–Welch (LZW) Algorithm is a common lossless data compression algorithm.
Abraham Lempel, Jakob Ziv, and Terry Welch developed the LZW compression algorithm.
Lossless compression algorithms reduce file size without hampering any information in the file.
Lempel–Ziv–Welch (LZW) algorithm is used in the GIF image format and is part of the widely used Unix file compression utility compress.
LZW is a 'dictionary-based' lossless compression algorithm that scans a file for data patterns that appear more than once.
The LZW algorithm performs better for files with a lot of repetitive data.
The LZW Algorithm is divided into two parts: the encoding algorithm, which converts strings to integer codes, and the decoding algorithm, which does the opposite.
We should not prefer the LZW algorithm for compression when there is no repetitive data.