Data Compression

Submitted by: Sampali Chakraborty (Department of BCA (Session: 2017 – 2020))

What is Compression?

Data compression is the process of modifying, encoding or converting the bits structure of data in such a way that it consumes less space on disk. One type of compression is lossless compression, which means the compressed file will be restored exactly to its original state with no loss of data during the decompression process. Another compression category is “lossy” compression often used in multimedia files for music and images and where data is discarded.

Lossless compression algorithms use statistic modeling techniques to reduce repetitive information in a file. Some of the methods may include removal of spacing characters, representing a string of repeated characters with a single character or replacing recurring characters with smaller bit sequences.

Advantages/Disadvantages of Compression

Compression of files offers many advantages. When compressed, files are smaller. Files that are smaller in size will result in shorter transmission times when they are transferred on the Internet. Compressed files also take up less storage space. Popular streaming services like YouTube, Netflix and many other uses compression for faster service.

As compression is a mathematically intense process, it may be a time-consuming process, especially when there are a large number of files involved. Some compression algorithms also offer varying levels of compression, with the higher levels achieving a smaller file size but taking up an even longer amount of compression time. It is a system intensive process that takes up valuable resources that can sometimes result in “Out of Memory” errors. With so many compression algorithm variants, a user downloading a compressed file may not have the necessary program to un-compress it.

Lossless Compression:

After decompression gives an exact copy of the original data.

Example: Entropy encoding schemes (Shannon-Fano, Huffman coding), arithmetic coding, LZW algorithm (used in GIF image file format).

 

Lossy Compression: after decompression gives ideally a “close” approximation of the original data, ideally perceptually lossless.

Example: Transform coding — FFT/DCT based quantization used in JPEG/MPEG deferential encoding, vector quantization.

 

In conclusion, data compression is very important in the computing world and it is commonly used by many applications