Data compression
Data compression is the modification of (digital) data such that it can be represented by fewer characters than in its original form. In computer science data compression is mainly used to reduce the memory needed to store certain information or to shorten the time needed to send it between two machines. There are two types of compression, lossless and lossy compression. While the complete information can be retrieved from data compressed using a lossless technique, some information is lost if data was compressed using a lossy technique.
Example
Suppose the following simple coding scheme for images:
- y = yellow, b = black, r = red
With this coding scheme the above image can be encoded by the following string:
- yyyybyyyyyyyybyyyybbbbbbbbbrrrrbrrrrrrrrbrrrr
Each pixel in the image is represented by the character corresponding to the color of the pixel. The order of the pixels is assumed to be from upper left to lower right.
This coding scheme can be modified as follows to achieve a compression of the data: Each character representing a color is mention only once and is followed by a digit. The digit represents the number of consecutive appearances of a single color. With this new coding scheme the image can be represented by the following string:
- y4b1y8b1y4b9r4b1r8b1r4
Using the first scheme the image is represented by 45 characters. The second scheme uses only 22 characters to encode the image. Thus the compression scheme achieves a rate of 50%.
Obviously this simple compression scheme is only effective for images that have large connected areas with the same color. If the color changes from pixel to pixel this scheme does not compress the data, but indeed increases the needed characters by a factor 2.