Here’s a neat explanation of why there can never be a single algorithm that can compress all files, even if only by one bit each.

  1. Take the set of all files of 10 bits length. There are 1024 files. Call this Set 1024.
  2. Now take the set of all files of 9 bits length. It is one half the size of the previous set, with 512 files. Call this Set 512.
  3. Let’s take a compression algorithm and compress all the files in Set 1024. Each output file must be found in Set 512, as it is by definition compressed and of 9 bits or less.
  4. But note, that’s a mapping of a set of 1024 files to a set of 512 files. There must be at least one output file in Set 512 which is mapped from two or more input files in Set 1024. Call this File A. So, when we decompress File A from Set 512, which of the files in Set 1024 that it is mapped from should it return?


There’s a wrinkle here that I didn’t appreciate when I first wrote this. I should have used the word string rather than file. The problem with using files is that in many cases the same compressed output can be stored differently, by using different length files. E.g. the output 000100101 could be stored as a 9 bit file, or 8 bit file, or 7 or 6 bit file! So the output set, when using files, has a total space of 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 + 512 = 1022, rather than 512. However, the basic explanation still holds, because 1022 < 1024, so some output files must map back to more than one input file.