Adding examples

2021-02-28 22:25:37 -06:00 · 2021-02-28 22:25:37 -06:00 · e0df6bcf00
parent a233bc5e88
commit e0df6bcf00
7 changed files with 22 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -1,8 +1,11 @@
-# compress
+# compress - WIP TEST SOFTWARE
+
 Maximum Entropy Compressor - Using Probability Hashes w/Hash-Chaining to compress "impossible" data sets.

 Copyright © 2021 by Brett Kuntz. All rights reserved.

+# Instructions
+
 See the /full/ directory for the latest test iteration that compresses & decompresses. Both files are stand-alone and you do not need to compile both for either to work.

 Compressing takes about 35 minutes on an EC2 c5.24xlarge machine, decompression takes place in real-time.
@ -22,4 +25,21 @@ Compression should output 3 files (for now):
 2. tweaks.bin - The 600kb worth of tweaks used. I'm about to work on compressing these next.
 3. inverts.bin - A 2kb file that was needed since I made a mistake designing the decompressor and it turns out it is not able to decompress without this information.

-For now please use FREE files from [Random.org/binary](https://archive.random.org/binary) so we can download and use the same files as you for testing purposes!
+For now please use FREE files from [Random.org/binary](https://archive.random.org/binary) so we can download and use the same files as you for testing purposes!
+
+# Examples
+
+Included in the /examples/ directory are compressed files and their respective outputs.
+
+# To Do
+
+1. Tweaks must be compressed using a similar technique as the 1MB main blocks.
+2. A secondary compressor must be found (or created) to compress the 1MB output.bin files. The compressor needs to be tailored for files where there are many bits at the front, and very few in the middle and end.
+3. A second method for scoring shuffling needs to be tested. It may be better than the current method.
+4. More cuts are needed to further reduce the entropy in output.bin
+
+If you would like to help out with testing, or to just provide free computations, it would be greatly appreciated.
+
+# History
+
+I came up with the idea for this compressor (symbol substitution via brute-force) in 2007, and begun preliminary work on it in Nov 2018. I put that work on hold once again until Dec 2020, which I can now focus on it full-time. The general idea of symbol-substitution is that a large hash is really just a giant 2^1159 byte database of random noise that is compressed into an extremely tiny decompressor. Brute-force is used to figure out which keys into the database best match the white noise of the input file, and then those symbols are subtracted (XOR'd) out from the input, leaving behind very few bits.
--- a/examples/2020-04-20/2020-04-20.bin
+++ b/examples/2020-04-20/2020-04-20.bin
--- a/examples/2020-04-20/2020-04-20.iv
+++ b/examples/2020-04-20/2020-04-20.iv
--- a/examples/2020-04-20/final.bin
+++ b/examples/2020-04-20/final.bin
--- a/examples/2020-04-20/inverts.bin
+++ b/examples/2020-04-20/inverts.bin
--- a/examples/2020-04-20/output.bin
+++ b/examples/2020-04-20/output.bin
--- a/examples/2020-04-20/tweaks.bin
+++ b/examples/2020-04-20/tweaks.bin