lib/uzlib: Add memory-efficient, streaming LZ77 compression support.

The compression algorithm implemented in this commit uses much less memory compared to the standard way of implementing it using a hash table and large look-back window. In particular the algorithm here doesn't allocate hash table to store indices into the history of the previously seen text. Instead it simply does a brute-force-search of the history text to find a match for the compressor. This is slower (linear search vs hash table lookup) but with a small enough history (eg 512 bytes) it's not that slow. And a small history does not impact the compression too much. To give some more concrete numbers comparing memory use between the approaches: - Standard approach: inplace compression, all text to compress must be in RAM (or at least memory addressable), and then an additional 16k bytes RAM of hash table pointers, pointing into the text - The approach in this commit: streaming compression, only a limited amount of previous text must be in RAM (user selectable, defaults to 512 bytes). To compress, say, 1k of data, the standard approach requires all that data to be in RAM, plus an additional 16k of RAM for the hash table pointers. With this commit, you only need the 1k of data in RAM. Or if it's streaming from a file (or elsewhere), you could get away with only 256 bytes of RAM for the sliding history and still get very decent compression. In summary: because compression takes such a large amount of RAM (in the standard algorithm) and it's not really suitable for microcontrollers, the approach taken in this commit is to minimise RAM usage as much as possible, and still have acceptable performance (speed and compression ratio). Signed-off-by: Damien George <damien@micropython.org>
author: Damien George <damien@micropython.org> 2023-01-18 15:46:23 +1100
committer: Damien George <damien@micropython.org> 2023-07-21 18:54:22 +1000
commit: c4feb806e0df452273b3c19751d8dad39ef8295b (patch)
tree: d6b2fe3c27b9448d17fc6366ecd3b32797263ac8 /lib/uzlib/uzlib.h
parent: 198311c780e9e05a58c710d73a22abaf8347d4ee (diff)
1 files changed, 8 insertions, 9 deletions
diff --git a/lib/uzlib/uzlib.h b/lib/uzlib/uzlib.h
index 3a4a1ad16..83dddcd47 100644
--- a/lib/uzlib/uzlib.h
+++ b/lib/uzlib/uzlib.h
@@ -143,17 +143,16 @@ int TINFCC uzlib_gzip_parse_header(TINF_DATA *d);
 
 /* Compression API */
 
-typedef const uint8_t *uzlib_hash_entry_t;
-
-struct uzlib_comp {
-    struct Outbuf out;
-
-    uzlib_hash_entry_t *hash_table;
-    unsigned int hash_bits;
-    unsigned int dict_size;
+struct uzlib_lz77_state {
+    struct Outbuf outbuf;
+    uint8_t *hist_buf;
+    size_t hist_max;
+    size_t hist_start;
+    size_t hist_len;
 };
 
-void TINFCC uzlib_compress(struct uzlib_comp *c, const uint8_t *src, unsigned slen);
+void TINFCC uzlib_lz77_init(struct uzlib_lz77_state *state, uint8_t *hist, size_t hist_max);
+void TINFCC uzlib_lz77_compress(struct uzlib_lz77_state *state, const uint8_t *src, unsigned len);
 
 /* Checksum API */
author	Damien George <damien@micropython.org>	2023-01-18 15:46:23 +1100
committer	Damien George <damien@micropython.org>	2023-07-21 18:54:22 +1000
commit	c4feb806e0df452273b3c19751d8dad39ef8295b (patch)
tree	d6b2fe3c27b9448d17fc6366ecd3b32797263ac8 /lib/uzlib/uzlib.h
parent	198311c780e9e05a58c710d73a22abaf8347d4ee (diff)