-
Notifications
You must be signed in to change notification settings - Fork 99
Data corruption with highCompression encoding (testcase attached) #69
Comments
Disabling stream checksum with $ diff test.txt test.dec.txt
811c811
< aaaaaaaa000000aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
---
> aaaaaaaa0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |
Testcase without streams: var LZ4 = require('lz4')
var assert = require('assert')
var lines = [
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n',
'000000aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n',
'aaaaaaaa000000aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n'
];
var data = lines[0] + lines[1] + lines[0].repeat(808) + lines[2];
var input = new Buffer(data)
var output = new Buffer(LZ4.encodeBound(input.length) )
var compressedSize = LZ4.encodeBlockHC(input, output) // encodeBlock works!
output = output.slice(0, compressedSize)
var uncompressed = new Buffer(input.length)
var uncompressedSize = LZ4.decodeBlock(output, uncompressed)
uncompressed = uncompressed.slice(0, uncompressedSize)
console.log('Before: ', data.slice(-81).toString().trim());
console.log('After: ', uncompressed.slice(-81).toString().trim());
assert.deepStrictEqual(input, uncompressed, 'data is equal'); |
This is reproducable using C/C++ and bundled lz4 directly. Testcase: #include <cstring>
#include <cstdio>
#include <cstdlib>
#include "lz4.h"
#include "lz4hc.h"
int main() {
const int len = 65691;
const char* src = (char *) malloc(len + 1);
src[len] = 0;
memset(src, 'a', len);
memset(src + 81, 'X', 6);
memset(src + 810 * 81 + 8, 'X', 6);
// Compress
const int src_size = (int)(strlen(src) + 1);
const int max_dst_size = LZ4_compressBound(src_size);
char* compressed_data = (char *) malloc(max_dst_size);
const int compressed_data_size = LZ4_compressHC_limitedOutput(src, compressed_data, src_size, max_dst_size);
compressed_data = (char *)realloc(compressed_data, compressed_data_size);
// Decompress
char* const regen_buffer = (char *) malloc(src_size);
const int decompressed_size = LZ4_decompress_safe(compressed_data, regen_buffer, compressed_data_size, src_size);
free(compressed_data);
// Validate
printf("Sizes: %d %d %d\n", src_size, compressed_data_size, decompressed_size);
if (memcmp(src, regen_buffer, src_size) != 0) {
printf("Well, ow. We failed. :-(\n");
} else {
printf("We succeeded!\n");
}
return 0;
} |
lz4 v1.8.3 seems to work. |
Bisect shows that this was fixed in lz4/lz4@2e4847c, lz4/lz4#562. |
hi, @ChALkeR , I cannot reproduce the problem in my local env(nodejs 7.7.1, node-lz4 0.5.2), and I run the test demo for 10000 times. |
Smth is strange with versioning here. Tags on GH don't seem to correspond to actually released versions on npm. Try to build from git before and after this commit to verify the fix. |
This is quite inconvenient, as encoding silently produces corrupted lz4 data, which later can't be read correctly by either node-lz4 or command-line lz4.
Code:
test.txt
is attached: test.txtShorter version:
The text was updated successfully, but these errors were encountered: