Skip to content
This repository was archived by the owner on Sep 9, 2024. It is now read-only.

Data corruption with highCompression encoding (testcase attached) #69

Closed
ChALkeR opened this issue Nov 12, 2018 · 8 comments
Closed

Data corruption with highCompression encoding (testcase attached) #69

ChALkeR opened this issue Nov 12, 2018 · 8 comments

Comments

@ChALkeR
Copy link
Contributor

ChALkeR commented Nov 12, 2018

This is quite inconvenient, as encoding silently produces corrupted lz4 data, which later can't be read correctly by either node-lz4 or command-line lz4.

Code:

var fs = require('fs')
var lz4 = require('lz4')
var encoder = lz4.createEncoderStream({
  highCompression: true
})
var input = fs.createReadStream('test.txt')
var output = fs.createWriteStream('test.txt.lz4')
input.pipe(encoder).pipe(output)

// this is not required, just `lz4cat test.txt.lz4` also fails
output.on('close', function () {
  var decoder = lz4.createDecoderStream()
  var inputd = fs.createReadStream('test.txt.lz4')
  var outputd = fs.createWriteStream('test.dec.txt')
  inputd.pipe(decoder).pipe(outputd)
});

test.txt is attached: test.txt

Shorter version:

var fs = require('fs')
var lz4 = require('lz4')
var encoder = lz4.createEncoderStream({
  highCompression: true
})
var decoder = lz4.createDecoderStream()
var input = fs.createReadStream('test.txt')
var output = fs.createWriteStream('/dev/null')
input.pipe(encoder).pipe(decoder).pipe(output);
@ChALkeR ChALkeR changed the title Invalid stream checksum with { highCompression: true } encoding Invalid stream checksum with { highCompression: true } encoding (testcase attached) Nov 12, 2018
@ChALkeR
Copy link
Contributor Author

ChALkeR commented Nov 13, 2018

Disabling stream checksum with streamChecksum: false shows that data is corrupted:

$ diff test.txt test.dec.txt
811c811
< aaaaaaaa000000aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
---
> aaaaaaaa0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

@ChALkeR ChALkeR changed the title Invalid stream checksum with { highCompression: true } encoding (testcase attached) Invalid stream checksum and data corruption with { highCompression: true } encoding (testcase attached) Nov 13, 2018
@ChALkeR
Copy link
Contributor Author

ChALkeR commented Nov 13, 2018

Testcase without streams:

var LZ4 = require('lz4')
var assert = require('assert')

var lines = [
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n',
'000000aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n',
'aaaaaaaa000000aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n'
];

var data = lines[0] + lines[1] + lines[0].repeat(808) + lines[2];

var input = new Buffer(data)
var output = new Buffer(LZ4.encodeBound(input.length) )
var compressedSize = LZ4.encodeBlockHC(input, output) // encodeBlock works!
output = output.slice(0, compressedSize)

var uncompressed = new Buffer(input.length)
var uncompressedSize = LZ4.decodeBlock(output, uncompressed)
uncompressed = uncompressed.slice(0, uncompressedSize)

console.log('Before: ', data.slice(-81).toString().trim());
console.log('After:  ', uncompressed.slice(-81).toString().trim());
assert.deepStrictEqual(input, uncompressed, 'data is equal');

@ChALkeR ChALkeR changed the title Invalid stream checksum and data corruption with { highCompression: true } encoding (testcase attached) Data corruption with { highCompression: true } encoding (testcase attached) Nov 13, 2018
@ChALkeR ChALkeR changed the title Data corruption with { highCompression: true } encoding (testcase attached) Data corruption with HC encoding (testcase attached) Nov 13, 2018
@ChALkeR ChALkeR changed the title Data corruption with HC encoding (testcase attached) Data corruption with highCompression encoding (testcase attached) Nov 13, 2018
@ChALkeR
Copy link
Contributor Author

ChALkeR commented Nov 13, 2018

This is reproducable using C/C++ and bundled lz4 directly.
The exact same code seems to work with a current version of lz4.

Testcase:

#include <cstring>
#include <cstdio>
#include <cstdlib>
#include "lz4.h"
#include "lz4hc.h"

int main() {
  const int len = 65691;
  const char* src = (char *) malloc(len + 1);
  src[len] = 0;
  memset(src, 'a', len);
  memset(src + 81, 'X', 6);
  memset(src + 810 * 81 + 8, 'X', 6);

  // Compress
  const int src_size = (int)(strlen(src) + 1);
  const int max_dst_size = LZ4_compressBound(src_size);
  char* compressed_data = (char *) malloc(max_dst_size);
  const int compressed_data_size = LZ4_compressHC_limitedOutput(src, compressed_data, src_size, max_dst_size);
  compressed_data = (char *)realloc(compressed_data, compressed_data_size);

  // Decompress
  char* const regen_buffer = (char *) malloc(src_size);
  const int decompressed_size = LZ4_decompress_safe(compressed_data, regen_buffer, compressed_data_size, src_size);
  free(compressed_data);

  // Validate
  printf("Sizes: %d %d %d\n", src_size, compressed_data_size, decompressed_size);
  if (memcmp(src, regen_buffer, src_size) != 0) {
    printf("Well, ow. We failed. :-(\n");
  } else {
    printf("We succeeded!\n");
  }
  return 0;
}

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Nov 13, 2018

lz4 v1.8.3 seems to work.
Git diff: lz4/lz4@c4dbc37...v1.8.3

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Nov 13, 2018

Bisect shows that this was fixed in lz4/lz4@2e4847c, lz4/lz4#562.
Upstream issue: lz4/lz4#560.

ChALkeR added a commit to ChALkeR/node-lz4 that referenced this issue Nov 13, 2018
ChALkeR added a commit to ChALkeR/node-lz4 that referenced this issue Nov 13, 2018
ChALkeR added a commit to ChALkeR/node-lz4 that referenced this issue Nov 13, 2018
@vivostar
Copy link

vivostar commented Jan 8, 2024

hi, @ChALkeR , I cannot reproduce the problem in my local env(nodejs 7.7.1, node-lz4 0.5.2), and I run the test demo for 10000 times.

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Jan 23, 2024

@vivostar The issue was introduced in 0.6.0, I think? (Or needed another testcase? not sure)
In 42fc1c5, with the previous lz4 subdep update.

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Jan 23, 2024

Smth is strange with versioning here.
On npm: '0.6.0': '2018-08-17T19:13:08.884Z', (and doesn't include this fix).
On GH, 0.6.0 is tagged from d200620 on May 5, 2019, but that PR was even filed after 0.6.0 has been released on npm.

Tags on GH don't seem to correspond to actually released versions on npm.
Also, I'm not sure if submodules were correctly updated before releases.

Try to build from git before and after this commit to verify the fix.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants