Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Had trouble in using isal_deflate_set_dict #206

Open
ZhaiMo15 opened this issue Mar 16, 2022 · 9 comments
Open

Had trouble in using isal_deflate_set_dict #206

ZhaiMo15 opened this issue Mar 16, 2022 · 9 comments

Comments

@ZhaiMo15
Copy link

ZhaiMo15 commented Mar 16, 2022

I did a little test about isal_deflate_set_dict, the test looks below: (small change of igzip/igzip_example.c)

/**********************************************************************
  Copyright(c) 2011-2016 Intel Corporation All rights reserved.

  Redistribution and use in source and binary forms, with or without
  modification, are permitted provided that the following conditions
  are met:
    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in
      the documentation and/or other materials provided with the
      distribution.
    * Neither the name of Intel Corporation nor the names of its
      contributors may be used to endorse or promote products derived
      from this software without specific prior written permission.

  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
  OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
**********************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include "igzip_lib.h"

#define BUF_SIZE 512

struct isal_zstream stream;

int main(int argc, char *argv[])
{
	uint8_t inbuf[BUF_SIZE], outbuf[BUF_SIZE];
	FILE *in, *out;

	if (argc != 3) {
		fprintf(stderr, "Usage: igzip_example infile outfile\n");
		exit(0);
	}
	in = fopen(argv[1], "rb");
	if (!in) {
		fprintf(stderr, "Can't open %s for reading\n", argv[1]);
		exit(0);
	}
	out = fopen(argv[2], "wb");
	if (!out) {
		fprintf(stderr, "Can't open %s for writing\n", argv[2]);
		exit(0);
	}

	printf("igzip_example\nWindow Size: %d K\n", IGZIP_HIST_SIZE / 1024);
	fflush(0);

	isal_deflate_init(&stream);
	stream.end_of_stream = 0;
	stream.flush = NO_FLUSH;
        stream.gzip_flag = ISAL_ZLIB;
    
        stream.level = 3;
        stream.level_buf = malloc(ISAL_DEF_LVL3_DEFAULT);
        stream.level_buf_size = ISAL_DEF_LVL3_DEFAULT;
        if (stream.level_buf == 0) {
            printf("Failed to allocate level compression buffer\n");
            exit(0);
        }

        // delete below 3 lines when not using dictionary
        char dict[5] = "hello";
        int len = 5;
        isal_deflate_set_dict(&stream, dict, len);

	do {
		stream.avail_in = (uint32_t) fread(inbuf, 1, BUF_SIZE, in);
		stream.end_of_stream = feof(in) ? 1 : 0;
		stream.next_in = inbuf;
		do {
			stream.avail_out = BUF_SIZE;
			stream.next_out = outbuf;

			isal_deflate(&stream);

			fwrite(outbuf, 1, BUF_SIZE - stream.avail_out, out);
		} while (stream.avail_out == 0);

		assert(stream.avail_in == 0);
	} while (stream.internal_state.state != ZSTATE_END);

	fclose(out);
	fclose(in);

	printf("End of igzip_example\n\n");
	return 0;
}

I used this test to compress the same data, the only difference is isal_deflate_set_dict.
When using dictionary, the result looks below:
image

On the other hand, the result looks below:
image

Therefore, the zlib header is the same in both cases. I don't think that's correct, since the zlib header should contains the dictionary message.
image

Does isa-l deflater don't write the zlib header correctly when using dictionary?

@gbtucker
Copy link
Contributor

To use a preset dictionary and zlib header you need to use isal_write_zlib_header() explicitly. Add the following after isal_deflate_set_dict() and change gzip_flag from ISAL_ZLIB to ISAL_ZLIB_NO_HDR (since we are writing explicitly).

        isal_deflate_set_dict(&stream, dict, len);

        struct isal_zlib_header zlib_hdr;
        zlib_hdr.info = ISAL_DEF_MAX_HIST_BITS - 8;
        zlib_hdr.level = stream.level ? 1 : 0;
        zlib_hdr.dict_flag = 1;
        zlib_hdr.dict_id = isal_adler32(1, dict, len);

        stream.avail_out = BUF_SIZE;
        stream.next_out = outbuf;
        isal_write_zlib_header(&stream, &zlib_hdr);
        fwrite(outbuf, 1, BUF_SIZE - stream.avail_out, out);
        stream.gzip_flag = ISAL_ZLIB_NO_HDR;

        do {
        ...

This will set the dictionary flag FLG.FDICT (5) and write the dictionary id.

00000000  78 7d 15 02 2c 06 f3 54
             5 |<- d id  ->|<- deflate ...

When using stream.gzip_flag = ISAL_ZLIB; instead, the fdict flag is never set and a dictid doesn't follow.

00000000  78 01 f3 54
            !5 |<- deflate ...

@ZhaiMo15
Copy link
Author

Thanks Greg, that works.

@ZhaiMo15
Copy link
Author

ZhaiMo15 commented Mar 18, 2022

zlib_hdr.info = ISAL_DEF_MAX_HIST_BITS - 8;
zlib_hdr.level = stream.level ? 1 : 0;

Is zlib_hdr.level always set to 1 no matter what compress level isa-l is or just for level 3? I'd like to know the mapping between isa-l compress level and the zlib_hdr.level. And the same question to zlib_hdr.info.

@ZhaiMo15
Copy link
Author

ZhaiMo15 commented Mar 18, 2022

Moreover, I'd like to ask a question about isal_write_zlib_header().
Suppose dict_id is 0x62c0215, isal zlib header looks like below:

00000000  78 7d 15 02 2c 06 f3 54  
             5 |<- d id  ->|<- deflate ...

However default zlib header looks below:

00000000  78 7d 06 2c 02 15 f3 54  
             5 |<- d id  ->|<- deflate ...

Should dict_id be written in zlib header in big endian?
Namely, replace

if (dict_flag)
	store_le_u32(out_buf + 2, z_hdr->dict_id);

by

if (dict_flag)
	store_be_u32(out_buf + 2, z_hdr->dict_id);

@rhpvorderman
Copy link
Contributor

For answers to questions like these

Should dict_id be written in zlib header in big endian?

The specification of the zlib format provides clarity: https://datatracker.ietf.org/doc/rfc1950/

All numbers in zlib format are stored in a big-endian fashion, also known as network order. Probably because the zlib format was designed with a network use case in mind.

@ZhaiMo15
Copy link
Author

ZhaiMo15 commented Mar 18, 2022

Thanks for reply. ISA-L writes zlib trailer(adler32) in big-endian, see igzip/igzip.c

case IGZIP_ZLIB_NO_HDR:
		if (stream->avail_out - bytes >= zlib_trl_bytes) {
			store_be_u32(stream->next_out,
				     (crc & 0xFFFF0000) | ((crc & 0xFFFF) + 1) % ADLER_MOD);
			stream->next_out += zlib_trl_bytes;
			bytes += zlib_trl_bytes;
			state->state = ZSTATE_END;
		}
		break;

I think ISA-L is compatible with default zlib in this, so there's no reason for ISA-L to write dict_id(this is also a adler32) in little-endian?

@rhpvorderman
Copy link
Contributor

That looks like a bug. Probably because in gzip format everything is little-endian, which makes for easy mistakes if you support both.

@ZhaiMo15
Copy link
Author

I created a PR, #207, if it is a bug indeed.

@gbtucker
Copy link
Contributor

zlib_hdr.info = ISAL_DEF_MAX_HIST_BITS - 8;
zlib_hdr.level = stream.level ? 1 : 0;

Is zlib_hdr.level always set to 1 no matter what compress level isa-l is or just for level 3? I'd like to know the mapping between isa-l compress level and the zlib_hdr.level. And the same question to zlib_hdr.info.

The zlib_hdr.level is not the same as user compression level and we map to 0 or 1 based on the description in rfc1950. zlib_hdr.info will always work with maxbits - 8 but you could set this with the actual wbits set.

      FLEVEL (Compression level)
         These flags are available for use by specific compression
         methods.  The "deflate" method (CM = 8) sets these flags as
         follows:

            0 - compressor used fastest algorithm
            1 - compressor used fast algorithm
            2 - compressor used default algorithm
            3 - compressor used maximum compression, slowest algorithm

         The information in FLEVEL is not needed for decompression; it
         is there to indicate if recompression might be worthwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants