Skip to content

Commit

Permalink
Fix incorrectly rejected blocks (#569)
Browse files Browse the repository at this point in the history
* Fix incorrectly rejected blocks

Assembly would reject blocks if an exact buffer was provided and no extra literals.

```
        --- FAIL: TestEncoderRegression/best-c2-w16384k-1seg/054917b2373a9ca83257108e02e9c356e49d126f (0.02s)
            encoder_test.go:257: error: output (40533) bigger than max block size (131072)
                want: 829232
                got:  786432
```

* Use safe more strictly.
* Generate directly to output
* Update benchmarks. Remove cgo.
  • Loading branch information
klauspost authored Apr 25, 2022
1 parent 92686be commit abf5559
Show file tree
Hide file tree
Showing 8 changed files with 50 additions and 75 deletions.
58 changes: 21 additions & 37 deletions zstd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -386,47 +386,31 @@ In practice this means that concurrency is often limited to utilizing about 3 co

### Benchmarks

These are some examples of performance compared to [datadog cgo library](https://github.com/DataDog/zstd).

The first two are streaming decodes and the last are smaller inputs.


Running on AMD Ryzen 9 3950X 16-Core Processor. AMD64 assembly used.

```
BenchmarkDecoderSilesia-8 3 385000067 ns/op 550.51 MB/s 5498 B/op 8 allocs/op
BenchmarkDecoderSilesiaCgo-8 6 197666567 ns/op 1072.25 MB/s 270672 B/op 8 allocs/op
BenchmarkDecoderEnwik9-8 1 2027001600 ns/op 493.34 MB/s 10496 B/op 18 allocs/op
BenchmarkDecoderEnwik9Cgo-8 2 979499200 ns/op 1020.93 MB/s 270672 B/op 8 allocs/op
Concurrent performance:
BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16 28915 42469 ns/op 4340.07 MB/s 114 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16 116505 9965 ns/op 11900.16 MB/s 16 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16 8952 134272 ns/op 3588.70 MB/s 915 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16 11820 102538 ns/op 4161.90 MB/s 594 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16 34782 34184 ns/op 3661.88 MB/s 60 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16 27712 43447 ns/op 3500.58 MB/s 99 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16 62826 18750 ns/op 21845.10 MB/s 104 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16 631545 1794 ns/op 57078.74 MB/s 2 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16 1690140 712 ns/op 172938.13 MB/s 1 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16 10432 113593 ns/op 6180.73 MB/s 1143 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/html.zst-16 113206 10671 ns/op 9596.27 MB/s 15 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16 1530615 779 ns/op 5229.49 MB/s 0 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/kppkn.gtb.zst-16 65217 16192 ns/op 11383.34 MB/s 46 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/geo.protodata.zst-16 292671 4039 ns/op 29363.19 MB/s 6 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/plrabn12.txt.zst-16 26314 46021 ns/op 10470.43 MB/s 293 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/lcet10.txt.zst-16 33897 34900 ns/op 12227.96 MB/s 205 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/asyoulik.txt.zst-16 104348 11433 ns/op 10949.01 MB/s 20 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/alice29.txt.zst-16 75949 15510 ns/op 9805.60 MB/s 32 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/html_x_4.zst-16 173910 6756 ns/op 60624.29 MB/s 37 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/paper-100k.pdf.zst-16 923076 1339 ns/op 76474.87 MB/s 1 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/fireworks.jpeg.zst-16 922920 1351 ns/op 91102.57 MB/s 2 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/urls.10K.zst-16 27649 43618 ns/op 16096.19 MB/s 407 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/html.zst-16 279073 4160 ns/op 24614.18 MB/s 6 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallelCgo/comp-data.bin.zst-16 749938 1579 ns/op 2581.71 MB/s 0 B/op 0 allocs/op
BenchmarkDecoderSilesia-32 5 206878840 ns/op 1024.50 MB/s 49808 B/op 43 allocs/op
BenchmarkDecoderEnwik9-32 1 1271809000 ns/op 786.28 MB/s 72048 B/op 52 allocs/op
Concurrent blocks, performance:
BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32 67356 17857 ns/op 10321.96 MB/s 22.48 pct 102 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32 266656 4421 ns/op 26823.21 MB/s 11.89 pct 19 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32 20992 56842 ns/op 8477.17 MB/s 39.90 pct 754 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32 27456 43932 ns/op 9714.01 MB/s 33.27 pct 524 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32 78432 15047 ns/op 8319.15 MB/s 40.34 pct 66 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32 65800 18436 ns/op 8249.63 MB/s 37.75 pct 88 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32 102993 11523 ns/op 35546.09 MB/s 3.637 pct 143 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32 1000000 1070 ns/op 95720.98 MB/s 80.53 pct 3 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32 749802 1752 ns/op 70272.35 MB/s 100.0 pct 5 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32 22640 52934 ns/op 13263.37 MB/s 26.25 pct 1014 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/html.zst-32 226412 5232 ns/op 19572.27 MB/s 14.49 pct 20 B/op 0 allocs/op
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32 923041 1276 ns/op 3194.71 MB/s 31.26 pct 0 B/op 0 allocs/op
```

This reflects the performance around May 2020, but this may be out of date.
This reflects the performance around May 2022, but this may be out of date.

## Zstd inside ZIP files

Expand Down
28 changes: 4 additions & 24 deletions zstd/_generate/gen.go
Original file line number Diff line number Diff line change
@@ -1,13 +1,10 @@
package main

//go:generate go run gen.go -out seqdec_amd64.s -stubs delme.go -pkg=zstd
//go:generate go run gen.go -out ../seqdec_amd64.s -pkg=zstd

import (
"flag"
"fmt"
"io/ioutil"
"os"
"path/filepath"
"runtime"

_ "github.com/klauspost/compress"
Expand All @@ -31,7 +28,7 @@ const errorMatchLenTooBig = 2
// error reported when mo > t or mo > s.windowSize
const errorMatchOffTooBig = 3

// error reported when the sum of literal lengths exeeceds the literal buffer size
// error reported when the sum of literal lengths exceeds the literal buffer size
const errorNotEnoughLiterals = 4

// error reported when capacity of `out` is too small
Expand All @@ -44,13 +41,6 @@ const seqValsSize = 24

func main() {
flag.Parse()
out := flag.Lookup("out")
os.Remove(filepath.Join("..", out.Value.String()))
stub := flag.Lookup("stubs")
if stub.Value.String() != "" {
os.Remove(stub.Value.String())
defer os.Remove(stub.Value.String())
}

Constraint(buildtags.Not("appengine").ToConstraint())
Constraint(buildtags.Not("noasm").ToConstraint())
Expand Down Expand Up @@ -89,16 +79,6 @@ func main() {
decodeSync.setBMI2(true)
decodeSync.generateProcedure("sequenceDecs_decodeSync_safe_bmi2")
Generate()
b, err := ioutil.ReadFile(out.Value.String())
if err != nil {
panic(err)
}
const readOnly = 0444
err = ioutil.WriteFile(filepath.Join("..", out.Value.String()), b, readOnly)
if err != nil {
panic(err)
}
os.Remove(out.Value.String())
}

func debugval(v Op) {
Expand Down Expand Up @@ -1044,12 +1024,12 @@ func (e executeSimple) executeSingleTriple(c *executeSingleTripleContext, handle
if !e.useSeqs {
Comment("Check if we have enough space in s.out")
{
// baseAfterCopy = ll + ml + c.outBese
// baseAfterCopy = ll + ml + c.outBase
baseAfterCopy := GP64()
LEAQ(Mem{Base: ll, Index: ml, Scale: 1}, baseAfterCopy)
ADDQ(c.outBase, baseAfterCopy)
CMPQ(baseAfterCopy, c.outEndPtr)
JAE(LabelRef("error_not_enough_space"))
JA(LabelRef("error_not_enough_space"))
}
}

Expand Down
6 changes: 5 additions & 1 deletion zstd/blockdec.go
Original file line number Diff line number Diff line change
Expand Up @@ -494,11 +494,15 @@ func (b *blockDec) decodeCompressed(hist *history) error {
b.dst = append(b.dst, hist.decoders.literals...)
return nil
}

before := len(hist.decoders.out)
err = hist.decoders.decodeSync(hist.b[hist.ignoreBuffer:])
if err != nil {
return err
}
if hist.decoders.maxSyncLen > 0 {
hist.decoders.maxSyncLen += uint64(before)
hist.decoders.maxSyncLen -= uint64(len(hist.decoders.out))
}
b.dst = hist.decoders.out
hist.recentOffsets = hist.decoders.prevOffset
return nil
Expand Down
7 changes: 4 additions & 3 deletions zstd/decoder_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -703,7 +703,7 @@ func TestDecoderRegression(t *testing.T) {
if err != nil {
t.Error(err)
}
got, gotErr := dec.DecodeAll(in, nil)
got, gotErr := dec.DecodeAll(in, make([]byte, 0, len(in)))
t.Log("Received:", len(got), gotErr)

// Check if we got the same:
Expand All @@ -713,7 +713,7 @@ func TestDecoderRegression(t *testing.T) {
return
}
defer decL.Close()
got2, gotErr2 := decL.DecodeAll(in, nil)
got2, gotErr2 := decL.DecodeAll(in, make([]byte, 0, len(in)/2))
t.Log("Fresh Reader received:", len(got2), gotErr2)
if gotErr != gotErr2 {
if gotErr != nil && gotErr2 != nil && gotErr.Error() != gotErr2.Error() {
Expand Down Expand Up @@ -741,7 +741,7 @@ func TestDecoderRegression(t *testing.T) {
if err != nil {
t.Error(err)
}
got, gotErr := dec.DecodeAll(in, nil)
got, gotErr := dec.DecodeAll(in, make([]byte, 0, len(in)))
t.Log("Received:", len(got), gotErr)

// Check a fresh instance
Expand Down Expand Up @@ -1381,6 +1381,7 @@ func BenchmarkDecoder_DecodeAllParallel(b *testing.B) {
}
}
})
b.ReportMetric(100*float64(len(in))/float64(len(got)), "pct")
})
}
}
Expand Down
6 changes: 3 additions & 3 deletions zstd/encoder_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -251,12 +251,12 @@ func TestEncoderRegression(t *testing.T) {
t.Error(err)
}
encoded := enc.EncodeAll(in, nil)
got, err := dec.DecodeAll(encoded, nil)
// Usually too small...
got, err := dec.DecodeAll(encoded, make([]byte, 0, len(in)))
if err != nil {
t.Logf("error: %v\nwant: %v\ngot: %v", err, len(in), len(got))
t.Fatal(err)
}

// Use the Writer
var dst bytes.Buffer
enc.ResetContentSize(&dst, int64(len(in)))
Expand All @@ -269,7 +269,7 @@ func TestEncoderRegression(t *testing.T) {
t.Error(err)
}
encoded = dst.Bytes()
got, err = dec.DecodeAll(encoded, nil)
got, err = dec.DecodeAll(encoded, make([]byte, 0, len(in)/2))
if err != nil {
t.Logf("error: %v\nwant: %v\ngot: %v", err, in, got)
t.Error(err)
Expand Down
6 changes: 4 additions & 2 deletions zstd/seqdec_amd64.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,9 @@ func (s *sequenceDecs) decodeSyncSimple(hist []byte) (bool, error) {
if s.maxSyncLen == 0 && cap(s.out)-len(s.out) < maxCompressedBlockSizeAlloc {
useSafe = true
}
if s.maxSyncLen > 0 && uint64(cap(s.out))-compressedBlockOverAlloc < s.maxSyncLen {
if s.maxSyncLen > 0 && cap(s.out)-len(s.out)-compressedBlockOverAlloc < int(s.maxSyncLen) {
useSafe = true
}

br := s.br

maxBlockSize := maxCompressedBlockSize
Expand Down Expand Up @@ -123,6 +122,9 @@ func (s *sequenceDecs) decodeSyncSimple(hist []byte) (bool, error) {

case errorNotEnoughSpace:
size := ctx.outPosition + ctx.ll + ctx.ml
if debugDecoder {
println("msl:", s.maxSyncLen, "cap", cap(s.out), "bef:", startSize, "sz:", size-startSize, "mbs:", maxBlockSize, "outsz:", cap(s.out)-startSize)
}
return true, fmt.Errorf("output (%d) bigger than max block size (%d)", size-startSize, maxBlockSize)

default:
Expand Down
14 changes: 9 additions & 5 deletions zstd/seqdec_amd64.s
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Code generated by command: go run gen.go -out seqdec_amd64.s -stubs delme.go -pkg=zstd. DO NOT EDIT.
// Code generated by command: go run gen.go -out ../seqdec_amd64.s -pkg=zstd. DO NOT EDIT.

//go:build !appengine && !noasm && gc && !noasm
// +build !appengine,!noasm,gc,!noasm
Expand Down Expand Up @@ -1667,7 +1667,7 @@ sequenceDecs_decodeSync_amd64_match_len_ofs_ok:
LEAQ (AX)(R13*1), R14
ADDQ R10, R14
CMPQ R14, 32(SP)
JAE error_not_enough_space
JA error_not_enough_space

// Copy literals
TESTQ AX, AX
Expand Down Expand Up @@ -1913,6 +1913,7 @@ error_not_enough_space:
MOVQ CX, 208(AX)
MOVQ 16(SP), CX
MOVQ CX, 216(AX)
MOVQ R12, 136(AX)
MOVQ $0x00000005, ret+24(FP)
RET

Expand Down Expand Up @@ -2173,7 +2174,7 @@ sequenceDecs_decodeSync_bmi2_match_len_ofs_ok:
LEAQ (CX)(R13*1), R14
ADDQ R9, R14
CMPQ R14, 32(SP)
JAE error_not_enough_space
JA error_not_enough_space

// Copy literals
TESTQ CX, CX
Expand Down Expand Up @@ -2419,6 +2420,7 @@ error_not_enough_space:
MOVQ CX, 208(AX)
MOVQ 16(SP), CX
MOVQ CX, 216(AX)
MOVQ R11, 136(AX)
MOVQ $0x00000005, ret+24(FP)
RET

Expand Down Expand Up @@ -2701,7 +2703,7 @@ sequenceDecs_decodeSync_safe_amd64_match_len_ofs_ok:
LEAQ (AX)(R13*1), R14
ADDQ R10, R14
CMPQ R14, 32(SP)
JAE error_not_enough_space
JA error_not_enough_space

// Copy literals
TESTQ AX, AX
Expand Down Expand Up @@ -2976,6 +2978,7 @@ error_not_enough_space:
MOVQ CX, 208(AX)
MOVQ 16(SP), CX
MOVQ CX, 216(AX)
MOVQ R12, 136(AX)
MOVQ $0x00000005, ret+24(FP)
RET

Expand Down Expand Up @@ -3236,7 +3239,7 @@ sequenceDecs_decodeSync_safe_bmi2_match_len_ofs_ok:
LEAQ (CX)(R13*1), R14
ADDQ R9, R14
CMPQ R14, 32(SP)
JAE error_not_enough_space
JA error_not_enough_space

// Copy literals
TESTQ CX, CX
Expand Down Expand Up @@ -3511,5 +3514,6 @@ error_not_enough_space:
MOVQ CX, 208(AX)
MOVQ 16(SP), CX
MOVQ CX, 216(AX)
MOVQ R11, 136(AX)
MOVQ $0x00000005, ret+24(FP)
RET
Binary file modified zstd/testdata/comp-crashers.zip
Binary file not shown.

0 comments on commit abf5559

Please sign in to comment.