Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AvroWriteSupport: optimize String to Binary Conversion #2994

Closed
sschepens opened this issue Aug 19, 2024 · 0 comments · Fixed by #2995
Closed

AvroWriteSupport: optimize String to Binary Conversion #2994

sschepens opened this issue Aug 19, 2024 · 0 comments · Fixed by #2995

Comments

@sschepens
Copy link
Contributor

sschepens commented Aug 19, 2024

Describe the enhancement requested

Currently AvroWriteSupport.fromAvroString calls Binary.fromCharSequence when converting String to Binary.

Binary.fromCharSequence is an order of magnitude slower than Binary.fromString when input is a String, this is because CharsetEncoder.encode() is much slower than String.getBytes(charset).

benchmark results:

Benchmark                     Mode  Cnt         Score         Error  Units
Benchmarks.fromCharSequence  thrpt   25   5885347.328 ±  186669.738  ops/s
Benchmarks.fromString        thrpt   25  71335979.492 ± 8800704.044  ops/s

benchmark code:

public class Benchmarks {
    private static final String string = RandomStringUtils.randomAlphanumeric(100);

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public void fromCharSequence(Blackhole blackhole) {
        blackhole.consume(Binary.fromCharSequence(string));
    }

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public void fromString(Blackhole blackhole) {
        blackhole.consume(Binary.fromString(string));
    }
}

Component(s)

Avro

wgtmac pushed a commit that referenced this issue Aug 28, 2024
)

`Binary.fromCharSequence` is an order of magnitude slower than `Binary.fromString` when input is a `String`:

```
Benchmarks.fromCharSequence  thrpt   25   5885347.328 ±  186669.738  ops/s
Benchmarks.fromString        thrpt   25  71335979.492 ± 8800704.044  ops/s
```

Here is the code for the benchmarks:
```java
public class Benchmarks {
    private static final String string = RandomStringUtils.randomAlphanumeric(100);

    @benchmark
    @BenchmarkMode(Mode.Throughput)
    public void fromCharSequence(Blackhole blackhole) {
        blackhole.consume(Binary.fromCharSequence(string));
    }

    @benchmark
    @BenchmarkMode(Mode.Throughput)
    public void fromString(Blackhole blackhole) {
        blackhole.consume(Binary.fromString(string));
    }
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant