Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(connector): generate native Row when datagen to avoid json deser #6969

Closed
wants to merge 14 commits into from

Conversation

lmatz
Copy link
Contributor

@lmatz lmatz commented Dec 20, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

#4555 #4961

Use Box::into_raw to leak the memory, and then Box::from_raw to reclaim the ownership of that piece of memory.

introduce keywords NATIVE

create source ... with ( connector = 'datagen' ) ROW FORMAT NATIVE

and create a special native parser for that.

I guess creating individual rows and then sending them row by row is still suboptimal, compared to creating columns (column by column, so create, e.g. 1024, the same types of data each time) to compose a StreamChunk.

This is unsafe but limited to a small scope.
No reason to free the memory twice as it requires using the Bytes twice, which itself would suggest it's a suboptimal implementation.

Comparison on c5.4xLarge EC2.
Query:

create source t1 (
    v1 BIGINT, 
    v2 BIGINT, 
    t1 timestamp, 
    t2 timestamp,
    c1 varchar,
    c2 varchar
) with ( 
    connector = 'datagen', 
    datagen.split.num = '8',
    datagen.rows.per.second = '1000000000'
) ROW FORMAT JSON;

create sink s1 as select * from t1 with ( connector = 'blackhole' );

risedev.yml:

parallelism: 8

This pr:
native 8-8

commit: 94d508a:
json 8-8

So more than 2X improvement.

Guess the improvement can be artificially increased by specifying more and longer column names, and more and larger column values.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

@lmatz lmatz force-pushed the lz/native branch 3 times, most recently from 0d57198 to 283a087 Compare December 21, 2022 17:53
@lmatz lmatz force-pushed the lz/native branch 2 times, most recently from 04e40e0 to 4487e1e Compare December 22, 2022 06:04
@lmatz lmatz marked this pull request as ready for review December 22, 2022 09:54
@codecov
Copy link

codecov bot commented Dec 22, 2022

Codecov Report

Merging #6969 (5643393) into main (70e3132) will increase coverage by 72.88%.
The diff coverage is 86.84%.

@@            Coverage Diff            @@
##           main    #6969       +/-   ##
=========================================
+ Coverage      0   72.88%   +72.88%     
=========================================
  Files         0     1041     +1041     
  Lines         0   167213   +167213     
=========================================
+ Hits          0   121872   +121872     
- Misses        0    45341    +45341     
Flag Coverage Δ
rust 72.88% <86.84%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/frontend/src/handler/create_source.rs 61.56% <0.00%> (ø)
src/source/src/lib.rs 81.81% <ø> (ø)
src/source/src/parser/csv_parser.rs 53.54% <0.00%> (ø)
src/source/src/parser/protobuf/parser.rs 54.16% <0.00%> (ø)
src/sqlparser/src/ast/statement.rs 69.79% <0.00%> (ø)
src/sqlparser/src/keywords.rs 100.00% <ø> (ø)
src/source/src/parser/canal/simd_json_parser.rs 62.80% <66.66%> (ø)
src/source/src/parser/maxwell/simd_json_parser.rs 64.28% <66.66%> (ø)
src/source/src/parser/native_parser.rs 96.00% <96.00%> (ø)
...c/connector/src/source/datagen/source/generator.rs 100.00% <100.00%> (ø)
... and 1049 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@lmatz
Copy link
Contributor Author

lmatz commented Dec 23, 2022

#7032 will make this unnecessary, close now.

@lmatz lmatz closed this Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant