Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: compression info in schema subcommand #42

Merged
merged 1 commit into from
May 10, 2023

Conversation

SteveLauC
Copy link
Contributor

@SteveLauC SteveLauC commented May 10, 2023

What does this PR do

  1. Add compression information to pqrs schema -D
    $ pqrs cat 1.parquet
    
    ###############
    File: 1.parquet
    ###############
    
    {age: 18, name: "steve", timestamp: 0}
    
    
    $ ./target/debug/pqrs schema 1.parquet -D
    ...
    column 0:
    --------------------------------------------------------------------------------
    column type: INT64
    column path: "age"
    encodings: PLAIN RLE
    file path: N/A
    file offset: 57
    num of values: 1
    compression: UNCOMPRESSED
    total compressed size (in bytes): 53
    total uncompressed size (in bytes): 53
    data page offset: 4
    index page offset: N/A
    dictionary page offset: N/A
    statistics: {min: 18, max: 18, distinct_count: N/A, null_count: 0, min_max_deprecated: false}
    bloom filter offset: N/A
    offset index offset: 423
    offset index length: 10
    column index offset: 336
    column index length: 31
    
    
    column 1:
    --------------------------------------------------------------------------------
    column type: BYTE_ARRAY
    column path: "name"
    encodings: PLAIN RLE
    file path: N/A
    file offset: 170
    num of values: 1
    compression: UNCOMPRESSED
    total compressed size (in bytes): 48
    total uncompressed size (in bytes): 48
    data page offset: 122
    index page offset: N/A
    dictionary page offset: N/A
    statistics: {min: [115, 116, 101, 118, 101], max: [115, 116, 101, 118, 101], distinct_count: N/A, null_count: 0, min_max_deprecated: false}
    bloom filter offset: N/A
    offset index offset: 433
    offset index length: 11
    column index offset: 367
    column index length: 25
    
    
    column 2:
    --------------------------------------------------------------------------------
    column type: INT64
    column path: "timestamp"
    encodings: PLAIN RLE
    file path: N/A
    file offset: 264
    num of values: 1
    compression: UNCOMPRESSED
    total compressed size (in bytes): 53
    total uncompressed size (in bytes): 53
    data page offset: 211
    index page offset: N/A
    dictionary page offset: N/A
    statistics: {min: 0, max: 0, distinct_count: N/A, null_count: 0, min_max_deprecated: false}
    bloom filter offset: N/A
    offset index offset: 444
    offset index length: 11
    column index offset: 392
    column index length: 31

Closes #40

@manojkarthick manojkarthick merged commit 0eaf5fe into manojkarthick:master May 10, 2023
@manojkarthick
Copy link
Owner

Thanks a lot @SteveLauC - will try and get a release out tonight!

@SteveLauC SteveLauC deleted the compression branch May 10, 2023 02:20
@SteveLauC
Copy link
Contributor Author

Thanks a lot @SteveLauC - will try and get a release out tonight!

That would be great, but I remember that crate with git dependencies is not allowed to be published to crates.io, so I guess we still need to wait for the release of 40.0.0:(

@manojkarthick
Copy link
Owner

That’s true yeah. I’ve released it using homebrew and also created prebuilt binaries for various platforms/architectures, so that’s something I suppose :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Compression algorithm information
2 participants