Skip to content

Commit

Permalink
update readme with json schema command
Browse files Browse the repository at this point in the history
  • Loading branch information
manojkarthick committed May 14, 2022
1 parent 810cfb5 commit 6376d3c
Showing 1 changed file with 23 additions and 25 deletions.
48 changes: 23 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,11 @@ You can download release binaries [here](https://github.com/manojkarthick/pqrs/r

### Alternative methods

#### Using macports

You can use [macports](https://www.macports.org/) to install `pqrs` if you are a macOS user.

```
sudo port install pqrs
```

#### Using Homebrew

For macOS users, `pqrs` is available as a homebrew tap.

```
```shell
brew tap manojkarthick/pqrs
brew install pqrs
```
Expand All @@ -35,15 +27,15 @@ brew install pqrs

`pqrs` is also available for installation from [crates.io](https://crates.io/crates/pqrs) using `cargo`, the rust package manager.

```shell script
```shell
cargo install pqrs
```

#### Building and running from source

Make sure you have `rustc` and `cargo` installed on your machine.

```
```shell
git clone https://github.com/manojkarthick/pqrs.git
cargo build --release
./target/release/pqrs
Expand All @@ -53,9 +45,9 @@ cargo build --release

The below snippet shows the available subcommands:

```
```shell
❯ pqrs --help
pqrs 0.2.0
pqrs 0.2.1
Manoj Karthick
Apache Parquet command-line utility

Expand Down Expand Up @@ -83,21 +75,21 @@ SUBCOMMANDS:
Prints the contents of the given files and folders. Recursively traverses and prints all the files if the input is a directory.
Supports json-like, json or CSV format. Use `--json` for JSON output and `--csv` for CSV output.

```
```shell
❯ pqrs cat data/cities.parquet
{continent: "Europe", country: {name: "France", city: ["Paris", "Nice", "Marseilles", "Cannes"]}}
{continent: "Europe", country: {name: "Greece", city: ["Athens", "Piraeus", "Hania", "Heraklion", "Rethymnon", "Fira"]}}
{continent: "North America", country: {name: "Canada", city: ["Toronto", "Vancouver", "St. John's", "Saint John", "Montreal", "Halifax", "Winnipeg", "Calgary", "Saskatoon", "Ottawa", "Yellowknife"]}}
```

```
```shell
❯ pqrs cat data/cities.parquet --json
{"continent":"Europe","country":{"name":"France","city":["Paris","Nice","Marseilles","Cannes"]}}
{"continent":"Europe","country":{"name":"Greece","city":["Athens","Piraeus","Hania","Heraklion","Rethymnon","Fira"]}}
{"continent":"North America","country":{"name":"Canada","city":["Toronto","Vancouver","St. John's","Saint John","Montreal","Halifax","Winnipeg","Calgary","Saskatoon","Ottawa","Yellowknife"]}}
```

```
```shell
❯ pqrs cat data/simple.parquet --csv
foo,bar
1,2
Expand All @@ -110,7 +102,7 @@ NOTE: CSV format is not supported for files that contain Struct or Byte fields.

Prints the first N records of the parquet file. Use `--records` flag to set the number of records.

```
```shell
❯ pqrs head data/cities.parquet --json --records 2
{"continent":"Europe","country":{"name":"France","city":["Paris","Nice","Marseilles","Cannes"]}}
{"continent":"Europe","country":{"name":"Greece","city":["Athens","Piraeus","Hania","Heraklion","Rethymnon","Fira"]}}
Expand All @@ -122,7 +114,7 @@ Merge two Parquet files by placing row groups (or blocks) from the two files one

Disclaimer: This does not combine the files to have optimized row groups, do not use it in production!

```
```shell
❯ pqrs merge --input data/pems-1.snappy.parquet data/pems-2.snappy.parquet --output data/pems-merged.snappy.parquet

❯ ls -al data
Expand All @@ -139,8 +131,8 @@ drwxr-xr-x 20 manojkarthick staff 640 Feb 14 08:52 ..

Print the number of rows present in the parquet file.

```
❯ pqrs rowcount data/pems-1.snappy.parquet data/pems-2.snappy.parquet
```shell
❯ pqrs row-count data/pems-1.snappy.parquet data/pems-2.snappy.parquet
File Name: data/pems-1.snappy.parquet: 2693 rows
File Name: data/pems-2.snappy.parquet: 2880 rows
```
Expand All @@ -149,7 +141,7 @@ File Name: data/pems-2.snappy.parquet: 2880 rows

Prints a random sample of records from the given parquet file.

```
```shell
❯ pqrs sample data/pems-1.snappy.parquet --records 3
{timeperiod: "01/17/2016 07:01:27", flow1: 0, occupancy1: 0E0, speed1: 0E0, flow2: 0, occupancy2: 0E0, speed2: 0E0, flow3: 0, occupancy3: 0E0, speed3: 0E0, flow4: null, occupancy4: null, speed4: null, flow5: null, occupancy5: null, speed5: null, flow6: null, occupancy6: null, speed6: null, flow7: null, occupancy7: null, speed7: null, flow8: null, occupancy8: null, speed8: null}
{timeperiod: "01/17/2016 07:47:27", flow1: 0, occupancy1: 0E0, speed1: 0E0, flow2: 0, occupancy2: 0E0, speed2: 0E0, flow3: 0, occupancy3: 0E0, speed3: 0E0, flow4: null, occupancy4: null, speed4: null, flow5: null, occupancy5: null, speed5: null, flow6: null, occupancy6: null, speed6: null, flow7: null, occupancy7: null, speed7: null, flow8: null, occupancy8: null, speed8: null}
Expand All @@ -160,7 +152,7 @@ Prints a random sample of records from the given parquet file.

Print the schema from the given parquet file. Use the `--detailed` flag to get more detailed stats.

```
```shell
❯ pqrs schema data/cities.parquet
Metadata for file: data/cities.parquet

Expand All @@ -180,7 +172,7 @@ message hive_schema {
}
```

```
```shell
❯ pqrs schema data/cities.parquet --detailed

num of row groups: 1
Expand Down Expand Up @@ -213,19 +205,25 @@ statistics: {min: [69, 117, 114, 111, 112, 101], max: [78, 111, 114, 116, 104, 3

```

```shell
❯ pqrs schema --json data/cities.parquet
{"version":1,"num_rows":3,"created_by":"parquet-mr version 1.5.0-cdh5.7.0 (build ${buildNumber})","metadata":null,"columns":[{"optional":"true","physical_type":"BYTE_ARRAY","name":"continent","path":"continent","converted_type":"UTF8"},{"name":"name","converted_type":"UTF8","path":"country.name","physical_type":"BYTE_ARRAY","optional":"true"},{"optional":"true","name":"array_element","physical_type":"BYTE_ARRAY","path":"country.city.bag.array_element","converted_type":"UTF8"}],"message":"message hive_schema {\n OPTIONAL BYTE_ARRAY continent (UTF8);\n OPTIONAL group country {\n OPTIONAL BYTE_ARRAY name (UTF8);\n OPTIONAL group city (LIST) {\n REPEATED group bag {\n OPTIONAL BYTE_ARRAY array_element (UTF8);\n }\n }\n }\n}\n"}

```

### Subcommand: size

Print the compressed/uncompressed size of the parquet file. Shows uncompressed size by default

```
```shell
❯ pqrs size data/pems-1.snappy.parquet --pretty
Size in Bytes:

File Name: data/pems-1.snappy.parquet
Uncompressed Size: 61 KiB
```

```
```shell
❯ pqrs size data/pems-1.snappy.parquet --pretty --compressed
Size in Bytes:

Expand Down

0 comments on commit 6376d3c

Please sign in to comment.