diff --git a/README.md b/README.md index c0d552c..29beb28 100644 --- a/README.md +++ b/README.md @@ -14,19 +14,11 @@ You can download release binaries [here](https://github.com/manojkarthick/pqrs/r ### Alternative methods -#### Using macports - -You can use [macports](https://www.macports.org/) to install `pqrs` if you are a macOS user. - -``` -sudo port install pqrs -``` - #### Using Homebrew For macOS users, `pqrs` is available as a homebrew tap. -``` +```shell brew tap manojkarthick/pqrs brew install pqrs ``` @@ -35,7 +27,7 @@ brew install pqrs `pqrs` is also available for installation from [crates.io](https://crates.io/crates/pqrs) using `cargo`, the rust package manager. -```shell script +```shell cargo install pqrs ``` @@ -43,7 +35,7 @@ cargo install pqrs Make sure you have `rustc` and `cargo` installed on your machine. -``` +```shell git clone https://github.com/manojkarthick/pqrs.git cargo build --release ./target/release/pqrs @@ -53,9 +45,9 @@ cargo build --release The below snippet shows the available subcommands: -``` +```shell ❯ pqrs --help -pqrs 0.2.0 +pqrs 0.2.1 Manoj Karthick Apache Parquet command-line utility @@ -83,21 +75,21 @@ SUBCOMMANDS: Prints the contents of the given files and folders. Recursively traverses and prints all the files if the input is a directory. Supports json-like, json or CSV format. Use `--json` for JSON output and `--csv` for CSV output. -``` +```shell ❯ pqrs cat data/cities.parquet {continent: "Europe", country: {name: "France", city: ["Paris", "Nice", "Marseilles", "Cannes"]}} {continent: "Europe", country: {name: "Greece", city: ["Athens", "Piraeus", "Hania", "Heraklion", "Rethymnon", "Fira"]}} {continent: "North America", country: {name: "Canada", city: ["Toronto", "Vancouver", "St. John's", "Saint John", "Montreal", "Halifax", "Winnipeg", "Calgary", "Saskatoon", "Ottawa", "Yellowknife"]}} ``` -``` +```shell ❯ pqrs cat data/cities.parquet --json {"continent":"Europe","country":{"name":"France","city":["Paris","Nice","Marseilles","Cannes"]}} {"continent":"Europe","country":{"name":"Greece","city":["Athens","Piraeus","Hania","Heraklion","Rethymnon","Fira"]}} {"continent":"North America","country":{"name":"Canada","city":["Toronto","Vancouver","St. John's","Saint John","Montreal","Halifax","Winnipeg","Calgary","Saskatoon","Ottawa","Yellowknife"]}} ``` -``` +```shell ❯ pqrs cat data/simple.parquet --csv foo,bar 1,2 @@ -110,7 +102,7 @@ NOTE: CSV format is not supported for files that contain Struct or Byte fields. Prints the first N records of the parquet file. Use `--records` flag to set the number of records. -``` +```shell ❯ pqrs head data/cities.parquet --json --records 2 {"continent":"Europe","country":{"name":"France","city":["Paris","Nice","Marseilles","Cannes"]}} {"continent":"Europe","country":{"name":"Greece","city":["Athens","Piraeus","Hania","Heraklion","Rethymnon","Fira"]}} @@ -122,7 +114,7 @@ Merge two Parquet files by placing row groups (or blocks) from the two files one Disclaimer: This does not combine the files to have optimized row groups, do not use it in production! -``` +```shell ❯ pqrs merge --input data/pems-1.snappy.parquet data/pems-2.snappy.parquet --output data/pems-merged.snappy.parquet ❯ ls -al data @@ -139,8 +131,8 @@ drwxr-xr-x 20 manojkarthick staff 640 Feb 14 08:52 .. Print the number of rows present in the parquet file. -``` -❯ pqrs rowcount data/pems-1.snappy.parquet data/pems-2.snappy.parquet +```shell +❯ pqrs row-count data/pems-1.snappy.parquet data/pems-2.snappy.parquet File Name: data/pems-1.snappy.parquet: 2693 rows File Name: data/pems-2.snappy.parquet: 2880 rows ``` @@ -149,7 +141,7 @@ File Name: data/pems-2.snappy.parquet: 2880 rows Prints a random sample of records from the given parquet file. -``` +```shell ❯ pqrs sample data/pems-1.snappy.parquet --records 3 {timeperiod: "01/17/2016 07:01:27", flow1: 0, occupancy1: 0E0, speed1: 0E0, flow2: 0, occupancy2: 0E0, speed2: 0E0, flow3: 0, occupancy3: 0E0, speed3: 0E0, flow4: null, occupancy4: null, speed4: null, flow5: null, occupancy5: null, speed5: null, flow6: null, occupancy6: null, speed6: null, flow7: null, occupancy7: null, speed7: null, flow8: null, occupancy8: null, speed8: null} {timeperiod: "01/17/2016 07:47:27", flow1: 0, occupancy1: 0E0, speed1: 0E0, flow2: 0, occupancy2: 0E0, speed2: 0E0, flow3: 0, occupancy3: 0E0, speed3: 0E0, flow4: null, occupancy4: null, speed4: null, flow5: null, occupancy5: null, speed5: null, flow6: null, occupancy6: null, speed6: null, flow7: null, occupancy7: null, speed7: null, flow8: null, occupancy8: null, speed8: null} @@ -160,7 +152,7 @@ Prints a random sample of records from the given parquet file. Print the schema from the given parquet file. Use the `--detailed` flag to get more detailed stats. -``` +```shell ❯ pqrs schema data/cities.parquet Metadata for file: data/cities.parquet @@ -180,7 +172,7 @@ message hive_schema { } ``` -``` +```shell ❯ pqrs schema data/cities.parquet --detailed num of row groups: 1 @@ -213,11 +205,17 @@ statistics: {min: [69, 117, 114, 111, 112, 101], max: [78, 111, 114, 116, 104, 3 ``` +```shell +❯ pqrs schema --json data/cities.parquet +{"version":1,"num_rows":3,"created_by":"parquet-mr version 1.5.0-cdh5.7.0 (build ${buildNumber})","metadata":null,"columns":[{"optional":"true","physical_type":"BYTE_ARRAY","name":"continent","path":"continent","converted_type":"UTF8"},{"name":"name","converted_type":"UTF8","path":"country.name","physical_type":"BYTE_ARRAY","optional":"true"},{"optional":"true","name":"array_element","physical_type":"BYTE_ARRAY","path":"country.city.bag.array_element","converted_type":"UTF8"}],"message":"message hive_schema {\n OPTIONAL BYTE_ARRAY continent (UTF8);\n OPTIONAL group country {\n OPTIONAL BYTE_ARRAY name (UTF8);\n OPTIONAL group city (LIST) {\n REPEATED group bag {\n OPTIONAL BYTE_ARRAY array_element (UTF8);\n }\n }\n }\n}\n"} + +``` + ### Subcommand: size Print the compressed/uncompressed size of the parquet file. Shows uncompressed size by default -``` +```shell ❯ pqrs size data/pems-1.snappy.parquet --pretty Size in Bytes: @@ -225,7 +223,7 @@ File Name: data/pems-1.snappy.parquet Uncompressed Size: 61 KiB ``` -``` +```shell ❯ pqrs size data/pems-1.snappy.parquet --pretty --compressed Size in Bytes: