Parquet Visualizer is a tool that helps you easily query with SQL and inspect very large parquet files fast and easy.
You can inspect the binary data of parquet files in a human readable tabular format with pagination. You can also change the page size.
You can easily inspect complex struct values by clicking on the cell, which shows a popup containing the value of the struct.
You can query a parquet file with DuckDB SQL. You can also search within the result, paginate the result or change the page size.
By typing in the editor, an autocomplete box with column suggestions appear. This makes it much easier to write queries, by selecting the suggested columns.
By typing free text in the search box, find specific values of the query result page.
By clicking on the export button in the query tab, you can save your query result to a specified location in CSV, Excel, JSON, ndJSON or Parquet format to disk.
By clicking on the copy button in the Query tab, you can copy the query result page data to the clipboard.
You can inspect the schema of the parquet file by clicking on the Schema tab, in which you can paginate if the file has many columns.
You can also inspect the struct type by clicking on the cell, which will show a popup containing the struct data type.
You can inspect the metadata in tabular format by clicking on the Metadata tab.
The theme of the extension (dark or light) is based on your VS Code Color theme setting. If the color theme is light, the extension will load it's light theme. When you change your theme settings, all active documents will change theme automatically.
The following configuration options are available:
name | default | description |
---|---|---|
parquet-visualizer.backend |
duckdb |
Backend for reading the parquet file. Options: duckdb , parquet-wasm |
parquet-visualizer.defaultPageSizes |
["20", "50", "100", "500", "all"] |
Set the default page size for data and query tab. |
parquet-visualizer.defaultQuery |
SELECT *\r\nFROM data\r\nLIMIT 1000; |
Default SQL query for parquet file. The table data should remain the same. |
parquet-visualizer.RunQueryKeyBinding |
Ctrl-Enter |
Default Key Binding for running queries. If Ctrl is written, it will be translated to Command for mac and vica versa. E.g., Ctrl-E will be synonymous to Command-E. |
parquet-visualizer.dateTimeFormat |
ISO8601 |
Set datetime format for columns of timestamp type. Defaults to ISO8601. You can set a custom format like YYYY-MM-DD HH:mm:ss.SSS Z . Find rules for formatting here. |
parquet-visualizer.outputDateTimeFormatInUTC |
true |
Outputs the datetime format for timestamp columns in UTC or in local time. |
This extension supports two different types of backends for visualizing and querying parquet files.
DuckDB is the primary backend used for uncompressed and compressed parquet files (except for the BROTLI compression codec.)
parquet-wasm is a backend that uses a Rust implementation of arrow and parquet. It supports all compression codecs except LZ4.
The tables of the frontend are powered by tabulator.
The query editor of the frontend is powered by ace.
See the CHANGELOG.MD
The following people have contributed time and effort to improve Parquet Visualizer:
- Darryl Thompson: Testing, Design
To improve the quality of Parquet Visualizer, the extension collects the following analytics such as:
- Extension load times
- File parsing success or failure
- Frequency of features like Data tab or query tab
Our telemetry implementation respects the vscode isTelemetryEnabled
and onDidChangeTelemetryEnabled
API, which allows you to disable telemetry dynamically and zero telemetry will be sent.
You can disable it via the settings by following the instructions here.
You can view all the possible telemetry events that are sent by following instructions here.