-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] support GeoArrow format #2385
Conversation
@kylebarron Hi Kyle, could you also help to take a look? Thanks! |
Signed-off-by: Xun Li <[email protected]>
Signed-off-by: Xun Li <[email protected]>
Signed-off-by: Xun Li <[email protected]>
Signed-off-by: Xun Li <[email protected]>
66f337c
to
6150d79
Compare
Signed-off-by: Xun Li <[email protected]>
Signed-off-by: Xun Li <[email protected]>
src/components/src/side-panel/layer-panel/layer-configurator.tsx
Outdated
Show resolved
Hide resolved
Signed-off-by: Xun Li <[email protected]>
Signed-off-by: Xun Li <[email protected]>
Signed-off-by: Xun Li <[email protected]>
Signed-off-by: Xun Li <[email protected]>
@ibgreen @heshan0131 see a test of progressive rendering using GeoArrow, loaders.gl v4 in kepler.gl |
return newBounds; | ||
} | ||
|
||
export default class GeoArrowLayer extends GeoJsonLayer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we can't use the existing geojson layer but created a new layer type to support geoarrow? All the configuration option is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Shan! After moving the arrow utils to loaders.gl, I think we could handle the arrow format as a special case in Kepler's GeoJsonLayer (or making GeoJsonLayer arrow compatible). Let me think about it.
Signed-off-by: Xun Li <[email protected]>
Signed-off-by: Xun Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just re-read the code and it is looking great. Added some thoughts, but they can be addressed later or not at all.
|
||
// parse fields | ||
arrowTable.schema.fields.forEach((field: arrow.Field, index: number) => { | ||
const isGeometryColumn = field.metadata.get('ARROW:extension:name')?.startsWith('geoarrow'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This means we have knowledge about the GeoArrow extensions both here and in loader.gl/utils? Maybe hard to avoid duplication, but it is always good to try to centralize knowledge about some aspect in one part of the code. I.e. if we could use one of the utils from loaders...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agree. Will expose this in loaders.gl/utils and call it from here.
Signed-off-by: Xun Li <[email protected]>
Signed-off-by: Xun Li <[email protected]>
Signed-off-by: Xun Li <[email protected]>
Description
GeoArrow support in Kepler.gl will enable efficient loading of big data. For example, loading 1 million polygons takes ~2 seconds with Arrow format vs. ~20 seconds with GeoJson format:
GeoParquet is a file format while GeoArrow is a memory format. Both can be saved as a file e.g. .parquet and .arrow. Arrow memory is both zero-copy and has constant-time access, so it could be a very efficient memory format that allows different programs (javascript, C++, WebAssembly, Rust, Python) to exchange data.
Details
The GeoJsonLayer in deck.gl already has the capability of loading binary geometries directly, so this pull request inherits an ArrowLayer from kepler.gl's
GeoJsonLayer
.The geometry types supported: (see https://github.com/geoarrow/geoarrow/blob/main/extension-types.md#extension-names):
Picking and table view are supported.
This PR adds a new column-wise data container, ArrowDataContainer, which implements the DataContainerInterface. This container is designed to efficiently use the data structure of the Arrow format.
The GPU filtering in GeojsonLayer is compatible with ArrowLayer. For CPU filtering, to avoid filtering on the raw Arrow table and make a partial copy of the raw table, this PR adds a simple deck.gl layer extension to filter the Geoarrow layer based on the result of CPU filteredIndex. This could impact other functions that rely on a filtered dataset. Please help to check. Thanks!
Other: support drag-n-drop a GeoParquet/GeoArrow file in Kepler.gl
The current version of kepler.gl uses loaders.gl/arrow v3. However, in loaders.gl/arrow version < 4.0.0, the arrow loader in batch didn't return the correct data. Instead, the raw arrow data of each arrow column is returned and stored directly (without the metadata) to kepler.gl. This has been fixed in latest loaders.gl/arrow v4 (see: https://github.com/visgl/loaders.gl/blob/2577ca735878b521f07a556f26ce8ee457a7ad9f/modules/arrow/src/lib/parse-arrow-in-batches.ts#L29).
One can call
processArrowTable()
directly to add arrow data to Kepler, e.g.:To support drag-n-drop a GeoParquet/GeoArrow file in Kepler.gl, there are two more tasks need to be done:
Test arrow files:
flights.arrow.zip
polygons.arrow.zip
One can use ogr2ogr to convert e.g. GeoJson file to Arrow file: