Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python API to get dataframes out of an RRD #6811

Closed
jleibs opened this issue Jul 8, 2024 · 3 comments
Closed

Python API to get dataframes out of an RRD #6811

jleibs opened this issue Jul 8, 2024 · 3 comments
Assignees
Labels
🐍 Python API Python logging API

Comments

@jleibs
Copy link
Member

jleibs commented Jul 8, 2024

The user should be able to open an existing RRD.

Using the open RRD object, we should expose 4 new APIs:

  • Get “Schema”
    • Probably takes an EntityExpression as an input
    • Returns Columns w/ information about entities, components, and types, static/dynamic
  • List timelines
    • Timeline + TimeRange
  • Latest-at (list of columns (multi entity), timestamp)
    • Returns ArrowRecordBatch (single row)
  • RangeQuery (list of columns (multi entity), single point of view column, time range, timeline)
    • Returns iterator of ArrowRecordBatch (multiple rows)
    • This should be a "streaming" API so we don't need to allocate memory for the entire frame

The inputs to the APIs that take a list of columns likely either use ColumnDescriptors returned from the Schema API, or an EntityPath expression that can be indirectly evaluated to a list of columns.

When this is done we should have an easy piece of example code that opens and rrd and constructrs a dataframe that is fed into pandas or polars.

Observed API issues:

Querying for an image is weird:

pov = schema.column_for("world/cameras/image/rgb", rr.components.Blob)

@jleibs jleibs added the 🐍 Python API Python logging API label Jul 8, 2024
@teh-cmc
Copy link
Member

teh-cmc commented Jul 9, 2024

Sounds like part of this should be an alternative crate to re_query that implements uncached multi-component queries, performs the join/clamping and finally packs everything into a single final Chunk.

This Chunk can then be passed to the SDKs for further native integration with the host language.

@jleibs jleibs modified the milestone: 0.18 - Chunks Jul 9, 2024
@jleibs jleibs added this to the 0.19 milestone Aug 26, 2024
@nikolausWest
Copy link
Member

@teh-cmc: does this issue have enough specificity to implement these queries or do you need further details on the exact semantics of the queries?

@jleibs
Copy link
Member Author

jleibs commented Aug 27, 2024

teh-cmc pushed a commit that referenced this issue Aug 31, 2024
…upport (#7322)

In preparation of:
- #6811

We want to be able to use the `arrow-rs` pyarrow support to more easily
move data bidirectionally between rust and python.

This required updating our `pyo3` version, which required migrating some
functions to use the new `Bound<>` APIs.

This allowed us to get rid of the old unsafe calls, though adds an extra
step of going pyarrow -> arrow-rs -> arrow.
@jleibs jleibs changed the title MVP: python API to get dataframes out of an RRD Python API to get dataframes out of an RRD Sep 9, 2024
@jleibs jleibs closed this as completed Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐍 Python API Python logging API
Projects
None yet
Development

No branches or pull requests

3 participants