Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generalize toListImpl to support conversions into lists and sequences #1046

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

martinitus
Copy link

The previous implementation already was based on iterating over rows, with this generalization, the user can now decide whether to eagerly convert a Dataframe into a list, or use lazy transformations via a sequence.

Resolves: #1045

@martinitus martinitus force-pushed the 1045-dataframe-to-sequence branch 3 times, most recently from f38649f to 9781085 Compare January 31, 2025 10:12
@martinitus
Copy link
Author

I am note entirely sure, if I correctly followed the pattern of separation between interface and implementation - Not that familiar with Kotlin. Let me know if you want some stuff changed / reorganized!

@koperagen
Copy link
Collaborator

koperagen commented Jan 31, 2025

Hi! It's a good improvement and contributions are welcome, so thank you for the interesting proposal :) I looked at the implementation and noticed that data is still converted eagerly for two reasons:
convertedColumns does part of the work, specifically conversion of nested lists or objects. Making it eager would be more difficult task
AnyFrame.toSequenceImpl(type: KType): Iterable<Any> is still not lazy, i think you'd need return rows().asSequence().map { to avoid converting all rows here

The previous implementation already was based on iterating over rows,
with this generalization, the user can now decide whether to eagerly
convert a dataframe into a list, or use lazy transformations via a
sequence.
@martinitus martinitus force-pushed the 1045-dataframe-to-sequence branch from 9781085 to 4afa823 Compare February 3, 2025 08:55
@martinitus
Copy link
Author

Nice to hear that :-)

Oh yes - I just looked up the difference between Kotlin Iterable and Sequence - I thought they were both lazy 🤷🏼 - Now I understand ! Added to asSequence where you suggested.

W.r.t. to making the inner / nested objects lazy: I don't think that would give the user a large benefit - at least not when converting into data classes which probably usually hold lists anyway: When the lazy outer conversion reaches a nested List object it has to convert that one eagerly anyway?

That said: I could have a look into implementing that - but I am not sure if I would succeed 😁

@koperagen
Copy link
Collaborator

koperagen commented Feb 3, 2025

W.r.t. to making the inner / nested objects lazy: I don't think that would give the user a large benefit - at least not when converting into data classes which probably usually hold lists anyway: When the lazy outer conversion reaches a nested List object it has to convert that one eagerly anyway?

That said: I could have a look into implementing that - but I am not sure if I would succeed 😁

In fact i also suggest to maybe not improve this further for now. If your dataframes are flat, without ColumnGroup or FrameColumn, it will work as you expect with Sequence. Here's what will happen if hierarchical data is involved:

data class Nested(val a: String, val b: Int)

data class MySchema(val i: Int, val d: Double, val nested: Nested)

df.toSequenceOf<MySchema>().take(10).toList()

Entire nested column is recursively converted from ColumnGroup to DataColumn, and then lazy sequence of MySchema is cut off at 10 items.

Example with FrameColumn:

data class Nested(val a: String, val b: Int)

data class MySchema(val i: Int, val d: Double, val nested: List<Nested>)

df.toSequenceOf<MySchema>().take(10).toList()

Entire nested column is recursively converted from FrameColumn to DataColumn<List<Nested>>, and then lazy sequence of MySchema is cut off at 10 items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide toSequenceOf<SomeDataClass> similar to toListOf
3 participants