Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependency of common for the storage crate #2076

Merged
merged 1 commit into from
Mar 25, 2022
Merged

Remove dependency of common for the storage crate #2076

merged 1 commit into from
Mar 25, 2022

Conversation

yahoNanJing
Copy link
Contributor

@yahoNanJing yahoNanJing commented Mar 24, 2022

Which issue does this PR close?

Closes #2075.

Rationale for this change

It's better to make the storage crate not dependent on any datafusion crates, which will make it easy for the object store extensions to depend on the storage crate. And it will also make it easy for the datafusion core to depend on both the storage crate and the object store extensions.

What's more, the storage crate should be relatively stable, since it defines the basic traits for accessing object stores. Later, maybe we can extract it to another repository. Then the repositories of the object store extensions and the datafusion repository won't depend on each other.

What changes are included in this PR?

  • Remove the dependency of datafusion-common in the datafusion-storage
  • Move out the PartitionedFile from the datafusion-storage to the datafusion::datasource::listing of datafusion-core.

Are there any user-facing changes?

@yahoNanJing
Copy link
Contributor Author

Hi @matthewmturner, @yjshen, @alamb, what do you think?

use std::{io, result};

/// Result type for operations that could result in an io error
pub type Result<T> = result::Result<T, io::Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im wondering if any impact from using this instead of DataFusionError::IoError but i think its fine and of course it would require keeping datafusion-common dependency which would defeat the point of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this PR is to remove the datafusion-common dependency.

When you need to use DataFusionError::IoError, it's simply to add .map_err(DataFusionError::IoError) like some places in this PR does.

@matthewmturner
Copy link
Contributor

I think this is quite nice.

Can you just clarify how this will make it easier for datafusion core to depend on the object store extensions?

Piggy backing on @alamb's idea from yesterday, and in light of this change, im wondering if new crate for all listing related code could be datafusion-storage-listing to show how they are linked.

@yahoNanJing
Copy link
Contributor Author

yahoNanJing commented Mar 24, 2022

Can you just clarify how this will make it easier for datafusion core to depend on the object store extensions?

Suppose we hope to introduce optional features of object store extensions in the datafusion core. Both of the datafusion core and the object store extensions should depend on the same storage crate, either by version or rev rather than by path. Then if the storage crate depends on the common crate, it will be not good for the common crate to have the same version or rev as the one for the storage crate. Otherwise, if the storage crate does not depend on any datafusion crate, it will be possible to make it stable with some version. And then both of the datafusion core and object store extensions depend on that version.

Piggy backing on @alamb's idea from yesterday, and in light of this change, im wondering if new crate for all listing related code could be datafusion-storage-listing to show how they are linked.

I'm good with creating a new crate for the listing things.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good change to me. However, I wonder if this is going to be some sort of general purpose object store interface, perhaps it doesn't belong in DataFusion at all 🤔

cc @tustvold and @rdettai and @yjshen in case they have thoughts

@yahoNanJing
Copy link
Contributor Author

yahoNanJing commented Mar 24, 2022

Seems like a good change to me. However, I wonder if this is going to be some sort of general purpose object store interface, perhaps it doesn't belong in DataFusion at all 🤔

Actually, I prefer to regard the storage crate as a general object store interface and move it out from datafusion😂.

And maybe we can manage the object store like Ballista and reorganize the whole project like follows:
Screen Shot 2022-03-25 at 10 17 28 AM

And have another PR for the reorganization.

@yahoNanJing
Copy link
Contributor Author

Hi @alamb, @matthewmturner, @yjshen, another PR is created for reorganizing the whole project folders #2081. What do you think?

Copy link
Member

@yjshen yjshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alamb alamb merged commit d644fae into apache:master Mar 25, 2022
@alamb
Copy link
Contributor

alamb commented Mar 25, 2022

Thanks @yahoNanJing @yjshen and @matthewmturner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove dependency of common crate for the storage crate
5 participants