-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PandasConnector #3492
PandasConnector #3492
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small nitpicks
contrib/connectors/pandas/models.py
Outdated
d['granularity_sqla'] = utils.choicify(self.dttm_cols) | ||
d['time_grain_sqla'] = [(g, g) for g in self.GRAINS.keys()] | ||
logging.info(d) | ||
print(d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug leftover
contrib/connectors/pandas/models.py
Outdated
|
||
class PandasDatabase(object): | ||
"""Non-ORM object for a Pandas Source""" | ||
database_name = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think you need to set defaults if they are not optional in init
contrib/connectors/pandas/models.py
Outdated
|
||
Each Pandas Datasource can have multiple columns""" | ||
|
||
__tablename__ = 'pandascolumns' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to be consistent with sqla backend and call it 'pandas_columns'
contrib/connectors/pandas/models.py
Outdated
# Build a dict of the metrics to include, including those that | ||
# are required for post-aggregation filtering | ||
filtered_metrics = [flt['col'] | ||
for flt in extras.get('having_druid', []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having_druid? If we are reusing the field a comment would be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding a comment for the use of having_druid
, and similar ones for granularity_sqla
and time_grain_sqla
. However, if there is a long term plan to encourage the development of more connectors it would be better to rename these parameters in the main code. For example to granularity_col
, granularity_freq
and post_aggregation_filter
.
contrib/connectors/pandas/models.py
Outdated
|
||
This is used to be displayed to the user so that she/he can | ||
understand what is taking place behind the scene""" | ||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make these imports global
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take them out, they are part of the debugging code.
contrib/connectors/pandas/models.py
Outdated
return val.isoformat() + "Z" | ||
|
||
logging.info(json.dumps(query_obj, indent=4, default=to_serializable)) | ||
print(json.dumps(query_obj, indent=4, default=to_serializable)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug
contrib/connectors/pandas/views.py
Outdated
'format': _( | ||
"The format of the raw data, e.g. csv"), | ||
'additional_parameters': _( | ||
"A JSON-formatted dictionary of additional parameters " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have add something like: "These are the actual parameters for the read_* functions, see https://pandas.pydata.org/pandas-docs/stable/api.html"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Thanks a lot for your great work, i'd really like this to be a first citizen in superset |
# for post-aggregation filters, and we are reusing that | ||
# interface component. | ||
filtered_metrics = [flt['col'] | ||
for flt in extras.get('having_druid', []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm have added a comment for the use of having_druid
, and similar ones for granularity_sqla
and time_grain_sqla
. However, if there is a long term plan to encourage the development of more connectors it would be better to rename these parameters in the main code. For example to granularity_col
, granularity_freq
and post_aggregation_filter
.
3 similar comments
1 similar comment
@mistercrunch I have moved the migration into I have also added Please let me know what you would like me to do next? |
Having the migrations and requirements mixed in with the main Superset ones without ever merging this MR causes frequent merge conflicts. Therefore, I am going to close this MR. We will continue to maintain the connector in order to allow us to use Superset with APIs and remote files, but we will do so in our fork rather than a MR. |
@rhunwicks thank you for creating this code. It seems pretty perfect for us to be honest - I wish it had made it into superset! |
Here is an initial draft of PandasConnector for consideration, as discussed in #3302.
It is not complete, but it shows the general approach.
I have created contrib.tests.connectors which implements a BaseConnectorTestCase which I am using to make sure that the results returned by PandasConnector are consistent with those from SqlaTable.
If we get to the point where this PR can be merged, then we will probably want to refactor to put BaseConnectorTestCase and SqlaConnectorTestCase into the main tests directory and just leave the PandasConnectorTestCase in the contrib directory.