PandasConnector #3492

rhunwicks · 2017-09-19T00:59:33Z

Here is an initial draft of PandasConnector for consideration, as discussed in #3302.

It is not complete, but it shows the general approach.

I have created contrib.tests.connectors which implements a BaseConnectorTestCase which I am using to make sure that the results returned by PandasConnector are consistent with those from SqlaTable.

If we get to the point where this PR can be merged, then we will probably want to refactor to put BaseConnectorTestCase and SqlaConnectorTestCase into the main tests directory and just leave the PandasConnectorTestCase in the contrib directory.

xrmx

Some small nitpicks

xrmx · 2017-09-20T08:59:26Z

contrib/connectors/pandas/models.py

+        d['granularity_sqla'] = utils.choicify(self.dttm_cols)
+        d['time_grain_sqla'] = [(g, g) for g in self.GRAINS.keys()]
+        logging.info(d)
+        print(d)


debug leftover

xrmx · 2017-09-20T09:03:14Z

contrib/connectors/pandas/models.py

+
+class PandasDatabase(object):
+    """Non-ORM object for a Pandas Source"""
+    database_name = ''


i don't think you need to set defaults if they are not optional in init

xrmx · 2017-09-20T09:08:14Z

contrib/connectors/pandas/models.py

+
+    Each Pandas Datasource can have multiple columns"""
+
+    __tablename__ = 'pandascolumns'


We may want to be consistent with sqla backend and call it 'pandas_columns'

xrmx · 2017-09-20T09:12:42Z

contrib/connectors/pandas/models.py

+        # Build a dict of the metrics to include, including those that
+        # are required for post-aggregation filtering
+        filtered_metrics = [flt['col']
+                            for flt in extras.get('having_druid', [])


having_druid? If we are reusing the field a comment would be nice

I'm adding a comment for the use of having_druid, and similar ones for granularity_sqla and time_grain_sqla. However, if there is a long term plan to encourage the development of more connectors it would be better to rename these parameters in the main code. For example to granularity_col, granularity_freq and post_aggregation_filter.

xrmx · 2017-09-20T09:16:50Z

contrib/connectors/pandas/models.py

+
+        This is used to be displayed to the user so that she/he can
+        understand what is taking place behind the scene"""
+        import json


you can make these imports global

I'll take them out, they are part of the debugging code.

xrmx · 2017-09-20T09:17:25Z

contrib/connectors/pandas/models.py

+            return val.isoformat() + "Z"
+
+        logging.info(json.dumps(query_obj, indent=4, default=to_serializable))
+        print(json.dumps(query_obj, indent=4, default=to_serializable))


xrmx · 2017-09-20T09:45:59Z

contrib/connectors/pandas/views.py

+        'format': _(
+            "The format of the raw data, e.g. csv"),
+        'additional_parameters': _(
+            "A JSON-formatted dictionary of additional parameters "


Could we have add something like: "These are the actual parameters for the read_* functions, see https://pandas.pydata.org/pandas-docs/stable/api.html"

xrmx · 2017-09-20T09:48:25Z

Thanks a lot for your great work, i'd really like this to be a first citizen in superset

rhunwicks · 2017-09-20T11:38:06Z

contrib/connectors/pandas/models.py

+        # for post-aggregation filters, and we are reusing that
+        # interface component.
+        filtered_metrics = [flt['col']
+                            for flt in extras.get('having_druid', [])


I'm have added a comment for the use of having_druid, and similar ones for granularity_sqla and time_grain_sqla. However, if there is a long term plan to encourage the development of more connectors it would be better to rename these parameters in the main code. For example to granularity_col, granularity_freq and post_aggregation_filter.

coveralls · 2017-09-28T10:44:07Z

Coverage remained the same at 70.161% when pulling 9df1786 on kimetrica:3302-pandas-connctor into ef59b6b on apache:master.

coveralls · 2017-09-28T10:44:08Z

Coverage remained the same at 70.161% when pulling 9df1786 on kimetrica:3302-pandas-connctor into ef59b6b on apache:master.

coveralls · 2017-09-28T10:44:08Z

Coverage remained the same at 70.161% when pulling 9df1786 on kimetrica:3302-pandas-connctor into ef59b6b on apache:master.

coveralls · 2017-09-28T10:44:08Z

Coverage remained the same at 70.161% when pulling 9df1786 on kimetrica:3302-pandas-connctor into ef59b6b on apache:master.

coveralls · 2017-09-28T12:18:59Z

Coverage increased (+0.5%) to 70.612% when pulling 4b47b75 on kimetrica:3302-pandas-connctor into ef59b6b on apache:master.

coveralls · 2017-09-28T13:51:22Z

Coverage increased (+0.5%) to 70.612% when pulling d156430 on kimetrica:3302-pandas-connctor into ef59b6b on apache:master.

coveralls · 2017-09-28T14:53:05Z

Coverage increased (+0.7%) to 70.83% when pulling fa38256 on kimetrica:3302-pandas-connctor into ef59b6b on apache:master.

coveralls · 2017-09-28T14:53:05Z

Coverage increased (+0.7%) to 70.83% when pulling fa38256 on kimetrica:3302-pandas-connctor into ef59b6b on apache:master.

rhunwicks · 2017-09-28T15:00:59Z

@mistercrunch I have moved the migration into superset/migrations/versions in order to get Travis to run. I can move it back out if you decide you want to keep it in a contrib directory.

I have also added lxml and beautifulsoup4 to dev-reqs.txt because contrib.tests.connector_tests uses them to prepare test data. connector_tests is probably useful for other people writing connectors as well as providing increased test coverage for superset/connectors/sqla/models.py so you might want to adopt some of it into core.

Please let me know what you would like me to do next?

coveralls · 2017-10-03T13:56:31Z

Coverage increased (+0.7%) to 70.814% when pulling 1df6237 on kimetrica:3302-pandas-connctor into ef59b6b on apache:master.

coveralls · 2017-10-05T18:31:09Z

Coverage increased (+0.4%) to 70.535% when pulling 878c7c4 on kimetrica:3302-pandas-connctor into ef59b6b on apache:master.

coveralls · 2017-10-05T19:05:23Z

Coverage increased (+0.4%) to 70.535% when pulling 2826876 on kimetrica:3302-pandas-connctor into ef59b6b on apache:master.

coveralls · 2017-10-05T19:58:50Z

Coverage increased (+0.4%) to 70.535% when pulling b24a700 on kimetrica:3302-pandas-connctor into ef59b6b on apache:master.

coveralls · 2017-10-05T21:23:04Z

Coverage increased (+0.6%) to 70.753% when pulling 3610ac0 on kimetrica:3302-pandas-connctor into ef59b6b on apache:master.

coveralls · 2017-10-17T14:39:47Z

Coverage increased (+1.3%) to 71.412% when pulling bb78be8 on kimetrica:3302-pandas-connctor into ef59b6b on apache:master.

apache#3302

rhunwicks · 2018-04-16T09:41:20Z

Having the migrations and requirements mixed in with the main Superset ones without ever merging this MR causes frequent merge conflicts. Therefore, I am going to close this MR. We will continue to maintain the connector in order to allow us to use Superset with APIs and remote files, but we will do so in our fork rather than a MR.

stu-co · 2018-07-11T13:21:11Z

@rhunwicks thank you for creating this code. It seems pretty perfect for us to be honest - I wish it had made it into superset!

rhunwicks added 2 commits September 19, 2017 02:54

Initial commit of PandasConnector - see apache#3302

b6c6cf1

remove temporary files - see apache#3302

44649de

rhunwicks changed the title ~~Initial commit of PandasConnector - see #3302~~ PandasConnector Sep 19, 2017

Fix PandasDatasource.baselink - see apache#3302

48156e0

xrmx reviewed Sep 20, 2017

View reviewed changes

rhunwicks added 2 commits September 20, 2017 12:55

Rename PandasConnectors sqla tables for consistency

3bf0857

Remove redundant debug code

1520c9f

rhunwicks commented Sep 20, 2017

View reviewed changes

Update PandasConnector for order_desc - see apache#3302

5960f49

xrmx mentioned this pull request Sep 27, 2017

Latest import csv #3533

Closed

rhunwicks added 4 commits September 28, 2017 09:38

Add PandasConnector migration to main versions folder - see apache#3302

e2296ac

Add PandasConnector migration to main versions folder - see apache#3302

848a223

Merge remote-tracking branch 'upstream/master' into 3302-pandas-connctor

5375c59

Add PandasConnector migration to main versions folder - see apache#3302

9df1786

rhunwicks added 2 commits September 28, 2017 12:59

Renamed connector_tests.py - see apache#3302

c015197

Add lxml and bs4 to dev-reqs - see apache#3302

4b47b75

rhunwicks added 2 commits September 28, 2017 15:01

Update PandasMetric with warning_text - see apache#3302

f8483b5

More efficient PandasDatasource.get_metadata - see apache#3302

d156430

Python27 compatibility for PandasConnector - see apache#3302

fa38256

Better tests for summary metrics - see apache#3302

1df6237

Cache source dataframes in PandasDatasource - see apache#3302

878c7c4

Py2 fix for DataFrameCache - see apache#3302

2826876

Py2 fix for DataFrameCache - see apache#3302

b24a700

Py2 fix for DataFrameCache - see apache#3302

3610ac0

rhunwicks added 2 commits October 10, 2017 14:24

Configure dataframe cache from settings - see apache#3302

a0657a9

Fix import in contrib.tests.cache_tests - see apache#3302

bb78be8

rhunwicks added 11 commits November 29, 2017 15:39

Merge branch 'master' into 3302-pandas-connctor

97070aa

Merge branch 'master' into 3302-pandas-connctor

c892e50

Support authentication against remote APIs for PandasConnector - see

0a24cb6

apache#3302

Flake8 fixes - see apache#3302

e5fcc1d

Flake8 fixes - see apache#3302

15d91bb

Improved dataframe cache - see apache#3302

0f98a55

Restore datasource count metric for PandasDatasource - see apache#3302

7c4a2d8

Include dttm_cols in datasource.data - see apache#3302

4f15e07

Put deps at end of setup.py to reduce merge conflicts - see apache#3302

ebeda44

Merge remote-tracking branch 'upstream/master' into 3302-pandas-connctor

ce39029

Flake8 fix for setup.py - see apache#3302

62d9438

rhunwicks closed this Apr 16, 2018

rhunwicks mentioned this pull request Apr 16, 2018

Create a PandasDatasource #3302

Closed

1 task

maartenbreddels mentioned this pull request Oct 5, 2018

POC: Vaex connector #6041

Closed

bipinsoniguavus mentioned this pull request Nov 2, 2018

Feature/rest api ThalesGroup/incubator-superset#12

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PandasConnector #3492

PandasConnector #3492

rhunwicks commented Sep 19, 2017

xrmx left a comment

xrmx Sep 20, 2017

xrmx Sep 20, 2017

xrmx Sep 20, 2017

xrmx Sep 20, 2017

rhunwicks Sep 20, 2017

xrmx Sep 20, 2017

rhunwicks Sep 20, 2017

xrmx Sep 20, 2017

xrmx Sep 20, 2017

rhunwicks Sep 20, 2017

xrmx commented Sep 20, 2017

rhunwicks Sep 20, 2017

coveralls commented Sep 28, 2017

coveralls commented Sep 28, 2017

coveralls commented Sep 28, 2017

coveralls commented Sep 28, 2017

coveralls commented Sep 28, 2017 •

edited

Loading

coveralls commented Sep 28, 2017 •

edited

Loading

coveralls commented Sep 28, 2017

coveralls commented Sep 28, 2017 •

edited

Loading

rhunwicks commented Sep 28, 2017

coveralls commented Oct 3, 2017 •

edited

Loading

coveralls commented Oct 5, 2017 •

edited

Loading

coveralls commented Oct 5, 2017 •

edited

Loading

coveralls commented Oct 5, 2017 •

edited

Loading

coveralls commented Oct 5, 2017 •

edited

Loading

coveralls commented Oct 17, 2017 •

edited

Loading

rhunwicks commented Apr 16, 2018

stu-co commented Jul 11, 2018


		Each Pandas Datasource can have multiple columns"""

		__tablename__ = 'pandascolumns'

PandasConnector #3492

PandasConnector #3492

Conversation

rhunwicks commented Sep 19, 2017

xrmx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xrmx commented Sep 20, 2017

Choose a reason for hiding this comment

coveralls commented Sep 28, 2017

coveralls commented Sep 28, 2017

coveralls commented Sep 28, 2017

coveralls commented Sep 28, 2017

coveralls commented Sep 28, 2017 • edited Loading

coveralls commented Sep 28, 2017 • edited Loading

coveralls commented Sep 28, 2017

coveralls commented Sep 28, 2017 • edited Loading

rhunwicks commented Sep 28, 2017

coveralls commented Oct 3, 2017 • edited Loading

coveralls commented Oct 5, 2017 • edited Loading

coveralls commented Oct 5, 2017 • edited Loading

coveralls commented Oct 5, 2017 • edited Loading

coveralls commented Oct 5, 2017 • edited Loading

coveralls commented Oct 17, 2017 • edited Loading

rhunwicks commented Apr 16, 2018

stu-co commented Jul 11, 2018

coveralls commented Sep 28, 2017 •

edited

Loading

coveralls commented Sep 28, 2017 •

edited

Loading

coveralls commented Sep 28, 2017 •

edited

Loading

coveralls commented Oct 3, 2017 •

edited

Loading

coveralls commented Oct 5, 2017 •

edited

Loading

coveralls commented Oct 5, 2017 •

edited

Loading

coveralls commented Oct 5, 2017 •

edited

Loading

coveralls commented Oct 5, 2017 •

edited

Loading

coveralls commented Oct 17, 2017 •

edited

Loading