Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow fetching all rows from results endpoint #8389

Merged
merged 8 commits into from
Oct 25, 2019

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Oct 14, 2019

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

Currently when results are fetched from /superset/results/ we apply DISPLAY_MAX_ROW, to limit the amount of data displayed in the UI. At Lyft, we have other clients accessing Superset programmatically, and we would like to bypass the limit when fetching data from these clients.

I changed the endpoint behavior so that by default DISPLAY_MAX_ROW is not applied, but it can be passed optionally. The frontend was changed to query /superset/results/${DISPLAY_MAX_ROW}, returning only a subset of the rows.

TEST PLAN

Tested with curl, confirmed it works. Added unit tests.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@codecov-io
Copy link

codecov-io commented Oct 14, 2019

Codecov Report

Merging #8389 into master will increase coverage by 0.1%.
The diff coverage is 50%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #8389     +/-   ##
=========================================
+ Coverage   67.57%   67.67%   +0.1%     
=========================================
  Files         448      448             
  Lines       22527    22492     -35     
  Branches     2364     2364             
=========================================
  Hits        15222    15222             
+ Misses       7167     7132     -35     
  Partials      138      138
Impacted Files Coverage Δ
...erset/assets/src/SqlLab/components/QuerySearch.jsx 58.65% <ø> (ø) ⬆️
superset/views/base.py 70.64% <ø> (-0.29%) ⬇️
.../assets/src/SqlLab/components/TabbedSqlEditors.jsx 83.33% <ø> (ø) ⬆️
...uperset/assets/src/SqlLab/components/SouthPane.jsx 91.42% <ø> (ø) ⬆️
...uperset/assets/src/SqlLab/components/SqlEditor.jsx 52.81% <ø> (ø) ⬆️
...rset/assets/src/SqlLab/components/QueryHistory.jsx 83.33% <ø> (ø) ⬆️
...perset/assets/src/SqlLab/components/QueryTable.jsx 59.25% <0%> (ø) ⬆️
superset/views/utils.py 89.77% <100%> (+2.27%) ⬆️
...uperset/assets/src/SqlLab/components/ResultSet.jsx 79.77% <100%> (ø) ⬆️
superset/assets/src/SqlLab/components/App.jsx 77.77% <50%> (ø) ⬆️
... and 66 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2117d1e...efe9349. Read the comment docs.

Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add documentation for this flag somewhere?

@betodealmeida
Copy link
Member Author

@etr2460 will do! I'll also add some unit tests.

@betodealmeida betodealmeida added enhancement:request Enhancement request submitted by anyone from the community sqllab Namespace | Anything related to the SQL Lab lyft Related to Lyft minor-review labels Oct 15, 2019
docs/sqllab.rst Outdated
applications. When retrieving results from asynchronous queries ran in SQL Lab
from the results backend, the config `DISPLAY_MAX_ROW` will still be applied,
even though the results might not necessarily be rendered in a display. In order
to bypass the limit you can pass the query parameter `bypass_display_limit=true`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would ignore_display_limit to more precise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any strong preferences. @etr2460, any thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, i like ignore_display_limit better personally

@etr2460
Copy link
Member

etr2460 commented Oct 16, 2019

Sorry, I probably should have thought of this before, but maybe we should have the client explicitly ask for DISPLAY_MAX_ROWS rows from the results instead of the backend automatically applying it. So instead of adding an ignore_limit setting, we make the default pass back the entire results set, and the client adds a rows query param to /results that requests DISPLAY_MAX_ROWS rows. I think this might be a bit cleaner, and would make the superset backend behave more like a service with multiple consumers than just assuming the client is asking by default. thoughts @villebro @betodealmeida ?

@betodealmeida
Copy link
Member Author

No worries, I agree that's a better approach.

@betodealmeida betodealmeida changed the title Allow bypassing DISPLAY_MAX_ROW Allow fetching all rows from results endpoint Oct 17, 2019
@betodealmeida
Copy link
Member Author

cc: @etr2460

Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks much better! one other question

@@ -176,7 +176,9 @@ def get_datasource_info(
return datasource_id, datasource_type


def apply_display_max_row_limit(sql_results: Dict[str, Any]) -> Dict[str, Any]:
def apply_display_max_row_limit(
sql_results: Dict[str, Any], rows: Optional[int] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this ever called with rows not defined?

Copy link
Member Author

@betodealmeida betodealmeida Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, synchronous queries will still call this to limit the response:

            payload = json.dumps(
                apply_display_max_row_limit(data),
                default=utils.pessimistic_json_iso_dttm_ser,
                ignore_nan=True,
                encoding=None,
            )

I think it makes sense to limit the sync response, disallowing users from bypassing it. if the query is returned more than DISPLAY_MAX_ROWS and the user needs that data they should run it asynchronously, IMHO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async vs sync queries are dependent on the config of the datasource right? Which is something the user doesn't have any control over unless they're admin. I guess if you're calling this with an API, then you can decide if you want to make an async or async query yourself though.

I think it's fine, but it's just another little wart with superset that we need to remember

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one other comment about the url/api design (sorry i didn't catch all these at once), but after that i think it lgtm

@@ -2459,8 +2459,9 @@ def cache_key_exist(self, key):

@has_access_api
@expose("/results/<key>/")
@expose("/results/<key>/<int:rows>")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems a little weird to encode the limit in the url like this. I think it would be preferable to construct the url like: /results/key?rows=1000.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, thanks for all the iteration! lgtm

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @etr2460 I missed your comment re: replacing ignore with explicit DISPLAY_MAX_ROWS. Agree, it's a much much better approach. LGTM!

@betodealmeida betodealmeida merged commit e704e29 into apache:master Oct 25, 2019
graceguo-supercat pushed a commit that referenced this pull request Nov 13, 2019
* Allow bypassing DISPLAY_MAX_ROW

* Add unit tests and docs

* Fix tests

* Fix mock

* Fix unit test

* Revert config change after test

* Change behavior

* Address comments
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.36.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels enhancement:request Enhancement request submitted by anyone from the community lyft Related to Lyft size/L sqllab Namespace | Anything related to the SQL Lab 🚢 0.36.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants