-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow fetching all rows from results endpoint #8389
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8389 +/- ##
=========================================
+ Coverage 67.57% 67.67% +0.1%
=========================================
Files 448 448
Lines 22527 22492 -35
Branches 2364 2364
=========================================
Hits 15222 15222
+ Misses 7167 7132 -35
Partials 138 138
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add documentation for this flag somewhere?
@etr2460 will do! I'll also add some unit tests. |
docs/sqllab.rst
Outdated
applications. When retrieving results from asynchronous queries ran in SQL Lab | ||
from the results backend, the config `DISPLAY_MAX_ROW` will still be applied, | ||
even though the results might not necessarily be rendered in a display. In order | ||
to bypass the limit you can pass the query parameter `bypass_display_limit=true` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would ignore_display_limit
to more precise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any strong preferences. @etr2460, any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah, i like ignore_display_limit
better personally
Sorry, I probably should have thought of this before, but maybe we should have the client explicitly ask for DISPLAY_MAX_ROWS rows from the results instead of the backend automatically applying it. So instead of adding an |
No worries, I agree that's a better approach. |
cc: @etr2460 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks much better! one other question
@@ -176,7 +176,9 @@ def get_datasource_info( | |||
return datasource_id, datasource_type | |||
|
|||
|
|||
def apply_display_max_row_limit(sql_results: Dict[str, Any]) -> Dict[str, Any]: | |||
def apply_display_max_row_limit( | |||
sql_results: Dict[str, Any], rows: Optional[int] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this ever called with rows
not defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, synchronous queries will still call this to limit the response:
payload = json.dumps(
apply_display_max_row_limit(data),
default=utils.pessimistic_json_iso_dttm_ser,
ignore_nan=True,
encoding=None,
)
I think it makes sense to limit the sync response, disallowing users from bypassing it. if the query is returned more than DISPLAY_MAX_ROWS
and the user needs that data they should run it asynchronously, IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Async vs sync queries are dependent on the config of the datasource right? Which is something the user doesn't have any control over unless they're admin. I guess if you're calling this with an API, then you can decide if you want to make an async or async query yourself though.
I think it's fine, but it's just another little wart with superset that we need to remember
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one other comment about the url/api design (sorry i didn't catch all these at once), but after that i think it lgtm
superset/views/core.py
Outdated
@@ -2459,8 +2459,9 @@ def cache_key_exist(self, key): | |||
|
|||
@has_access_api | |||
@expose("/results/<key>/") | |||
@expose("/results/<key>/<int:rows>") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems a little weird to encode the limit in the url like this. I think it would be preferable to construct the url like: /results/key?rows=1000
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, thanks for all the iteration! lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @etr2460 I missed your comment re: replacing ignore with explicit DISPLAY_MAX_ROWS
. Agree, it's a much much better approach. LGTM!
* Allow bypassing DISPLAY_MAX_ROW * Add unit tests and docs * Fix tests * Fix mock * Fix unit test * Revert config change after test * Change behavior * Address comments
CATEGORY
Choose one
SUMMARY
Currently when results are fetched from
/superset/results/
we applyDISPLAY_MAX_ROW
, to limit the amount of data displayed in the UI. At Lyft, we have other clients accessing Superset programmatically, and we would like to bypass the limit when fetching data from these clients.I changed the endpoint behavior so that by default
DISPLAY_MAX_ROW
is not applied, but it can be passed optionally. The frontend was changed to query/superset/results/${DISPLAY_MAX_ROW}
, returning only a subset of the rows.TEST PLAN
Tested with
curl
, confirmed it works. Added unit tests.ADDITIONAL INFORMATION
REVIEWERS