-
-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggested facets should only consider first 1000 rows #2406
Comments
Here's a trace illustrating the problem (thanks to #2405): |
I tried a patch to just do this for column string facets and it worked as expected: diff --git a/datasette/facets.py b/datasette/facets.py
index ccd85461..1e091afd 100644
--- a/datasette/facets.py
+++ b/datasette/facets.py
@@ -170,9 +170,8 @@ class ColumnFacet(Facet):
if column in already_enabled:
continue
suggested_facet_sql = """
- select {column} as value, count(*) as n from (
- {sql}
- ) where value is not null
+ with limited as (select * from ({sql}) limit 1000)
+ select {column} as value, count(*) as n from limited where value is not null
group by value
limit {limit}
""".format( But that doesn't cover other facet types, so I'll try turning the SQL that gets fed to that method into the limited SQL first. |
Given the structure of the current Lines 160 to 180 in 8a63cdc
Lines 460 to 467 in 8a63cdc
It's going to be easier to modify each facet definition rather than the calling code to pass in that limit. |
Lines 470 to 477 in 8a63cdc
|
We get a lot of performance issues from suggested facets - on large tables we end up running multiple SQL queries for every column (one for column facets, one for date facets, one for JSON facets), each with a 50ms
facet_suggest_time_limit_ms
time limit but even with that in place these can add up - 20 columns could be 20 * 3 * 50 = 3000ms, not including overhead of the Python code that manages the queries.Since these are really just suggestions, an optimization could be to only consider the first 1,000 rows in the table. This would be enough to spot likely date / JSON / column facets and should be much faster.
The text was updated successfully, but these errors were encountered: