Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested facets should only consider first 1000 rows #2406

Closed
simonw opened this issue Aug 21, 2024 · 5 comments
Closed

Suggested facets should only consider first 1000 rows #2406

simonw opened this issue Aug 21, 2024 · 5 comments

Comments

@simonw
Copy link
Owner

simonw commented Aug 21, 2024

We get a lot of performance issues from suggested facets - on large tables we end up running multiple SQL queries for every column (one for column facets, one for date facets, one for JSON facets), each with a 50ms facet_suggest_time_limit_ms time limit but even with that in place these can add up - 20 columns could be 20 * 3 * 50 = 3000ms, not including overhead of the Python code that manages the queries.

Since these are really just suggestions, an optimization could be to only consider the first 1,000 rows in the table. This would be enough to spot likely date / JSON / column facets and should be much faster.

@simonw
Copy link
Owner Author

simonw commented Aug 21, 2024

Here's a trace illustrating the problem (thanks to #2405):

CleanShot 2024-08-21 at 12 38 22@2x

@simonw
Copy link
Owner Author

simonw commented Aug 21, 2024

I tried a patch to just do this for column string facets and it worked as expected:

diff --git a/datasette/facets.py b/datasette/facets.py
index ccd85461..1e091afd 100644
--- a/datasette/facets.py
+++ b/datasette/facets.py
@@ -170,9 +170,8 @@ class ColumnFacet(Facet):
             if column in already_enabled:
                 continue
             suggested_facet_sql = """
-                select {column} as value, count(*) as n from (
-                    {sql}
-                ) where value is not null
+                with limited as (select * from ({sql}) limit 1000)
+                select {column} as value, count(*) as n from limited where value is not null
                 group by value
                 limit {limit}
             """.format(

But that doesn't cover other facet types, so I'll try turning the SQL that gets fed to that method into the limited SQL first.

@simonw
Copy link
Owner Author

simonw commented Aug 21, 2024

Given the structure of the current .facet() method:

class ColumnFacet(Facet):
type = "column"
async def suggest(self):
row_count = await self.get_row_count()
columns = await self.get_columns(self.sql, self.params)
facet_size = self.get_facet_size()
suggested_facets = []
already_enabled = [c["config"]["simple"] for c in self.get_configs()]
for column in columns:
if column in already_enabled:
continue
suggested_facet_sql = """
select {column} as value, count(*) as n from (
{sql}
) where value is not null
group by value
limit {limit}
""".format(
column=escape_sqlite(column), sql=self.sql, limit=facet_size + 1
)

class DateFacet(Facet):
type = "date"
async def suggest(self):
columns = await self.get_columns(self.sql, self.params)
already_enabled = [c["config"]["simple"] for c in self.get_configs()]
suggested_facets = []
for column in columns:

It's going to be easier to modify each facet definition rather than the calling code to pass in that limit.

@simonw
Copy link
Owner Author

simonw commented Aug 21, 2024

HUGE improvement:

CleanShot 2024-08-21 at 13 15 02@2x

@simonw
Copy link
Owner Author

simonw commented Aug 21, 2024

DateFacet doesn't need this, it already only checks the first 100:

# Does this column contain any dates in the first 100 rows?
suggested_facet_sql = """
select date({column}) from (
{sql}
) where {column} glob "????-??-*" limit 100;
""".format(
column=escape_sqlite(column), sql=self.sql
)

@simonw simonw closed this as completed in f28ff8e Aug 21, 2024
simonw added a commit that referenced this issue Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant