Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation is not compatible with order_by on the Postgres backend #10321

Closed
2 tasks done
radeusgd opened this issue Jun 19, 2024 · 7 comments · Fixed by #10677
Closed
2 tasks done

Aggregation is not compatible with order_by on the Postgres backend #10321

radeusgd opened this issue Jun 19, 2024 · 7 comments · Fixed by #10677
Assignees
Labels
--bug Type: bug -libs Libraries: New libraries to be implemented l-postgresql

Comments

@radeusgd
Copy link
Member

radeusgd commented Jun 19, 2024

  • Try fix by hand, outside Enso
  • □ Graceful error
  • Fix the problem

While working on #10319 I tried introducing deterministic ordering into my input tables, by adding order_by with an 'index' column.

I have noticed that concat aggregate breaks if the table was ordered before.

Repro (e.g. in REPL):

c = Database.connect (Postgres ...) # put your Postgres connection details here
t = (Table.new [["X", [1,2,3]], ["Y", ['a', 'b', 'c']]]).select_into_database_table c "foobar" temporary=True
t.order_by "X" . print
r_unsorted = t.aggregate [Aggregate_Column.Concatenate "Y"]
r_unsorted.print
r_sorted = t.order_by "X" . aggregate [Aggregate_Column.Concatenate "Y"]
IO.println r_sorted
IO.println r_sorted.print

Expected behaviour

Both sorted and unsorted version should work and return a new table. The unsorted version may have an unspecified ordering in the concatenated string.

Actual behaviour

The line IO.println r_sorted.print currently prints:

(Error: There was an SQL error: ERROR: column "foobar.X" must appear in the GROUP BY clause or be used in an aggregate function
  Pozycja: 214. [Query was: SELECT "foobar"."Concatenate Y" AS "Concatenate Y" FROM (SELECT (?) || string_agg(CASE WHEN "foobar"."Y" IS NULL THEN '' ELSE ("foobar"."Y") END, ?) || (?) AS "Concatenate Y" FROM "foobar" AS "foobar" ORDER BY "foobar"."X" ASC NULLS FIRST) AS "foobar" LIMIT 11])
@radeusgd radeusgd added --bug Type: bug -libs Libraries: New libraries to be implemented l-postgresql labels Jun 19, 2024
@radeusgd
Copy link
Member Author

radeusgd commented Jun 19, 2024

We probably need to find the Context.orders and add an OVER clause to the aggregate to make Postgres happy.

Not sure how this will work for more complex nested queries.

radeusgd added a commit that referenced this issue Jun 19, 2024
@jdunkerley jdunkerley self-assigned this Jun 25, 2024
@GregoryTravis GregoryTravis assigned radeusgd and unassigned jdunkerley Jul 23, 2024
@GregoryTravis GregoryTravis moved this from ❓New to 📤 Backlog in Issues Board Jul 23, 2024
@radeusgd
Copy link
Member Author

Trying to debug - the SQL query is following:

> r_sorted.to_sql
>>> SQL_Statement SELECT "foobar"."Concatenate Y" AS "Concatenate Y" FROM (SELECT (?) || string_agg(CASE WHEN "foobar"."Y" IS NULL THEN '' ELSE ("foobar"."Y") END, ?) || (?) AS "Concatenate Y" FROM "foobar" AS "foobar" ORDER BY "foobar"."X" ASC NULLS FIRST) AS "foobar" with values [, , ]

@radeusgd
Copy link
Member Author

So the problem lies in the CONCATENATE and ORDER BY being in a single SELECT query.

I have rephrased the query:

SELECT "59dbd5a4-04c2-411c-a0dc-db37bd0f6435"."Concatenate Y" AS "Concatenate Y" FROM (SELECT "foobar"."Concatenate Y" AS "Concatenate Y" FROM (SELECT string_agg(CASE WHEN "foobar"."Y" IS NULL THEN '' ELSE ("foobar"."Y") END, '') AS "Concatenate Y" FROM (SELECT * FROM "foobar" AS "foobar" ORDER BY "foobar"."X" ASC NULLS FIRST) AS foobar) AS "foobar") AS "59dbd5a4-04c2-411c-a0dc-db37bd0f6435"

(also removed the (?) interpolations for simplicity)
now such query works:

> c.read 'SELECT "foobar"."Concatenate Y" AS "Concatenate Y" FROM (SELECT string_agg(CASE WHEN "foobar"."Y" IS NULL THEN \'\' ELSE ("foobar"."Y") END, \'\') AS "Concatenate Y" FROM (SELECT * FROM "foobar" AS "foobar" ORDER BY "foobar"."X" ASC NULLS FIRST) AS foobar) AS "foobar"' . print
   | Concatenate Y
---+---------------
 0 | abc

>>> Nothing

@radeusgd
Copy link
Member Author

radeusgd commented Jul 25, 2024

The investigation yields the following observations:

  1. We can detect the problematic situation by checking context.orders.not_empty.
  • Currently check_aggregate_support does not have access to the current Context, but there's no reason it couldn't.
  1. We can provide a quick-fix to this issue by adding another layer of subquery in ordering - if the ORDER BY is 'pushed down' to a separate subquery, it will work again as demonstrated by the example query above.
    • But this increases the complexity of the queries. We should try to keep that limited. Ideally we'd only do this when needed and not everywhere - for a simple SELECT a yet another layer of nesting when ORDER BY is added is a bit wasteful and makes the queries huge and hard to read/debug.
    • To avoid doing this every time, instead of adding the sub-query in sort, we could add it in aggregate - we may add a flag that the dialect specifies to tell if it needs such subquery or not.

@radeusgd
Copy link
Member Author

I also was hoping that adding OVER(ORDER BY ...) may be the remedy here, but it seems it actually does not work as expected as it is keeping all entries instead of aggregating:

> c.query 'SELECT "foobar"."Concatenate Y" AS "Concatenate Y" FROM (SELECT string_agg(CASE WHEN "foobar"."Y" IS NULL THEN \'\' ELSE ("foobar"."Y") END, \'\') OVER(ORDER BY "foobar"."X") AS "Concatenate Y" FROM "foobar" ORDER BY "foobar"."X" ASC NULLS FIRST) AS "foobar"' . print
 Concatenate Y
---------------
 a
 ab
 abc

@radeusgd radeusgd changed the title Concatenate aggregation is not compatible with order_by on the Postgres backend Aggregation is not compatible with order_by on the Postgres backend Jul 25, 2024
@radeusgd
Copy link
Member Author

I just checked and this is not specific to Concatenate - any aggregation preceded by sort fails on Postgres - even simple Count:

> t.sort "X" . aggregate [Aggregate_Column.Sum "X"] . print
(Error: There was an SQL error: ERROR: column "foobar.X" must appear in the GROUP BY clause or be used in an aggregate function
  Pozycja: 138. [Query was: SELECT "foobar"."Sum X" AS "Sum X" FROM (SELECT CAST(SUM("foobar"."X") AS numeric(1000,0)) AS "Sum X" FROM "foobar" AS "foobar" ORDER BY "foobar"."X" ASC NULLS FIRST) AS "foobar" LIMIT 11])
> t.sort "X" . aggregate [Aggregate_Column.Count] . print
(Error: There was an SQL error: ERROR: column "foobar.X" must appear in the GROUP BY clause or be used in an aggregate function
  Pozycja: 104. [Query was: SELECT "foobar"."Count" AS "Count" FROM (SELECT COUNT(*) AS "Count" FROM "foobar" AS "foobar" ORDER BY "foobar"."X" ASC NULLS FIRST) AS "foobar" LIMIT 11])

Indeed it does not seem right to ORDER BY a query that was aggregated and will yield just one row. We need to push the ORDER BY into a subquery.

@radeusgd radeusgd moved this from 📤 Backlog to 🔧 Implementation in Issues Board Jul 25, 2024
radeusgd added a commit that referenced this issue Jul 25, 2024
@radeusgd radeusgd moved this from 🔧 Implementation to 👁️ Code review in Issues Board Jul 25, 2024
jdunkerley pushed a commit that referenced this issue Jul 26, 2024
* add test for #10321

* wrap in subquery if ordering was present to fix the bug

* adding one more test just to be sure

* fix import
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Jul 26, 2024
jdunkerley pushed a commit that referenced this issue Jul 26, 2024
* add test for #10321

* wrap in subquery if ordering was present to fix the bug

* adding one more test just to be sure

* fix import

(cherry picked from commit 46e8bab)
@enso-bot
Copy link

enso-bot bot commented Jul 26, 2024

Radosław Waśko reports a new STANDUP for yesterday (2024-07-25):

Progress: Final stuff with relative paths PR. Investigated the Postgres sort+aggregate issue - found a relatively simple fix and implemented it. It should be finished by 2024-07-25.

Next Day: Next day I will be working on the #9812 task. Now do work on types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
--bug Type: bug -libs Libraries: New libraries to be implemented l-postgresql
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants