Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Arel instead of String for AR Enumerator conditionals #456

Merged
merged 5 commits into from
Feb 14, 2024

Conversation

pedropb
Copy link
Contributor

@pedropb pedropb commented Feb 2, 2024

Problem statement

On our application we noticed some inconsistencies with the usage of Rails 7.1 and job-iteration.

For example:

enumerator_builder.active_record_on_batches(Model.all, cursor: cursor)

when iterating would result in queries like:

 WHERE (reconciled_transaction_fees.id > '238427110') ORDER BY reconciled_transaction_fees.id LIMIT 3
 WHERE (reconciled_transaction_fees.id > '304256793') ORDER BY reconciled_transaction_fees.id LIMIT 3
 WHERE (reconciled_transaction_fees.id > '421883103') ORDER BY reconciled_transaction_fees.id LIMIT 3

While reconciled_transaction_fees.id > '421883103' is not wrong for adapters like MySQL and Postgres, it relies on behaviour from those DBMS to convert from strings to integer and still use the indices.

While working on a BigQuery adapter for ActiveRecord, we noticed that BigQuery will raise when comparing numeric columns with strings.

Proposal

So instead of manipulating strings to build the paging conditional:
https://github.com/Shopify/job-iteration/blob/main/lib/job-iteration/active_record_batch_enumerator.rb#L97-L101

We propose using Rails Arel:

conditions = if @columns.size == @position.size
column.gt(@position[i])
else
column.gteq(@position[i])
end

which will emit queries without the quotes:

WHERE `reconciled_transaction_fees`.`id` > 238427110 ORDER BY `reconciled_transaction_fees`.`id` LIMIT 3
WHERE `reconciled_transaction_fees`.`id` > 304256793 ORDER BY `reconciled_transaction_fees`.`id` LIMIT 3
WHERE `reconciled_transaction_fees`.`id` > 421883103 ORDER BY `reconciled_transaction_fees`.`id` LIMIT 3

What's in this PR?

These changes affect:

  • enumerator_builder.active_record_on_records
  • enumerator_builder.active_record_on_batches

The change affects only the columns: optional argument.

  • columns: can still be passed as strings or symbols. Internally, the job-iteration code will convert the symbols to Arel attributes and manipulate them as Arel instead of strings.
    - (new!) columns: can also be passed as Arel attributes.

The Ugly

There are 2 main problems with the approach:

  1. Arel is a private API subject to change. I tested this with Rails 7 (the job-iteration unit tests) and 7.1 (my app's unit tests). I'm relying on CI to test against other builds.

  2. The new code does not support multi-column cursor when more than one table is used (cursor: ["products.id", "comments.id"]). While trying to add support to that, I created a test file and noticed that job-iteration code had a bug for multi-column cursors on different tables. This PR does not resolve the bug (job-iteration can cause infinite loop when using multi-column cursor in different tables with the same attribute #457 )

Copy link
Contributor

@nvasilevski nvasilevski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arel is a private API subject to change.

Fair but it feels like building raw SQL is even less stable especially when it comes to supporting multiple db adapters

(new!) columns: can also be passed as Arel attributes.

But I wonder if that's something we should still disallow. One thing is to rely on Arel internally but the other thing is to allow applications to pass arel attributes and promise that the gem will always (within a major release) maintain it. Perhaps it should actually be disallowed

The new code does not support multi-column cursor when more than one table is used

But it wasn't supported before, right? Or at least it wasn't working due to a bug, right? So I'm not too concerned about it as long as it doesn't break something that used to be working

Overall I think this is a reasonable change. @Shopify/rails would love to hear more opinions!

lib/job-iteration/active_record_cursor.rb Outdated Show resolved Hide resolved
Copy link
Contributor

@Mangara Mangara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall change makes sense to me. On the input, I agree with Nikita - using Arel internally in the cursor is fine (assuming we have good test coverage for it), but the interface for users of the gem should just accept strings and symbols.

@pedropb pedropb force-pushed the pb-use-arel-for-conditions branch from 5d18040 to 51d8f8e Compare February 10, 2024 00:24
@pedropb
Copy link
Contributor Author

pedropb commented Feb 10, 2024

Thanks for the feedback @nvasilevski @Mangara
I think it's a fair compromise to keep the Arel dependency internal to job-iteration and not expose/support it publicly.

Pushed commit 51d8f8e to drop the conditional that handled Arel inputs as suggested.

Copy link
Contributor

@Mangara Mangara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the new implementation. Could you add a test case that fails before this change but passes with it?

@pedropb
Copy link
Contributor Author

pedropb commented Feb 13, 2024

I'm happy with the new implementation. Could you add a test case that fails before this change but passes with it?

I can add a new unit test file and some test cases for #conditions, if that makes sense.

I want to call out that ActiveRecordCursor is marked as private class and the conditions method is protected. I'm mentioning this in case there are any strong opinions on testing patterns, because from a library perspective, testing if the query is emitted with quotes or not might not fit the grand-scheme of things.

@Mangara
Copy link
Contributor

Mangara commented Feb 13, 2024

If it's important to prevent regressions, I'm fine with testing the precise SQL queries we generate. Ideally, we'd test the public interface of ActiveRecordEnumerator instead of ActiveRecordCursor or the conditions method directly.

@pedropb pedropb force-pushed the pb-use-arel-for-conditions branch from b43d8af to f71d0ac Compare February 14, 2024 00:09
@pedropb
Copy link
Contributor Author

pedropb commented Feb 14, 2024

@Mangara, as suggested:

  • assert_sql helper introduced in ff5935d
  • and test case for the integer conditional verification that fails on main: f71d0ac

@Mangara
Copy link
Contributor

Mangara commented Feb 14, 2024

Oh, before we merge - could you add a note to the change log?

@pedropb pedropb force-pushed the pb-use-arel-for-conditions branch from 868a50e to f2392df Compare February 14, 2024 22:05
@pedropb
Copy link
Contributor Author

pedropb commented Feb 14, 2024

Change log added!
Merging.

@pedropb pedropb merged commit a01977d into main Feb 14, 2024
45 checks passed
@pedropb pedropb deleted the pb-use-arel-for-conditions branch February 14, 2024 23:59
Array(relation.primary_key).map { |pk| "#{relation.table_name}.#{pk}" }
end
def initialize(relation, columns, position = nil)
@columns = columns
self.position = Array.wrap(position)
raise ArgumentError, "Must specify at least one column" if columns.empty?
if relation.joins_values.present? && [email protected]? { |column| column.to_s.include?(".") }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still necessary, now that it's an arel attribute being passed in instead of a string?

Copy link
Contributor Author

@pedropb pedropb May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic was left untouched because on the presence of join conditions, the ORDER by columns can become ambiguous and fail the queries.
The .all? clause was a constraint to prevent this failure mode.

Also FWIW, the changes in this PR were reverted in #483 (v1.5.1). There's more context about the bug it introduced in #483.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants