Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enforce_version_limit! assumes database records are returned in correct order #940

Closed
danbernier opened this issue Mar 27, 2017 · 3 comments

Comments

@danbernier
Copy link
Contributor

When setting PaperTrail.config.version_limit to N, the expectation is that, when an update creates the (N+1)th version of a record, the oldest versions for that record will be deleted.

This behavior is implemented in PaperTrail::VersionConcern#enforce_version_limit!, which doesn't order the versions by created_at, so it deletes the last versions based on the order the database returns them. This is very often insertion order, but it's often some other order, based on whatever is efficient for the database. You can verify this by calling to_sql on the relation that's used, eg, sibling_versions.to_sql. You get this (reformatted with whitespace):

SELECT "versions".*  FROM "versions" 
WHERE "versions"."item_type" = 'Widget' 
  AND "versions"."item_id" = 1

This behavior - the database returning records in a different order - is hard to reproduce reliably, but is a generally-understood property of RDBMs, so I felt justified in hacking to simulate it. While investigating this issue and working on a fix, I wrote a spec to verify that #enforce_version_limit! deletes the wrong records when they're returned out-of-insertion-order. When I noticed your bug-report template requires a reproduction, I reused that spec (only a minor port from rspec to minitest).

I'll submit a PR with the spec, and the change that fixes it (by adding an ORDER clause), shortly. The new SQL will be:

SELECT "versions".* FROM "versions" 
WHERE "versions"."item_type" = 'Widget' 
  AND "versions"."item_id" = 1  
ORDER BY "versions"."created_at" ASC
# Use this template to report PaperTrail bugs.
# Please include only the minimum code necessary to reproduce your issue.
require "bundler/inline"

# STEP ONE: What versions are you using?
gemfile(true) do
  ruby "2.3.1"
  source "https://rubygems.org"
  gem "activerecord", "4.2.7.1"
  gem "minitest", "5.9.0"
  gem "paper_trail", "5.2.0", require: false
  gem "sqlite3"
  gem "timecop"
end

require "active_record"
require "minitest/autorun"
require "logger"
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = nil
ActiveRecord::Schema.define do
  # STEP TWO: Define your tables here.
  create_table :widgets, force: true do |t|
    t.text :name, null: false
    t.timestamps null: false
  end

  create_table :versions do |t|
    t.string :item_type, null: false
    t.integer :item_id, null: false
    t.string :event, null: false
    t.string :whodunnit
    t.text :object, limit: 1_073_741_823
    t.text :object_changes, limit: 1_073_741_823
    t.integer :transaction_id
    t.datetime :created_at
  end
  add_index :versions, [:item_type, :item_id]
  add_index :versions, [:transaction_id]

  create_table :version_associations do |t|
    t.integer  :version_id
    t.string   :foreign_key_name, null: false
    t.integer  :foreign_key_id
  end
  add_index :version_associations, [:version_id]
  add_index :version_associations, [:foreign_key_name, :foreign_key_id],
    name: "index_version_associations_on_foreign_key"
end
ActiveRecord::Base.logger = Logger.new(STDOUT)
require "paper_trail/config"

# STEP THREE: Configure PaperTrail as you would in your initializer
PaperTrail::Config.instance.track_associations = true

require "paper_trail"

# STEP FOUR: Define your AR models here.
class Widget < ActiveRecord::Base
  has_paper_trail
end

# STEP FIVE: Please write a test that demonstrates your issue.
class BugTest < ActiveSupport::TestCase
  def test_1
    epoch = DateTime.new(2017, 1, 1)

    # Turn off the version_limit, and make a bunch of versions.
    PaperTrail.config.version_limit = nil

    widget = Timecop.freeze(epoch) do
      Widget.create(name: 'Starting Name')
    end

    Timecop.freeze(epoch + 1.hours) { widget.update_attributes(name: 'Name 1') }
    Timecop.freeze(epoch + 2.hours) { widget.update_attributes(name: 'Name 2') }
    Timecop.freeze(epoch + 3.hours) { widget.update_attributes(name: 'Name 3') }
    Timecop.freeze(epoch + 4.hours) { widget.update_attributes(name: 'Name 4') }
    Timecop.freeze(epoch + 5.hours) { widget.update_attributes(name: 'Name 5') }

    # Here comes the crazy part.
    #
    # We want the database to return PaperTrail::Versions out of order. So
    # we'll delete what's there, shuffle them, and re-create them, but with
    # the same data, including the created_at timestamps. This will simulate
    # what happens when the database optimizes a query, and returns the
    # records in a different order than they were inserted in.
    # Load the records:
    versions = PaperTrail::Version.where(item: widget).not_creates.to_a

    # Delete the records, but don't let the in-memory objects know.
    PaperTrail::Version.where(id: versions.map(&:id)).delete_all

    # Are the versions deleted?
    assert_equal(1, Widget.find(widget.id).versions.count)
    # Yes: only the create is left.

    # Shuffle them, and re-create them.
    versions.shuffle.each do |version|
      Timecop.freeze(version.created_at) do
        PaperTrail::Version.create!(
          event: version.event,
          item_type: version.item_type,
          item_id: version.item_id,
          object: version.object,
        )
      end
    end
    assert_equal(6, Widget.find(widget.id).versions.count)

    # Sorting by ID should match insertion order. We can be confident that
    # the records are out of order, if sorting by ID doesn't match sorting by
    # created_at.
    ids_and_dates = PaperTrail::Version.
      where(item: widget, event: 'update').
      pluck(:id, :created_at)
    assert_not_equal(
      ids_and_dates.sort_by(&:first), 
      ids_and_dates.sort_by(&:last))

    # Now, we've recreated the scenario where we can accidentally clean up
    # the wrong versions. Re-enable the version_limit, and make one more
    # wafer-thin update, to trigger the clean-up:
    PaperTrail.config.version_limit = 3
    Timecop.freeze(epoch + 6.hours) do
      widget.update_attributes(name: 'Mr. Creosote')
    end

    # Verify that we have fewer versions:
    assert_equal(4, widget.reload.versions.count)

    # Exclude the create, because the create will return nil for `#reify`.
    last_names = widget.versions.not_creates.map(&:reify).map(&:name)
    assert_not_equal(['Name 3', 'Name 4', 'Name 5'], last_names)
    # No matter what order the version records are returned it, we should
    # always keep the most-recent changes.
  end
end
@jaredbeck
Copy link
Member

Thanks, Dan. This all sounds good.

  • The query in enforce_version_limit! should change as you have described (adding ORDER BY "versions"."created_at" ASC), but
  • the SQL generated by sibling_versions should not change

I'll look for your PR, thanks.

Heroic effort on the bug report, btw. ⭐️ Not an easy thing to simulate.

@danbernier
Copy link
Contributor Author

Thanks! I'll follow up with you on the PR.

@jaredbeck
Copy link
Member

jaredbeck commented Mar 28, 2017

Closed by #941, squashed into 300a16c

aried3r pushed a commit to aried3r/paper_trail that referenced this issue Dec 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants