Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow storing diffed changes in object_changes column #1093

Closed
wants to merge 11 commits into from

Conversation

hashwin
Copy link
Contributor

@hashwin hashwin commented May 21, 2018

Based on the discussion in #1082

This allows you to store changes in the format specified by https://github.com/liufengyun/hashdiff
This is especially useful when dealing with serialized json columns in postgres.

For example, say you have a jsonb column as follows:

{
  custom_values: {
    name: 'abc'
    age: 5,
    tags: ['a', 'b']
  }
}

and it is changed to

{
  custom_values: {
    name: 'def'
    age: 5,
    tags: ['a', 'c']
  }
}

Previously the object changes column would have been stored as follows:

{
  object_changes: {
    custom_values: [
      {
        name: 'abc',
        age: 5,
        tags: ['a', 'b']
      },
      {
        name: 'def',
        age: 5,
        tags: ['a', 'c']
      }
    ]
  }
}

With this change, it will be stored as follows:

{
  object_changes: {
    custom_values: [
      ["~", [:name], "abc", "def"], ["-", [:tags, 1], "b"], ["+", [:tags, 1], "c"]
    ]
  }
}

As you can see, the keys that were not changed (i.e. in this case, age) will not appear in the object changes at all, thus increasing storage efficiency.

If you are on board with this, I can add tests and a migration path to this format.

@hashwin
Copy link
Contributor Author

hashwin commented May 21, 2018

Also, FWIW HashDiff supports "Patch/Unpatch" (https://github.com/liufengyun/hashdiff#patch) which could possibly be used for reification, removing the need for an object column, which could be used to solve the problem described here: #1046.

Though this would probably entail a much more holistic change in the way paper_trail works. Just a thought, definitely don't have the time to work on that one 😄

@jaredbeck
Copy link
Member

Hi Ashwin,

If you'd like to continue to work on this, I think the first thing we should discuss is your proposed patch format.

Thanks for the clear PR description, I now understand your recommended patch format.

What do you think about using a plugin design? Something like this:

PaperTrail.config.object_diff_adapter = PaperTrail::ObjectDiffAdapters::HashDiff.new

# lib/paper_trail/record_trail.rb
def recordable_object_changes(changes)
  if PaperTrail.config.object_diff_adapter.present?
    changes = PaperTrail.config.object_diff_adapter.diff(changes)
  end
  # ...
end

We would also use the adapter in VersionConcern#load_changeset.

config.object_changes_adapter would be another possible name.

So, the adapter interface would have at least three methods:

  1. Given the a hash of changes (as is currently passed to recordable_object_changes) return something that can be assigned to the object_changes attribute (as we currently do in recordable_object_changes).
  2. Whatever VersionConcern#load_changeset needs
  3. Implement where_object_changes, just as Queries::Versions::WhereObjectChanges currently does.

That way, the new adapter (and others to come) can live in separate gems. The gem should probably be named paper_trail-hashdiff, per the rubygems.org Name your gem guide. (See the "Use dashes for extensions" section).

@jaredbeck
Copy link
Member

Hi Ashwin, Are you still working on this? Want me to keep it open?

@hashwin
Copy link
Contributor Author

hashwin commented May 31, 2018

@jaredbeck I have updated based on your suggestions. Right now I'm not 100% sure about details within custom attribute serialization/deserialization as I have not gone too deep into load_changeset. The method already works fine if there is no custom serialization.

I think this could be moved forward and people who are doing the following could use it:

  1. Not using any Custom Attribute serialization
  2. Using a jsonb column for object_changes

What do you think?

@jaredbeck
Copy link
Member

Hi Ashwin, thanks for the update.

I see you have added a file hash_diff_adapter.rb. I was hoping you would put that in a separate gem, named paper_trail-hashdiff. What do you think?

.. I'm not 100% sure about details within custom attribute serialization/deserialization as I have not gone too deep into load_changeset. The method already works fine if there is no custom serialization.

I think load_changeset must support the adapter, otherwise the semi-popular changeset method will not work, right?

When you say "custom attribute", is that ActiveRecord::Attributes::ClassMethods#attribute?
I think it's OK for the paper_trail-hashdiff gem to lack support for that. If someone needs it, they can build it later.

I think this could be moved forward and people who are doing the following could use it: .. Using a jsonb column for object_changes

I am OK with the paper_trail-hashdiff gem only supporting jsonb, but the code in paper_trail itself needs to support all three data types:

  1. WhereObjectChanges#text - JSON in text column
  2. WhereObjectChanges#json - actual json column
  3. WhereObjectChanges#jsonb - jsonb column

So far, it looks like you have only implemented jsonb. Can you add adapter support to all three please?

@hashwin
Copy link
Contributor Author

hashwin commented Jun 1, 2018

Hi @jaredbeck
I've made the requested changes and extracted the adapter into a gem here: https://rubygems.org/gems/paper_trail_hashdiff

@jaredbeck
Copy link
Member

Hi Ashwin, This is getting close.

Have we chosen the best name for this config option? Do you like object_diff_adapter or object_changes_adapter or something else?

You can name your gem whatever you want, of course, but the correct spelling according to rubygems.org is paper_trail-hashdiff (with a hyphen), as I said above.

The gem should probably be named paper_trail-hashdiff, per the rubygems.org Name your gem guide. (See the "Use dashes for extensions" section).

Please add to the readme:

  • Section 6.c.1., explaining how to install your adapter
  • Section 6.c.2., explaining how to write a different adapter (specify the interface)

Please add an entry to the changelog under Unreleased -> Added.

@hashwin hashwin changed the title [WIP] Allow storing diffed changes in object_changes column Allow storing diffed changes in object_changes column Jun 4, 2018
@hashwin
Copy link
Contributor Author

hashwin commented Jun 4, 2018

Hi Jared, I have updated with the changes you requested.

@jaredbeck
Copy link
Member

Hi Jared, I have updated with the changes you requested.

Hi Ashwin. Everything looks great. However, I just realized we don't have any tests. I'm sorry I didn't mention tests earlier. It should be OK to have some very simple tests. If you want to have a simple little adapter in spec/support, you can do that, or you could use a mock adapter. I'm happy with either.

@jaredbeck
Copy link
Member

I fixed the stack overflow in version_spec; closing via #1102, preserving your authorship. Thanks for your work on this!

@jaredbeck jaredbeck closed this Jun 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants