Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix deadlock #1371

Merged
merged 6 commits into from
Aug 5, 2016
Merged

Conversation

ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Jul 24, 2016

see #1220 + commit comments.

The deadlock is fixed already.

But: needed a RPC API change to add the exclusive param. :-(

Also: Thinking about whether we maybe rather want a safe default for exclusive.
Currently it is usually False and you need to give exclusive=True - guess the opposite way would be safer, so one does not forget requesting an exclusive lock.

Nothing really bad would happen if one forgets though, the code has an automatic lock upgrade to an exclusive lock.

The lock upgrade/downgrade code was kept to support old clients.
Current code shall not use it, it is deprecated.

@codecov-io
Copy link

codecov-io commented Jul 24, 2016

Current coverage is 82.20% (diff: 89.18%)

Merging #1371 into 1.0-maint will increase coverage by 0.14%

@@           1.0-maint      #1371   diff @@
===========================================
  Files             14         14          
  Lines           4433       4440     +7   
  Methods            0          0          
  Messages           0          0          
  Branches         776        777     +1   
===========================================
+ Hits            3638       3650    +12   
+ Misses           569        566     -3   
+ Partials         226        224     -2   

Sunburst

Powered by Codecov. Last update 976925c...33e3348

@textshell textshell mentioned this pull request Jul 25, 2016
9 tasks
@ThomasWaldmann ThomasWaldmann force-pushed the fix-deadlock branch 2 times, most recently from 2c0333e to 75a9378 Compare July 26, 2016 23:02
@ThomasWaldmann
Copy link
Member Author

From IRC discussion:

  • need to keep lock upgrade on txn start, for compat with older clients (== if there is no excl. lock already, do an upgrade)
  • thus: need to keep lock upgrade/downgrade code in borg 1.0, but deprecate it (emit warning about old client?)
  • keep all other changes as is, so new clients based on this code will not use lock upgrade

@ThomasWaldmann ThomasWaldmann changed the title WIP: fix deadlock fix deadlock Jul 28, 2016
@ThomasWaldmann
Copy link
Member Author

any comments about the exclusive default value? change it? in 1.0? in 1.1?

@ThomasWaldmann
Copy link
Member Author

please test / review, this is the last essential issue to do for 1.0.7 milestone.

@enkore
Copy link
Contributor

enkore commented Aug 2, 2016

I'll review this offline

self.id = self.call('open', self.location.path, create, lock_wait, lock, append_only)
except self.RPCError as err:
if err.remote_type == 'TypeError':
raise self.NoAppendOnlyOnServer() from err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this? Silently ignoring Repository settings on create doesn‘t sound nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's now either both exclusive and append_only params (new api) or neither of them (old api).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what i meant was that the old api code path should do:

if append_only:
    raise self.NoAppendOnlyOnServer()

Thus keeping the safe guard from the original PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, see new commit.

@textshell
Copy link
Member

textshell commented Aug 2, 2016

For #1374 i either need this in enforcing mode (i.e. situation with a client and server that both have the iv parts needs to never upgrade on put) or i need to add independent locking (or lock upgrading) to #1374

with self.repository:
self.assert_raises(LockFailed, lambda: len(self.repository))
upgrade.assert_called_once_with()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this? I think this case is still relevant? i.e. we still need to upgrade, and we still need not to hit any asserts or other problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't make much sense of this test.
I first completely removed the lock upgrading code.
Later I re-added it for older clients.
But the current code does not do lock upgrades.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it still does.

  • Do a borg list with a remote repo that had it’s last writer crash and no other access since then
  • client eventually calls Repository.get
  • which calls Repository.get_transaction_id()
  • which checks for crash by index_transaction_id != segments_transaction_id
  • and calls Repository.replay_segments
  • which calls self.prepare_txn
  • which needs to upgrade

This is the one legitimate case i can come up where an always read only operation (that correctly uses an non exclusive lock) can get into prepare_txn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now accounted for in my understanding.

@textshell
Copy link
Member

Apart from the above this looks good.

Just noticed though that change-passphrase and migrate-to-repokey use Repository.save_key which is not covered by any lock. Not sure we care enough to make a separate ticket for that though.

@ThomasWaldmann
Copy link
Member Author

@textshell please verify if the last changeset is what you meant.

@textshell
Copy link
Member

Yes, that‘s what i meant. Can you please resurrect the test (or similar) for the lock upgrade on replay?
It seems the travis unhappiness is for unrelated reasons.

@ThomasWaldmann
Copy link
Member Author

Test resurrected and differentiated into 2 tests for old and new clients.

@enkore
Copy link
Contributor

enkore commented Aug 5, 2016

thomas/fix-deadlock
with_repository arg changes: ok
prepare_txn: ok
locking code: not much of an idea...
test_replay_of_readonly_repository: hmm, look into test spec, why must be removed

I see the last bit was already addressed :)

@ThomasWaldmann
Copy link
Member Author

@enkore guess the most delicate part is the compatibility.

@ThomasWaldmann
Copy link
Member Author

about lock upgrade/downgrade: I wanted to deprecate that, due to the obvious deadlock issues, but it seems that even with the new code we need the lock upgrade for somehow unexpectedly needing a write lock, e.g. when borg list finds a crashed repo and needs to do segment replay.

So I think I am removing the deprecation notes and replacing them with a deadlock warning.

if err.remote_type != 'TypeError':
raise
if append_only:
raise self.NoAppendOnlyOnServer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

@ThomasWaldmann
Copy link
Member Author

collapse / merge?

if we don't, we try to upgrade the lock.
this is to support old clients talking to a new server and also
to avoid bad consequences from coding mistakes for new clients.
@enkore
Copy link
Contributor

enkore commented Aug 5, 2016

yes

@textshell
Copy link
Member

wrt collapse: I would at least keep the first two commits uncollapsed. Renames are alsways better in a separate commit. looks good now

lock upgrading is troublesome / may deadlock, do not advertise it.
…eplay

old clients use self.exclusive = None and do a read->write lock upgrade when needed.
new clients use self.exclusive = True/False and never upgrade.

replay fakes an old client by setting self.exclusive = None to get a lock upgrade if needed.
@ThomasWaldmann ThomasWaldmann merged commit 18654ea into borgbackup:1.0-maint Aug 5, 2016
@ThomasWaldmann ThomasWaldmann deleted the fix-deadlock branch August 5, 2016 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants