Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-document transactions #2220

Closed
wants to merge 3 commits into from

Conversation

DavideTriso
Copy link

Q A
Type feature
BC Break no
Fixed issues none

Summary

This is a draft implementation to support the multi-document ACID transaction feature of MongoDB v4.

@alcaeus alcaeus marked this pull request as draft September 21, 2020 05:51
@alcaeus alcaeus changed the title Multi-document transactions (draft) Multi-document transactions Sep 21, 2020
Copy link
Member

@alcaeus alcaeus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving an initial review to steer the direction of the PR.

First of all, thanks for tackling this topic and getting the feature started! Looking at the changes, I think we can simplify the implementation a bit. I noticed that you chose to defer all postPersist, postUpdate, and postRemove operations until the transactions is completed. However, given that a rollback will leave the UnitOfWork in a potentially unusable state (as any error does at the moment), I'm not sure if we really need to go this far.

Looking at ORM, it always fires events right after an entity is persisted, even when the transaction might be rolled back later. This has the advantage of not producing different behaviour when transactions are enabled. Right now, we don't know what users may be doing in their lifecycle event subscribers, which may very well be things they shouldn't be doing (e.g. persisting other entities). Changing the timing of these events will do more harm than good, so I'd refrain from changing this.

Instead, I'd leave the entire commit workflow as is, except for wrapping the workflow in a transaction if the feature is enabled. Please note the inline comment I left below, as the current state may break things for users that manually pass a session (with a potentially running transaction) through the options.

Since you asked on Slack, a few notes on testing:

  • The current UnitOfWorkTest tests a few commit scenarios. Due to the test handling, you shouldn't add tests for transactional commits in this test. Instead, create a new test file.
  • Any test extending the Doctrine\ODM\MongoDB\Tests\BaseTest class can override the getConfiguration method to change the configuration for the implicitly created DocumentManager (including its UnitOfWork). This can be used to enable transactional commits.
  • To test that commit starts a transaction, there are two ways that should be combined.
    • For one, you can use a postPersist lifecycle event (which should be fired before the transaction is committed) and verify that the changes to be committed are not visible outside of the transaction.
    • Some tests use a command subscriber to log commands and perform additional checks. One of these tests is the CommitImprovementTest which uses the logger to assert against the number of commands. You can use the same class to assert against the actual commands that were run. In this case, you'll want to assert that all commands that are part of the transactional commit include an lsid field, denoting that they are part of a session.
  • When testing, you should also test the use-case of a user providing their own session as an option to $dm->flush(), including the case of passing a session with a running transaction. The behaviour in this case should be clearly defined and tested (see above).

private function executeCommitTransaction(array $options = []) : void
{
$session = $this->dm->getClient()->startSession();
$options['session'] = $session;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overrides any user-provided session. In my opinion, we should be more defensive here as we don't limit the kind of options users may pass:

  1. If a session is given in the options, use it. Otherwise, start a new session.
  2. If the session already has a running transaction, don't start a transaction. Apparently, the user wants to commit the transaction themselves, so we shouldn't interfere with it.
  3. If ODM started a transaction, commit it. Otherwise, ignore.
  4. If ODM created a session, close it. Otherwise, ignore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alcaeus many thanks for your review and feedback!
If I understand you correctly, this is how sessions and transactions should be handled (untested code):

public function commit(array $options = []) : void
    {
        // Raise preFlush
        if ($this->evm->hasListeners(Events::preFlush)) {
            $this->evm->dispatchEvent(Events::preFlush, new Event\PreFlushEventArgs($this->dm));
        }

        // Compute changes done since last commit.
        $this->computeChangeSets();

        if (! ($this->documentInsertions ||
            $this->documentUpserts ||
            $this->documentDeletions ||
            $this->documentUpdates ||
            $this->collectionUpdates ||
            $this->collectionDeletions ||
            $this->orphanRemovals)
        ) {
            return; // Nothing to do.
        }

        $this->commitsInProgress++;
        if ($this->commitsInProgress > 1) {
            throw MongoDBException::commitInProgress();
        }
        try {
            if ($this->orphanRemovals) {
                foreach ($this->orphanRemovals as $removal) {
                    $this->remove($removal);
                }
            }

            // Raise onFlush
            if ($this->evm->hasListeners(Events::onFlush)) {
                $this->evm->dispatchEvent(Events::onFlush, new Event\OnFlushEventArgs($this->dm));
            }

            if($this->dm->getConfiguration()->getEnableCommitTransaction()) {
                $sessionStartedByUser     = isset($options['session']);
                $transactionStartedByUser = $sessionStartedByUser && $options['session']->isInTransaction();

                // Start a session if it was not started by the user
                if (! $sessionStartedByUser) {
                    $options['session'] = $this->dm->getClient()->startSession();
                }
               
                // Start a transaction if it was not started by the user
                if (! $transactionStartedByUser) {
                    $options['session']->startTransaction($options);
                }
                
                try {
                    $this->executeDbCommands($options);
                   
                    // Commit the transaction only if it was started by the ODM
                    if (! $transactionStartedByUser) {
                        $options['session']->commitTransaction();
                    }
                } catch (Exception $e) {
                    // Roll-back the transaction only if it was started by the ODM
                    if (! $transactionStartedByUser) {
                        $options['session']->abortTransaction();
                    } 
                    // Users that are handling the transaction themselves
                    // can catch this exception and execute their own roll-back logic
                    throw $e;

                } finally {
                    // End the session if it was started by the ODM
                    if (! $sessionStartedByUser) {
                        $options['session']->endSession();
                    }
                } 
               
            } else {
                $this->executeDbCommands($options);
            }
           
            // Raise postFlush
            if ($this->evm->hasListeners(Events::postFlush)) {
                $this->evm->dispatchEvent(Events::postFlush, new Event\PostFlushEventArgs($this->dm));
            }

            // Clear up
            $this->documentInsertions          =
            $this->documentUpserts             =
            $this->documentUpdates             =
            $this->documentDeletions           =
            $this->documentChangeSets          =
            $this->collectionUpdates           =
            $this->collectionDeletions         =
            $this->visitedCollections          =
            $this->scheduledForSynchronization =
            $this->orphanRemovals              =
            $this->hasScheduledCollections     = [];
        } finally {
            $this->commitsInProgress--;
        }
    }

private function executeDbCommands(array $options = []) : void
    {
        foreach ($this->getClassesForCommitAction($this->documentUpserts) as $classAndDocuments) {
            [$class, $documents] = $classAndDocuments;
            $this->executeUpserts($class, $documents, $options);
        }

        foreach ($this->getClassesForCommitAction($this->documentInsertions) as $classAndDocuments) {
            [$class, $documents] = $classAndDocuments;
            $this->executeInserts($class, $documents, $options);
        }

        foreach ($this->getClassesForCommitAction($this->documentUpdates) as $classAndDocuments) {
            [$class, $documents] = $classAndDocuments;
            $this->executeUpdates($class, $documents, $options);
        }

        foreach ($this->getClassesForCommitAction($this->documentDeletions, true) as $classAndDocuments) {
            [$class, $documents] = $classAndDocuments;
            $this->executeDeletions($class, $documents, $options);
        }
    }

Please let me know you thoughts, so I can go on with the implementation and update the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Davide, in theory, this is exactly what ORM does and what we thought we could do here.

However, while analysing this in a little more detail, there are a few things that make this approach unfeasible until we refactor large portions of how UnitOfWork handles its writes in general. In order to explain I have to dive a bit into how transactions in MongoDB work, so please bear with me. As for resources, you may want to check out the Driver Transactions Specification, which defines how drivers expose transactions to the users. Based on feedback we received, there is also a Convenient API for Transactions, which removes the need for users to handle all errors that occur during the transaction. More on this later on.

The basics of course are: start a session, start a transaction, then commit or roll back the transaction. This is pretty much the same as with most relational databases, but the details hold a few surprises for us. For one, committing the transaction may yield errors. The specification defines two error categories (exposed to clients using error labels): A TransientTransactionError can be encountered running any command besides commitTransaction in a transaction. For example, a network issue while sending commands during a transaction would be such a transient error. The other category is UnknownTransactionCommitResult, which is an error that occurs while running commitTransaction.

The second error category means that the client can call commitTransaction again to re-attempt the commit operation. However, the TransientTransactionError means that the entire transaction should be aborted (rolled back) and re-attempted. Without looking at all the potential causes of such a TransientTransactionError, we can already see that ODM would have a small problem: ODM cleans up pending operations as they happen, which is fine without transactions. However, with transactions requiring us to be able to replay the entire transaction from start to finish, this kind of cleanup must not happen until after the transaction has been committed.

To add to our woes, this also has implications on events. Your original design deferred some events and cleanup until after the transaction was finished. The reason I originally suggested not deferring such events was to reduce the impact of transaction on the entire event system, but it turns out we need to make an even bigger change. All events, including pre events may be executed twice if we have to retry the entire transaction. This means that we need to exclude events from the transactional logic, triggering all pre events before starting the transaction, and deferring all post events until we've successfully completed the transaction. I'm not quite sure of the impact this would have on users and the code in general.

There is another issue with using transactions that I want to highlight: locking and conflicts. Consider the following logic:

$session1 = $manager->startSession();
$session2 = $manager->startSession();

$session1->startTransaction();
$session2->startTransaction();

$update1 = new MongoDB\Driver\BulkWrite();
$update1->update([], ['$inc' => ['count' => 1]]);
$manager->executeBulkWrite('test.items', $update1, ['session' => $session1]);

$update2 = new MongoDB\Driver\BulkWrite();
$update2->update([], ['$inc' => ['count' => 1]]);
$manager->executeBulkWrite('test.items', $update2, ['session' => $session2]);

Note that while I have put this in the same script, this could easily be part of separate processes on separate machines. The code above will produce an exception in the second line, as the command tries to modify a record that is already being modified by a different transaction:

PHP Fatal error:  Uncaught MongoDB\Driver\Exception\BulkWriteException: WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction. [...]

This shows that adding transactions to the current UnitOfWork is a dangerous endeavour: it will ensure an all-or-nothing approach to writes, but will also cause a significantly higher error rate in concurrent applications. Since you'll always have to assume that two requests will try to modify the same record in a transaction, I'm convinced that we have to rewrite the UnitOfWork before making transactions a feature of ODM. However, such a rewrite falls far outside of the scope of what I expect a contributor to be able to shoulder, as the changes go far beyond just the logic of a single method in UnitOfWork. Rather, this problem will require a large-scale refactoring of how UnitOfWork stores and handles document data, the associated change sets, and how it passes them on to persisters.

You are of course welcome to try and do this refactoring. Please let me know, as I'd like to keep a closer feedback loop than what we normally have through pull requests.

…tance"

This reverts commit 6bf710b, reversing
changes made to 9b8c4d5.
@stale
Copy link

stale bot commented Oct 31, 2020

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Stale Issues that have not seen any activity in 30 days label Oct 31, 2020
@alcaeus
Copy link
Member

alcaeus commented Nov 4, 2020

@DavideTriso I'm closing this PR as we'll have to refactor UnitOfWork on a larger scale to accommodate transactions. I appreciate your work on this, but with the issues I highlighted above we can't add support for transactions until we can ensure that this won't lead to data loss, which it does.

If you are up for taking a stab at UnitOfWork, please reach out to us on the Doctrine Slack to discuss changes and get some help.

@alcaeus alcaeus closed this Nov 4, 2020
@DavideTriso DavideTriso mentioned this pull request Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Issues that have not seen any activity in 30 days
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants