[Error] select_for_update happening when using replica(read-only) and default(write-only) DB. #558

abxsantos · 2021-05-19T02:09:17Z

I've been getting a select_for_update cannot be used outside of a transaction error when using a replica set in my applications.

Here are my settings

DATABASES = {
    "default": {
        "ENGINE": os.getenv("DB_ENGINE"),
        "NAME": os.getenv("DB_NAME"),
        "USER": os.environ.get("DB_USER"),
        "HOST": os.environ.get("DB_HOST"),
        "PORT": os.environ.get("DB_PORT"),
        "PASSWORD": os.environ.get("DB_PASSWORD"),
    },
    "replica": {
        "ENGINE": os.getenv("DB_ENGINE"),
        "NAME": os.getenv("DB_NAME_REPLICA"),
        "USER": os.environ.get("DB_USER_REPLICA"),
        "HOST": os.environ.get("DB_HOST_REPLICA"),
        "PORT": os.environ.get("DB_PORT_REPLICA"),
        "PASSWORD": os.environ.get("DB_PASSWORD_REPLICA"),
    }
}

Q_CLUSTER = {
    "name": "myscheduler",
    "orm": "default",  # Use Django's ORM + database for broker
    ....
}

My database router currently uses the replica only to read and the default just to write.

class DatabaseRouter:

    def db_for_read(self, model, **hints):
        """Always read from REPLICA database"""
        return "replica"

    def db_for_write(self, model, **hints):
        """Always write to DEFAULT database"""
        return "default"
        
    def allow_relation(self, obj1, obj2, **hints):
        """Objects from REPLICA and DEFAULT are de same, then True always"""
        return True

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        """Only DEFAULT database"""
        return db == "default"

I've been digging through the code (awesome documenation btw!) and found that during the task creation in the scheduler function it forces the database used in the transaction block.

# Here it seems to force the usage, in this case it will be the replica database.
with db.transaction.atomic(using=Schedule.objects.db):  
            for s in (
                Schedule.objects.select_for_update()
                .exclude(repeats=0)
                .filter(next_run__lt=timezone.now())
                .filter(db.models.Q(cluster__isnull=True) | db.models.Q(cluster=Conf.PREFIX))
            ):

Is there a reason for this behaviour? I couldn't really understand why, since when removing the using from the transaction block made it work like a charm, reading only from replica and writing only on default.

Dependencies

python = 3.9.5
Django = 3.1.7
psycopg2-binary = "2.8.6
django-q = 1.3.6

The text was updated successfully, but these errors were encountered:

Koed00 · 2021-05-19T08:19:21Z

I think this was added by someone who was using a database router to store the Schedule objects in a different database.
I guess it brakes your setup with replicated databases, since we have no tests for that.

Probably the easiest fix would be to introduce a config setting for this. I'm open to ideas and PR's.

abxsantos · 2021-05-19T10:56:27Z

Nice @Koed00! I'll take a better look and propose something.

Right now I've added this repo to understand what's happening with multiple databases.
Do you think it's worth it to add a docker service to test these scenarios as a first step?

Lizards · 2021-05-19T18:32:29Z

We are also experiencing this issue. As a temporary workaround, we modified our database router to always use the write db for Django-Q's Schedule model:

class PrimaryReplicaRouter(object):
    def db_for_read(self, model, **hints):
        if model._meta.app_label == "django_q" and model.__name__ == "Schedule":
            return "default"
        return "replica"

abxsantos · 2021-05-20T11:00:49Z

Although modifying the router works, I would suggest adding a has_replica boolean attribute in the Django Q settings.

In the conf.py, add a HAS_REPLICA attribute.

# conf.py

class Conf:
    """
    Configuration class
    """
    # other settings...
    # Support for read/write replicas
    HAS_REPLICA = conf.get("has_replica", False)

And the scheduler function in the cluster.py will now look like this:

# cluster.py
def scheduler(broker: Broker = None):
    """
    Creates a task from a schedule at the scheduled time and schedules next run
    """
    if not broker:
        broker = get_broker()
    close_old_django_connections()
    try:
        # addition of database_to_use
        database_to_use = {"using": Schedule.objects.db} if not Conf.HAS_REPLICA else {}
        with db.transaction.atomic(**database_to_use):
            for s in (
                Schedule.objects.select_for_update()
                .exclude(repeats=0)
                .filter(next_run__lt=timezone.now())
            ):

When we use read/write replicas we don't specify a database to use in the transaction. The transaction will be made without any problems, since the router will correctly use the write database when a write operation is made (in this case the select_for_update will always be made using the write database, which must be configured in the orm setting). All the other read operations will be made using the read database, just as intended.

The scenarios where some apps needs to write into one database and other apps into another, like it was suggested in a previous modification, would also works correctly when applying the with db.transaction.atomic(using=Schedule.objects.db)

That way the current usage would not break and it would allow the usage of read only replicas via a setting in the configuration.

What do you all think? Does it make sense to add the has_replica in the Django Q settings?

If it does can I open a PR suggesting these additions and tests covering some scenarios around the multiple databases?

edthrn · 2021-06-26T19:12:59Z

I think this was added by someone who was using a database router to store the Schedule objects in a different database.
I guess it brakes your setup with replicated databases, since we have no tests for that.

Probably the easiest fix would be to introduce a config setting for this. I'm open to ideas and PR's.

And I think @Koed00 was talking about me 😉

The fix proposed in #561 by @abxsantos put us back to square one regarding the problem described in #434: using conf.ORM is not what we need to route django-q to the specific database where Schedule is stored.

In my case, I do not use the ORM broker at all. However, I do use another db to store any data related to django-q administration, which means that I still need to be explicit about which database I'm using. This was done by #440.

Ironically, the code snippet proposed above would probably have avoided this regression:

# ok
database_to_use = {"using": Schedule.objects.db} if not Conf.HAS_REPLICA else {}

But what was implemented is:

# not ok
database_to_use = {"using": Conf.ORM} if not Conf.HAS_REPLICA else {}

which is very different.

EDIT

@abxsantos it looks like you tried to prevent this regression with the test cases you added in the PR... Do you think there is something missing? I did not review the new test cases in detail, but maybe you only considered the case where the ORM broker was configured?

abxsantos · 2021-06-28T22:52:57Z

Hello @edthrn !

From what I've understand you need to write in a specific database without the ORM being set as broker right?

The scenarios I've proposed looked only at the ORM being set. Thus the regression.
From what I see, the database_to_use = {"using": Schedule.objects.db} if not Conf.HAS_REPLICA else {} as you've proposed should work!

I've seen that you've opened a PR with the Schedule.objects.db, but I think adding test cases to prevent this happening again can be valid.
I can take a better look and improve the test scenarios, after your merge, what do you think?

edthrn · 2021-06-28T23:39:06Z

If you can improve the test scenarios so my fix passes the current tests + (why not) adding a test case to prevent this regression happening again, it would be great. Thank you.

I gave you the authorization to push to my forked repo so we can work on this PR #587 together.

Thanks

abxsantos · 2021-06-29T11:33:39Z

Great! I'll take a look.

msabatier · 2021-07-02T14:09:08Z

Hello @edthrn, @abxsantos ,

I believe that the issue is in fact that Schedule.objects.db returns the database for reading Schedule objects, not writing them. This works in @edthrn use case (same database for read and write) but breaks @abxsantos use case (which is also mine).

The best solution would be to use database_to_use = {"using": db.router.db_for_write(Schedule)} and in fact there wouldn't be a need for the Conf.HAS_REPLICA flag that feels a little hacky.

I have verified this with Django 3.1.12 and django-q 1.3.9.

I think the same should be applied for Success objects in

def save_task(task, broker: Broker):

abxsantos changed the title ~~Problems when using replica DB.~~ Problems when using replica(read-only) and default(write-only) DB. May 19, 2021

abxsantos changed the title ~~Problems when using replica(read-only) and default(write-only) DB.~~ [Error] select_for_update happening when using replica(read-only) and default(write-only) DB. May 19, 2021

abxsantos mentioned this issue May 23, 2021

Feature/improves multiple databases support #561

Merged

4 tasks

abxsantos closed this as completed May 23, 2021

edthrn mentioned this issue Jun 26, 2021

Regression: select_for_update cannot be used outside of a transaction #586

Open

edthrn mentioned this issue Jul 10, 2021

Fix/ensures correct database support edthrn/django-q#1

Open

3 tasks

msabatier mentioned this issue Jan 8, 2023

Clarify which database connection should be used with the orm broker and fix related issues django-q2/django-q2#60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Error] select_for_update happening when using replica(read-only) and default(write-only) DB. #558

[Error] select_for_update happening when using replica(read-only) and default(write-only) DB. #558

abxsantos commented May 19, 2021

Koed00 commented May 19, 2021

abxsantos commented May 19, 2021

Lizards commented May 19, 2021

abxsantos commented May 20, 2021

edthrn commented Jun 26, 2021 •

edited

Loading

abxsantos commented Jun 28, 2021 •

edited

Loading

edthrn commented Jun 28, 2021

abxsantos commented Jun 29, 2021

msabatier commented Jul 2, 2021

[Error] select_for_update happening when using replica(read-only) and default(write-only) DB. #558

[Error] select_for_update happening when using replica(read-only) and default(write-only) DB. #558

Comments

abxsantos commented May 19, 2021

Koed00 commented May 19, 2021

abxsantos commented May 19, 2021

Lizards commented May 19, 2021

abxsantos commented May 20, 2021

edthrn commented Jun 26, 2021 • edited Loading

abxsantos commented Jun 28, 2021 • edited Loading

edthrn commented Jun 28, 2021

abxsantos commented Jun 29, 2021

msabatier commented Jul 2, 2021

edthrn commented Jun 26, 2021 •

edited

Loading

abxsantos commented Jun 28, 2021 •

edited

Loading