Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Error] select_for_update happening when using replica(read-only) and default(write-only) DB. #558

Closed
abxsantos opened this issue May 19, 2021 · 9 comments

Comments

@abxsantos
Copy link
Contributor

I've been getting a select_for_update cannot be used outside of a transaction error when using a replica set in my applications.

Here are my settings

DATABASES = {
    "default": {
        "ENGINE": os.getenv("DB_ENGINE"),
        "NAME": os.getenv("DB_NAME"),
        "USER": os.environ.get("DB_USER"),
        "HOST": os.environ.get("DB_HOST"),
        "PORT": os.environ.get("DB_PORT"),
        "PASSWORD": os.environ.get("DB_PASSWORD"),
    },
    "replica": {
        "ENGINE": os.getenv("DB_ENGINE"),
        "NAME": os.getenv("DB_NAME_REPLICA"),
        "USER": os.environ.get("DB_USER_REPLICA"),
        "HOST": os.environ.get("DB_HOST_REPLICA"),
        "PORT": os.environ.get("DB_PORT_REPLICA"),
        "PASSWORD": os.environ.get("DB_PASSWORD_REPLICA"),
    }
}

Q_CLUSTER = {
    "name": "myscheduler",
    "orm": "default",  # Use Django's ORM + database for broker
    ....
}

My database router currently uses the replica only to read and the default just to write.

class DatabaseRouter:

    def db_for_read(self, model, **hints):
        """Always read from REPLICA database"""
        return "replica"

    def db_for_write(self, model, **hints):
        """Always write to DEFAULT database"""
        return "default"
        
    def allow_relation(self, obj1, obj2, **hints):
        """Objects from REPLICA and DEFAULT are de same, then True always"""
        return True

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        """Only DEFAULT database"""
        return db == "default"

I've been digging through the code (awesome documenation btw!) and found that during the task creation in the scheduler function it forces the database used in the transaction block.

# Here it seems to force the usage, in this case it will be the replica database.
with db.transaction.atomic(using=Schedule.objects.db):  
            for s in (
                Schedule.objects.select_for_update()
                .exclude(repeats=0)
                .filter(next_run__lt=timezone.now())
                .filter(db.models.Q(cluster__isnull=True) | db.models.Q(cluster=Conf.PREFIX))
            ):

Is there a reason for this behaviour? I couldn't really understand why, since when removing the using from the transaction block made it work like a charm, reading only from replica and writing only on default.

Dependencies

  • python = 3.9.5
  • Django = 3.1.7
  • psycopg2-binary = "2.8.6
  • django-q = 1.3.6
@abxsantos abxsantos changed the title Problems when using replica DB. Problems when using replica(read-only) and default(write-only) DB. May 19, 2021
@abxsantos abxsantos changed the title Problems when using replica(read-only) and default(write-only) DB. [Error] select_for_update happening when using replica(read-only) and default(write-only) DB. May 19, 2021
@Koed00
Copy link
Owner

Koed00 commented May 19, 2021

I think this was added by someone who was using a database router to store the Schedule objects in a different database.
I guess it brakes your setup with replicated databases, since we have no tests for that.

Probably the easiest fix would be to introduce a config setting for this. I'm open to ideas and PR's.

@abxsantos
Copy link
Contributor Author

Nice @Koed00! I'll take a better look and propose something.

Right now I've added this repo to understand what's happening with multiple databases.
Do you think it's worth it to add a docker service to test these scenarios as a first step?

@Lizards
Copy link

Lizards commented May 19, 2021

We are also experiencing this issue. As a temporary workaround, we modified our database router to always use the write db for Django-Q's Schedule model:

class PrimaryReplicaRouter(object):
    def db_for_read(self, model, **hints):
        if model._meta.app_label == "django_q" and model.__name__ == "Schedule":
            return "default"
        return "replica"

@abxsantos
Copy link
Contributor Author

Although modifying the router works, I would suggest adding a has_replica boolean attribute in the Django Q settings.

In the conf.py, add a HAS_REPLICA attribute.

# conf.py

class Conf:
    """
    Configuration class
    """
    # other settings...
    # Support for read/write replicas
    HAS_REPLICA = conf.get("has_replica", False)

And the scheduler function in the cluster.py will now look like this:

# cluster.py
def scheduler(broker: Broker = None):
    """
    Creates a task from a schedule at the scheduled time and schedules next run
    """
    if not broker:
        broker = get_broker()
    close_old_django_connections()
    try:
        # addition of database_to_use
        database_to_use = {"using": Schedule.objects.db} if not Conf.HAS_REPLICA else {}
        with db.transaction.atomic(**database_to_use):
            for s in (
                Schedule.objects.select_for_update()
                .exclude(repeats=0)
                .filter(next_run__lt=timezone.now())
            ):

When we use read/write replicas we don't specify a database to use in the transaction. The transaction will be made without any problems, since the router will correctly use the write database when a write operation is made (in this case the select_for_update will always be made using the write database, which must be configured in the orm setting). All the other read operations will be made using the read database, just as intended.

The scenarios where some apps needs to write into one database and other apps into another, like it was suggested in a previous modification, would also works correctly when applying the with db.transaction.atomic(using=Schedule.objects.db)

That way the current usage would not break and it would allow the usage of read only replicas via a setting in the configuration.

What do you all think? Does it make sense to add the has_replica in the Django Q settings?

If it does can I open a PR suggesting these additions and tests covering some scenarios around the multiple databases?

@edthrn
Copy link
Contributor

edthrn commented Jun 26, 2021

I think this was added by someone who was using a database router to store the Schedule objects in a different database.
I guess it brakes your setup with replicated databases, since we have no tests for that.

Probably the easiest fix would be to introduce a config setting for this. I'm open to ideas and PR's.

And I think @Koed00 was talking about me 😉

The fix proposed in #561 by @abxsantos put us back to square one regarding the problem described in #434: using conf.ORM is not what we need to route django-q to the specific database where Schedule is stored.

In my case, I do not use the ORM broker at all. However, I do use another db to store any data related to django-q administration, which means that I still need to be explicit about which database I'm using. This was done by #440.

Ironically, the code snippet proposed above would probably have avoided this regression:

# ok
database_to_use = {"using": Schedule.objects.db} if not Conf.HAS_REPLICA else {}

But what was implemented is:

# not ok
database_to_use = {"using": Conf.ORM} if not Conf.HAS_REPLICA else {}

which is very different.


EDIT

@abxsantos it looks like you tried to prevent this regression with the test cases you added in the PR... Do you think there is something missing? I did not review the new test cases in detail, but maybe you only considered the case where the ORM broker was configured?

@abxsantos
Copy link
Contributor Author

abxsantos commented Jun 28, 2021

Hello @edthrn !

From what I've understand you need to write in a specific database without the ORM being set as broker right?

The scenarios I've proposed looked only at the ORM being set. Thus the regression.
From what I see, the database_to_use = {"using": Schedule.objects.db} if not Conf.HAS_REPLICA else {} as you've proposed should work!

I've seen that you've opened a PR with the Schedule.objects.db, but I think adding test cases to prevent this happening again can be valid.
I can take a better look and improve the test scenarios, after your merge, what do you think?

@edthrn
Copy link
Contributor

edthrn commented Jun 28, 2021

If you can improve the test scenarios so my fix passes the current tests + (why not) adding a test case to prevent this regression happening again, it would be great. Thank you.

I gave you the authorization to push to my forked repo so we can work on this PR #587 together.

Thanks

@abxsantos
Copy link
Contributor Author

Great! I'll take a look.

@msabatier
Copy link

Hello @edthrn, @abxsantos ,

I believe that the issue is in fact that Schedule.objects.db returns the database for reading Schedule objects, not writing them. This works in @edthrn use case (same database for read and write) but breaks @abxsantos use case (which is also mine).

The best solution would be to use database_to_use = {"using": db.router.db_for_write(Schedule)} and in fact there wouldn't be a need for the Conf.HAS_REPLICA flag that feels a little hacky.

I have verified this with Django 3.1.12 and django-q 1.3.9.

I think the same should be applied for Success objects in

def save_task(task, broker: Broker):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants