-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query_lock() in iterate() prohibits any other database operations within async for
loop
#176
Comments
Workout I've found for the time being is to use custom batching select utility instead:
|
Interesting @rafalp! What a DB back-end is it? What do you use under the hood, is it still the transactional cursor? |
It's postgresql, and I am using two setups for DB connection:
|
Just to capture the thoughts from the https://gitter.im/encode/community.
Possible ways to overcome the issue:
tho, it removes the guarantees and the developer should be responsible and accurate in actions
|
Hey! |
The locks exist per connection, so unless you have queue = asyncio.Queue()
async def producer():
async with database.connection() as conn:
print("Producer's connection:", conn)
async for row in conn.iterate("SELECT * FROM table"):
await queue.put(row)
await queue.put(None)
async def consumer():
async with database.connection() as conn:
print("Consumer's connection:", conn)
while True:
row = await queue.get()
if row is None:
break
await conn.execute("UPDATE table SET ... WHERE ...")
for r in await asyncio.gather(producer(), consumer(), return_exceptions=True):
if isinstance(r, Exception):
raise r from None Ojo! If the connections are printed the same, you need to ensure that you have not worked with that database in the parent coroutine. Additional docs: https://asyncio.readthedocs.io/en/latest/producer_consumer.html |
There is another way with subclassing: class ParallelDatabase(databases.Database):
"""Override connection() to ignore the task context and spawn a new Connection every time."""
def connection(self) -> "databases.core.Connection":
"""Bypass self._connection_context."""
return databases.core.Connection(self._backend) However, you will have to forget about executing on a database directly as it will open a new connection every time. Use |
Couldn't I just launch a task then? Like |
Depending on the surrounding code, yes or no. For example, this will not work: async for row in database.iterate("SELECT * FROM table"):
await asyncio.create_task(database.execute("UPDATE table SET ... WHERE ...")) because the spawned coroutine will inherit a copy of the parent's context. |
Oh I see, that makes sense. Never worked with Another thing I don't quite get is why |
That teased my brain, too. This is how it's currently implemented: https://github.com/encode/databases/blob/master/databases/core.py#L176 def connection(self) -> "Connection":
if self._global_connection is not None:
return self._global_connection
try:
return self._connection_context.get()
except LookupError:
connection = Connection(self._backend)
self._connection_context.set(connection)
return connection Let's ignore It tries to load the I guess they wrote code this way to avoid the overhead of creating a new I hit this issue in my prod many times. I had to fight with the library to make my queries run in parallel. I gave up and applied #176 (comment) |
@vmarkovtsev AFAIR |
As any other query method of db = Database(...)
await db.connect()
db.execute("SELECT 1") # any preceeding query to initialize the context var
async def serialized_query():
# these guys will go serialized because the context is inherited
await db.execute(...)
await asyncio.gather(serialized_query(), serialized_query(), return_exceptions=True) Hence #176 (comment) |
I guess some kind of alternative solution might be using cursor from raw_connection (at least in asyncpg)? async with database.connection() as connection:
async with connection.transaction():
async for row in connection.raw_connection.cursor(str_query):
await database.fetch_all("SELECT based on row") # or anything other with database really Are there any flaws with this that I'm not seeing? |
I'm running into this problem. An additional complication for me is that I need everything to be done within a single "repeatable_read" transaction. So what I desire is something like from sqlalchemy import select
async def generate_results(db):
async with db.transaction(isolation="repeatable_read"):
async for some_row in db.iterate(select(...)):
some_other_rows = await db.fetch_all(select(...))
yield "some result based on both some_row and some_other_rows" So a My use case is entirely read-only. Is there any hope for me? I looked at the suggestion from @vmarkovtsev in #176 (comment), with two coroutines communicating via a queue, each opening a separate connection. But I think with that approach I would not be able to put it all in a single "repeatable_read" transaction? |
same here :( |
#108 introduced query locking to prohibit situation when multiple queries are executed at same time, however logic within
iterate()
is also is also wrapped with such logic, making code like such impossible due to deadlock:The text was updated successfully, but these errors were encountered: