Skip to content

Commit

Permalink
Improve docs in queries
Browse files Browse the repository at this point in the history
  • Loading branch information
tarsil committed Feb 22, 2025
1 parent 1a778cc commit db3a9cd
Show file tree
Hide file tree
Showing 5 changed files with 150 additions and 159 deletions.
111 changes: 40 additions & 71 deletions docs/connection.md
Original file line number Diff line number Diff line change
@@ -1,154 +1,123 @@
# Connection
# Connection Management

Using edgy is extremely simple and easy to do but there are some steps you might want to take
into consideration like connections and what it can happen if this is not done properly.
Using Edgy is designed to be straightforward, but understanding connection management is crucial for optimal performance and stability.

Edgy is on SQLAlechemy core but is an `async` version of it and therefore what happens if you
want to use it within your favourite frameworks like [Esmerald](https://esmerald.dymmond.com),
Starlette or even FastAPI?
Edgy is built on SQLAlchemy Core, but it's an asynchronous implementation. This raises questions about its integration with popular frameworks like [Esmerald](https://esmerald.dymmond.com), Starlette, or FastAPI.

Well, Edgy is framework agnostic so it will fit in any framework you want, even in those that
are not listed above that support **lifecycle events**.
Edgy is framework-agnostic, meaning it can be seamlessly integrated into any framework that supports lifecycle events.

## Lifecycle events
## Lifecycle Events

These are very common amongst those frameworks that are based on Starlette, like
[Esmerald](https://esmerald.dymmond.com) or FastAPI but other might have a similar approach but
using different approaches.
Lifecycle events are common in frameworks built on Starlette, such as [Esmerald](https://esmerald.dymmond.com) and FastAPI. Other frameworks may offer similar functionality through different mechanisms.

The common lifecycle events are the following:
The most common lifecycle events include:

* **on_startup**
* **on_shutdown**
* **lifespan**

This document will focus on the one more commonly used, `lifespan`.
This document focuses on `lifespan`, which is widely used.

## Hooking your database connection into your application
## Hooking Database Connections into Your Application

Hooking a connection is as easy as putting them inside those events in your framework.
Integrating database connections is as simple as incorporating them into your framework's lifecycle events.

For this example, since the author is the same as the one of [Esmerald](https://esmerald.dymmond.com),
we will be using it for explanatory purposes, feel free to apply the same principle in your favourite
framework.
For illustrative purposes, we'll use [Esmerald](https://esmerald.dymmond.com). However, the principles apply to any framework.

with the ASGI integration:
Using ASGI integration:

```python hl_lines="8-12"
{!> ../docs_src/connections/asgi.py !}
```

Or doing it manually (that applies to every framework):

Manual integration (applicable to all frameworks):

```python hl_lines="11-12"
{!> ../docs_src/connections/simple.py !}
```

Or just as an async contexmanager
Using an asynchronous context manager:

```python
{!> ../docs_src/connections/asynccontextmanager.py !}
```

And that is pretty much this. Once the connection is hooked into your application lifecycle.
Otherwise you will get warnings about decreased performance because the databasez backend is not connected and will be
reininitialized for each operation.
Once the connection is integrated into your application's lifecycle, you can use the ORM throughout your application. Failing to do so will result in performance warnings, as the databasez backend will be reinitialized for each operation.

You are now free to use the ORM anywhere in your application. As well as extra defined database connections in registry.
You can also define additional database connections in the registry and switch between them.

## Django integration
## Django Integration

Django currently doesn't support the lifespan protocol. So we have a keyword parameter to handle it ourselves.
Django doesn't natively support the lifespan protocol. Therefore, we provide a keyword parameter for manual handling.

```python
{!> ../docs_src/connections/django.py !}
```

## Manual integration
## Manual Integration

The `__aenter__` and `__aexit__` methods support also being called like `connect` and `disconnect`.
It is however not recommended as contextmanagers have advantages in simpler error handling.
The `__aenter__` and `__aexit__` methods can be called as `connect` and `disconnect`. However, using context managers is recommended for simpler error handling.

```python
{!> ../docs_src/connections/manual.py !}
```

You can use this however for an integration via `on_startup` & `on_shutdown`.
This approach is suitable for integration via `on_startup` and `on_shutdown`.

```python
{!> ../docs_src/connections/manual_esmerald.py !}
```

## `DatabaseNotConnectedWarning` warning
## `DatabaseNotConnectedWarning`

This warning appears, when an unconnected Database object is used for an operation.
This warning appears when an unconnected `Database` object is used.

Despite bailing out the warning `DatabaseNotConnectedWarning` is raised.
You should connect correctly like shown above.
In sync environments it is a bit trickier.
Despite the warning being non-fatal, you should establish proper connections as demonstrated above. Synchronous environments require additional care.

!!! Note
When passing Database objects via using, make sure they are connected. They are not necessarily connected
when not in extra.
Ensure that `Database` objects passed via `using` are connected. They are not guaranteed to be connected outside of `extra`.

## Integration in sync environments
## Integration in Synchronous Environments

When the framework is sync by default and no async loop is active we can fallback to `run_sync`.
It is required to build an async evnironment via the `with_async_env` method of registry. Otherwise
we run in bad performance problems and have `DatabaseNotConnectedWarning` warnings.
`run_sync` calls **must** happen within the scope of `with_async_env`. `with_async_env` is reentrant and has an optional loop parameter.
When the framework is synchronous and no asynchronous loop is active, we can use `run_sync`. It's necessary to create an asynchronous environment using the `with_async_env` method of the registry. Otherwise, you'll encounter performance issues and `DatabaseNotConnectedWarning` warnings. `run_sync` calls must occur within the scope of `with_async_env`. `with_async_env` is re-entrant and accepts an optional loop parameter.

```python
{!> ../docs_src/connections/contextmanager.py !}
```
To keep the loop alive for performance reasons we can either wrap the server worker loop or in case of
a single-threaded server the server loop which runs the application. As an alternative you can also keep the asyncio eventloop alive.
This is easier for sync first frameworks like flask.
Here an example which is even multithreading save.

To maintain the loop for performance reasons, you can wrap the server worker loop or, for single-threaded servers, the server loop that runs the application. Alternatively, you can keep the asyncio event loop alive, which is easier for synchronous-first frameworks like Flask. Here's an example that's multi-threading safe.

```python
{!> ../docs_src/connections/contextmanager_with_loop.py !}
```

That was complicated, huh? Let's unroll it in a simpler example with explicit loop cleanup.

That was complicated, right? Let's unroll it in a simpler example with explicit loop cleanup.

```python
{!> ../docs_src/connections/contextmanager_with_loop_and_cleanup.py !}
```

Note: `with_async_env` also calls `__aenter__` and `__aexit__` internally. So the database is connected during the
with scope spanned by `with_async_env`.
This means you can use `run_sync` as well as running commands in another loop via e.g. asyncio.run.
Everything **just** works without raising the `DatabaseNotConnectedWarning`.
This for example used for `edgy shell`.
Note: `with_async_env` internally calls `__aenter__` and `__aexit__`. Therefore, the database is connected during the `with` scope of `with_async_env`. This means you can use `run_sync` and run commands in another loop (e.g., via `asyncio.run`). Everything works without raising `DatabaseNotConnectedWarning`. This is used, for example, in `edgy shell`.

## `run_sync` function
## `run_sync` Function

`run_sync` needs a bit more explaination. On the one hand it hooks into the async environment
spawned by `with_async_env`. On the other hand it prefers checking for an active running loop (except if an explicit loop was provided).
If an active loop was found, a subloop is spawned which is only torn down when the found loop (or explicit provided loop) was collected.
When an idling loop was found, it will be reused, instead of creating a subloop.
`run_sync` requires further explanation. It integrates with the asynchronous environment created by `with_async_env` and prefers checking for an active running loop (unless an explicit loop is provided). If an active loop is found, a subloop is created, which is only terminated when the found loop (or explicit loop) is garbage collected. If an idling loop is found, it's reused instead of creating a subloop.

What is a subloop?

A subloop is an eventloop running in an extra thread. This enables us to run multiple eventloops simultanously.
They are removed when the parent eventloop is garbage collected.
A subloop is an event loop running in a separate thread. This allows multiple event loops to run concurrently. They are removed when the parent event loop is garbage collected.

However given that the eventloops are quite sticky despite they should have been garbage collected
we additionally poll if the old loop had stopped.
However, given that event loops can be sticky, we additionally check if the old loop has stopped.

## Querying other schemas
## Querying Other Schemas

Edgy supports that as well. Have a look at the [tenancy](./tenancy/edgy.md) section for more details.
Edgy supports querying other schemas. Refer to the [tenancy](./tenancy/edgy.md) section for details.

## Having multiple connections
## Multiple Connections

Edgy Registry has an extra parameter where named additional Database objects or strings can be defined. Having them there
is useful because they will be connected/disconnected too.
The Edgy Registry accepts an `extra` parameter for defining named additional `Database` objects or strings. Including them here ensures they're connected and disconnected appropriately.

You can switch to them on the fly via [using](./queries/queries.md#selecting-the-database-and-schema).
You can switch between them using [using](./queries/queries.md#selecting-the-database-and-schema).

## Migrate from flask-migrate

Expand Down
68 changes: 28 additions & 40 deletions docs/debugging.md
Original file line number Diff line number Diff line change
@@ -1,79 +1,67 @@
# Debugging & Performance

Edgy has several debug features, also through databasez. It tries also to keep it's eventloop and use the most smartest way
to execute a query performant.
For example asyncio pools are thread protected in databasez so it is possible to keep the connections to the database open.
Edgy provides several debugging features, also through databasez. It aims to maintain an efficient event loop and execute queries in the most performant manner. For example, asyncio pools are thread-protected in databasez, allowing connections to the database to remain open.

But this requires that databases and registries are not just thrown away but kept open during the operation. For getting a
sane lifespan a reference counter are used.
However, this requires that databases and registries are not simply discarded but kept open during operation. To ensure a proper lifespan, a reference counter is used.

When dropped to 0 the database is uninitialized and drops the connections.
When the reference count drops to 0, the database is uninitialized, and connections are closed.

There is no problem re-opening the database but it is imperformant and can have side-effects especcially with the `DatabaseTestClient`.
For this the `DatabaseNotConnectedWarning` warning exist.
Reopening the database is possible but inefficient and can lead to side effects, especially with the `DatabaseTestClient`. The `DatabaseNotConnectedWarning` exists to address this.

### Getting the SQL Query

### Getting the SQL query
The `QuerySet` contains a cached debug property named `sql`, which displays the `QuerySet` as a query with inserted blanks.

QuerySet contains a cached debug property named `sql` which contains the QuerySet as query with inserted blanks.
### Performance Warnings (`DatabaseNotConnectedWarning`)

### Performance warnings (`DatabaseNotConnectedWarning`)
The `DatabaseNotConnectedWarning` is likely the most common warning in Edgy.

The most common warning in edgy is probably the `DatabaseNotConnectedWarning` warning.
It is intentional and serves to guide users in improving their code, preventing unnecessary disposal of engines. Additionally, it can lead to difficult-to-debug errors in test environments due to a missing database (e.g., `drop_database` parameter).

It is deliberate and shall guide the user to improve his code so he doesn't throws away engines unneccessarily.
Also it could lead in test environments to hard to debug errors because of a missing database (drop_database parameter).
Edgy issues a `DatabaseNotConnectedWarning` when used without a connected database. To suppress it, wrap the affected code in a database scope:

Edgy issues a `DatabaseNotConnectedWarning` when using edgy without a connected database. To silence it, wrap the affected
code in a database scope

``` python
```python
await model.save()
# becomes
async with model.database:
await model.save()
```

If the warning is completely unwanted despite the performance impact, you can filter:
If the warning is completely unwanted despite the performance impact, you can filter it:

``` python
```python
import warnings
from edgy.exceptions import DatabaseNotConnectedWarning

with warnings.catch_warnings(action="ignore", category=DatabaseNotConnectedWarning):
await model.save()
```

It inherits from `UserWarning` so it is possible to filter UserWarnings.
It inherits from `UserWarning`, so filtering `UserWarning` is also possible.

However the silencing way is not recommended.
However, silencing the warning is generally not recommended.

## Many connections
## Many Connections

If the database is slow due to many connections by edgy and no `DatabaseNotConnectedWarning` warning was raised
it indicates that deferred fields are accessed.
This includes ForeignKey, which models are not prefetched via `select_related`.
If the database is slow due to numerous Edgy connections, and no `DatabaseNotConnectedWarning` was raised, it indicates that deferred fields are being accessed. This includes `ForeignKey` relationships where models are not prefetched via `select_related`.

### Debugging deferred loads
### Debugging Deferred Loads

For debugging purposes (but sacrificing deferred loads with it) you can set the ContextVariable
`edgy.core.context_vars.MODEL_GETATTR_BEHAVIOR` to `"passdown"` instead of `"load"`.
For debugging purposes (at the cost of deferred loads), you can set the `ContextVariable` `edgy.core.context_vars.MODEL_GETATTR_BEHAVIOR` to `"passdown"` instead of `"load"`.

This will lead to crashes in case an implicit loaded variable is accessed.
This will cause crashes if an implicitly loaded variable is accessed.

### Optimizing ReflectedModel
### Optimizing `ReflectedModel`

ReflectedModel have the problem that not all database fields are known. Therefor testing if an optional attribute
is available via `getattr`/`hasattr` will lead to a load first.
`ReflectedModel` has the issue that not all database fields are known. Therefore, testing if an optional attribute is available via `getattr`/`hasattr` will trigger a load first.

There are two ways to work around:
There are two ways to work around this:

1. Use the model instance dict instead (e.g. `model.__dict__.get("foo")` or `"foo" in model.__dict__`).
2. Add the optional available attributes to `__no_load_trigger_attrs__`. They won't trigger an load anymore.
1. Use the model instance dictionary instead (e.g., `model.__dict__.get("foo")` or `"foo" in model.__dict__`).
2. Add the optional available attributes to `__no_load_trigger_attrs__`. They will no longer trigger a load.

## Hangs

Hangs typical occur when there is only **one** connection available or the database is blocked.
This is normally easily debuggable often with the same ways like mentioned before because of the same reasons.
If it has hard to debug stack traces, it seems that threads and asyncio are mixed.
Hangs typically occur when only **one** connection is available or the database is blocked. This is usually easily debuggable, often with the same methods mentioned earlier, due to the same reasons. If there are hard-to-debug stack traces, it suggests that threads and asyncio are mixed.

Here you can enforce hard timeouts via the `DATABASEZ_RESULT_TIMEOUT` environment variable.
Here, you can enforce hard timeouts via the `DATABASEZ_RESULT_TIMEOUT` environment variable.
Loading

0 comments on commit db3a9cd

Please sign in to comment.