Persistent query cache #6567

fantix · 2023-12-05T22:42:18Z

fantix
Dec 5, 2023
Maintainer

Motivation

The idea is to move the EdgeQL query cache from the EdgeDB server memory to the backend database.

Query cache is mostly compiled SQL to be executed. They don't change unless the database schema is updated, so they should survive a server restart to save re-compilation time (a.k.a. cache hydration). Storing the query cache in the database as a stored function also saves the network traffic between the server and the database, e.g., the server can send only the function name with arguments to the database, when an EdgeQL query is repeatedly executed. Also, the server memory footprint is reduced without the query cache (up to 1000+1000 cache entries).

Design

Headlines:

Compiler packs cache-able results in CREATE FUNCTION
Server maintains a table for cache functions registration
Both compile request and response are serialized in the registration table
Auto-recompilation of all cache with DDLs in the same transaction
Cache eviction is managed in the server with tenant-local LRU table
Use the regular PG NOTIFY to sync between tenants

Compile Request Serialization

The compile request is a combination of the query itself, and all types of parameters that affects compilation, including:

EdgeDB binary protocol version
Query input/output format (binary/JSON/...)
Implicit limits, and flags to inline type ID/names or object IDs in query results
Module aliases
Session/system config that affects compilation

It needs to be serialized for 3 reasons:

Sent to compiler processes through IPC
Calculate cache key using hashing algorithm (BLAKE-2b -> 16 bytes -> UUID-compatible)
Stored in the registration table for recompilation

The original query string is included in the serialized data, but only the normalized query hash is considered in the cache key calculation, because:

We want to reuse the same cache for queries that only have different literal values, like select 42 and select 66
Tokenization is fast

At last, a numerical serialization version is also added to the binary, so that we can keep the backward compatibility while making changes to the data format in the future.

Schema version is taken care of separately; neither exists in the serialized request nor affects the cache key.

We reviewed several serialization frameworks (see also the rejected ideas) and decided to use a hand-written custom Cython serialization because:

We already have custom serialization for config values (sertypes.py)
It's easier with custom code to control what is serialized and what is hashed
We can have separate deserialize functions for each historical serialization version for compatibility

Stored Functions

In the "high-level compiler" after a successful compilation of a cache-able query, the result SQL is wrapped in a CREATE FUNCTION, where:

The function name is __qh_{cache_key_hex}
The function arguments are simply compiled SQL parameters with explicit type casting
Returns record for object types, corresponding PG types for scalars, json if the output format is JSON
Returns setof ... if the result cardinality is multi, or the output format is JSON_ELEMENTS
We use regular "SQL" functions

At the same time, a DROP FUNCTION SQL is also created (and stored in the registration table in the next step), because dropping a PG function needs not only the name but also the arguments signature. When evicting a cache entry, we can directly use this drop SQL.

~~Pending issue: In certain cases, Postgres is not satisfied with just record without a specific shape.~~

The CREATE FUNCTION and DROP FUNCTION SQL texts are added to each QueryUnit as a new field cache_sql, while the original sql field stores the function call SQL, so that the majority of execution code remains the same.

Performance-wise, PostgreSQL stores AST of function implementation, which saves some (if not little) parsing time on execution. EdgeDB server uses the same logic of handling prepared statements for calling (cached) functions, so this part has no change. See also the benchmarks section of a large number of functions.

Cache Management

This design proposes a new internal table edgedb._query_cache:

Field	Type	Constraints	Description
key	uuid	PRIMARY KEY	The 16-bytes cache key
schema_version	uuid	NOT NULL	User schema version
input	bytea	NOT NULL	Serialized compile request
output	bytea	NOT NULL	Pickled `QueryUnitGroup`
evict	text	NOT NULL	SQL of `DROP FUNCTION`

All rows are loaded at server start into the in-memory LRU cache. EdgeDB queries will only look at the LRU cache for execution.

Compiled result is persisted (inserted into this table, and create the actual function) before execute in the same transaction, if it was a fresh compile. Only after this transaction is committed successfully, can we update the EdgeDB server in-memory LRU cache, so that concurrent queries won't use the cache before it's actually ready in the database.

There's a new function edgedb._evict_query_cache() that deletes one row and drop its registered function, used by the server asynchronously after the insertion of the in-memory LRU cache if it's full.

key and schema_version is NOT a composite primary key, meaning we only keep cache for one version of the user schema. Before executing DDLs, we call a new function edgedb._clear_query_cache() to clear this table, drop all registered functions, and return input of all rows in the same transaction. After DDL applies successfully, we send those already-serialized compile requests to the compiler in parallel with the new schema for recompilation. We will re-create the cache before the DDL transaction ends with only successful compilation, discarding cache entries that can no longer be compiled after DDL.

After any changes to the database query cache, the EdgeDB server will issue a PG notification query-cache-changes to other tenants on the same backend, so that they got a chance to load the changes into their in-memory LRU cache. We don't have a centralized LRU statistics across tenants to avoid the complication, trusting each tenant to have roughly the same load-balanced cache LRU statistics.

Compile Result

The result of compilation is a QueryUnitGroup object. Except for SQL, it also has a lot of information used by the I/O server to complete query execution, like:

Query capabilities
Cardinality
Type descriptors
State serializer
Flags of what's in the query
Modified globals and session config

Currently, QueryUnitGroup is simply pickled, but ideally, this should be replaced with a custom serialization like the request. However, the only reason for that is compatibility, so that we don't have to drop old cache after server updates.

Compilation and transactions (TBD)

In general, compilation results in transactions should be temporarily stored in memory, reused within the transaction if the same command is issued multiple times. And those after DDL could be persisted when the transaction is committed successfully. This optimization was broken before this proposal, and I'm planning to add it back properly.

Scripts (TBD)

Scripts can be cached per statements under certain circumstances for better cache hit rate (hence faster compilation) and less cache storage. This requires us to handle DDLs carefully, but it's overall doable and similar to compilation in transactions.

Benchmarks (TBD)

How well functions perform in general with imdbench (smaller SQL per execution? skipped SQL parsing? function overheads?)
How well Postgres handles lots of functions (how do we configure the default max cache size)

Implementation

Rejected Ideas

Centralized concurrency control

Assume the function exists and try to execute optimistically, and only compile with cache table row locks. This introduces too much complexity and performance impacts but only for a not-so-important consistency across tenants.

Serialization Libraries

Python pickle doesn't play well with code changes / server udpates.
FlatBuffers's Python support is suboptimal (string, required not working, wrong imports with --gen-onefile).
pyserde introduces too many dependencies, and it's still young.
serde is too far away from Python - the bridging trouble is unnecessary.
msgpack or Avro are not evaluated.

CodesInChaos · 2024-02-11T16:14:09Z

CodesInChaos
Feb 11, 2024

Some thoughts:

If it's possible to use edgedb with different postgres-users on the same postgres instance, you need to scope the cache to that user. Otherwise a lower privileged user could save an incorrect SQL query which is then used by a higher privileged user.
128-bit hashes offer 64-bit collision resistance, which is relatively low. I struggle to find a scenario where that's an actual security issue, but it feels a bit uncomfortable.
The original query string is included in the serialized data

This means that potentially sensitive arguments are persisted in the database.
(The tokenized version should be harmless, if you replace all hardcoded values in the query by parameters)
trusting each tenant to have roughly the same load-balanced cache LRU statistics.

I'm not sure if that's always justified. I assume that if a client sends all its queries over a single connection, they'll all be processed by the same edgedb process.

9 replies

CodesInChaos Feb 11, 2024

this would also make the hot happy-path less happy (LRU -> LRC(reated)).

I'm not proposing an LRC in general. But to use min(least-recently-used-by-this-process, created-at) for evictions, or alternatively see it as "when evicting cache entries that this process has never used, pick the oldest one of them". In the happy path this would still function as an LRU cache. But in split brain scenarios, it'd gracefully degrade to an LRC cache for the entries not accessed by this process, instead of potentially evicting an entry the other process just created.

fantix Feb 12, 2024
Maintainer Author

Wouldn't a basic high availability setup have one edgedb server per availability-zone (so ~3 total) in front of a HA postgres cluster (3 postgres servers in 3 AZs with consensus based master elections)?

Yes, in that case, the traffic is usually even among the 3 EdgeDB, so it's relatively safe to evict based on the statistics of just one of them, right?

fantix Feb 12, 2024
Maintainer Author

use min(least-recently-used-by-this-process, created-at) for evictions, or alternatively see it as "when evicting cache entries that this process has never used, pick the oldest one of them". In the happy path this would still function as an LRU cache. But in split brain scenarios, it'd gracefully degrade to an LRC cache for the entries not accessed by this process, instead of potentially evicting an entry the other process just created.

Ah I see, that makes sense. Thanks for the suggestion!

CodesInChaos Feb 12, 2024

Yes, in that case, the traffic is usually even among the 3 EdgeDB

In the normal case, probably yes. But consider cases where you start an application and fill its connection pool when only one of the edgedb processes is running. Unless your client has some kind of pool re-balancing functionality, that application will continue to only work with that single process. When later more instances of edgedb start up, they might only get queries from different applications with a different query profile.

It also relies on clients picking an idle connection randomly or via round robin from the connection pool when executing a query, and not always picking the same connection (I don't remember if that's what your clients do).

fantix Feb 14, 2024
Maintainer Author

That's a sound analysis, thanks for the input! This persistent cache design is an ongoing effort, so let's address this step by step. created_at sounds like a good solution, I'll run some tests to see how that performs in practice.

msullivan · 2024-02-14T20:04:23Z

msullivan
Feb 14, 2024
Maintainer

Pending issue: In certain cases, Postgres is not satisfied with just record without a specific shape.

Can you say anything about what cases those are? Are we able to detect them and not cache them, at least?

After DDL applies successfully, we send those already-serialized compile requests to the compiler in parallel with the new schema for recompilation. We will re-create the cache before the DDL transaction ends with only successful compilation, discarding cache entries that can no longer be compiled after DDL.

I think we want to do the recompilation before starting the postgres transaction, in order to avoid keeping the postgres transaction open for a long time. (This means that we might miss recompiling a query that came in in between, but that's probably fine).

I'm also a little nervous about potential transaction serialization errors resulting from the cache, so we'll need to be potentially careful about that.

6 replies

elprans Feb 14, 2024
Maintainer

Clear cache table before DDL so that DDL doesn't error out on old cache function dependencies

We don't necessarily need to clear the cache in-transaction, the garbage collection can be done later, even asynchronously.

elprans Feb 14, 2024
Maintainer

Or are we putting any user-defined types in the function definition? Functions bodies are not traced by Postgres.

fantix Feb 14, 2024
Maintainer Author

I think it's possible in the function parameter types, but maybe we should just upcast those user scalars to PG built-in types?

fantix Feb 19, 2024
Maintainer Author

This is the case which is fixed by "clearing the cache in-transaction":

_localdev:edgedb> create scalar type Foo extending str;
OK: CREATE SCALAR TYPE

_localdev:edgedb> select <Foo>'hello';
{'hello'}

_localdev:edgedb> drop scalar type Foo;
edgedb error: InternalServerError: cannot drop type edgedbpub."8dcd12b6-cebd-11ee-a000-bf47eb750afe_domain" because other objects depend on it
  Server traceback:
      edb.errors.InternalServerError: cannot drop type edgedbpub."8dcd12b6-cebd-11ee-a000-bf47eb750afe_domain" because other objects depend on it

The cache function looks like:

CREATE OR REPLACE FUNCTION edgedb.__qh_fef0cd521a2294bc749bc51e6f770d75(text)
 RETURNS edgedbpub."8dcd12b6-cebd-11ee-a000-bf47eb750afe_domain"
 LANGUAGE sql
 STABLE
AS $function$
SELECT/*<pg.SelectStmt at 0x1083dfb50>*/
        "expr-2~2"."expr~2_serialized~1" AS "expr~2_serialized~2"
    FROM
        LATERAL
        (SELECT/*<pg.SelectStmt at 0x108417c50>*/
                ($1)::edgedbpub."8dcd12b6-cebd-11ee-a000-bf47eb750afe_domain" AS "expr~2_serialized~1"
        ) AS "expr-2~2"
    LIMIT
    (SELECT/*<pg.SelectStmt at 0x10774f790>*/
            (101)::int8 AS "expr~3_value~1"
    )
$function$

So, upcast?

elprans Feb 19, 2024
Maintainer

downcasting to base scalars should be fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent query cache #6567

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 15 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Persistent query cache #6567

fantix Dec 5, 2023 Maintainer

Motivation

Design

Compile Request Serialization

Stored Functions

Cache Management

Compile Result

Compilation and transactions (TBD)

Scripts (TBD)

Benchmarks (TBD)

Implementation

Rejected Ideas

Centralized concurrency control

Serialization Libraries

Replies: 2 comments · 15 replies

CodesInChaos Feb 11, 2024

CodesInChaos Feb 11, 2024

fantix Feb 12, 2024 Maintainer Author

fantix Feb 12, 2024 Maintainer Author

CodesInChaos Feb 12, 2024

fantix Feb 14, 2024 Maintainer Author

msullivan Feb 14, 2024 Maintainer

elprans Feb 14, 2024 Maintainer

elprans Feb 14, 2024 Maintainer

fantix Feb 14, 2024 Maintainer Author

fantix Feb 19, 2024 Maintainer Author

elprans Feb 19, 2024 Maintainer

fantix
Dec 5, 2023
Maintainer

Replies: 2 comments 15 replies

CodesInChaos
Feb 11, 2024

fantix Feb 12, 2024
Maintainer Author

fantix Feb 12, 2024
Maintainer Author

fantix Feb 14, 2024
Maintainer Author

msullivan
Feb 14, 2024
Maintainer

elprans Feb 14, 2024
Maintainer

elprans Feb 14, 2024
Maintainer

fantix Feb 14, 2024
Maintainer Author

fantix Feb 19, 2024
Maintainer Author

elprans Feb 19, 2024
Maintainer