-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add query performance statistics #7869
Conversation
…38a38408e8adec971740966
* Add query info JSON to EdgeQL-compiled SQL * Extract info in the extension and track original query * Add `cache_key` in the stats hash table * Add view of `sys::QueryStats` * Add basic test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK this all looks good except I realize I might not understand the memory management model in play here.
As discussed on the call, we need to make a few adjustments.
The plan for tagging is as follows:
Tags should be documented as arbitrary strings. We can reserve I think we should put a max-length constraint on this field and limit tags to 200-or-something characters. cc @elprans @msullivan and obviously @fantix |
I think Elvis wanted to inject the tag on the server side, to avoid compiler round trips. |
If the same key exists in multiple lines, the first is effective while the rest is simply ignored. If all expected keys are found, the remaining lines are also ignored.
ca51bf3
to
f26c9c5
Compare
c79bfb4
to
00f9cd5
Compare
4784e0c
to
205546a
Compare
Also, stop using cache_key as the stats entry ID, calculate hash with the JSON string instead.
205546a
to
9e1b40b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks basically good but I want to make sure I understand the behavior in some of the edge cases where the input might be malformed.
In particular, what happens if uuid_in fails because the id
is malformed?
(To some extent, this is just me not knowing postgres internals well. It might be that the answer is simple? Does failure do a longjmp or something and just abort everything?)
JsonParseErrorType parse_rv = pg_parse_json(lex, &sem); | ||
freeJsonLexContext(lex); | ||
|
||
if (parse_rv == JSON_SUCCESS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the state get mutated even if there is an error? I guess that's probably fine?
What happens to info_len
on an error? Is it still updated with however much got consumed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the state get mutated even if there is an error? I guess that's probably fine?
Yes, it'll keep the parsed values and continue to the next line, which I think is fine.
What happens to
info_len
on an error? Is it still updated with however much got consumed?
On error, info_len
will be updated to skip the whole failing line and restart on the next line.
if ((state.found & EDB_STMT_INFO_PARSE_REQUIRED) == EDB_STMT_INFO_PARSE_REQUIRED) | ||
return info->query_id != UINT64CONST(0) ? info : NULL; | ||
|
||
info_str += info_len + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The +1 makes me nervous about the case where there isn't a newline at the end of an info line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the cases where there are untrusted entries in the query log is that likely, but this is C so I want to be extra careful about our boundary cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, good question. The edbss_extract_info_line()
function will tag the \n
or the end of the query_str
, so this +1
will either skip the \n
properly, or go beyond the end of the query_str
and cause the next call to edbss_extract_info_line()
to return NULL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because len is negative, at that point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly
Datum id_datum = DirectFunctionCall1(uuid_in, CStringGetDatum(token)); | ||
pg_uuid_t *id_ptr = DatumGetUUIDP(id_datum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The outer-most PG_CATCH
in the current session will - like you said - do a longjmp and recover by sending an error to the peer. Such error will be propagated to the client as an edb.errors.InvalidValueError
:
ERROR 116169 - 2024-11-20T11:22:27.377 postgres: invalid input syntax for type uuid: "b2f8e457-a4f8-ab73-1979-afb333f9c"
INFO 116169 - 2024-11-20T11:22:27.377 postgres: -- {"query": "select\n (<__std__::int64>$0 + <__std__::int64>$1)", "type": 1, "extras": "{\"cc\": {\"__internal_no_apply_query_rewrites\": false, \"__internal_query_reflschema\": false, \"__internal_testmode\": false, \"allow_bare_ddl\": \"AlwaysAllow\", \"allow_dml_in_functions\": false, \"allow_user_specified_id\": false, \"apply_access_policies\": true, \"force_database_error\": \"false\", \"query_cache_mode\": \"Default\", \"simple_scoping\": null, \"store_migration_sdl\": \"NeverStore\", \"warn_old_scoping\": null}, \"pv\": [3, 0], \"of\": \"BINARY\", \"e1\": false, \"il\": 101, \"ii\": false, \"in\": true, \"io\": false, \"dn\": \"default\"}", "id": "b2f8e457-a4f8-ab73-1979-afb333f9c"}
INFO 116169 - 2024-11-20T11:22:27.377 postgres: SELECT edgedb_v6_2f20a50ab0.__qh_bd20a1eba9bb696335db87182e5b207f(($1)::int8, ($2)::int8)
---------------------------------------------------------------------- Exception occurred: invalid input syntax for type std::uuid: "b2f8e457-a4f8-ab73-1979-afb333f9c" ----------------------------------------------------------------------
1. edb.errors.InvalidValueError: invalid input syntax for type std::uuid: "b2f8e457-a4f8-ab73-1979-afb333f9c"
------------------------------------------------------------------------------------------------------------------ Details -------------------------------------------------------------------------------------------------------------------
edb.errors.InvalidValueError: invalid input syntax for type std::uuid: "b2f8e457-a4f8-ab73-1979-afb333f9c"
ERROR 116111 _localdev 2024-11-20T11:22:27.377 asyncio: an error in edgedb protocol
protocol: <edb.server.protocol.binary.EdgeConnection object at 0x73c6cfabf370>
transport: <uvloop.loop._SSLProtocolTransport object at 0x73c6d48086c0>
If you merge this now, please update #7725 with the remaining pending tasks so we can track them |
This is the 2nd take after #7814, forking the builtin pg_stat_statement extension from the upstream master branch. It's different in a way that we can extract JSON query info only once per query across parse/plan/execute runs (unless reset cleared the hashtable row), and some custom stats columns are directly stored as a hashtable column.
Please review each commit separately.
Add edb_stat_statements Postgres extension (forked from the master
branch of the upstream pg_stat_statement extension) to handle custom
query performance statistics.
sys::QueryStats
is added as a view ofthe statistics.
This is done in a way that, for each stats-significant SQL we send to the
backend, one or more comment lines of "query stats info" JSONs are
prepended for the Postgres extension to ingest and record in the modified
statistics hash table. Among the stats info fields,
id: uuid
is especiallyimportant to identify different queries and accumulate stats of the same
query onto the same hash table entry, which reflects some settings that
affected the compilation (excluding the user schema version for common
grouping of stats). Particularly, the first-8-bytes of
id
is also used bythe Postgres extension to replace the underlying
queryId
of the SQLstatement, so that the same frontend query can be recognized across all
PARSE/EXECUTE operations in Postgres for stats recording.
System queries, DDLs, and unrecognized queries are not recorded.
Refs #7725