Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
apollo-server-core: unified Studio reporting (#4142)
The usage reporting plugin in `apollo-server-core` is not the first tool Apollo built to report usage to Studio. Previous iterations such as `optics-agent` and `engineproxy` reported a combination of detailed per-field single-operation performance *traces* and summarized *stats* of operations to Apollo's servers. When we built this TypeScript usage reporting plugin in 2018, for the sakes of expediency we did something different: it only sent traces to Apollo's servers. This meant that the performance of every single single user operation was described in detail to Apollo's servers. Studio is not an exhaustive trace warehouse: we have always *sampled* the traces received, making only some of them available via Studio's Traces UI. The other traces were converted to stats inside Studio's servers. While this meant that the reporting agent was simpler than the previous implementations (no need to be able to describe performance statistics), it also meant that the protocol used to talk to Studio consumed a lot more bandwidth (as well as CPU time for encoding traces). This PR returns us to the world where Studio usage is reported as a combination of stats and traces. It takes a slightly different approach than the previous implementations: instead of reporting stats and traces in parallel, usage reports contain both stats and traces. Each GraphQL operation is described either as a trace or as stats, not both. We expect this to significantly reduce the network and CPU requirements of sending usage reports to Studio. It should not significantly affect the experience of using Studio: we have always heavily sampled traces in Studio before saving them to the trace warehouse, and the default heuristic for which operations to send as traces works similarly to the heuristic used in Studio's servers. This PR introduces an option `experimental_sendOperationAsTrace` to allow you to control whether a given operation is sent as trace or stats. This is truly an experimental option that may change at any time. For example, you should not rely on the fact that this will be called on all operations after the operation is done with a full, or on its signature, or even that it exists. It is likely that future improvements to the usage reporting plugin will change how operations are observed so that we don't have to collect a full trace before deciding how to represent the operation. Some other notes: - Upgrade our fork `@apollo/protobufjs` with a few improvements: - New `js_use_toArray` option which lets you encode repeated fields from objects that aren't stored in memory as arrays but expose `toArray` methods. We use this so that we can build up `DurationHistogram`s and map-like objects in a non-array fashion and only convert to array at encoding time. - New `js_preEncoded` option which allows you to encode messages in repeated fields as buffers (Uint8Arrays). This helps amortize encoding cost of a large message over time instead of freezing the event loop to encode the whole message at once. This replaces an old hack we used for one field with something built in to the protobuf compiler (including correct TypeScript typings). - New `--no-from-object` flag which we use to reduce the size of generated code (as we don't use the fromObject protobuf.js API). - In order to help us validate that the trace->stats code in this PR matches similar code in Studio's servers, the flag `internal_includeTracesContributingToStats` sends the traces that contribute to stats in a special field. This is something we only use as part of our own validation in our servers; for your graphs it will have no effect other than increasing message size. - Viewing traces in Studio is only available on paid plans. The usage-reporting endpoint now tells the plugin whether traces are supported on your graph's plan; if not supported, the plugin will switch to sending all operations as stats (regardless of the value of `experimental_sendOperationAsTrace`) after the first report. - We try to estimate the message size compared to maxUncompressedReportSize via a rough estimate about how big the leaf nodes of the stats messages will be rather than carefully counting how much space is used by each number and histogram. We do take the lengths of all strings into account. - By mistake, this plugin never sent the cache policy on traces, meaning that visualizing cache-specific stats in Studio did not work. This is now fixed. This project was begun by @jsegaran and completed by @glasser.
- Loading branch information