-
-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional columns to Almanac Requests table #1293
Comments
Yeah I don't know why those fields have such weird values. I think making the |
I've checked and they are wrong in the HAR. If we could update it ASAP, it would be really useful and save @gregorywolf (and therefore your coupons!) a lot. However if that's a lot of effort than can just query this from the And if we added these columns while we are at it, then perhaps none would need to look at the
But some of these might be less useful so maybe should just take the hit there rather than clogging up the table with so many columns. |
Actually I was wrong - all 17 queries need this expensive column at the moment and not just 16/17 |
@rviscomi I think getting this change done ASAP would be great. I also worked with @paulcalvano this morning and he created a new sample_data.requests_withprotocol table which confirmed that there is a massive reduction in cost. In addition to the protocol field it would be great to add the other fields that @tunetheweb has identified in the above comment |
I'm supportive of this change. It'd be good to see if there are any other easy optimization wins that we can do all at once. FWIW I've made it about halfway through running/optimizing the H2 queries. The results are saved to the H2 sheet, so we don't need to rerun those queries unless there are any other structural changes. |
Here's some more fields from security chapter last year:
I also discovered they are looking at the |
@rviscomi Thank you for running the queries for me and dropping the results into the H2 sheet. I am reviewing all of @tunetheweb comments in my PR. There may be reason to rerun a couple of the queries (i.e., combining QUIC & http/2+quic/46 as H3). If/when I decide to rerun some queries should I ask you or @paulcalvano or will there be a new table created that contains the fields of interest as requested above? |
To answer my own question, it looks like it does! @gregorywolf you should move your queries to using these instead of the @paulcalvano / @rviscomi I've updated the first comment with the complete list as far as I can see. Can we get requests table updated with these? And also update the request table definition at the same time. |
Chatted with @gregorywolf about this. Given that we're so far info analysis, I wouldn't want to make analysts rewrite their queries to be more efficient and use these fields. In the H2 queries, I've already run them and saved results, so efficiency isn't a blocker there. If any analysts are running out of quota due to these expensive queries, I'm happy to do the query execution using my unlimited quota. I do think these are good changes to make for 2021 analysis, so we should update the |
FYI: I think I've found the reason why the HTTP Version fields are not set correctly and what the odd values are. Most of them are HTTP/2 pseudo fields incorrectly being parsed under the assumption they are the initial HTTP/1 lines ( The blank ones are HTTP/1.1 where the blank line at the end was being incorrectly parsed to get the version. I've submitted a PR to fix this so probably wanna keep the Still think the other columns (including |
Great catch, and thanks for the fix! |
@rviscomi what do you think about adding Speaking to @tomvangoethem as his Security PR has a lot of querying of those and the We could also replace Also presume this is for next year? In which case we'll need to run @tomvangoethem 's queries for him this year if you're OK with that. |
I'm open to more granular summary data, but let's not block on it for this year. |
OK added those two columns to first comment for when we return to this later. |
The
httparchive.almanac.requests
table includes arespHttpVersion
and areqHttpVersion
column neither of which contain any meaningful data.All the HTTP/2 queries last year were based on the
JSON_EXTRACT_SCALAR(payload, "$._protocol")
column which does appear to be mostly accurate. We should add the protocol column to thehttparchive.almanac.requests
table and/or replace therespHttpVersion
and areqHttpVersion
columns with this data to make it easier and cheaper to query HTTP version requests.Edit. Complete list of headers we should add if possible:
$._protocol
$._was_pushed
$._tls_version
$._tls_cipher_suite
$._securityDetails.issuer
$._securityDetails.keyExchange
$._securityDetails.cipher
$._securityDetails.protocol
- not sure how this is different than TLS Version to be honest$.request.headers
- in JSON format rather than likereqOtherHeaders
$.response.headers
- in JSON format rather than likerespOtherHeaders
summary_pages
Also fix the
startedDateTime
field as off by 1000.CC: @gregorywolf
The text was updated successfully, but these errors were encountered: