-
-
Notifications
You must be signed in to change notification settings - Fork 185
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Recycle 2020 ecommerce queries Updating the 2020 queries to make it ready for 2021 * Fix linting errors * domains that are marked as payment processors but not ecommerce * update all the new 2021 files to use 2021_07_01 or 2020_08_01 or 2019_07_01 where needed * fixing the queries after running the updated linter * update the readme * add some new queries; update the readme * Update all_categories.sql fix a typo changing from `app` to `category` * add hreflang queries * rename file * update the README * add ecomm + csp query * add aug and sep versions of cmp query * add app links query update * updated readme * update readme * update the top vendors file to use ranks * fixing the query to use well-known * Update README.md Updating list of queries used, lists created for testing and unused items * remove unused/informational queries * fix this to use production tables instead of sample data * fix to not use the ROUND function * update README * fix to stop using ROUND * fixing this query to use UNNEST and remove some redundancy * fix linter issue * fix query and fix linting again * fix the ecomm category matches * Update sql/2021/ecommerce/core_web_vitals_passingmetrics_byvendor_bydevice.sql Updating the query as its possible FID could be null Co-authored-by: Rick Viscomi <[email protected]> Co-authored-by: Barry <[email protected]> Co-authored-by: Rick Viscomi <[email protected]>
- Loading branch information
1 parent
191dece
commit c429654
Showing
47 changed files
with
2,017 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
#standardSQL | ||
# This query uses custom metric '_well-known' - https://github.com/HTTPArchive/legacy.httparchive.org/blob/master/custom_metrics/well-known.js | ||
# Note that in this query, there is a subtle bug where the site could have empty /.well-known/assetlinks.json or /.well-known/apple-app-site-association files which will lead to over counting sites with native app links | ||
# an example is: https://www.allbirds.com/.well-known/assetlinks.json which has a payload of "[]" | ||
# To fix this, this would require response body parsing on well-known.js | ||
|
||
SELECT | ||
client, | ||
COUNTIF(android_app_links) AS android_app_links, | ||
COUNTIF(ios_universal_links) AS ios_universal_links, | ||
COUNT(0) AS total, | ||
COUNTIF(android_app_links) / COUNT(0) AS pct_android_app_links, | ||
COUNTIF(ios_universal_links) / COUNT(0) AS pct_ios_universal_links | ||
FROM ( | ||
SELECT DISTINCT | ||
_TABLE_SUFFIX AS client, | ||
url | ||
FROM | ||
`httparchive.technologies.2021_07_01_*` | ||
WHERE | ||
category = 'Ecommerce' AND | ||
(app != 'Cart Functionality' AND | ||
app != 'Google Analytics Enhanced eCommerce')) | ||
JOIN ( | ||
SELECT | ||
_TABLE_SUFFIX AS client, | ||
url, | ||
JSON_VALUE(JSON_EXTRACT_SCALAR(payload, '$._well-known'), '$."/.well-known/assetlinks.json".found') = 'true' AS android_app_links, | ||
JSON_VALUE(JSON_EXTRACT_SCALAR(payload, '$._well-known'), '$."/.well-known/apple-app-site-association".found') = 'true' AS ios_universal_links | ||
FROM | ||
`httparchive.pages.2021_07_01_*`) | ||
USING | ||
(client, url) | ||
GROUP BY | ||
client | ||
ORDER BY | ||
client | ||
|
41 changes: 41 additions & 0 deletions
41
sql/2021/ecommerce/core_web_vitals_distribution_byvendor_bydevice.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
#standardSQL | ||
# Core Web Vitals distribution by Ecommerce vendor | ||
# | ||
# Note that this is an unweighted average of all sites per Ecommerce vendor. | ||
# Performance of sites with millions of visitors as weighted the same as small sites. | ||
SELECT | ||
client, | ||
ecomm, | ||
COUNT(DISTINCT origin) AS origins, | ||
SUM(fast_lcp) / (SUM(fast_lcp) + SUM(avg_lcp) + SUM(slow_lcp)) AS good_lcp, | ||
SUM(avg_lcp) / (SUM(fast_lcp) + SUM(avg_lcp) + SUM(slow_lcp)) AS ni_lcp, | ||
SUM(slow_lcp) / (SUM(fast_lcp) + SUM(avg_lcp) + SUM(slow_lcp)) AS poor_lcp, | ||
SUM(fast_fid) / (SUM(fast_fid) + SUM(avg_fid) + SUM(slow_fid)) AS good_fid, | ||
SUM(avg_fid) / (SUM(fast_fid) + SUM(avg_fid) + SUM(slow_fid)) AS ni_fid, | ||
SUM(slow_fid) / (SUM(fast_fid) + SUM(avg_fid) + SUM(slow_fid)) AS poor_fid, | ||
SUM(small_cls) / (SUM(small_cls) + SUM(medium_cls) + SUM(large_cls)) AS good_cls, | ||
SUM(medium_cls) / (SUM(small_cls) + SUM(medium_cls) + SUM(large_cls)) AS ni_cls, | ||
SUM(large_cls) / (SUM(small_cls) + SUM(medium_cls) + SUM(large_cls)) AS poor_cls | ||
FROM | ||
`chrome-ux-report.materialized.device_summary` | ||
JOIN ( | ||
SELECT DISTINCT | ||
_TABLE_SUFFIX AS client, | ||
url, | ||
app AS ecomm | ||
FROM | ||
`httparchive.technologies.2021_07_01_*` | ||
WHERE | ||
category = 'Ecommerce' AND | ||
(app != 'Cart Functionality' AND | ||
app != 'Google Analytics Enhanced eCommerce')) | ||
ON | ||
CONCAT(origin, '/') = url AND | ||
IF(device = 'desktop', 'desktop', 'mobile') = client | ||
WHERE | ||
date = '2021-07-01' | ||
GROUP BY | ||
client, | ||
ecomm | ||
ORDER BY | ||
origins DESC |
63 changes: 63 additions & 0 deletions
63
sql/2021/ecommerce/core_web_vitals_passingmetrics_byvendor_bydevice.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
#standardSQL | ||
# CrUX Core Web Vitals performance of Ecommerce vendors by device | ||
CREATE TEMP FUNCTION IS_GOOD (good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS ( | ||
good / (good + needs_improvement + poor) >= 0.75 | ||
); | ||
|
||
CREATE TEMP FUNCTION IS_NON_ZERO (good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS ( | ||
good + needs_improvement + poor > 0 | ||
); | ||
|
||
|
||
SELECT | ||
client, | ||
ecomm, | ||
COUNT(DISTINCT origin) AS origins, | ||
# Origins with good LCP divided by origins with any LCP. | ||
SAFE_DIVIDE( | ||
COUNT(DISTINCT IF(IS_GOOD(fast_lcp, avg_lcp, slow_lcp), origin, NULL)), | ||
COUNT(DISTINCT IF(IS_NON_ZERO(fast_lcp, avg_lcp, slow_lcp), origin, NULL))) AS pct_good_lcp, | ||
|
||
# Origins with good FID divided by origins with any FID. | ||
SAFE_DIVIDE( | ||
COUNT(DISTINCT IF(IS_GOOD(fast_fid, avg_fid, slow_fid), origin, NULL)), | ||
COUNT(DISTINCT IF(IS_NON_ZERO(fast_fid, avg_fid, slow_fid), origin, NULL))) AS pct_good_fid, | ||
|
||
# Origins with good CLS divided by origins with any CLS. | ||
SAFE_DIVIDE( | ||
COUNT(DISTINCT IF(IS_GOOD(small_cls, medium_cls, large_cls), origin, NULL)), | ||
COUNT(DISTINCT IF(IS_NON_ZERO(small_cls, medium_cls, large_cls), origin, NULL))) AS pct_good_cls, | ||
|
||
# Origins with good LCP, FID, and CLS dividied by origins with any LCP, FID, and CLS. | ||
SAFE_DIVIDE( | ||
COUNT(DISTINCT IF( | ||
IS_GOOD(fast_lcp, avg_lcp, slow_lcp) AND | ||
(NOT IS_NON_ZERO(fast_fid, avg_fid, slow_fid) OR IS_GOOD(fast_fid, avg_fid, slow_fid)) AND | ||
IS_GOOD(small_cls, medium_cls, large_cls), origin, NULL)), | ||
COUNT(DISTINCT IF( | ||
IS_NON_ZERO(fast_lcp, avg_lcp, slow_lcp) AND | ||
IS_NON_ZERO(small_cls, medium_cls, large_cls), origin, NULL))) AS pct_good_cwv | ||
FROM | ||
`chrome-ux-report.materialized.device_summary` | ||
JOIN ( | ||
SELECT | ||
_TABLE_SUFFIX AS client, | ||
url, | ||
app AS ecomm | ||
FROM | ||
`httparchive.technologies.2021_07_01_*` | ||
WHERE | ||
category = 'Ecommerce' AND | ||
(app != 'Cart Functionality' AND | ||
app != 'Google Analytics Enhanced eCommerce')) | ||
ON | ||
CONCAT(origin, '/') = url AND | ||
IF(device = 'desktop', 'desktop', 'mobile') = client | ||
WHERE | ||
date = '2021-07-01' | ||
GROUP BY | ||
client, | ||
ecomm | ||
ORDER BY | ||
origins DESC | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
#standardSQL | ||
# 13_03: Timeseries to show eCommerce growth acceleration due to Covid-19 | ||
# Excluding apps which are not eCommerce platforms/vendors themselves but are used to identify eCommerce sites. These are signals added in Wappalyzer in 2020 to get better idea on % of eCommerce sites but these are not relevant for vendor % market share analysis | ||
SELECT | ||
IF(ENDS_WITH(_TABLE_SUFFIX, '_desktop'), 'desktop', 'mobile') AS client, | ||
COUNT(DISTINCT url) AS freq, | ||
total, | ||
COUNT(DISTINCT url) / total AS pct, | ||
2021 AS year, | ||
LEFT(_TABLE_SUFFIX, 2) AS month | ||
FROM | ||
`httparchive.technologies.2021_*` | ||
JOIN | ||
(SELECT | ||
_TABLE_SUFFIX, | ||
COUNT(DISTINCT url) AS total | ||
FROM | ||
`httparchive.summary_pages.2021_*` | ||
GROUP BY | ||
_TABLE_SUFFIX) | ||
USING (_TABLE_SUFFIX) | ||
WHERE | ||
category = 'Ecommerce' | ||
GROUP BY | ||
client, | ||
year, | ||
month, | ||
total | ||
|
||
UNION ALL | ||
|
||
SELECT | ||
IF(ENDS_WITH(_TABLE_SUFFIX, '_desktop'), 'desktop', 'mobile') AS client, | ||
COUNT(DISTINCT url) AS freq, | ||
total, | ||
COUNT(DISTINCT url) / total AS pct, | ||
2020 AS year, | ||
LEFT(_TABLE_SUFFIX, 2) AS month | ||
FROM | ||
`httparchive.technologies.2020_*` | ||
JOIN | ||
(SELECT | ||
_TABLE_SUFFIX, | ||
COUNT(DISTINCT url) AS total | ||
FROM | ||
`httparchive.summary_pages.2020_*` | ||
GROUP BY | ||
_TABLE_SUFFIX) | ||
USING (_TABLE_SUFFIX) | ||
WHERE | ||
category = 'Ecommerce' | ||
GROUP BY | ||
client, | ||
year, | ||
month, | ||
total | ||
|
||
ORDER BY | ||
year DESC, | ||
month DESC |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
#standardSQL | ||
# 13_04: Timeseries to show eCommerce vendors growth acceleration due to Covid-19 | ||
# Excluding apps which are not eCommerce platforms/vendors themselves but are used to identify eCommerce sites. These are signals added in Wappalyzer in 2020 to get better idea on % of eCommerce sites but these are not relevant for vendor % market share analysis | ||
# Limiting to top 5000 records to continue further analysis in Google Sheets. Using HAVING clauses based on 'pct' results in missing data for certain months | ||
SELECT | ||
IF(ENDS_WITH(_TABLE_SUFFIX, '_desktop'), 'desktop', 'mobile') AS client, | ||
app, | ||
COUNT(DISTINCT url) AS freq, | ||
total, | ||
COUNT(DISTINCT url) / total AS pct, | ||
LEFT(_TABLE_SUFFIX, 4) AS year, | ||
SUBSTR(_TABLE_SUFFIX, 6, 2) AS month | ||
FROM | ||
`httparchive.technologies.*` | ||
JOIN | ||
(SELECT | ||
_TABLE_SUFFIX, | ||
COUNT(DISTINCT url) AS total | ||
FROM | ||
`httparchive.summary_pages.*` | ||
GROUP BY | ||
_TABLE_SUFFIX) | ||
USING (_TABLE_SUFFIX) | ||
WHERE | ||
category = 'Ecommerce' AND | ||
(app != 'Cart Functionality' AND | ||
app != 'Google Analytics Enhanced eCommerce') | ||
GROUP BY | ||
client, | ||
app, | ||
year, | ||
month, | ||
total | ||
ORDER BY | ||
pct DESC, | ||
client DESC, | ||
app DESC | ||
LIMIT 5000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
#standardSQL | ||
# 13_20: Lighthouse category scores per eCommerce plaforms. Web Almanac run LightHouse only in mobile mode and hence references to mobile tables | ||
SELECT | ||
app AS ecommVendor, | ||
COUNT(DISTINCT url) AS freq, | ||
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.performance.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_performance, | ||
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.accessibility.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_accessibility, | ||
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.pwa.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_pwa, | ||
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.seo.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_seo, | ||
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.best-practices.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_best_practices | ||
FROM | ||
`httparchive.lighthouse.2021_07_01_mobile` | ||
JOIN | ||
`httparchive.technologies.2021_07_01_mobile` | ||
USING | ||
(url) | ||
WHERE | ||
category = 'Ecommerce' AND | ||
(app != 'Cart Functionality' AND | ||
app != 'Google Analytics Enhanced eCommerce') | ||
GROUP BY | ||
ecommVendor | ||
ORDER BY | ||
freq DESC |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
#standardSQL | ||
# 13_07: Distribution of HTML kilobytes per page | ||
SELECT | ||
_TABLE_SUFFIX AS client, | ||
percentile, | ||
APPROX_QUANTILES(bytesHtml, 1000)[OFFSET(percentile * 10)] / 1024 AS requests | ||
FROM | ||
`httparchive.summary_pages.2021_07_01_*` | ||
JOIN | ||
`httparchive.technologies.2021_07_01_*` | ||
USING (_TABLE_SUFFIX, url), | ||
UNNEST([10, 25, 50, 75, 90, 100]) AS percentile | ||
WHERE | ||
category = 'Ecommerce' AND | ||
app != 'Cart Functionality' AND | ||
app != 'Google Analytics Enhanced eCommerce' | ||
GROUP BY | ||
percentile, | ||
client | ||
ORDER BY | ||
percentile, | ||
client |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
#standardSQL | ||
# 13_06: Distribution of image stats for 2021 | ||
SELECT | ||
percentile, | ||
_TABLE_SUFFIX AS client, | ||
APPROX_QUANTILES(reqImg, 1000)[OFFSET(percentile * 10)] AS image_count, | ||
APPROX_QUANTILES(bytesImg, 1000)[OFFSET(percentile * 10)] / 1024 AS image_kbytes | ||
FROM | ||
`httparchive.summary_pages.2021_07_01_*` | ||
JOIN ( | ||
SELECT DISTINCT | ||
_TABLE_SUFFIX, | ||
url | ||
FROM `httparchive.technologies.2021_07_01_*` | ||
WHERE category = 'Ecommerce' AND | ||
(app != 'Cart Functionality' AND | ||
app != 'Google Analytics Enhanced eCommerce')) | ||
USING (_TABLE_SUFFIX, url), | ||
UNNEST([10, 25, 50, 75, 90, 100]) AS percentile | ||
GROUP BY | ||
percentile, | ||
client | ||
ORDER BY | ||
percentile, | ||
client |
Oops, something went wrong.