Skip to content

Commit

Permalink
Ecommerce 2021 queries (#2300)
Browse files Browse the repository at this point in the history
* Recycle 2020 ecommerce queries

Updating the 2020 queries to make it ready for 2021

* Fix linting errors

* domains that are marked as payment processors but not ecommerce

* update all the new 2021 files to use 2021_07_01 or 2020_08_01 or 2019_07_01 where needed

* fixing the queries after running the updated linter

* update the readme

* add some new queries; update the readme

* Update all_categories.sql

fix a typo changing from `app` to `category`

* add hreflang queries

* rename file

* update the README

* add ecomm + csp query

* add aug and sep versions of cmp query

* add app links query update

* updated readme

* update readme

* update the top vendors file to use ranks

* fixing the query to use well-known

* Update README.md

Updating list of queries used, lists created for testing and unused items

* remove unused/informational queries

* fix this to use production tables instead of sample data

* fix to not use the ROUND function

* update README

* fix to stop using ROUND

* fixing this query to use UNNEST and remove some redundancy

* fix linter issue

* fix query and fix linting again

* fix the ecomm category matches

* Update sql/2021/ecommerce/core_web_vitals_passingmetrics_byvendor_bydevice.sql

Updating the query as its possible FID could be null

Co-authored-by: Rick Viscomi <[email protected]>

Co-authored-by: Barry <[email protected]>
Co-authored-by: Rick Viscomi <[email protected]>
  • Loading branch information
3 people authored Oct 15, 2021
1 parent 191dece commit c429654
Show file tree
Hide file tree
Showing 47 changed files with 2,017 additions and 0 deletions.
58 changes: 58 additions & 0 deletions sql/2021/ecommerce/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,61 @@
Analysts: if helpful, you can use this README to give additional info about the queries.
-->

Current list of queries used

* Ecommerce comparison 2020 to 2021. - pct_ecommsites_bydevice_compare20202021.sql
* Top ecommerce platforms. - top_vendors.sql, top_vendors_crux_rank.sql
* Enterprise ecommerce platforms (desktop) - top_vendors.sql
* Enterprise ecommerce platforms - 2019 desktop
* Enterprise ecommerce platforms - 2020 desktop
* Ecommerce platform growth Covid-19 impact - ecomm_vendors_covid_growth.sql
* Page requests distribution. - pagestats_percentiles_bydevice.sql
* Page weight distribution. - pagestats_percentiles_bydevice.sql
* Median page requests by type. - pagestats_percentile_bydevice_format.sql
* Median page kilobytes by type. - pagestats_percentile_bydevice_format.sql
* Distribution of HTML bytes per ecommerce page - pagestats_html_bydevice.sql
* Distribution of image requests for ecommerce - pagestats_image_bydevice.sql
* Distribution of image bytes for ecommerce - pagestats_percentiles_bydevice.sql
* Popular image formats on ecommerce sites - pagestats_image_bydevice_format.sql
* Distribution of third-party requests - pct_3pusage_bydevice.sql
* Distribution of third-party bytes - pct_3pusage_bydevice.sql
* Real-user Largest Contentful Paint experiences - core_web_vitals_distribution_byvendor_bydevice.sql
* Real-user First Input Delay experiences - core_web_vitals_distribution_byvendor_bydevice.sql
* Real-user Cumulative Layout Shift experiences - core_web_vitals_distribution_byvendor_bydevice.sql
* Real-user Core Web Vitals experiences - core_web_vitals_passingmetrics_byvendor_bydevice.sql
* Top analytics solutions on ecommerce sites - top_analytics_providers_bydevice_wapp.sql
* Tag manager usage on ecommerce sites. - percent_of_ecommsites_using_each_tag_managers.sql
* Consent Management Platform adoption - percent_of_ecommsites_using_cmp.sql, percent_of_ecommsites_using_cmp_aug21.sql, percent_of_ecommsites_using_cmp_sep21.sql
* AMP usage on ecommerce sites (mobile). - pct_ampusage_bydevice_vendor.sql
* Web Push Notification acceptance rates - webpushstats_ecommsites.sql
* Top "JavaScript frameworks" - top_jsframework_providers_by_device.sql
* Top "JavaScript libraries" category - top_jslibs_by_device.sql
* Top CMS technology category - top_cms_by_device.sql
* Top "Page Builders” technology category - top_pagebuilders_bydevice.sql
* Top “A/B testing” technology category. - top_abtesting_bydevice.sql
* Top “Personalisation” technology category - top_personalisation_bydevice.sql
* Top “Loyalty & Rewards” technology category - top_loyaltyandrewards_bydevice.sql
* Median lighthouse scores for ecommerce - median_lighthouse_score_ecommsites.sql
* Ecommerce sites using hreflang value through headers - percent_of_ecommsites_using_hreflang_value_headers.sql
* Ecommerce sites using hreflang value through link rel - percent_of_ecommsites_using_hreflang_value_link.sql
* Presence of `Content-Security-Policy` and `Content-Security-Policy-Report-Only` on `Ecommerce` sites - percent_of_ecommsites_csp.sql
* App links association - android_ios_app_links_ecomm_sites.sql
* Ecomm covid growth: 2020-2021 - ecomm_covid_growth.sql
* A11y usage on Ecomm sites - percent_of_ecommsites_using_a11y_solutions.sql
* Webpush adoption stats - webpush_adoption_by_ecommsites.sql

Unused queries

* percent_of_ecommsites_using_each_a11y_solutions.sql
* percent_of_ecommsites_using_each_cmp.sql
* percent_of_ecommsites_using_each_payment_processors.sql
* top_adplatform_bydevice_vendor.sql
* top_adplatform_bydevice_vendor_wapp.sql
* top_analytics_bydevice_vendor.sql
* top_cdn_bydevice.sql
* top_cdn_bydevice_vendor_cdn.sql
* top_cdn_bydevice_vendor_wapp.sql
* pct_3pusage_bydevice_vendor.sql
* pct_3pusage_bydevice_vendor_category.sql
* pagestats_image_dimensions_bydevice.sql
38 changes: 38 additions & 0 deletions sql/2021/ecommerce/android_ios_app_links_ecomm_sites.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#standardSQL
# This query uses custom metric '_well-known' - https://github.com/HTTPArchive/legacy.httparchive.org/blob/master/custom_metrics/well-known.js
# Note that in this query, there is a subtle bug where the site could have empty /.well-known/assetlinks.json or /.well-known/apple-app-site-association files which will lead to over counting sites with native app links
# an example is: https://www.allbirds.com/.well-known/assetlinks.json which has a payload of "[]"
# To fix this, this would require response body parsing on well-known.js

SELECT
client,
COUNTIF(android_app_links) AS android_app_links,
COUNTIF(ios_universal_links) AS ios_universal_links,
COUNT(0) AS total,
COUNTIF(android_app_links) / COUNT(0) AS pct_android_app_links,
COUNTIF(ios_universal_links) / COUNT(0) AS pct_ios_universal_links
FROM (
SELECT DISTINCT
_TABLE_SUFFIX AS client,
url
FROM
`httparchive.technologies.2021_07_01_*`
WHERE
category = 'Ecommerce' AND
(app != 'Cart Functionality' AND
app != 'Google Analytics Enhanced eCommerce'))
JOIN (
SELECT
_TABLE_SUFFIX AS client,
url,
JSON_VALUE(JSON_EXTRACT_SCALAR(payload, '$._well-known'), '$."/.well-known/assetlinks.json".found') = 'true' AS android_app_links,
JSON_VALUE(JSON_EXTRACT_SCALAR(payload, '$._well-known'), '$."/.well-known/apple-app-site-association".found') = 'true' AS ios_universal_links
FROM
`httparchive.pages.2021_07_01_*`)
USING
(client, url)
GROUP BY
client
ORDER BY
client

Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#standardSQL
# Core Web Vitals distribution by Ecommerce vendor
#
# Note that this is an unweighted average of all sites per Ecommerce vendor.
# Performance of sites with millions of visitors as weighted the same as small sites.
SELECT
client,
ecomm,
COUNT(DISTINCT origin) AS origins,
SUM(fast_lcp) / (SUM(fast_lcp) + SUM(avg_lcp) + SUM(slow_lcp)) AS good_lcp,
SUM(avg_lcp) / (SUM(fast_lcp) + SUM(avg_lcp) + SUM(slow_lcp)) AS ni_lcp,
SUM(slow_lcp) / (SUM(fast_lcp) + SUM(avg_lcp) + SUM(slow_lcp)) AS poor_lcp,
SUM(fast_fid) / (SUM(fast_fid) + SUM(avg_fid) + SUM(slow_fid)) AS good_fid,
SUM(avg_fid) / (SUM(fast_fid) + SUM(avg_fid) + SUM(slow_fid)) AS ni_fid,
SUM(slow_fid) / (SUM(fast_fid) + SUM(avg_fid) + SUM(slow_fid)) AS poor_fid,
SUM(small_cls) / (SUM(small_cls) + SUM(medium_cls) + SUM(large_cls)) AS good_cls,
SUM(medium_cls) / (SUM(small_cls) + SUM(medium_cls) + SUM(large_cls)) AS ni_cls,
SUM(large_cls) / (SUM(small_cls) + SUM(medium_cls) + SUM(large_cls)) AS poor_cls
FROM
`chrome-ux-report.materialized.device_summary`
JOIN (
SELECT DISTINCT
_TABLE_SUFFIX AS client,
url,
app AS ecomm
FROM
`httparchive.technologies.2021_07_01_*`
WHERE
category = 'Ecommerce' AND
(app != 'Cart Functionality' AND
app != 'Google Analytics Enhanced eCommerce'))
ON
CONCAT(origin, '/') = url AND
IF(device = 'desktop', 'desktop', 'mobile') = client
WHERE
date = '2021-07-01'
GROUP BY
client,
ecomm
ORDER BY
origins DESC
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#standardSQL
# CrUX Core Web Vitals performance of Ecommerce vendors by device
CREATE TEMP FUNCTION IS_GOOD (good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS (
good / (good + needs_improvement + poor) >= 0.75
);

CREATE TEMP FUNCTION IS_NON_ZERO (good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS (
good + needs_improvement + poor > 0
);


SELECT
client,
ecomm,
COUNT(DISTINCT origin) AS origins,
# Origins with good LCP divided by origins with any LCP.
SAFE_DIVIDE(
COUNT(DISTINCT IF(IS_GOOD(fast_lcp, avg_lcp, slow_lcp), origin, NULL)),
COUNT(DISTINCT IF(IS_NON_ZERO(fast_lcp, avg_lcp, slow_lcp), origin, NULL))) AS pct_good_lcp,

# Origins with good FID divided by origins with any FID.
SAFE_DIVIDE(
COUNT(DISTINCT IF(IS_GOOD(fast_fid, avg_fid, slow_fid), origin, NULL)),
COUNT(DISTINCT IF(IS_NON_ZERO(fast_fid, avg_fid, slow_fid), origin, NULL))) AS pct_good_fid,

# Origins with good CLS divided by origins with any CLS.
SAFE_DIVIDE(
COUNT(DISTINCT IF(IS_GOOD(small_cls, medium_cls, large_cls), origin, NULL)),
COUNT(DISTINCT IF(IS_NON_ZERO(small_cls, medium_cls, large_cls), origin, NULL))) AS pct_good_cls,

# Origins with good LCP, FID, and CLS dividied by origins with any LCP, FID, and CLS.
SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_GOOD(fast_lcp, avg_lcp, slow_lcp) AND
(NOT IS_NON_ZERO(fast_fid, avg_fid, slow_fid) OR IS_GOOD(fast_fid, avg_fid, slow_fid)) AND
IS_GOOD(small_cls, medium_cls, large_cls), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_lcp, avg_lcp, slow_lcp) AND
IS_NON_ZERO(small_cls, medium_cls, large_cls), origin, NULL))) AS pct_good_cwv
FROM
`chrome-ux-report.materialized.device_summary`
JOIN (
SELECT
_TABLE_SUFFIX AS client,
url,
app AS ecomm
FROM
`httparchive.technologies.2021_07_01_*`
WHERE
category = 'Ecommerce' AND
(app != 'Cart Functionality' AND
app != 'Google Analytics Enhanced eCommerce'))
ON
CONCAT(origin, '/') = url AND
IF(device = 'desktop', 'desktop', 'mobile') = client
WHERE
date = '2021-07-01'
GROUP BY
client,
ecomm
ORDER BY
origins DESC

60 changes: 60 additions & 0 deletions sql/2021/ecommerce/ecomm_covid_growth.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
#standardSQL
# 13_03: Timeseries to show eCommerce growth acceleration due to Covid-19
# Excluding apps which are not eCommerce platforms/vendors themselves but are used to identify eCommerce sites. These are signals added in Wappalyzer in 2020 to get better idea on % of eCommerce sites but these are not relevant for vendor % market share analysis
SELECT
IF(ENDS_WITH(_TABLE_SUFFIX, '_desktop'), 'desktop', 'mobile') AS client,
COUNT(DISTINCT url) AS freq,
total,
COUNT(DISTINCT url) / total AS pct,
2021 AS year,
LEFT(_TABLE_SUFFIX, 2) AS month
FROM
`httparchive.technologies.2021_*`
JOIN
(SELECT
_TABLE_SUFFIX,
COUNT(DISTINCT url) AS total
FROM
`httparchive.summary_pages.2021_*`
GROUP BY
_TABLE_SUFFIX)
USING (_TABLE_SUFFIX)
WHERE
category = 'Ecommerce'
GROUP BY
client,
year,
month,
total

UNION ALL

SELECT
IF(ENDS_WITH(_TABLE_SUFFIX, '_desktop'), 'desktop', 'mobile') AS client,
COUNT(DISTINCT url) AS freq,
total,
COUNT(DISTINCT url) / total AS pct,
2020 AS year,
LEFT(_TABLE_SUFFIX, 2) AS month
FROM
`httparchive.technologies.2020_*`
JOIN
(SELECT
_TABLE_SUFFIX,
COUNT(DISTINCT url) AS total
FROM
`httparchive.summary_pages.2020_*`
GROUP BY
_TABLE_SUFFIX)
USING (_TABLE_SUFFIX)
WHERE
category = 'Ecommerce'
GROUP BY
client,
year,
month,
total

ORDER BY
year DESC,
month DESC
38 changes: 38 additions & 0 deletions sql/2021/ecommerce/ecomm_vendors_covid_growth.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#standardSQL
# 13_04: Timeseries to show eCommerce vendors growth acceleration due to Covid-19
# Excluding apps which are not eCommerce platforms/vendors themselves but are used to identify eCommerce sites. These are signals added in Wappalyzer in 2020 to get better idea on % of eCommerce sites but these are not relevant for vendor % market share analysis
# Limiting to top 5000 records to continue further analysis in Google Sheets. Using HAVING clauses based on 'pct' results in missing data for certain months
SELECT
IF(ENDS_WITH(_TABLE_SUFFIX, '_desktop'), 'desktop', 'mobile') AS client,
app,
COUNT(DISTINCT url) AS freq,
total,
COUNT(DISTINCT url) / total AS pct,
LEFT(_TABLE_SUFFIX, 4) AS year,
SUBSTR(_TABLE_SUFFIX, 6, 2) AS month
FROM
`httparchive.technologies.*`
JOIN
(SELECT
_TABLE_SUFFIX,
COUNT(DISTINCT url) AS total
FROM
`httparchive.summary_pages.*`
GROUP BY
_TABLE_SUFFIX)
USING (_TABLE_SUFFIX)
WHERE
category = 'Ecommerce' AND
(app != 'Cart Functionality' AND
app != 'Google Analytics Enhanced eCommerce')
GROUP BY
client,
app,
year,
month,
total
ORDER BY
pct DESC,
client DESC,
app DESC
LIMIT 5000
24 changes: 24 additions & 0 deletions sql/2021/ecommerce/median_lighthouse_score_ecommsites.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#standardSQL
# 13_20: Lighthouse category scores per eCommerce plaforms. Web Almanac run LightHouse only in mobile mode and hence references to mobile tables
SELECT
app AS ecommVendor,
COUNT(DISTINCT url) AS freq,
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.performance.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_performance,
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.accessibility.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_accessibility,
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.pwa.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_pwa,
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.seo.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_seo,
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.best-practices.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_best_practices
FROM
`httparchive.lighthouse.2021_07_01_mobile`
JOIN
`httparchive.technologies.2021_07_01_mobile`
USING
(url)
WHERE
category = 'Ecommerce' AND
(app != 'Cart Functionality' AND
app != 'Google Analytics Enhanced eCommerce')
GROUP BY
ecommVendor
ORDER BY
freq DESC
22 changes: 22 additions & 0 deletions sql/2021/ecommerce/pagestats_html_bydevice.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#standardSQL
# 13_07: Distribution of HTML kilobytes per page
SELECT
_TABLE_SUFFIX AS client,
percentile,
APPROX_QUANTILES(bytesHtml, 1000)[OFFSET(percentile * 10)] / 1024 AS requests
FROM
`httparchive.summary_pages.2021_07_01_*`
JOIN
`httparchive.technologies.2021_07_01_*`
USING (_TABLE_SUFFIX, url),
UNNEST([10, 25, 50, 75, 90, 100]) AS percentile
WHERE
category = 'Ecommerce' AND
app != 'Cart Functionality' AND
app != 'Google Analytics Enhanced eCommerce'
GROUP BY
percentile,
client
ORDER BY
percentile,
client
25 changes: 25 additions & 0 deletions sql/2021/ecommerce/pagestats_image_bydevice.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#standardSQL
# 13_06: Distribution of image stats for 2021
SELECT
percentile,
_TABLE_SUFFIX AS client,
APPROX_QUANTILES(reqImg, 1000)[OFFSET(percentile * 10)] AS image_count,
APPROX_QUANTILES(bytesImg, 1000)[OFFSET(percentile * 10)] / 1024 AS image_kbytes
FROM
`httparchive.summary_pages.2021_07_01_*`
JOIN (
SELECT DISTINCT
_TABLE_SUFFIX,
url
FROM `httparchive.technologies.2021_07_01_*`
WHERE category = 'Ecommerce' AND
(app != 'Cart Functionality' AND
app != 'Google Analytics Enhanced eCommerce'))
USING (_TABLE_SUFFIX, url),
UNNEST([10, 25, 50, 75, 90, 100]) AS percentile
GROUP BY
percentile,
client
ORDER BY
percentile,
client
Loading

0 comments on commit c429654

Please sign in to comment.