Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ecommerce 2021 queries #2300

Merged
merged 40 commits into from
Oct 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
f43e1e3
Recycle 2020 ecommerce queries
rrajiv Aug 8, 2021
ac12fcf
Merge branch 'main' into ecommerce-sql-2021
rrajiv Aug 16, 2021
a978e46
Fix linting errors
tunetheweb Aug 16, 2021
b11147e
Merge branch 'main' into ecommerce-sql-2021
tunetheweb Aug 16, 2021
e1d6c17
domains that are marked as payment processors but not ecommerce
rrajiv Aug 17, 2021
a2171ad
update all the new 2021 files to use 2021_07_01 or 2020_08_01 or 2019…
rrajiv Aug 17, 2021
7caf4ca
Merge branch 'main' into ecommerce-sql-2021
rrajiv Aug 29, 2021
237fa68
fixing the queries after running the updated linter
rrajiv Aug 29, 2021
9aae852
update the readme
rrajiv Aug 30, 2021
cf71929
Merge branch 'main' into ecommerce-sql-2021
rrajiv Sep 7, 2021
9a0d08d
add some new queries; update the readme
rrajiv Sep 22, 2021
f100633
Update all_categories.sql
rrajiv Sep 22, 2021
a32c2b0
Merge branch 'main' into ecommerce-sql-2021
rrajiv Sep 24, 2021
3f23036
add hreflang queries
rrajiv Sep 26, 2021
84c8be1
rename file
rrajiv Sep 27, 2021
48c2797
update the README
rrajiv Sep 27, 2021
aa198b8
add ecomm + csp query
rrajiv Sep 27, 2021
d9a0f73
add aug and sep versions of cmp query
rrajiv Oct 4, 2021
f2442d4
add app links query update
rrajiv Oct 4, 2021
5b44b8e
updated readme
rrajiv Oct 4, 2021
f9f3c16
Merge branch 'main' into ecommerce-sql-2021
rrajiv Oct 4, 2021
f16d750
Merge branch 'main' into ecommerce-sql-2021
rrajiv Oct 4, 2021
818aa99
update readme
rrajiv Oct 5, 2021
9f68f4a
update the top vendors file to use ranks
rrajiv Oct 5, 2021
e54c836
Merge branch 'main' into ecommerce-sql-2021
rrajiv Oct 5, 2021
881f25b
fixing the query to use well-known
rrajiv Oct 5, 2021
f268b36
Update README.md
rrajiv Oct 5, 2021
06c1ee7
remove unused/informational queries
rrajiv Oct 5, 2021
391a6a5
fix this to use production tables instead of sample data
rrajiv Oct 5, 2021
fd7a699
fix to not use the ROUND function
rrajiv Oct 5, 2021
a32dad8
update README
rrajiv Oct 5, 2021
a87cf62
fix to stop using ROUND
rrajiv Oct 5, 2021
2784e9c
fixing this query to use UNNEST and remove some redundancy
rrajiv Oct 5, 2021
06aaeb6
fix linter issue
rrajiv Oct 6, 2021
3ff953c
Merge branch 'main' into ecommerce-sql-2021
rrajiv Oct 6, 2021
6a42c00
fix query and fix linting again
rrajiv Oct 6, 2021
6549d77
Merge branch 'main' into ecommerce-sql-2021
rrajiv Oct 11, 2021
06ea010
fix the ecomm category matches
rrajiv Oct 12, 2021
03eef8c
Update sql/2021/ecommerce/core_web_vitals_passingmetrics_byvendor_byd…
rrajiv Oct 14, 2021
ee21e87
Merge branch 'main' into ecommerce-sql-2021
rrajiv Oct 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions sql/2021/ecommerce/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,61 @@

Analysts: if helpful, you can use this README to give additional info about the queries.
-->

Current list of queries used

* Ecommerce comparison 2020 to 2021. - pct_ecommsites_bydevice_compare20202021.sql
* Top ecommerce platforms. - top_vendors.sql, top_vendors_crux_rank.sql
* Enterprise ecommerce platforms (desktop) - top_vendors.sql
* Enterprise ecommerce platforms - 2019 desktop
* Enterprise ecommerce platforms - 2020 desktop
* Ecommerce platform growth Covid-19 impact - ecomm_vendors_covid_growth.sql
* Page requests distribution. - pagestats_percentiles_bydevice.sql
* Page weight distribution. - pagestats_percentiles_bydevice.sql
* Median page requests by type. - pagestats_percentile_bydevice_format.sql
* Median page kilobytes by type. - pagestats_percentile_bydevice_format.sql
* Distribution of HTML bytes per ecommerce page - pagestats_html_bydevice.sql
* Distribution of image requests for ecommerce - pagestats_image_bydevice.sql
* Distribution of image bytes for ecommerce - pagestats_percentiles_bydevice.sql
* Popular image formats on ecommerce sites - pagestats_image_bydevice_format.sql
* Distribution of third-party requests - pct_3pusage_bydevice.sql
* Distribution of third-party bytes - pct_3pusage_bydevice.sql
* Real-user Largest Contentful Paint experiences - core_web_vitals_distribution_byvendor_bydevice.sql
* Real-user First Input Delay experiences - core_web_vitals_distribution_byvendor_bydevice.sql
* Real-user Cumulative Layout Shift experiences - core_web_vitals_distribution_byvendor_bydevice.sql
* Real-user Core Web Vitals experiences - core_web_vitals_passingmetrics_byvendor_bydevice.sql
* Top analytics solutions on ecommerce sites - top_analytics_providers_bydevice_wapp.sql
* Tag manager usage on ecommerce sites. - percent_of_ecommsites_using_each_tag_managers.sql
* Consent Management Platform adoption - percent_of_ecommsites_using_cmp.sql, percent_of_ecommsites_using_cmp_aug21.sql, percent_of_ecommsites_using_cmp_sep21.sql
* AMP usage on ecommerce sites (mobile). - pct_ampusage_bydevice_vendor.sql
* Web Push Notification acceptance rates - webpushstats_ecommsites.sql
* Top "JavaScript frameworks" - top_jsframework_providers_by_device.sql
* Top "JavaScript libraries" category - top_jslibs_by_device.sql
* Top CMS technology category - top_cms_by_device.sql
* Top "Page Builders” technology category - top_pagebuilders_bydevice.sql
* Top “A/B testing” technology category. - top_abtesting_bydevice.sql
* Top “Personalisation” technology category - top_personalisation_bydevice.sql
* Top “Loyalty & Rewards” technology category - top_loyaltyandrewards_bydevice.sql
* Median lighthouse scores for ecommerce - median_lighthouse_score_ecommsites.sql
* Ecommerce sites using hreflang value through headers - percent_of_ecommsites_using_hreflang_value_headers.sql
* Ecommerce sites using hreflang value through link rel - percent_of_ecommsites_using_hreflang_value_link.sql
* Presence of `Content-Security-Policy` and `Content-Security-Policy-Report-Only` on `Ecommerce` sites - percent_of_ecommsites_csp.sql
* App links association - android_ios_app_links_ecomm_sites.sql
* Ecomm covid growth: 2020-2021 - ecomm_covid_growth.sql
* A11y usage on Ecomm sites - percent_of_ecommsites_using_a11y_solutions.sql
* Webpush adoption stats - webpush_adoption_by_ecommsites.sql

Unused queries

* percent_of_ecommsites_using_each_a11y_solutions.sql
* percent_of_ecommsites_using_each_cmp.sql
* percent_of_ecommsites_using_each_payment_processors.sql
* top_adplatform_bydevice_vendor.sql
* top_adplatform_bydevice_vendor_wapp.sql
* top_analytics_bydevice_vendor.sql
* top_cdn_bydevice.sql
* top_cdn_bydevice_vendor_cdn.sql
* top_cdn_bydevice_vendor_wapp.sql
* pct_3pusage_bydevice_vendor.sql
* pct_3pusage_bydevice_vendor_category.sql
* pagestats_image_dimensions_bydevice.sql
38 changes: 38 additions & 0 deletions sql/2021/ecommerce/android_ios_app_links_ecomm_sites.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#standardSQL
# This query uses custom metric '_well-known' - https://github.com/HTTPArchive/legacy.httparchive.org/blob/master/custom_metrics/well-known.js
# Note that in this query, there is a subtle bug where the site could have empty /.well-known/assetlinks.json or /.well-known/apple-app-site-association files which will lead to over counting sites with native app links
# an example is: https://www.allbirds.com/.well-known/assetlinks.json which has a payload of "[]"
# To fix this, this would require response body parsing on well-known.js

SELECT
client,
COUNTIF(android_app_links) AS android_app_links,
COUNTIF(ios_universal_links) AS ios_universal_links,
COUNT(0) AS total,
COUNTIF(android_app_links) / COUNT(0) AS pct_android_app_links,
COUNTIF(ios_universal_links) / COUNT(0) AS pct_ios_universal_links
FROM (
SELECT DISTINCT
_TABLE_SUFFIX AS client,
url
FROM
`httparchive.technologies.2021_07_01_*`
WHERE
category = 'Ecommerce' AND
(app != 'Cart Functionality' AND
app != 'Google Analytics Enhanced eCommerce'))
JOIN (
SELECT
_TABLE_SUFFIX AS client,
url,
JSON_VALUE(JSON_EXTRACT_SCALAR(payload, '$._well-known'), '$."/.well-known/assetlinks.json".found') = 'true' AS android_app_links,
JSON_VALUE(JSON_EXTRACT_SCALAR(payload, '$._well-known'), '$."/.well-known/apple-app-site-association".found') = 'true' AS ios_universal_links
FROM
`httparchive.pages.2021_07_01_*`)
USING
(client, url)
GROUP BY
client
ORDER BY
client

Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#standardSQL
# Core Web Vitals distribution by Ecommerce vendor
#
# Note that this is an unweighted average of all sites per Ecommerce vendor.
# Performance of sites with millions of visitors as weighted the same as small sites.
SELECT
client,
ecomm,
COUNT(DISTINCT origin) AS origins,
SUM(fast_lcp) / (SUM(fast_lcp) + SUM(avg_lcp) + SUM(slow_lcp)) AS good_lcp,
SUM(avg_lcp) / (SUM(fast_lcp) + SUM(avg_lcp) + SUM(slow_lcp)) AS ni_lcp,
SUM(slow_lcp) / (SUM(fast_lcp) + SUM(avg_lcp) + SUM(slow_lcp)) AS poor_lcp,
SUM(fast_fid) / (SUM(fast_fid) + SUM(avg_fid) + SUM(slow_fid)) AS good_fid,
SUM(avg_fid) / (SUM(fast_fid) + SUM(avg_fid) + SUM(slow_fid)) AS ni_fid,
SUM(slow_fid) / (SUM(fast_fid) + SUM(avg_fid) + SUM(slow_fid)) AS poor_fid,
SUM(small_cls) / (SUM(small_cls) + SUM(medium_cls) + SUM(large_cls)) AS good_cls,
SUM(medium_cls) / (SUM(small_cls) + SUM(medium_cls) + SUM(large_cls)) AS ni_cls,
SUM(large_cls) / (SUM(small_cls) + SUM(medium_cls) + SUM(large_cls)) AS poor_cls
FROM
`chrome-ux-report.materialized.device_summary`
JOIN (
SELECT DISTINCT
_TABLE_SUFFIX AS client,
url,
app AS ecomm
FROM
`httparchive.technologies.2021_07_01_*`
WHERE
category = 'Ecommerce' AND
(app != 'Cart Functionality' AND
app != 'Google Analytics Enhanced eCommerce'))
ON
CONCAT(origin, '/') = url AND
IF(device = 'desktop', 'desktop', 'mobile') = client
WHERE
date = '2021-07-01'
GROUP BY
client,
ecomm
ORDER BY
origins DESC
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#standardSQL
# CrUX Core Web Vitals performance of Ecommerce vendors by device
CREATE TEMP FUNCTION IS_GOOD (good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS (
good / (good + needs_improvement + poor) >= 0.75
);

CREATE TEMP FUNCTION IS_NON_ZERO (good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS (
good + needs_improvement + poor > 0
);


SELECT
client,
ecomm,
COUNT(DISTINCT origin) AS origins,
# Origins with good LCP divided by origins with any LCP.
SAFE_DIVIDE(
COUNT(DISTINCT IF(IS_GOOD(fast_lcp, avg_lcp, slow_lcp), origin, NULL)),
COUNT(DISTINCT IF(IS_NON_ZERO(fast_lcp, avg_lcp, slow_lcp), origin, NULL))) AS pct_good_lcp,

# Origins with good FID divided by origins with any FID.
SAFE_DIVIDE(
COUNT(DISTINCT IF(IS_GOOD(fast_fid, avg_fid, slow_fid), origin, NULL)),
COUNT(DISTINCT IF(IS_NON_ZERO(fast_fid, avg_fid, slow_fid), origin, NULL))) AS pct_good_fid,

# Origins with good CLS divided by origins with any CLS.
SAFE_DIVIDE(
COUNT(DISTINCT IF(IS_GOOD(small_cls, medium_cls, large_cls), origin, NULL)),
COUNT(DISTINCT IF(IS_NON_ZERO(small_cls, medium_cls, large_cls), origin, NULL))) AS pct_good_cls,

# Origins with good LCP, FID, and CLS dividied by origins with any LCP, FID, and CLS.
SAFE_DIVIDE(
COUNT(DISTINCT IF(
IS_GOOD(fast_lcp, avg_lcp, slow_lcp) AND
(NOT IS_NON_ZERO(fast_fid, avg_fid, slow_fid) OR IS_GOOD(fast_fid, avg_fid, slow_fid)) AND
IS_GOOD(small_cls, medium_cls, large_cls), origin, NULL)),
COUNT(DISTINCT IF(
IS_NON_ZERO(fast_lcp, avg_lcp, slow_lcp) AND
IS_NON_ZERO(small_cls, medium_cls, large_cls), origin, NULL))) AS pct_good_cwv
FROM
`chrome-ux-report.materialized.device_summary`
JOIN (
SELECT
_TABLE_SUFFIX AS client,
url,
app AS ecomm
FROM
`httparchive.technologies.2021_07_01_*`
WHERE
category = 'Ecommerce' AND
(app != 'Cart Functionality' AND
app != 'Google Analytics Enhanced eCommerce'))
ON
CONCAT(origin, '/') = url AND
IF(device = 'desktop', 'desktop', 'mobile') = client
WHERE
date = '2021-07-01'
GROUP BY
client,
ecomm
ORDER BY
origins DESC

60 changes: 60 additions & 0 deletions sql/2021/ecommerce/ecomm_covid_growth.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
#standardSQL
# 13_03: Timeseries to show eCommerce growth acceleration due to Covid-19
# Excluding apps which are not eCommerce platforms/vendors themselves but are used to identify eCommerce sites. These are signals added in Wappalyzer in 2020 to get better idea on % of eCommerce sites but these are not relevant for vendor % market share analysis
SELECT
IF(ENDS_WITH(_TABLE_SUFFIX, '_desktop'), 'desktop', 'mobile') AS client,
COUNT(DISTINCT url) AS freq,
total,
COUNT(DISTINCT url) / total AS pct,
2021 AS year,
LEFT(_TABLE_SUFFIX, 2) AS month
FROM
`httparchive.technologies.2021_*`
JOIN
(SELECT
_TABLE_SUFFIX,
COUNT(DISTINCT url) AS total
FROM
`httparchive.summary_pages.2021_*`
GROUP BY
_TABLE_SUFFIX)
USING (_TABLE_SUFFIX)
WHERE
category = 'Ecommerce'
GROUP BY
client,
year,
month,
total

UNION ALL

SELECT
IF(ENDS_WITH(_TABLE_SUFFIX, '_desktop'), 'desktop', 'mobile') AS client,
COUNT(DISTINCT url) AS freq,
total,
COUNT(DISTINCT url) / total AS pct,
2020 AS year,
LEFT(_TABLE_SUFFIX, 2) AS month
FROM
`httparchive.technologies.2020_*`
JOIN
(SELECT
_TABLE_SUFFIX,
COUNT(DISTINCT url) AS total
FROM
`httparchive.summary_pages.2020_*`
GROUP BY
_TABLE_SUFFIX)
USING (_TABLE_SUFFIX)
WHERE
category = 'Ecommerce'
GROUP BY
client,
year,
month,
total

ORDER BY
year DESC,
month DESC
38 changes: 38 additions & 0 deletions sql/2021/ecommerce/ecomm_vendors_covid_growth.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#standardSQL
# 13_04: Timeseries to show eCommerce vendors growth acceleration due to Covid-19
# Excluding apps which are not eCommerce platforms/vendors themselves but are used to identify eCommerce sites. These are signals added in Wappalyzer in 2020 to get better idea on % of eCommerce sites but these are not relevant for vendor % market share analysis
# Limiting to top 5000 records to continue further analysis in Google Sheets. Using HAVING clauses based on 'pct' results in missing data for certain months
SELECT
IF(ENDS_WITH(_TABLE_SUFFIX, '_desktop'), 'desktop', 'mobile') AS client,
app,
COUNT(DISTINCT url) AS freq,
total,
COUNT(DISTINCT url) / total AS pct,
LEFT(_TABLE_SUFFIX, 4) AS year,
SUBSTR(_TABLE_SUFFIX, 6, 2) AS month
FROM
`httparchive.technologies.*`
JOIN
(SELECT
_TABLE_SUFFIX,
COUNT(DISTINCT url) AS total
FROM
`httparchive.summary_pages.*`
GROUP BY
_TABLE_SUFFIX)
USING (_TABLE_SUFFIX)
WHERE
category = 'Ecommerce' AND
(app != 'Cart Functionality' AND
app != 'Google Analytics Enhanced eCommerce')
GROUP BY
client,
app,
year,
month,
total
ORDER BY
pct DESC,
client DESC,
app DESC
LIMIT 5000
24 changes: 24 additions & 0 deletions sql/2021/ecommerce/median_lighthouse_score_ecommsites.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#standardSQL
# 13_20: Lighthouse category scores per eCommerce plaforms. Web Almanac run LightHouse only in mobile mode and hence references to mobile tables
SELECT
app AS ecommVendor,
COUNT(DISTINCT url) AS freq,
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.performance.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_performance,
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.accessibility.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_accessibility,
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.pwa.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_pwa,
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.seo.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_seo,
APPROX_QUANTILES(CAST(JSON_EXTRACT_SCALAR(report, '$.categories.best-practices.score') AS NUMERIC), 1000)[OFFSET(500)] AS median_best_practices
FROM
`httparchive.lighthouse.2021_07_01_mobile`
JOIN
`httparchive.technologies.2021_07_01_mobile`
USING
(url)
WHERE
category = 'Ecommerce' AND
(app != 'Cart Functionality' AND
app != 'Google Analytics Enhanced eCommerce')
GROUP BY
ecommVendor
ORDER BY
freq DESC
22 changes: 22 additions & 0 deletions sql/2021/ecommerce/pagestats_html_bydevice.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#standardSQL
# 13_07: Distribution of HTML kilobytes per page
SELECT
_TABLE_SUFFIX AS client,
percentile,
APPROX_QUANTILES(bytesHtml, 1000)[OFFSET(percentile * 10)] / 1024 AS requests
FROM
`httparchive.summary_pages.2021_07_01_*`
JOIN
`httparchive.technologies.2021_07_01_*`
USING (_TABLE_SUFFIX, url),
UNNEST([10, 25, 50, 75, 90, 100]) AS percentile
WHERE
category = 'Ecommerce' AND
app != 'Cart Functionality' AND
app != 'Google Analytics Enhanced eCommerce'
GROUP BY
percentile,
client
ORDER BY
percentile,
client
25 changes: 25 additions & 0 deletions sql/2021/ecommerce/pagestats_image_bydevice.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#standardSQL
# 13_06: Distribution of image stats for 2021
SELECT
percentile,
_TABLE_SUFFIX AS client,
APPROX_QUANTILES(reqImg, 1000)[OFFSET(percentile * 10)] AS image_count,
APPROX_QUANTILES(bytesImg, 1000)[OFFSET(percentile * 10)] / 1024 AS image_kbytes
FROM
`httparchive.summary_pages.2021_07_01_*`
JOIN (
SELECT DISTINCT
_TABLE_SUFFIX,
url
FROM `httparchive.technologies.2021_07_01_*`
WHERE category = 'Ecommerce' AND
(app != 'Cart Functionality' AND
app != 'Google Analytics Enhanced eCommerce'))
USING (_TABLE_SUFFIX, url),
UNNEST([10, 25, 50, 75, 90, 100]) AS percentile
GROUP BY
percentile,
client
ORDER BY
percentile,
client
Loading