Rationalize exported crawlSpecs function behavior #1812

tidoust · 2025-04-14T09:27:41Z

The exported crawlSpecs function has always behaved differently depending on whether the first parameter it receives is an array of specs, or an object that sets crawl options. Some of these differences are more historical artefacts than anything else, did not make any sense, and could bite ;) (for example, in browser-specs, I cannot access the markdown summaries that were added by the previous release of Reffy, because they are only reported to the console...)

This makes a minimal amount of updates to rationalize the behavior of the function (which maps internally to crawlList and crawlSpecs). In particular:

crawlList now accepts getting specs shortnames, series shortnames or URLs as input, same format as crawlSpecs in short.
crawlList now also processes the summary option to add markdown crawl summaries to each spec, as done by crawlSpecs.
crawlSpecs now accepts a specific {return} value for the output option that makes it return the index of crawl results to the caller and not output anything to console and files (the function could only report to the console or to files, and there was no way for a caller to access crawl results). If post-processing modules need to run at the crawl level, their results are now reported in a post property.

Differences that remain are:

Given an array as first parameter, the function returns an array of results (and not an index of crawl results). That seems somewhat logical. Array in, array out. Object in, object out.
Given an array as first parameter, the function does not run post-processing modules that run at the crawl level. That's also somewhat logical as there would be no way to report the results in the returned array.
Given an array as first parameter, the function does not output anything to the console and files. That remains imperfect, but I can live with it for now. The {return} value (remains clunky but) goes in the other direction and forces the function to return an index of crawl results so that a caller may process further things.

Changes should be non-breaking.

The exported `crawlSpecs` function has always behaved differently depending on whether the first parameter it receives is an array of specs, or an object that sets crawl options. Some of these differences are more historical artefacts than anything else, did not make any sense, and could bite ;) (for example, in browser-specs, I cannot access the markdown summaries that were added by the previous release of Reffy, because they are only reported to the console...) This makes a minimal amount of updates to rationalize the behavior of the function (which maps internally to `crawlList` and `crawlSpecs`). In particular: - `crawlList` now accepts getting specs shortnames, series shortnames or URLs as input, same format as `crawlSpecs` in short. - `crawlList` now also processes the `summary` option to add markdown crawl summaries to each spec, as done by `crawlSpecs`. - `crawlSpecs` now accepts a specific `{return}` value for the `output` option that makes it return the index of crawl results to the caller and not output anything to console and files (the function could only report to the console or to files, and there was no way for a caller to access crawl results). If post-processing modules need to run at the `crawl` level, their results are now reported in a `post` property. Differences that remain are: 1. Given an array as first parameter, the function returns an array of results (and not an index of crawl results). That seems somewhat logical. Array in, array out. Object in, object out. 2. Given an array as first parameter, the function does not run post-processing modules that run at the `crawl` level. That's also somewhat logical as there would be no way to report the results in the returned array. 3. Given an array as first parameter, the function does not output anything to the console and files. That remains imperfect, but I can live with it for now. The `{return}` value (remains clunky but) goes in the other direction and forces the function to return an index of crawl results so that a caller may process further things. Changes should be non-breaking.

tidoust · 2025-04-14T09:33:49Z

Taking the liberty to merge to progress inclusion in browser-specs :)

New features: - Markdown report: improve dfns, fix links, sort summary (#1813) - Rationalize exported crawlSpecs function behavior (#1812) Dependencies bumped: - Bump rollup from 4.39.0 to 4.40.0 (#1814)

tidoust merged commit cc5ff45 into main Apr 14, 2025
1 check passed

tidoust deleted the rationalize-crawl-fn branch April 14, 2025 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rationalize exported crawlSpecs function behavior #1812

Rationalize exported crawlSpecs function behavior #1812

tidoust commented Apr 14, 2025

tidoust commented Apr 14, 2025

Rationalize exported crawlSpecs function behavior #1812

Rationalize exported crawlSpecs function behavior #1812

Conversation

tidoust commented Apr 14, 2025

tidoust commented Apr 14, 2025