Skip to content

Rationalize exported crawlSpecs function behavior #1812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 14, 2025
Merged

Conversation

tidoust
Copy link
Member

@tidoust tidoust commented Apr 14, 2025

The exported crawlSpecs function has always behaved differently depending on whether the first parameter it receives is an array of specs, or an object that sets crawl options. Some of these differences are more historical artefacts than anything else, did not make any sense, and could bite ;) (for example, in browser-specs, I cannot access the markdown summaries that were added by the previous release of Reffy, because they are only reported to the console...)

This makes a minimal amount of updates to rationalize the behavior of the function (which maps internally to crawlList and crawlSpecs). In particular:

  • crawlList now accepts getting specs shortnames, series shortnames or URLs as input, same format as crawlSpecs in short.
  • crawlList now also processes the summary option to add markdown crawl summaries to each spec, as done by crawlSpecs.
  • crawlSpecs now accepts a specific {return} value for the output option that makes it return the index of crawl results to the caller and not output anything to console and files (the function could only report to the console or to files, and there was no way for a caller to access crawl results). If post-processing modules need to run at the crawl level, their results are now reported in a post property.

Differences that remain are:

  1. Given an array as first parameter, the function returns an array of results (and not an index of crawl results). That seems somewhat logical. Array in, array out. Object in, object out.
  2. Given an array as first parameter, the function does not run post-processing modules that run at the crawl level. That's also somewhat logical as there would be no way to report the results in the returned array.
  3. Given an array as first parameter, the function does not output anything to the console and files. That remains imperfect, but I can live with it for now. The {return} value (remains clunky but) goes in the other direction and forces the function to return an index of crawl results so that a caller may process further things.

Changes should be non-breaking.

The exported `crawlSpecs` function has always behaved differently depending on
whether the first parameter it receives is an array of specs, or an object that
sets crawl options. Some of these differences are more historical artefacts
than anything else, did not make any sense, and could bite ;) (for example, in
browser-specs, I cannot access the markdown summaries that were added by the
previous release of Reffy, because they are only reported to the console...)

This makes a minimal amount of updates to rationalize the behavior of the
function (which maps internally to `crawlList` and `crawlSpecs`). In
particular:
- `crawlList` now accepts getting specs shortnames, series shortnames or URLs
as input, same format as `crawlSpecs` in short.
- `crawlList` now also processes the `summary` option to add markdown crawl
summaries to each spec, as done by `crawlSpecs`.
- `crawlSpecs` now accepts a specific `{return}` value for the `output` option
that makes it return the index of crawl results to the caller and not output
anything to console and files (the function could only report to the console or
to files, and there was no way for a caller to access crawl results). If
post-processing modules need to run at the `crawl` level, their results are now
reported in a `post` property.

Differences that remain are:
1. Given an array as first parameter, the function returns an array of results
(and not an index of crawl results). That seems somewhat logical. Array in,
array out. Object in, object out.
2. Given an array as first parameter, the function does not run post-processing
modules that run at the `crawl` level. That's also somewhat logical as there
would be no way to report the results in the returned array.
3. Given an array as first parameter, the function does not output anything to
the console and files. That remains imperfect, but I can live with it for now.
The `{return}` value (remains clunky but) goes in the other direction and
forces the function to return an index of crawl results so that a caller may
process further things.

Changes should be non-breaking.
@tidoust
Copy link
Member Author

tidoust commented Apr 14, 2025

Taking the liberty to merge to progress inclusion in browser-specs :)

@tidoust tidoust merged commit cc5ff45 into main Apr 14, 2025
1 check passed
@tidoust tidoust deleted the rationalize-crawl-fn branch April 14, 2025 09:33
tidoust added a commit that referenced this pull request Apr 14, 2025
New features:
- Markdown report: improve dfns, fix links, sort summary (#1813)
- Rationalize exported crawlSpecs function behavior (#1812)

Dependencies bumped:
- Bump rollup from 4.39.0 to 4.40.0 (#1814)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant