Store KIT IPD files in folders according to HTML heading structure #99
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Until now, the
kit-ipd
crawlers have stored files that are presented in an HTML table together in a folder. The folder is named after the HTML heading that precedes the table. This is specifically tailored to sites such as this.This PR introduces a more general approach that can construct a nested folder structure. The folders' names are obtained by the HTML structure of (nested) headings without relying on the presence of table elements.
For the example website above, the following file tree is created:

This might be considered a breaking chance as it is incompatible with the file trees produced by previous runs. (?)