This folder contains Groovy sources for the Stencils: Public UI hook scripts, custom filters and workflow scripts.
- The
main
folder contains the sources - The
test
folder contains unit tests for the various features
Please consider writing unit tests for every new implemented feature.
To make use of these hook scripts, the collection hook scripts need to reference a global Stencil hook that will trigger the relevant code. This allows making use of the Stencil hooks without having to copy the code into every collection.
To call the global Stencil hook, use this one-liner inside a collection hook_*.groovy
files:
new com.funnelback.stencils.hook.StencilHooks().apply(transaction, binding.hasVariable("hook") ? hook : null)
The bind.hasVariable()
part is for compatibility with older versions of Funnelback which didn't pass a hook
variable indicating which hook is currently running (pre/post datafetch, pre/post process, etc.).
Once added to the hook scripts, edit collection.cfg
to indicate which hook should run for the collection via the stencils
parameter:
stencils=facebook,facets
See the individual hook documentation below to find out which ones need to be manually enabled.
Hook scripts are provided to support the various Stencils:
To make use of the filters, modify the collection configuration, depending of the type of filter:
filter.classes
for regular filters (Groovy classextends ScriptFilterProvider
)filter.jsoup.classes
for Jsoup filters (Groovy classimplements IJSoupFilter
)
Use the full class name when specifying it in the parameter, e.g.:
filter.classes=...:com.funnelback.stencils.filter.scraper.MetadataScraperFilter
- Metadata scraper: Scrape metadata from web pages with CSS Selectors.
⚠️ Note that this filter is available in Funnelback as standard since v15.8, using the native product filter is preferred. - Title prefix / suffix remover: Remove SEO prefixes and suffixes from titles (e.g. "Apply to FBU | Funnelback University")
- XML element HTML wrapper: Wrap specific XML tag in
<html>...</html>
tags for PADRE to index them as inner documents - Social media Date filter: Filter out social media posts by date
- HTML document date filter: Filter out HTML documents by date
- Content-Length extraction: Inject the correct file size as metadata for larger documents
To use the Stencil workflow scripts, call the relevant script from the collection workflow commands, e.g.:
post_index_command=$GROOVY_COMMAND $SEARCH_HOME/share/stencils/src/main/groovy/com/funnelback/stencils/workflow/.../myWorkflow.goovy -arg1 X -arg2 Y
- CSV Autocompletion Workflow: Generic workflow to generate CSV auto-completion for profiles
- Instagram Gathering: Helper class to gather Instagram content
General web utility classes are provided. Please consult the documentation of each utility class.
- X-Forwarded-For altering servlet filter: Remove the first or last value from the X-Forwarded-For header