Tika Pipes has the ability to pull binary data from a variety of different sources.
Tika Pipes reads from these input sources using the Pipe Iterators library.
Each Tika Pipes fetcher is implemented as a pf4j plugin.
There are two Pf4j extensions for Fetchers: One for the Fetcher code, and one for the FetcherConfig.
Fetchers are Maven Java Jar modules that contain the following key files:
tika-pipes-fetchers/tika-fetcher-YOURFETCHER/src/main/assembly.xml
- maven assembly. Tells maven how to build the package.tika-pipes-fetchers/tika-fetcher-YOURFETCHER/src/main/java/org/apache/tika/pipes/fetchers/YOURFETCHER/config/YOURFETCHERFetcherConfig.java
- Custom configuration properties for the fetcher.tika-pipes-fetchers/tika-fetcher-YOURFETCHER/src/main/java/org/apache/tika/pipes/fetchers/YOURFETCHER/FETCHERFetcher.java
- The fetcher code.tika-pipes-fetchers/tika-fetcher-YOURFETCHER/src/main/java/org/apache/tika/pipes/fetchers/YOURFETCHER/FETCHERPlugin.java
- The pf4j plugin with start/stop event handler at the pf4j plugin level.tika-pipes-fetchers/tika-fetcher-YOURFETCHER/src/main/resources/plugin.properties
- pf4j plugin properties file (see https://pf4j.org/doc/plugins.html)
When packaged, they will be built to a .zip
file format.
Each Fetcher in Tika Pipes is a PF4J plugin.
Each Fetcher has a plugin.properties
file that describes the plugin. See more info here: https://pf4j.org/doc/plugins.html
But most importantly, each Plugin has an ID that is defined in this file:
plugin.id=microsoft-graph-fetcher
plugin.class=org.apache.tika.pipes.fetchers.microsoftgraph.MicrosoftGraphPlugin
When you refer to a fetcher in the Tika Pipes service, you refer to it with the plugin ID.
-
Copy one of the existing folders in tika-pipes-fetchers to
tika-pipes-fetchers/tika-fetcher-YOURFETCHER
that most closely matches your new Fetcher. -
Update
tika-pipes-fetchers/tika-fetcher-YOURFETCHER/pom.xml
-
Update groupId, artifactId, to match your project.
-
Update the Maven project dependencies:
- Remove the dependencies from the fetcher you copied from that you do not need.
- Add the dependency your project needs as you need them.
-
All the java classes for the FetcherConfig, fetcher and plugin need to be refactored to your fetcher's name.
-
Update
tika-pipes-fetchers/pom.xml
to include your new fetcher module in the<modules>
section. -
Add a CLI to
tika-pipes-cli/src/main/java/pipes