Skip to content
This repository was archived by the owner on Sep 16, 2024. It is now read-only.

DMSDK Jobs

rjrudin edited this page Dec 26, 2017 · 2 revisions

Version 3.3.0 of ml-javaclient-util introduces a new job concept for simplifying how DataMovement jobs are instantiated, configured, and executed. The focus is currently on jobs that use a QueryBatcher as opposed to a WriteBatcher; supporting the latter is certainly feasible, just not yet a priority.

Jobs that use a QueryBatcher implement the QueryBatcherJob interface, which is very simple:

QueryBatcherJobTicket run(DatabaseClient client);

Job usage should generally look like this:

DatabaseClient client = ... // create this any way you'd like
new AddCollectionsJob("blue", "green")
  .setWhereCollections("red")
  .run(client);

The intent is that an implementation of QueryBatcherJob can be instantiated with its required arguments, and then (if needed) a "setWhere*" method can be used to specify the URIs to select. Then, call "run(client)" to run the job. The Job classes will typically reuse the Listener implementations that were originally added in version 3.0.0, with new ones being added in subsequent releases.

Each QueryBatcherJob implementation is likely to extend AbstractQueryBatcherJob, which provides access to a number of methods for configuring the job:

new AddCollectionsJob("blue", "green")
  .setAwaitCompletion(true)
  .setConsistentSnapshot(false)
  .setStopJobAfterCompletion(true)
  .setJobName("my-job")
  .setBatchSize(500) // defaults to 100
  .setThreadCount(32) // defaults to 8
  .setForestConfig(new ForestConfiguration(...))
  .run(client);

Several "setWhere*" methods are available as well to select URIs, though only one should be used at a time:

new AddCollectionsJob("blue", "green")
  .setWhereUris("doc1", "doc2")
  .setWhereCollections("coll1", "coll2")
  .setWhereUriPattern("/test/*.xml")
  .setWhereUrisQuery("cts:element-value-query(xs:QName('hello'), 'world')");

Version 3.3.0 includes the following jobs:

  1. AddCollectionsJob
  2. AddPermissionsJob
  3. DeleteCollectionsJob
  4. ExportBatchesToDirectoryJob
  5. ExportBatchesToZipsJob
  6. ExportToFileJob
  7. ExportToZipJob
  8. RemoveCollectionsJob
  9. RemovePermissionsJob
  10. SetCollectionsJob
  11. SetPermissionsJob
  12. SimpleExportJob (allows for using any Consumer with DMSDK's ExportListener)
  13. SimpleQueryBatcherJob (allows for using any QueryBatchListener)

And of course you can create your own class, which is likely to extend AbstractQueryBatcherJob.

Configuring a job via Properties

To simplify using a job in a context like Gradle, a job can implement the ConfigurableJob interface, which means the job can be configured via a Properties object. More importantly, a job can also describe the properties that it supports. The ConfigurableJob interface has the following methods:

List<String> configureJob(Properties props);
List<JobProperty> getJobProperties();

Given a Properties instance, a job can be configured via "configureJob", which returns a list of validation error messages - e.g. for missing properties. A tool like ml-gradle can then call "getJobProperties" and print out this list so that a Gradle user knows what properties are available for each job.

Here's an example of adding collections and configuring the job via a Properties object:

Properties props = new Properties();
props.setProperty("collections", "red,green");
props.setProperty("whereCollections", "test");
props.setProperty("batchSize", "50");
AddCollectionsJob job = new AddCollectionsJob();
job.configureJob(props);
job.run(databaseClient);

Of course, this is more verbose than simply calling methods directly on the job as shown in the examples at the top of this page. But this allows for a tool like ml-gradle to simply pass all the properties it has to the "configureJob" method and remain unaware of each job's configuration details.

Clone this wiki locally