Skip to content

Oplog Progress File

lovett89 edited this page Nov 12, 2014 · 9 revisions

The oplog progress file keeps track of the latest oplog entry seen for each replica set to which Mongo Connector is connected. Mongo Connector uses this file to decide, for each replica set to which it is connected, where to begin reading the oplog on startup. Note that Mongo Connector will continue normal operation even if the file becomes deleted or corrupt while running.

Nota Bene

The format for oplog progress files was recently changed for sharded clusters. This was to fix a bug where Mongo Connector was unable to parse the progress file, and thus would raise an Exception instead of begin tailing the oplog at the proper place. This change does not affect users who have only run the connector against replica sets. This change does not impact any replicated data. Users who run against sharded clusters will need to allow Mongo Connector to create a new oplog progress file by following the steps in the Creating an Oplog Progress File section.

Collection Dumps and the Oplog Progress File

When the oplog progress file cannot be found, or if it is empty, Mongo Connector will begin pulling data from all MongoDB collections (or the ones given in --namespace-set) in the "collection dump" phase. The oplog progress file is then updated with the most recent timestamp from before the dump happened. Mongo Connector then applies all oplog operations from before the dump, so that the copied documents will be up-to-date with what's on MongoDB.

We can force a collection dump to happen, therefore, by specifying an empty or non-existent file with the --oplog-ts option. You may want to re-sync if Mongo Connector falls behind the last record in the oplog. This may happen during a very high write-load or after having stopped Mongo Connector for a long time.

Format

The exact format of this file depends on MongoDB's toplogy. For a single replica set, the format is:

["oplog name", timestamp]

For a sharded cluster, there is one such entry for each replica set shard:

[["oplog 1 name", timestamp 1], ["oplog 2 name", timestamp 2], ...]

Creation and Update

The oplog progress file is created as the final step of Mongo Connector's initialization and happens with or without a collection dump. The main thread of the connector monitors the progress of each oplog-tailing thread, updating the progress file once per second. Oplog-tailing threads publish their progress at the following times:

  • After every --batch-size oplog records processed
  • After processing all available oplog records
  • When an oplog-tailing thread's connection to MongoDB is interrupted
  • Immediately upon startup (progress is reported as most recent oplog record)
  • Immediately after a rollback

Note: Each time before the main thread writes to the progress file, it creates a backup copy of the progress file with the same name with ".backup" appended to it.

Creating an Oplog Progress File

Creating an oplog progress file starting at the most recent oplog record can be useful if your previous progress file is accidentally deleted or somehow becomes corrupted. You should only do this if you're confident that Mongo Connector has successfully replicated all operations up to that point, otherwise you should re-sync the connector by deleting the file and restarting mongo-connector. You can force Mongo Connector to create an oplog progress file containing the most recent oplog record using the following method:

  1. Stop Mongo Connector, if it is running.
  2. Start Mongo Connector again with:
    • --oplog-ts pointing to an empty or non-existent file
    • --no-dump so that Mongo Connector will not attempt to copy data.
  3. Stop Mongo Connector.
  4. Restart Mongo Connector with your usual options, and make sure to point --oplog-ts at the new progress file.
Clone this wiki locally