Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revision tree repair tool #2857

Closed
5 tasks done
tleyden opened this issue Aug 29, 2017 · 6 comments · Fixed by #2869
Closed
5 tasks done

Revision tree repair tool #2857

tleyden opened this issue Aug 29, 2017 · 6 comments · Fixed by #2869
Assignees

Comments

@tleyden
Copy link
Contributor

tleyden commented Aug 29, 2017

To repair the revision trees with cycles caused by #2847

  • Dry run mode
  • Call Import View to walk keys (UpdateAllDocChannels) -- current doc path
  • Cas safe write if identified changes
  • Instructions: Take bucket offline, do changes, then restart. Solves stale revcache issue.
  • Wire up to endpoint

Instructions

See https://github.com/couchbase/sync_gateway/wiki/Issue-2857-Repair-Tool

Test Instructions

  • Run Couchbase Server 4.x, with empty data-bucket bucket
  • Run Sync Gateway with this config
  • wget https://gist.githubusercontent.com/tleyden/bc4d9cae6ac8bfe408c373e673ff58e9/raw/659700bc6287ed44cdd8158a9e8a82a33cb80e08/data_for_test
  • cbc cp data_for_test -U couchbase://localhost/data-bucket
  • Verify that the doc has cycles in revtree by running curl http://{{host}}:{{port_admin}}/{{db}}/data_for_test?revs=true and looking for "reason": "Internal error: getHistory found cycle in revision tree
  • Repair the bucket via the instructions
@tleyden
Copy link
Contributor Author

tleyden commented Aug 29, 2017

@tleyden
Copy link
Contributor Author

tleyden commented Aug 29, 2017

POST /_repair
{
  dry_run: true
  revTreeCycle: {}
}


@tleyden
Copy link
Contributor Author

tleyden commented Aug 30, 2017

@adamcfraser here's what I have so far:

release/1.3.1.3...feature/issue_2847_repair_tool

Can you take a quick pass to make sure we're on the same page? In particular the RepairBucket method

What's still pending:

  • Cas safe write if identified changes
  • Get doc to refresh rev cache (or drop rev cache)
  • Wire up to endpoint

@tleyden tleyden self-assigned this Aug 30, 2017
@tleyden tleyden added the review label Aug 30, 2017
tleyden added a commit that referenced this issue Aug 31, 2017
* TestRepairRevsHistoryWithCycles()

* Add Repair() — unit test passes now

* Add repair bucket (in progress)

* Test passes

* Fix dry run

* TestRepairBucketRevTreeCycles() passes

* Add _repair endpoint

* Fix invocation of InitFrom

* Refactor RepairBucket() to return repaired docs to enable more test assertions

* WriteRepairedDocsToDisk by default to make diagnosis of repair tool easier

* Run gofmt + goimports

* PR feedback, remove redundant repair_job

* DocTransformer takes raw bytes instead of the marshalled document

* Return RepairBucketResult with doc and and repair job

* Update _repair endpoint to marshal result to response

* Add WriteRepairedDocsToBucket(), fix super nasty dev bug along the way

* Fix bug in WriteRepairedDocsToBucket()

* Change to 24 hours and fix up _sync: doc id

* gofmt

* Return DryRun and BackupOrDryRunDocId in results

* Repair() -> RepairCycles()

* More test docs

* Handle bucket.Update err

* Add TestRepairBucketDryRun()

* Use bucket.GetRaw()

* Gofmt + goimports

* TestRepairBucket had wrong number of docs in assertion

+ saw error on drone that made me think there is interference between walrus buckets w/ same name.

* Fix compile error
tleyden added a commit that referenced this issue Aug 31, 2017
* TestRepairRevsHistoryWithCycles()

* Add Repair() — unit test passes now

* Add repair bucket (in progress)

* Test passes

* Fix dry run

* TestRepairBucketRevTreeCycles() passes

* Add _repair endpoint

* Fix invocation of InitFrom

* Refactor RepairBucket() to return repaired docs to enable more test assertions

* WriteRepairedDocsToDisk by default to make diagnosis of repair tool easier

* Run gofmt + goimports

* PR feedback, remove redundant repair_job

* DocTransformer takes raw bytes instead of the marshalled document

* Return RepairBucketResult with doc and and repair job

* Update _repair endpoint to marshal result to response

* Add WriteRepairedDocsToBucket(), fix super nasty dev bug along the way

* Fix bug in WriteRepairedDocsToBucket()

* Change to 24 hours and fix up _sync: doc id

* gofmt

* Return DryRun and BackupOrDryRunDocId in results

* Repair() -> RepairCycles()

* More test docs

* Handle bucket.Update err

* Add TestRepairBucketDryRun()

* Use bucket.GetRaw()

* Gofmt + goimports

* TestRepairBucket had wrong number of docs in assertion

+ saw error on drone that made me think there is interference between walrus buckets w/ same name.

* Fix compile error
tleyden added a commit that referenced this issue Aug 31, 2017
* TestRepairRevsHistoryWithCycles()

* Add Repair() — unit test passes now

* Add repair bucket (in progress)

* Test passes

* Fix dry run

* TestRepairBucketRevTreeCycles() passes

* Add _repair endpoint

* Fix invocation of InitFrom

* Refactor RepairBucket() to return repaired docs to enable more test assertions

* WriteRepairedDocsToDisk by default to make diagnosis of repair tool easier

* Run gofmt + goimports

* PR feedback, remove redundant repair_job

* DocTransformer takes raw bytes instead of the marshalled document

* Return RepairBucketResult with doc and and repair job

* Update _repair endpoint to marshal result to response

* Add WriteRepairedDocsToBucket(), fix super nasty dev bug along the way

* Fix bug in WriteRepairedDocsToBucket()

* Change to 24 hours and fix up _sync: doc id

* gofmt

* Return DryRun and BackupOrDryRunDocId in results

* Repair() -> RepairCycles()

* More test docs

* Handle bucket.Update err

* Add TestRepairBucketDryRun()

* Use bucket.GetRaw()

* Gofmt + goimports

* TestRepairBucket had wrong number of docs in assertion

+ saw error on drone that made me think there is interference between walrus buckets w/ same name.

* Fix compile error
tleyden added a commit that referenced this issue Aug 31, 2017
* Fixes #2857: Revision tree repair tool  (#2866)

* TestRepairRevsHistoryWithCycles()

* Add Repair() — unit test passes now

* Add repair bucket (in progress)

* Test passes

* Fix dry run

* TestRepairBucketRevTreeCycles() passes

* Add _repair endpoint

* Fix invocation of InitFrom

* Refactor RepairBucket() to return repaired docs to enable more test assertions

* WriteRepairedDocsToDisk by default to make diagnosis of repair tool easier

* Run gofmt + goimports

* PR feedback, remove redundant repair_job

* DocTransformer takes raw bytes instead of the marshalled document

* Return RepairBucketResult with doc and and repair job

* Update _repair endpoint to marshal result to response

* Add WriteRepairedDocsToBucket(), fix super nasty dev bug along the way

* Fix bug in WriteRepairedDocsToBucket()

* Change to 24 hours and fix up _sync: doc id

* gofmt

* Return DryRun and BackupOrDryRunDocId in results

* Repair() -> RepairCycles()

* More test docs

* Handle bucket.Update err

* Add TestRepairBucketDryRun()

* Use bucket.GetRaw()

* Gofmt + goimports

* TestRepairBucket had wrong number of docs in assertion

+ saw error on drone that made me think there is interference between walrus buckets w/ same name.

* Fix compile error

* Fix null return value when no repairs done

* Change from Warn -> Crud
@ArihantRk
Copy link

ArihantRk commented Sep 8, 2017

@tleyden I went through #2847 about dangling parent.

I am more interested in identifying documents have revision tree issues.
please let us know the working of repair tool. will it apply on all documents or documents having issues.
how do we know a document has issue and it can be repaired by this tool.

@tleyden
Copy link
Contributor Author

tleyden commented Sep 8, 2017

@ArihantRk the docs are here: https://github.com/couchbase/sync_gateway/wiki/Issue-2857-Repair-Tool

If you do a dry run, it will show you which docs it would repair (but not actually make any changes)

@adamcfraser
Copy link
Collaborator

@ArihantRk The tool is only intended to be used in conjunction with data affected by a specific maintenance patches, and only if that data can't be otherwise rolled back. It's not intended for general detection of 'revision tree issues'. If you feel like you're in a situation that requires this tool, you should review w/ support.

@tleyden tleyden closed this as completed Sep 27, 2017
@tleyden tleyden removed the review label Sep 27, 2017
tleyden added a commit that referenced this issue Sep 29, 2017
* TestRepairRevsHistoryWithCycles()

* Add Repair() — unit test passes now

* Add repair bucket (in progress)

* Test passes

* Fix dry run

* TestRepairBucketRevTreeCycles() passes

* Add _repair endpoint

* Fix invocation of InitFrom

* Refactor RepairBucket() to return repaired docs to enable more test assertions

* WriteRepairedDocsToDisk by default to make diagnosis of repair tool easier

* Run gofmt + goimports

* PR feedback, remove redundant repair_job

* DocTransformer takes raw bytes instead of the marshalled document

* Return RepairBucketResult with doc and and repair job

* Update _repair endpoint to marshal result to response

* Add WriteRepairedDocsToBucket(), fix super nasty dev bug along the way

* Fix bug in WriteRepairedDocsToBucket()

* Change to 24 hours and fix up _sync: doc id

* gofmt

* Return DryRun and BackupOrDryRunDocId in results

* Repair() -> RepairCycles()

* More test docs

* Handle bucket.Update err

* Add TestRepairBucketDryRun()

* Use bucket.GetRaw()

* Gofmt + goimports

* TestRepairBucket had wrong number of docs in assertion

+ saw error on drone that made me think there is interference between walrus buckets w/ same name.

* Fix compile error
tleyden added a commit that referenced this issue Oct 3, 2017
* TestRepairRevsHistoryWithCycles()

* Add Repair() — unit test passes now

* Add repair bucket (in progress)

* Test passes

* Fix dry run

* TestRepairBucketRevTreeCycles() passes

* Add _repair endpoint

* Fix invocation of InitFrom

* Refactor RepairBucket() to return repaired docs to enable more test assertions

* WriteRepairedDocsToDisk by default to make diagnosis of repair tool easier

* Run gofmt + goimports

* PR feedback, remove redundant repair_job

* DocTransformer takes raw bytes instead of the marshalled document

* Return RepairBucketResult with doc and and repair job

* Update _repair endpoint to marshal result to response

* Add WriteRepairedDocsToBucket(), fix super nasty dev bug along the way

* Fix bug in WriteRepairedDocsToBucket()

* Change to 24 hours and fix up _sync: doc id

* gofmt

* Return DryRun and BackupOrDryRunDocId in results

* Repair() -> RepairCycles()

* More test docs

* Handle bucket.Update err

* Add TestRepairBucketDryRun()

* Use bucket.GetRaw()

* Gofmt + goimports

* TestRepairBucket had wrong number of docs in assertion

+ saw error on drone that made me think there is interference between walrus buckets w/ same name.

* Fix compile error
adamcfraser pushed a commit that referenced this issue Oct 3, 2017
…go into infinite loop + rev tree repair tool) (#2869)

* Fixes #2847 Getting doc history can go into infinite loop: Backport for 1.4.1.2 (#2858)

* Fixes #2847 Getting doc history can go into infinite loop: Backport for 1.3.1.2 (#2856)

* Repro case for #2847 - Getting doc history can go into infinite loop

#2847

* Add revTree.Validate() method

* Invoke rawDoc.History.RenderGraphvizDot(), change timeout checking

* Add new test

* Cherry pick commit from feature/issue_2847_cycles

Cherry pick commit d8feb1d from  feature/issue_2847_cycles, which is based on the master branch

* Fixes issue #2847 by wwitching the dangling parent check in pruneRevisions to happen after branches are deleted

#2847 (comment)

* Comments on test

* Fixes issue #2847 by fixing marshalJSON to  better handle dangling parents during marshal

#2847 (comment)

* Replace getHistory with getValidatedHistory

TestRevsHistoryInfiniteLoop now passes

* Rename getValidatedHistory -> getHistory

* Gofmt

* Remove parent rev from history string

* Remove unneeded fmt.Sprintf()

* Run sg-accel tests

* Try pointing to specific commit to fix build failure

* Use git url instead of ssh for sga accel repo

* Revert "Use git url instead of ssh for sga accel repo"

This reverts commit 1c5e061.

* Revert "Try pointing to specific commit to fix build failure"

This reverts commit dd3f9d9.

* Revert "Run sg-accel tests"

This reverts commit d5cc940.

* Remove rawDoc.History.Validate().  Does not help test catch any issues.

* Remove commented log

* Remove unneeded import

# Conflicts:
#	db/revtree_test.go

* Fix compile errors

# Conflicts:
#	db/crud.go
#	db/revtree_test.go

* Fixes #2857: Revision tree repair tool  (#2866)

* TestRepairRevsHistoryWithCycles()

* Add Repair() — unit test passes now

* Add repair bucket (in progress)

* Test passes

* Fix dry run

* TestRepairBucketRevTreeCycles() passes

* Add _repair endpoint

* Fix invocation of InitFrom

* Refactor RepairBucket() to return repaired docs to enable more test assertions

* WriteRepairedDocsToDisk by default to make diagnosis of repair tool easier

* Run gofmt + goimports

* PR feedback, remove redundant repair_job

* DocTransformer takes raw bytes instead of the marshalled document

* Return RepairBucketResult with doc and and repair job

* Update _repair endpoint to marshal result to response

* Add WriteRepairedDocsToBucket(), fix super nasty dev bug along the way

* Fix bug in WriteRepairedDocsToBucket()

* Change to 24 hours and fix up _sync: doc id

* gofmt

* Return DryRun and BackupOrDryRunDocId in results

* Repair() -> RepairCycles()

* More test docs

* Handle bucket.Update err

* Add TestRepairBucketDryRun()

* Use bucket.GetRaw()

* Gofmt + goimports

* TestRepairBucket had wrong number of docs in assertion

+ saw error on drone that made me think there is interference between walrus buckets w/ same name.

* Fix compile error

* Fix null return value when no repairs done

* Change from Warn -> Crud

* Disable _repair endpoint

* Fix test compile errors

* Fixes #2892 repair tool efficiency (#2893)

* Fixes #2892 - Repair tool efficiency improvements

Initial first pass at iterating over the view w/ paging

* Add documentation

* More comments

* Comment regarding min pageSIzeViewResult

* pull ViewQueryPageSize out to a constant.  Enhance tests to add more docs to excercise the view iteration

* Address PR feedback

* Update comment

* Change constant to DefaultViewQueryPageSize

* Run gofmt

# Conflicts:
#	base/constants.go
#	db/repair_bucket_test.go

* Adds support for custom TTL of repaired docs and ViewQueryPageSize deafults (#2902)

Adds support for custom TTL of repaired docs and ViewQueryPageSize defaults

* correctly apply _repair parameter value for params.RepairedFileTTL (#2909)

* Fixes #2919 Revtree repair tool gets stuck if node points to itself a… (#2920)

* Fixes #2919 Revtree repair tool gets stuck if node points to itself as parent

* PR feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants