-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confluence transfer stuck near end #46
Comments
@craigjm Good question why it skips those. The log ( Two approaches. The first approach to finding the missing pages: diffing page IDs manuallyCheck the number of migrated pages for this space, maybe the progress bar is off but the number of pages not. This can be done by filtering the Site Pages library by space key, then you can export and further analyze e.g. via Excel. Is the number of page IDs equal to 3588? Or higher? Now the missing pages can be found via the REST endpoint in Confluence, just like WikiTraccs gets them:
The difference in page IDs compared to the Excel is the pages missing. With those page IDs I would search the log for messages, especially any errors. The second apprach to finding the missing pages: wait for the next releaseWikiTraccs should make this easier. A release is scheduled for the coming week. I'll integrate the diffing of what has been migrated and what needs to be migrated into this next release. So the log file will tell which pages are yet to be migrated. This can then be used for further diagnosis. Ultimately this information should probably go to the space inventory as migration result information. Sorry that it's not easier at the moment. It should and it will be. |
@craigjm Would you please try the latest release v1.1.1 and have a look at the progress log files for the space in question. I'd like to know if the progress bar is just off or if there is really something missing. If so, the new progress log files will tell exactly, which pages are missing. This then can be the basis for further investigation. Just start the migration again and WikiTraccs logs information about all spaces that are marked for migration. |
@craigjm I observed the same behavior in another context. Those 200 pages have been migrated to SharePoint before and have been updated in Confluence since then. Then, when running another migration for the same space, WikiTraccs will happily migrate any new pages that were created in Confluence. But it won't overwrite existing pages from a previous migration run, to not overwrite any changes that have been since made on the SharePoint side. The progress log files added in release v1.1.1 should tell exactly which pages are affected - all 200 should be in the update-state-of-migrated-pages.txt file, marked as needsupdate. You can delete the SharePoint pages that should be updated and restart the migration. It will create the pages again. Looking into the future: would a "force-overwrite" mode help? Are you migrating all Confluence pages at once, or are you doing it in waves? Is there a risk to interfere with user-made page edits, or can't they access the site during the migration anyway? |
We are still in the testing phase of these migrations, so I'm migrating one Confluence space at a time so the owner can check it out, knowing that we might need to remigrate to fix issues. I do not expect to make changes in Sharepoint and then migrate again, but people will make changes in Confluence before the final migration. For this particular space, it is another migration for the Confluence space, but it was to a new site in Sharepoint. Does that count as another migration? I also ended up trying to run it again to see if it would finish. When I look at Site Page in SharePoint, I see a lot of the problems with multiple Failed Transformations. Some have 100% Text Transferred, but others have less. Some of the pages are in the -not-yet-migrated-pages log, others are in the -update-state-of-migrated-pages log. A "force-overwrite" mode would definitely be useful for me, because I will always want to replace what is in Sharepoint with the final migration from Confluence. |
@craigjm This sounds like an interesting source Confluence. WikiTraccs skips Confluence pages if it finds those pages already present in the target SharePoint site, identified by page ID. If there is no page yet, it should migrate. The not-yet-migrated pages should be created one after another when starting the migration. They are waiting to be migrated. If they aren't created then something happened, preventing them from being created. Something like #3 comes to mind (too long page titles). I could read this from the log file. So I see different topics here:
I'd like to look into any of those topics depending on your time and priorities. The page Troubleshooting Strategies shows which diagnosis information can be found where and how to get the storage format of pages. I'd be very interested to look at the log files and storage format of pages with off Text Transferred Percent. Often those share a similarity in structure, or the same macro that WikiTraccs cannot (yet) handle. |
Ok, first I'll upload the log files from the new version. I'll take a look at some pages with < 100% transferred and get the storage format with the info from the site pages library next week. Thanks for the assistance! |
@craigjm Thanks, please send the other log files from the logs directory as well. I'd suggest via email to [email protected] as its content is usually not for the public eye. |
@craigjm There is already something interesting in the logs you provided. About ~200 pages are listed twice. For me it looks like WikiTraccs migrated all pages, but somehow there are pages that were marked for migration twice, then once skipped (because already migrated), and thus skewing the count. What I don't know yet is whether Confluence already provides the duplicates to WikiTraccs, or whether it's happening further down the road. |
@craigjm Would you please run the migration for the large space again using the latest release v1.3.7? From reading the log I get the impression that all pages were migrated from Confluence to SharePoint, but some pages are coming back doubly from Confluence. WikiTraccs now logs all page IDs it gets from Confluence for each space, and also actively checks for duplicates, so duplicates should show up in the logs. Please send me the logs for this run. |
It got further this time! 3338/3779. I'm mailing the logs over now. |
@craigjm Quick update on the issue: when WikiTraccs asks Confluence for page IDs Confluence returns 400 duplicate page IDs for the "cit" space. Those duplicates mess with the overall bookkeeping of how many pages have been migrated, how many are still due, etc. The obvious thing to do is to add a duplicate removal step to WikiTraccs, which will require an update. This should fit into the maintenance release planned for next week. |
@craigjm Page de-duplication has been added to the latest release v1.3.13. Could you please check if this makes the progress bar reach its end? At least the behavior should change compared to last time. |
I started a run on the newest release (1.3.13). It looks like progress is
still stuck at 3334/3381, but hopefully the logs files I sent have some
useful information.
…On Sun, Jun 4, 2023 at 3:38 PM Heinrich Ulbricht ***@***.***> wrote:
@craigjm <https://github.com/craigjm> Page de-duplication has been added
to the latest release v1.3.13
<https://github.com/WikiTransformationProject/wikitraccs-releases>. Could
you please check if this makes the progress bar reach its end? At least the
behavior should change compared to last time.
—
Reply to this email directly, view it on GitHub
<#46 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGUGTMZKLIO4KS4GNGWTVM3XJTPZ7ANCNFSM6AAAAAAYFF54WU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
.com>
|
@craigjm The latest release v1.4.6 contains progress bar improvements. Outdated pages are now skipped. This has the chance - together with the previously added duplicate removal - to push the progress bar to 100%. |
Note: I found an issue in the Atlassian community that is describing duplicate pages being returned, as well as pages being missing. One possible solution from Atlassian support is to rebuild the content index. |
This issue is stale because it has been open 20 days with no activity. Remove stale label or comment, or this will be closed in 10 days. |
This issue was closed because it has been stalled for 10 days with no activity. |
I have a space I am trying to migrate, but it seems like WIkiTraccs gets stuck with 200 pages left and doesn't finish. I can cancel the job at the console, but is there a way to find what the problem is and finish?
The text was updated successfully, but these errors were encountered: