Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mattermost export compatibility #107

Closed
rusq opened this issue Aug 12, 2022 · 26 comments
Closed

Mattermost export compatibility #107

rusq opened this issue Aug 12, 2022 · 26 comments
Assignees
Labels
enhancement New feature or request

Comments

@rusq
Copy link
Owner

rusq commented Aug 12, 2022

as per https://t.me/slackdump/185 investigate the format of Mattermost compatible export (the one that includes attachments). Tool name: slack-advanced-exporter

@rusq rusq self-assigned this Aug 12, 2022
@rusq rusq added the enhancement New feature or request label Aug 12, 2022
@ChenSun-Phys
Copy link

ChenSun-Phys commented Aug 12, 2022

Hi, I'm in the middle of testing slack advanced exporter. In particular, I'm trying to supply emails and attachments to the Slackdump export. It seems that the emails can be correctly fetched yet the attachments are not downloaded properly.

Test 1:
./slack-advanced-exporter -i ../official_export.zip -o ./official_attachments.zip fetch-attachments
This gives me export zip file enhanced with attachments in the __uploads folder.

Test 2:
./slack-advanced-exporter -i ../slackdump_export.zip -o ./slackdump_attachments.zip fetch-attachments
This creates corresponding file names in the __uploads folder yet each file is merely an HTML file. The terminal output from slack-advanced-exporter seems to be the same as before "Downloaded attachment into output archive: F03D3XXXXX." Run it with --api-token doesn't seem to help.

By skimming through the file structure I'm not sure what causes the difference.

Edit: It seems my test 1 is consistent with the result in #105 but not sure test 2 was done there.

@ChenSun-Phys
Copy link

ChenSun-Phys commented Aug 12, 2022

Here's a clue. There's some difference when grep-ing the same file ID in the Slack official export and the Slackdump export.

Official export:

./test_official/general/2022-01-21.json:39:                "url_private": "https:\/\/files.slack.com\/files-pri\/T02UXK672P8-F02UXQQJTPH\/math-with-slack.py?t=xoxe-298...<some_more_numbers>",
./test_official/general/2022-01-21.json:40:                "url_private_download": "https:\/\/files.slack.com\/files-pri\/T02UXK672P8-F02UXQQJTPH\/download\/math-with-slack.py?t=xoxe-298...<some_more_numbers>",
./test_official/general/2022-01-21.json:41:                "permalink": "https:\/\/myslackname.slack.com\/files\/U0302UGGXUZ\/F02UXQQJTPH\/math-with-slack.py",
./test_official/general/2022-01-21.json:42:                "permalink_public": "https:\/\/slack-files.com\/T02UXK672P8-F02UXQQJTPH-d8147370cf",
./test_official/general/2022-01-21.json:43:                "edit_link": "https:\/\/myslackname.slack.com\/files\/U0302UGGXUZ\/F02UXQQJTPH\/math-with-slack.py\/edit",

Slackdump export:

./test/general/2022-01-21.json:86:        "url_private": "https://files.slack.com/files-pri/T02UXK672P8-F02UXQQJTPH/math-with-slack.py",
./test/general/2022-01-21.json:87:        "url_private_download": "https://files.slack.com/files-pri/T02UXK672P8-F02UXQQJTPH/download/math-with-slack.py",
./test/general/2022-01-21.json:109:        "permalink": "https://myslackname.slack.com/files/U0302UGGXUZ/F02UXQQJTPH/math-with-slack.py",
./test/general/2022-01-21.json:110:        "permalink_public": "https://slack-files.com/T02UXK672P8-F02UXQQJTPH-d8147370cf",
./test/general/2022-01-21.json:111:        "edit_link": "https://myslackname.slack.com/files/U0302UGGXUZ/F02UXQQJTPH/math-with-slack.py/edit",

Note that "url_private" and "url_private_download" are different as Slackdump doesn't contain what I assume is the token (the ?t=... part.)

It seems that it's exactly either "url_private" or "url_private_download" is the go-to address slack-advanced-exporter uses. This can be seen from their fetch_attachments.go line 139.

To back it up:

  • if I append the url_private_download in Slackdump with ?t= string, I can download it in a fresh browser session (i.e. chrome Incognito mode.)
  • if I use the value of url_private_download Slackdump gives me as is, I can download it in a browser session where I previously have logged into my workspace.
  • if I use the value of url_private_download Slackdump gives me as is and try to download it in a fresh browser session, I'm redirected to the Slack login page. I think this is the page that gets mistakenly saved by slack-advanced-exporter in previous runs. Each one being a HTML file has ~144 KB.

I'm not familiar enough with the token system so not sure how to fix it elegantly other than hard coding my token into the slack-advanced-exporter. Perhaps this can serve as a starting point. I'm sure there are better solutions.

@rusq
Copy link
Owner Author

rusq commented Aug 13, 2022

Hey @ChenSun-Phys thanks for your investigation, it seems you solved it.

When I was running the Slack Export (when implementing export function), I noticed the following:

  1. When export is requested, slack generates a xoxe- token
  2. Slack appends this token to each attachment URL.

I think the first option in this case would be to provide an additional parameter that allows one to specify some token (xoxe, or xoxp etc.), that makes the file available for download externally.

The second option - to generate the export file with __uploads directory, to make it compatible with Mattermost "out of the box".

@ChenSun-Phys
Copy link

Thanks for the comment @rusq . I realized that the token was generated during the export on Slack.com. This is displayed on the page where the export can be downloaded.

Your Workspace’s Export File Download Tokens
Exported history will include links to private file urls and thumbnails that have a special token attached for access to those files. You can revoke the tokens at a later time (like say, after an import has been completed) for added security of your message history.

For now, I just wrote a short python script to replace the token to the Slackdump export. It works well with slack-advanced-exporter.

@luvwinnie
Copy link

luvwinnie commented Aug 14, 2022

For my slack channel it seems like work with no problem with the slackdump's export data with slack advanced exporter with emails and attachment.

Is it take time to implement to export with API Token? seems like the authentication can pass a token args, but even I pass a token to it it still ask for browser login.

@rusq
Copy link
Owner Author

rusq commented Aug 14, 2022

I was planning to do it yesterday, but quite busy moving house, most likely I will look at it closer to the end of next week. @ChenSun-Phys has mentioned that had made a python program that updates the JSON data in the message json - maybe if he could consider committing it to "examples" of this repo, or share it some way otherwise, it would be a quick solution for your problem.

@ChenSun-Phys
Copy link

Here's my quick dirty fix. I'm attaching my regex script below. I used it to migrate from Slack to Mattermost with success. Feel free to ping me about possible issues if you're in a similar transition process, be it related to this token issue or not.

import re
from glob import glob
import zipfile
import os
import shutil

slackdump_path = "your_export.zip"
output_path = "your_export_with_token.zip"
#stockdump_path = ""
parent_path = os.path.dirname(slackdump_path)
work_path = os.path.join(parent_path, "tmp")
try:
    shutil.rmtree(work_path)
except:
    pass
os.makedirs(work_path)
print(work_path)

with zipfile.ZipFile(slackdump_path, 'r') as zip_ref:
    zip_ref.extractall(work_path)

pub_token = "xoxe-<fill_your_token>"

counter = 0

    
for root, dirs, files in os.walk(work_path):
    for file in files:
        if file.endswith(".json"):
            target_path = os.path.join(root, file)            
            with open(target_path, 'r') as f:
                lines = f.readlines()
                
            for linum, line in enumerate(lines):

                # deal with url_private
                m = re.search(".*url_private.*\?t=xo.*", line)
                if m is not None:
                    print('pass and continue')
                    continue
        
    
                m = re.search("(.*url_private\": )\"(https.*)\",", line)                        
                if m is not None:
                    counter += 1
                    print(counter)
                    print(target_path, "#", linum, ":")

                    line_new = m.group(1) + "\""+ m.group(2) + "?t=" + pub_token + "\",\n"
                    print(line)
                    print(line_new)
                    
                    lines[linum] = line_new
                    
                # deal with url_private_download
                m = re.search(".*url_private_download.*\?t=xo.*", line)
                if m is not None:
                    print('pass and continue')
                    continue
        
    
                m = re.search("(.*url_private_download\": )\"(https.*)\",", line)                        
                if m is not None:
                    counter += 1
                    print(counter)
                    print(target_path, "#", linum, ":")

                    line_new = m.group(1) + "\""+ m.group(2) + "?t=" + pub_token + "\",\n"
                    print(line)
                    print(line_new)
                    
                    lines[linum] = line_new                
                    
            # write them back
            with open(target_path, 'w') as f:
                f.writelines(lines)
                        
# zip it
with zipfile.ZipFile(output_path, 'w') as new_zip:
    for root, dirs, files in os.walk(work_path):
        for file in files:
            new_zip.write(os.path.join(root, file), arcname=os.path.join(root.replace(work_path, ""), file))
            

@luvwinnie
Copy link

luvwinnie commented Aug 14, 2022

Thank you for the script.I would like to take a look with it.

I have made an exporter with python by using the API Token with full access permission.
Maybe this go language exporter repository can be improve by including members list in the channels/mpims/groups json.

Should I make a PR to put this script in the examples folder?
https://gist.github.com/luvwinnie/e985fe90ece2c2aaf6630310a33bad8d

@ChenSun-Phys
Copy link

Awesome the script looks good, although I haven't got time to test it carefully. Adding it through a PR sounds like a good idea.

@ChenSun-Phys
Copy link

ChenSun-Phys commented Aug 15, 2022

Another thought is that perhaps it's easier to get away with the token requirement for downloading the attachment altogether. This is based on the observastion that if I take the "url_private_download" or "url_private" value without the token, I could download it if I previously have already logged into the workspace in the same browser.

Since slackdump is based on EZ-Login 3000, it might be easier to make slackdump download the attachments without the token than (1) generating the token by hand and (2) regexing it into all the "url_private" keys then (3) using slack-advanced-exporter. The former sounds like a one-stop solution while the latter three stops.

I've done the latter three-step export many times by now with great success but I'm not happy with it. :-) Besides the added elegance in the one-step solution, it addresses a burning issue that I have in a few of my workspaces: generating the token by hand (i.e. step 1) requires at least admin privilege. Even though I have access to all the attachments in the first place, the slack-advanced-exporter solution will require either asking the group owner/admin to generate an export and send it to me in the first place just to expose the token for attachments or asking them to raise my privilege so I can do that myself.

@rusq
Copy link
Owner Author

rusq commented Aug 17, 2022

@ChenSun-Phys I agree, that would be the best approach. I ran the slack-advanced-exporter, and examined the structure, tomorrow I'll have some time to start working on this.

@rusq
Copy link
Owner Author

rusq commented Aug 19, 2022

Hey @ChenSun-Phys , i have implemented the mattermost export format support in the https://github.com/rusq/slackdump/tree/i107-mattermost branch

Would need to add unit tests before I release it, but if you want to give it a go, i'm attaching slackdump-mattermost.zip. Would be keen to hear the feedback?

Run it like this:

./slackdump -export test.zip -export-type mattermost

and it should generate the compatible export file. I'm considering making it the default export mode, as the "standard" export type is not supported by any third-party software out in the wild.

@rusq
Copy link
Owner Author

rusq commented Aug 19, 2022

@luvwinnie would this address your problem of having an API token in the export? Was the attachments the only reason for having the API token?

@rusq
Copy link
Owner Author

rusq commented Aug 20, 2022

I was able to setup mattermost instance locally and tested, it seems to work, but requires the use of mmetl, still too many steps.

@luvwinnie
Copy link

luvwinnie commented Aug 20, 2022

@rusq sorry for late reply, I'm busying on working other projects,I would like to test later.

One more things is that maybe you can try to use the following command to import to mattermost.
I used this to do so.

$mattermost import slack <matermost team name> slack_export.zip

@rusq
Copy link
Owner Author

rusq commented Aug 20, 2022

@luvwinnie no worries at all. Thank you for the suggestion - I tried it too, when following the documentation, it does require the token, so now I do understand the need for this. But interestingly, it couldn't import attachments even with official export. Maybe that was some shenanigans of my docker instance, i'll have to mess around with it more.

@ChenSun-Phys
Copy link

Sorry for being late to the party. A few Q's/comments

  • @rusq did you examine the zip file and verify the attachments are correctly downloaded?
  • After you used mmetl to convert it to mattermost format, and before using mmctl to import, did you create a ./data folder and put the attachments inside it? There's a bug there. See this issue.

@rusq
Copy link
Owner Author

rusq commented Aug 20, 2022

That's alright :)

  • Yes, and found a bug where I was chucking files in "__upload" dir, instead of "__uploads" - fixed.
  • I performed the following steps:
./mmetl transform slack -t slackdump -d bulk-export-attachments -f test.zip -o mattermost_import.jsonl

# this seemed like a workaround step for the issue that you've linked
mkdir data
mv bulk-export-attachments data
7z a bulk.zip data mattermost_import.jsonl

mmctl import upload bulk.zip
mmctl import process dx9bg37wgty4pb14hywywde3rh_bulk.zip

# then mmctl import job list to ensure that it completes.

I haven't run into the aforementioned issue, and all images were uploaded.

The only issue I had is that my email in the export was the same I used for mattermost account, so it complained, had to edit the jsonl file.

@ChenSun-Phys
Copy link

Thanks for the follow-up! Indeed, I also saw that the imported accounts cannot have the same email address that's already in mattermost.

Just to be sure, do you still need to generate and add the xoxe token to the export by slackdump, or is it automatically added?

@rusq
Copy link
Owner Author

rusq commented Aug 20, 2022

No problem :)

Just to be sure, do you still need to generate and add the xoxe token to the export by slackdump, or is it automatically added?

No, there's no need for xoxe token, and there's no need to run slack-advanced-exporter on the slackdump generated archive, as slackdump creates the mmetl compatible export: all files are placed in "__uploads" directory, which mmetl understands.

So the high level steps would be:

  1. Run slackdump -export my_slack.zip -export-type mattermost -download
  2. run mmetl transform slack ...
  3. run mmctl import upload ... and mmctl import process ...

There is also a highly experimental branch i107-mattermost-mmetl, it reduces the steps to

  1. Run slackdump -export my_mm_export.zip -export-type mattermost -team-name test_team -download
  2. run mmctl import upload ... and mmctl import process ...
    but the problem is there's no licence in mmetl repository, so it is not clear if it can be imported in other packages. I opened an issue about the license type, let's see what would be their reply: What is the licence type? mattermost/mmetl#20

The drawback for this is that it uses the packages from mattermost repo and it pulls LOTS of dependencies. The executable size increases dramatically, if compiled with mmetl version 0.0.1 it becomes 36MB and with the master branch - 57MB. I'm undecided on whether I want to proceed down this path.

@rusq
Copy link
Owner Author

rusq commented Aug 21, 2022

Binaries available https://github.com/rusq/slackdump/releases/tag/v2.1.2

it's still alpha as there's no unit tests, only manually tested - releasing for the sake of speed.

I have also updated the doc, adding a quick guide for mattermost migration

@rusq
Copy link
Owner Author

rusq commented Aug 21, 2022

@luvwinnie I've been experimenting with mattermost import command, and here are my discoveries:

mattermost import seems to understand the __uploads directory, however, there are problems:

  1. larger files are not being imported
  2. small files (i.e. 16 bytes) seem to import fine, and are present in data/.../*.* directory, but when trying to access them through the web UI, it reports 404 on screen and in the log file

The same file imported with mmetl/mmctl doesn't give any attachment problems. It seems that mattermost import slack has some internal problems.

@ChenSun-Phys
Copy link

I haven't tried mattermost import slack yet but I recall that Mattermost documents recommended mmctl over it. Whats the reasoning of using this instead of mmctl?

@rusq
Copy link
Owner Author

rusq commented Aug 23, 2022

Out of interest, as doc mentions it and @luvwinnie pointed to it as well. It seemed like one step of running mattermost import would be easier than doing the mmetl/mmctl dance, but the inability to import attachments properly is a showstopper really.

@rusq
Copy link
Owner Author

rusq commented Aug 28, 2022

@ChenSun-Phys I'll close this as this has been merged, let me know if there are issues, and we can revisit this.

@rusq rusq closed this as completed Aug 28, 2022
@ChenSun-Phys
Copy link

ChenSun-Phys commented Aug 31, 2022

Sorry for the slow response. I just tested this in v2.2.0-alpha. It works like a charm. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants