Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HWP/HWPX document formats #460

Merged
merged 7 commits into from
Aug 1, 2023
Merged

Add support for HWP/HWPX document formats #460

merged 7 commits into from
Aug 1, 2023

Conversation

OctopusET
Copy link
Contributor

@OctopusET OctopusET commented Jun 26, 2023

HWP

HWP is the most popular document format in South Korea, aside from the controversy over its closed nature. Almost all South Korean government documents are written in hwp/hwpx format.

Test with this files

Sample hwp attachments are available here (On the attachments table)
https://bugs.documentfoundation.org/show_bug.cgi?id=144747

H2Orestart

https://github.com/ebandal/H2Orestart/

This extension helps to convert the hwp/hwpx document files to other formats supported by LibreOffice.

Good news is it's open sourced recently. (That's why it's called 'restart')

It's a Java based LibreOffice plugin, so it needs a JRE. Fortunately, the dangerzone container image contains the java8 JRE, so there is no need to include another package in the image.

Problems with this PR

Errors Update: Fixed! #460 (comment)

  • PDF conversion works.
  • It's corrupted when it's running pdfinfo and pdftoppm(Run with bypassing pdfinfo)
  • When I pdfinfo/pdftoppm manually there's no error.
  • Need some more sample files for testing.~

MIME types

HWP and HWPX use custom MIME types that are not recognized by IANA. And one format has multiple MIME types, so they all need to be added. Some recommend application/vnd.hancom.*. But wildcard may not be supported on this code base and it may lead to security problems.

  • hwp
    application/x-hwp, application/haansofthwp, application/vnd.hancom.hwp
  • hwpx
    application/haansofthwpx, application/vnd.hancom.hwpx

Reference (in Korean)

CJK Fonts

Update: Related issue: #468

I tried to convert some documents and then all the Korean characters were rendered as 'tofu'. I installed font-noto-cjk and it's all gone. I think it would be better to install font-noto-cjk-extra as well, just in case. https://pkgs.alpinelinux.org/package/edge/community/x86/font-noto-cjk-extra

Test needed on other system

I haven't tested on the MacOS, and Windows.

I think this support would be very helpful not only for journalists but also for many people who use hwp formats.

Additional information

The following information is provided to better understand these formats.

HWPX

It's one of South Korean standard, KS X 6101, Archive.

Actual name is OWPML(Open Word-Processor Markup Language), and HWPX is branding of the Hancom.
HWPX is quite a new format, and it's been adopted these days. Hancom changed default document format to .hwpx.

References (in Korean):

Security attack increase

Security attacks using HWP/HWPX formats have been around for a long time. They continue to grow and become more complex.

News:

Actually...

LibreOffice does support the hwp format, but it's very old version HWP3.0 (released in 1997).

Other References

Fixes #468

@OctopusET
Copy link
Contributor Author

I think commit Add h2orestart on Dockerfile and Add hwp hwpx support can be just squashed.

@OctopusET
Copy link
Contributor Author

OctopusET commented Jun 26, 2023

Closes #243

Copy link
Contributor

@apyrgio apyrgio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this contribution @OctopusET! You've done lots of research which makes the problem area much clearer.

I have commented on the code, but I also have some general comments:

  1. From the links you sent in the PR, it seem that the .hwp{x} files may not be initially present in the system. The real problem is .lnk files, which execute arbitrary code and may open benign .hwp{x} files as a decoy. In this PR, we will protect users against directly opening a malicious .hwp{x} document, correct? Did you perhaps have a suggestion for tackling .lnk files (aside from "please, don't open them :-)"?
  2. Is there any LibreOffice hardening that people employ when opening an .hwp{x} file? Some sort of extension sandboxing, for instance?

Also, thanks a lot for providing sample files. I'll test your implementation soon and try to trace the source of errors.

Dockerfile Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
@deeplow
Copy link
Contributor

deeplow commented Jun 27, 2023

Thanks as well for the contribution, this seems like it could be impactful indeed.

However, I was unable to use this extension to convert the demo file you provided. It failed in the pdfinfo similarly to you, but in my case it didn't succeed because it didn't finish the conversion in the previous step, so pdfinfo failed because it was running on a non-existant file, I think.

$ ./install/linux/build-image.sh
$ export INPUT_FILE=file.hwp
$ podman run --network none -u dangerzone --security-opt no-new-privileges --userns keep-id --cap-drop all -v $INPUT_FILE:/tmp/input_file --rm dangerzone.rocks/dangerzone libreoffice --headless --safe-mode --convert-to pdf --outdir /tmp /tmp/input_file && ls /tmp/input*
Error: source file could not be loaded

@OctopusET if you run the same command, does it work for you?

@OctopusET
Copy link
Contributor Author

OctopusET commented Jun 27, 2023

$ ./install/linux/build-image.sh
$ export INPUT_FILE=file.hwp
$ podman run --network none -u dangerzone --security-opt no-new-privileges --userns keep-id --cap-drop all -v $INPUT_FILE:/tmp/input_file --rm dangerzone.rocks/dangerzone libreoffice --headless --safe-mode --convert-to pdf --outdir /tmp /tmp/input_file && ls /tmp/input*
Error: source file could not be loaded

This command is not working for me too, even the .pdf file doesn't work. @deeplow

@deeplow
Copy link
Contributor

deeplow commented Jun 27, 2023

This command is not working for me too, even the .pdf file doesn't work. @deeplow

Strange. At least our results are consistent. But if I'm not mistaken that's the command that Dangerzone is running under the hood to convert Dangerzone is running under the hood.

In order to make this would I'd say we first must be able to call that command successfully. I will try later manually installing libreoffice and that extension in a disposable VM and see if I can export it as PDF though the graphical version of libreoffice.

@deeplow
Copy link
Contributor

deeplow commented Jun 27, 2023

My bad. There were several issue with that command. One was a typo in the command it was creating a directory /tmp/input_file instead of mounting a file.>

@OctopusET
Copy link
Contributor Author

OctopusET commented Jun 27, 2023

Oh I missed too. Thank you! @deeplow

This is the command I tried. And the conversion works great.

mkdir ./tmp
podman run --network none -u dangerzone --security-opt no-new-privileges --userns keep-id --cap-drop all -v ./tmp:/tmp --rm dangerzone.rocks/dangerzone libreoffice --headless --safe-mode --convert-to pdf --outdir /tmp /tmp/$INPUT_FILE && ls ./tmp

@apyrgio apyrgio added this to the 0.4.2 milestone Jun 28, 2023
@apyrgio apyrgio added enhancement New feature or request stretch goal labels Jun 28, 2023
@deeplow
Copy link
Contributor

deeplow commented Jun 28, 2023

  • PDF conversion works.
  • It's corrupted when it's running pdfinfo and pdftoppm(Run with bypassing pdfinfo)
  • When I pdfinfo/pdftoppm manually there's no error.
  • Need some more sample files for testing.

I have figured it out 🥳. Basically the extension appears to be relying on the file extension and not the mime type. So in our case, the input file is in /tmp/input_file and thus it was erroring out with:

/tmp $ libreoffice --headless --safe-mode --convert-to pdf --outdir /tmp /tmp/input_file
[06-28 16:18] (OrestartImpl.rese) INFO: Resetting Page info.
[06-28 16:18] (OrestartImpl.rese) INFO: Resetting Numbering info.
[06-28 16:18] (OrestartImpl.rese) INFO: Resetting Paragraph info.
[06-28 16:18] (OrestartImpl.rese) INFO: Resetting Equasion info.
[06-28 16:18] (OrestartImpl.rese) INFO: Resetting Graphics info.
[06-28 16:18] (OrestartImpl.rese) INFO: Resetting Table info.
[06-28 16:18] (OrestartImpl.rese) INFO: Resetting Footnote info.
[06-28 16:18] (OrestartImpl.rese) INFO: Cleaning temporary folder.
Error: source file could not be loaded

note: the command right after libreoffice (pdfinfo) is the one that fails, but in reality it's libreoffice that failed. It's just that it exists with code 0 even though there is an error. No idea 🤷.

To fix this, all we have to do is to rename the file as file.hwp:

/tmp $ mv input_file file.hwp
/tmp $ libreoffice --headless --safe-mode --convert-to pdf --outdir /tmp /tmp/file.hwp 
[06-28 16:19] (OrestartImpl.rese) INFO: Resetting Page info.
[06-28 16:19] (OrestartImpl.rese) INFO: Resetting Numbering info.
[06-28 16:19] (OrestartImpl.rese) INFO: Resetting Paragraph info.
[06-28 16:19] (OrestartImpl.rese) INFO: Resetting Equasion info.
[06-28 16:19] (OrestartImpl.rese) INFO: Resetting Graphics info.
[06-28 16:19] (OrestartImpl.rese) INFO: Resetting Table info.
[06-28 16:19] (OrestartImpl.rese) INFO: Resetting Footnote info.
[06-28 16:19] (OrestartImpl.rese) INFO: Cleaning temporary folder.
[06-28 16:19] (OrestartImpl.rese) INFO: Resetting Page info.
[06-28 16:19] (OrestartImpl.rese) INFO: Resetting Numbering info.
[06-28 16:19] (OrestartImpl.rese) INFO: Resetting Paragraph info.
[06-28 16:19] (OrestartImpl.rese) INFO: Resetting Equasion info.
[06-28 16:19] (OrestartImpl.rese) INFO: Resetting Graphics info.
[06-28 16:19] (OrestartImpl.rese) INFO: Resetting Table info.
[06-28 16:19] (OrestartImpl.rese) INFO: Resetting Footnote info.
[06-28 16:19] (OrestartImpl.rese) INFO: Cleaning temporary folder.
4 paras
convert /tmp/file.hwp -> /tmp/file.pdf using filter : writer_pdf_Export

Applying this patch to the code will make the full conversion work. But this is a non-permanent hack. We have to think of a more permanent solution. Potentially opening an issue upstream to use mime-types instead.

--- a/dangerzone/conversion/doc_to_pixels.py
+++ b/dangerzone/conversion/doc_to_pixels.py
@@ -142,6 +142,7 @@ class DocumentToPixels(DangerzoneConverter):
             pdf_filename = "/tmp/input_file"
         elif conversion["type"] == "libreoffice":
             self.update_progress("Converting to PDF using LibreOffice")
+            shutil.copy('/tmp/input_file', '/tmp/input_file.hwp')
             args = [
                 "libreoffice",
                 "--headless",
@@ -150,8 +151,9 @@ class DocumentToPixels(DangerzoneConverter):
                 "pdf",
                 "--outdir",
                 "/tmp",
-                "/tmp/input_file",
+                "/tmp/input_file.hwp",
             ]

@deeplow
Copy link
Contributor

deeplow commented Jun 28, 2023

As a security-focused project, this requires some adversarial skepticism. One assumption I have is the possibility of a malicious .oxt extension. As I understand it, once libreoffice is loaded and the extension installed, it can have full code execution in the container. So this is yet another developer we need to trust. But this doesn't scale well.

I have now implemented a proof of concept of dynamically loading libreoffice extensions. This reduces the damage that a potentially malicious .oxt file can do. The proof of concept code is available here (it includes patches that should make it work out of the box).

@OctopusET
Copy link
Contributor Author

Great work! Thank you so much. I think your work should be also included in this PR.

I'm writing the answer of @apyrgio first comment. I am finding some cases. I will comment soon.

Before that, @deeplow what do you think about the binary header? I'm not sure about the hwpx, but hwp has a unique format header. Should it be also used for the determining hwp files?

I will start writing a PR for the h2orestart soon.

@OctopusET
Copy link
Contributor Author

I just find out clamav also supports the hwp/hwpx (including very old version).

https://blog.clamav.net/2016/03/clamav-0991-hangul-word-processor-hwp.html

This doesn't seem relay on the hancom's opened(not sure about the license) format document.

Seems like it's own reversing work from the cisco.

I guess this also could be used for better file detection.

https://github.com/Cisco-Talos/clamav/blob/b778a6b12e5592d57ba9f0f11e85b4c5a281540b/libclamav/hwp.h#L42

@deeplow
Copy link
Contributor

deeplow commented Jun 29, 2023

I will start writing a PR for the h2orestart soon.

Nice! I have opened an issue upstream ebandal/H2Orestart#7

Before that, @deeplow what do you think about the binary header? I'm not sure about the hwpx, but hwp has a unique format header. Should it be also used for the determining hwp files?

That's what mime is for 🙂. The mime type should be detected by the mime library (at least in my case it was). If for some reason that doesn't happen, then that's an upstream bug with either https://github.com/file/file or https://github.com/python/cpython/blob/3.11/Lib/mimetypes.py. These are the two mimetype libraries that we use:

https://github.com/freedomofpress/dangerzone/blob/cfdaec2/dangerzone/conversion/doc_to_pixels.py#L103-L109

From your comment here you seem to imply that there are issues with detecting the file types, but I'm skeptical about adding any extra dependencies or doing custom code just for this specific bit. I think solutions for better file detection have to be made upstream in mimetype dectection libs as I stated above.

@deeplow
Copy link
Contributor

deeplow commented Jun 29, 2023

Great work! Thank you so much. I think your work should be also included in this PR.

OK. If you don't mind, I'll push it here.

@OctopusET
Copy link
Contributor Author

OK. If you don't mind, I'll push it here.

Great go a head

@deeplow
Copy link
Contributor

deeplow commented Jun 29, 2023

CJK Fonts

I tried to convert some documents and then all the Korean characters were rendered as 'tofu'. I installed font-noto-cjk and it's all gone. I think it would be better to install font-noto-cjk-extra as well, just in case. https://pkgs.alpinelinux.org/package/edge/community/x86/font-noto-cjk-extra

The other concern, (which is a big concern) is the file size of this dependency. Since we ship the container image, including 80MB of another package is this much size for everyone who uses Dangerzone. @apyrgio may have an idea how to solve this.

@OctopusET
Copy link
Contributor Author

Nice! I have opened an issue upstream ebandal/H2Orestart#7

Yeah I checked that too, thank you.

From your comment #460 (comment) you seem to imply that there are issues with detecting the file types, but I'm skeptical about adding any extra dependencies or doing custom code just for this specific bit. I think solutions for better file detection have to be made upstream in mimetype dectection libs as I stated above.

Cool I agree. Still I want to mention that hwp/hwpx formats are so messed up and they use some 'offical' mime types, even they are not standard MIME type.

I'm think this won't help that much but here's some python library for the hwp. It also supports the pdf conversion but it's unstable and not maintained for long time. So I think sticking with h2orestart would be better.
https://github.com/mete0r/pyhwp

@OctopusET
Copy link
Contributor Author

OctopusET commented Jun 29, 2023

@deeplow Hmm, I think the font support is necessary.
I made some test files with LibreOffice filled with the Chinese, Japanese, Korean characters.

With the latest main branch of dangerzone, when I test it, all I get is the tofu.
Except the pdf -> safe pdf conversion.

test.odt
test.docx
test.pdf

Is there any issue from non-latin character users before?
And what about the other than CJK fonts like Thai?
If it's an existing problem, it would be better to separate the PR.

I think the best solution would be using only the Regular size font. Or the adding a download option, if it's possible I think ocr data also be downloaded too.

deeplow added a commit to deeplow/dangerzone that referenced this pull request Jun 29, 2023
In the case of 'hwp' files (Hancom Office), read in LibreOffice though
the H2Orestart extension. However, it doesn't guess the file type based
on the file's contents because of that it has to infer it from the file
extension [1]. An upstream bug has been reported.

[1]: freedomofpress#460 (comment)
@deeplow
Copy link
Contributor

deeplow commented Jun 29, 2023

Apparently I can't push to this branch. @OctopusET is there an option that you can check for allowing contributions from maintainers on the branch? It should have shown up when you created the PR, I think.

Either that, or merge this branch onto yours: https://github.com/deeplow/dangerzone/tree/hwp-support-dynamic-extension

@deeplow
Copy link
Contributor

deeplow commented Jun 29, 2023

@deeplow Hmm, I think the font support is necessary.

Is there any issue from non-latin character users before?
And what about the other than CJK fonts like Thai?

We haven't heard of any issues, but it could be because we may not have many user in the region due to the lack of support. So I'm interested in tackling this issue. But I want to go about it in a way that scales with adding support for more languages without significantly increasing the container size. I'll have to read up some more on font support.

But I'd say its it's own issue.

Note I'm out of time this week, so I'll be back on Monday to continue this conversation

@OctopusET
Copy link
Contributor Author

Apparently I can't push to this branch. @OctopusET is there an option that you can check for allowing contributions from maintainers on the branch? It should have shown up when you created the PR, I think.

Hmm I did it. I think github doesn't think you don't have a write permission of this repo.
I will merge it manually.

Note I'm out of time this week, so I'll be back on Monday to continue this conversation

Okay, I will start more investigation and commenting the first comment in this weekend.

@deeplow
Copy link
Contributor

deeplow commented Jul 3, 2023

I pushed now just a lint, but I'd say overall the only think left changing it to also work on Qubes. I can work on this over the next days. Anything else that that you think is missing @OctopusET?

UPDATE: and also some tests

@deeplow
Copy link
Contributor

deeplow commented Jul 3, 2023

Regarding the font support, I talked to @apyrgio and we agree that for CJK we'll bite the bullet and accept the image increase. We created an issue for exploring potential scaling issues about the size costs costs of additional language support and how to go about it #465.

@OctopusET
Copy link
Contributor Author

And I need a cleanup the MIME. I'm working on right now.

@deeplow
Copy link
Contributor

deeplow commented Jul 27, 2023

I think it would be better if commit message was better. Do you think it should be edited?

If you're fine with doing it. If not, we can polish that prior to merging.

By bad. Sorry about that. I misread something in your code and thought it wouldn't work (I read magic.read_from_file as mime.read_from_file`).

Oh no problem, then it would be work ideally after the new file release right?

Ideally yes, but I don't understand very much how those libraries are chained. But if it does, then it's good :) if not, at least we have something that can detect that edge-case.

As a matter of fact, this is not the first time that an office file is simply detected as application/zip. Se discussed that in issue #369.

@deeplow
Copy link
Contributor

deeplow commented Jul 27, 2023

@OctopusET for the mimetypes that you just commented, can you please add there as a comment a link to this discussion? That way in the future it can be immediately apparent why that is commented out.

@OctopusET
Copy link
Contributor Author

That's absolutely right. I'll add it.
Sorry to keep pushing the PR. I think it would be better to let you see the changes.

@deeplow
Copy link
Contributor

deeplow commented Jul 27, 2023

Sorry to keep pushing the PR. I think it would be better to let you see the changes.

No worries, feel free to edit as is the most comfortable to you. We can always polish it on our end after you're done with all the changes.

@deeplow
Copy link
Contributor

deeplow commented Jul 27, 2023

@OctopusET just decided to push the candidate release one week, so we'll be doing feature freeze on monday instead. So you still have some time to work on this if you want it included on the next release.

Sorry for the change of plans.

@OctopusET
Copy link
Contributor Author

@deeplow No problem, I'm glad to have more time to work on some improvements for commit messages.

I don't think any more feature updates are needed. But I'll find more test cases if there are any.

I will soon post the the answer for the first comment I been writing for.
Sorry again for the delay.

And I'll tell you when I think it's done.

Thank you for all your great work!

@OctopusET
Copy link
Contributor Author

In the upstream, file command changed the MIME type of the hwpx from application/hwp+zip to application/x-hwpx-zip. file/file@ceef7ea

I will update that too. Still, I will left the application/hwp+zip (or in least at the comment).

There's might be a chance that it will be detected as application/hwp+zip, because there's a file called mimetype in the hwpx format. And it made the file detected as application/hwp+zip. (Detailed description will be included in the future answer of the first comment)

I'm glad that it changed before the new version of file command out.

Related issue: https://bugs.astron.com/view.php?id=467

@OctopusET
Copy link
Contributor Author

Known MIME types of HWP/HWPX

HWP

  • application/x-hwp
  • application/haansofthwp
  • application/vnd.hancom.hwp

HWPX

  • application/haansofthwpx
  • application/vnd.hancom.hwpx
  • application/hwp+zip (It's in the mimetype file, inside the .hwpx format)
  • application/x-hwp+zip (added at file/file@ceef7ea)

file

MIME types that `file' command actually uses in upstream file/file@1fc9175

HWP

  • application/x-hwp

HWPX

  • application/x-hwp+zip

But I also added this because it can be detected as this with 'mimetype' file

  • application/hwp+zip

@deeplow
Copy link
Contributor

deeplow commented Jul 31, 2023

In the upstream, file command changed the MIME type of the hwpx from application/hwp+zip to application/x-hwpx-zip. file/file@ceef7ea

Nice to see this. Thanks for keeping an eye on it.

@OctopusET what's the current status of this PR. Is it ready for final review and merge?

@OctopusET
Copy link
Contributor Author

Yes! I think it's ready to be reviewed.

OctopusET and others added 7 commits August 1, 2023 14:07
H2ORestart is a LibreOffice extension which adds Hancom HWP/HWPX (Hangul Word Processor)
supports for LibreOffice. This format is widely used in South Korea.

Version: v0.5.7
Extension Repository: https://github.com/ebandal/H2Orestart/releases
hwp/hwpx has several custom MIME types

.hwp:
 - application/x-hwp
 - application/haansofthwp
 - application/vnd.hancom.hwp

.hwpx:
 - application/haansofthwpx
 - application/vnd.hancom.hwpx,
 - application/hwp+zip

Fixes #243
Only load the LibreOffice extension for opening hwp/hwpx when it is
actually needed. Adding an extension to libreoffice may allow for it to
run arbitrary code. This makes it trust more scalable by trusting
LibreOffice extensions only for the filetypes which they target.

Reasoning
---------

Assuming a malicious `.oxt` extension this means that the extension has
arbitrary code execution in the container. While this is not an
existential threat in itself, we should not expose every Dangerzone user
to it. This is achieved by dynamically loading the extension at runtime
only when needed.

This ensures that a compromised extension will in its least malicious
form be able to modify the visual content of any hancom office files but
not *every file*. In the more malicious version, if the code execution
manages to do a container escape, this will only affect users that have
converted a Hancom office file.
HWPX MIME type is recognized as 'application/zip' with current version of file command (file-5.44).
It will be recognized as 'application/hwp+zip' when new version of file is released.

For a temporary fix, when MIME type of file is 'application/zip',
check the file type again (without the MIME option).
And then check if it's 'Zip data (MIME type "application/hwp+zip"?)' or not.
Use the MIME types actually used by the `file` command, which was
recently changed for the detection of the HWPX format [1].

application/hwp+zip -> application/x-hwp+zip

But the HWPX format includes a 'mimetype' file, which contains the
MIME type string "application/hwp+zip", so that was left so because
it may be possible to detect it as "application/hwp+zip".

[1]: file/file@ceef7ea
Add extra files and base64 encode externally contributed docs. This
prevents the accidental opening of such documents, since they couldn't
be rebuit by the Dangerzone developers to ensure their safety.
@deeplow
Copy link
Contributor

deeplow commented Aug 1, 2023

Just rebased, polished some commit messages and reordered some commits (squashing some other unneeded ones).

Copy link
Contributor

@apyrgio apyrgio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as well. Thanks a lot @OctopusET for this contribution :-).

@OctopusET
Copy link
Contributor Author

I wrote this around the end of July 2023, I only changed some, there will be some errors. But I left this in case someone might be helpful and I said I will answer the first comment.

This would be the answer of the first comment from @apyrgio #460 (review)

Hello, @apyrgio thank you again for the comment.

  1. From the links you sent in the PR, it seem that the .hwp{x} files may not be initially present in the system. The real problem is .lnk files, which execute arbitrary code and may open benign .hwp{x} files as a decoy. In this PR, we will protect users against directly opening a malicious .hwp{x} document, correct? Did you perhaps have a suggestion for tackling .lnk files (aside from "please, don't open them :-)"?

Honestly, I'm not sure.

But since there's some security improve on extension. I think it's better now, but still it should be verified.

FYI, I didn't mention this article because it's in Korean but recently there was an attack using hwp on the MacOS targeting the North Korea human rights activist in South Korea (https://www.genians.co.kr/blog/threat_intelligence_report_macos). By using the similar method you mentioned.

You might able to translate the PDF file to English. If you can't or you don't want to use the proprietary pdf translator. I can translate for you.

  1. Is there any LibreOffice hardening that people employ when opening an .hwp{x} file? Some sort of extension sandboxing, for instance?

Unfortunately, AFAIK, there are no certain hardening solution. This extension is very new one and it is not popular yet. It's also because LibreOffice is not that popular in South Korea, and people just use Hancom office (or piracy version) or other web solutions.

And the almost no discussion on the hardening HWP documents, Since the (almost) all hwp viewer, editors are propitiatory software. And the lack of concern of the security, it's exclusive format only in South Korea.

But still, open hwp documents in browser could get the benefits of the browser sandbox.
Like these solutions:

It's not web version, still worth to mention:

  • PolarisOffice
  • Use Naver Whale (It's just Naver version of Chrome)
  • Hancom viewer (Read only HWP/HWPX viewer, freeware)

@apyrgio
Copy link
Contributor

apyrgio commented Dec 12, 2023

Thanks a lot for the detailed answer @OctopusET. It reinforces the point that Dangerzone will be helpful to South Korean users 👍 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stretch goal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support CJK characters
4 participants