Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement vectorscan support to improve malware scan performance #236

Closed
akenion opened this issue Feb 8, 2024 · 9 comments · Fixed by #242, #243, #246, #247 or #248
Closed

Implement vectorscan support to improve malware scan performance #236

akenion opened this issue Feb 8, 2024 · 9 comments · Fixed by #242, #243, #246, #247 or #248
Assignees
Labels
dev-complete Development work to resolve this issue is complete enhancement New feature or request qa-passed QA has tested and confirmed the fix for this issue
Milestone

Comments

@akenion
Copy link
Contributor

akenion commented Feb 8, 2024

Allow optionally using vectorscan instead of PCRE for malware scanning when available.

https://github.com/VectorCamp/vectorscan

@akenion akenion added the enhancement New feature or request label Feb 8, 2024
@akenion akenion self-assigned this Feb 8, 2024
@akenion akenion added this to the v4.0.1 milestone Apr 30, 2024
@akenion akenion linked a pull request Apr 30, 2024 that will close this issue
@akenion akenion added the dev-complete Development work to resolve this issue is complete label Apr 30, 2024
@akenion akenion linked a pull request May 2, 2024 that will close this issue
@davidnuzik
Copy link

v4.0.1-rc1

Re-opening this case per a slack discussion between Alex and I.

These issues only happen on an arm64 environment for me (aws ec2 instance, aarch64 cpu arch).

Try to run:
./wordfence malware-scan demohackedsite/ --noc1-url https://noc1.wordfence.ninja/v2.27/ -d --match-engine vectorscan
Will output:

Traceback (most recent call last):                                                                                                                                                                                                                                              
  File "wordfence/util/caching.py", line 153, in _load                                                                                                                                                                                                                          
FileNotFoundError: [Errno 2] No such file or directory: '/home/david/.cache/wordfence/7072652D636F6D70696C65642D7369676E6174757265732D766563746F727363616E'                                                                                                                     
                                                                                                                                                                                                                                                                                
The above exception was the direct cause of the following exception:                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
Traceback (most recent call last):                                                                                                                                                                                                                                              
  File "wordfence/util/caching.py", line 201, in get                                                                                                                                                                                                                            
  File "wordfence/util/caching.py", line 58, in get                                                                                                                                                                                                                             
  File "wordfence/util/caching.py", line 166, in _load                                                                                                                                                                                                                          
wordfence.util.caching.NoCachedValueException                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                
During handling of the above exception, another exception occurred:                                                                                                                                                                                                             
                                                                                                                                                                                                                                                                                
Traceback (most recent call last):                                                                                                                                                                                                                                              
  File "main.py", line 4, in <module>                                                                                                                                                                                                                                           
  File "wordfence/cli/cli.py", line 187, in main
  File "wordfence/cli/cli.py", line 43, in process_exception
  File "wordfence/cli/cli.py", line 185, in main
  File "wordfence/cli/cli.py", line 178, in invoke
  File "wordfence/cli/malwarescan/malwarescan.py", line 272, in invoke
  File "wordfence/cli/malwarescan/malwarescan.py", line 90, in _get_signatures
  File "wordfence/cli/malwarescan/malwarescan.py", line 81, in _get_pre_compiled_signatures
  File "wordfence/util/caching.py", line 206, in get
  File "wordfence/util/caching.py", line 197, in _initialize_value
  File "wordfence/cli/malwarescan/malwarescan.py", line 71, in fetch_pre_compiled
  File "wordfence/api/noc1.py", line 173, in get_precompiled_malware_signatures
AttributeError: 'NoneType' object has no attribute 'key'
[4228] Failed to execute script 'main' due to unhandled exception!

And I do not have the issue when I use pcre. I believe we discussed and this was due to the cpu arch being aarch64 when arm64 was expected.


The second issue as also in the same arm64 (aarch64) environment. If I include --compile-local I get a segfault:
./wordfence malware-scan demohackedsite/ --noc1-url https://noc1.wordfence.ninja/v2.27/ -d --no-verbose --match-engine vectorscan --compile-local
Outputs:

Filtered signature count: 5778
Compiling 5778 pattern(s) to vectorscan database...
Segmentation fault (core dumped)

You suggested to use gdb to get more details and you helped out by doing a back trace which outputted:

Program received signal SIGSEGV, Segmentation fault.
0x0000fffff638ba04 in ?? () from /lib/aarch64-linux-gnu/libhs.so.5
(gdb) bt
#0  0x0000fffff638ba04 in ?? () from /lib/aarch64-linux-gnu/libhs.so.5
#1  0x0000fffff638c774 in ?? () from /lib/aarch64-linux-gnu/libhs.so.5
#2  0x0000fffff632cf00 in ?? () from /lib/aarch64-linux-gnu/libhs.so.5
#3  0x0000fffff632f060 in ?? () from /lib/aarch64-linux-gnu/libhs.so.5
#4  0x0000fffff632f644 in hs_compile_multi () from /lib/aarch64-linux-gnu/libhs.so.5
#5  0x0000fffff6836e10 in ?? () from /lib/aarch64-linux-gnu/libffi.so.8
#6  0x0000fffff6833a94 in ?? () from /lib/aarch64-linux-gnu/libffi.so.8
#7  0x0000fffff68621c8 in ?? () from /usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-aarch64-linux-gnu.so
#8  0x0000fffff6860974 in ?? () from /usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-aarch64-linux-gnu.so
#9  0x0000aaaaaab9aca0 in _PyObject_MakeTpCall ()
#10 0x0000aaaaaab91af4 in _PyEval_EvalFrameDefault ()
#11 0x0000aaaaaaba5348 in _PyFunction_Vectorcall ()
#12 0x0000aaaaaab8e2ac in _PyEval_EvalFrameDefault ()
#13 0x0000aaaaaaba5348 in _PyFunction_Vectorcall ()
#14 0x0000aaaaaab8d3f8 in _PyEval_EvalFrameDefault ()
#15 0x0000aaaaaaba5348 in _PyFunction_Vectorcall ()
#16 0x0000aaaaaab8d3f8 in _PyEval_EvalFrameDefault ()
#17 0x0000aaaaaaba5348 in _PyFunction_Vectorcall ()
#18 0x0000aaaaaab9177c in _PyEval_EvalFrameDefault ()
#19 0x0000aaaaaaba5348 in _PyFunction_Vectorcall ()
#20 0x0000aaaaaab8d3f8 in _PyEval_EvalFrameDefault ()
#21 0x0000aaaaaaba5348 in _PyFunction_Vectorcall ()
#22 0x0000aaaaaab8d3f8 in _PyEval_EvalFrameDefault ()
#23 0x0000aaaaaabb4148 in ?? ()
#24 0x0000aaaaaab8e2ac in _PyEval_EvalFrameDefault ()
#25 0x0000aaaaaaba5348 in _PyFunction_Vectorcall ()
#26 0x0000aaaaaab8d3f8 in _PyEval_EvalFrameDefault ()
#27 0x0000aaaaaaba5348 in _PyFunction_Vectorcall ()
#28 0x0000aaaaaab8d3f8 in _PyEval_EvalFrameDefault ()
#29 0x0000aaaaaaba5348 in _PyFunction_Vectorcall ()
#30 0x0000aaaaaab8d2bc in _PyEval_EvalFrameDefault ()
#31 0x0000aaaaaac89760 in ?? ()
#32 0x0000aaaaaac895e4 in PyEval_EvalCode ()
#33 0x0000aaaaaac9186c in ?? ()
#34 0x0000aaaaaaba559c in ?? ()
#35 0x0000aaaaaab8d2bc in _PyEval_EvalFrameDefault ()
#36 0x0000aaaaaaba5348 in _PyFunction_Vectorcall ()
#37 0x0000aaaaaab8d2bc in _PyEval_EvalFrameDefault ()
#38 0x0000aaaaaaba5348 in _PyFunction_Vectorcall ()
#39 0x0000aaaaaacad824 in ?? ()
#40 0x0000aaaaaacabe90 in Py_RunMain ()
#41 0x0000aaaaaac7a748 in Py_BytesMain ()
#42 0x0000fffff7d273fc in __libc_start_call_main (main=main@entry=0xaaaaaac7a720, argc=argc@entry=12, argv=argv@entry=0xfffffffff4f8)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#43 0x0000fffff7d274cc in __libc_start_main_impl (main=0xaaaaaac7a720, argc=12, argv=0xfffffffff4f8, init=<optimized out>, fini=<optimized out>,
    rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:392
#44 0x0000aaaaaac7a630 in _start ()

I just thought I'd record this here fore posterity. Thanks for looking into this! So far all testing on AMD64 looks good.

@davidnuzik davidnuzik self-assigned this May 2, 2024
@davidnuzik
Copy link

v4.0.1-rc1

Details of a new issue found per our slack discussion below. I am getting different results for each vectorscan run:

/tmp/BATS/wordfence malware-scan ~/qa/malware-sets-for-testing-cli/ -a --no-verbose (USES VECTORSCAN per INI FILE)
Found 34114 suspicious file(s) after processing 59228 file(s) containing 2.7 GiB over 5 second(s)
Found 34123 suspicious file(s) after processing 59228 file(s) containing 2.7 GiB over 5 second(s)
Found 34144 suspicious file(s) after processing 59228 file(s) containing 2.7 GiB over 5 second(s)
Found 34097 suspicious file(s) after processing 59228 file(s) containing 2.7 GiB over 5 second(s)
/tmp/BATS/wordfence malware-scan ~/qa/malware-sets-for-testing-cli/ -a --no-verbose --match-engine pcre (OVERRIDE INI, specify pcre)
Found 34498 suspicious file(s) after processing 59228 file(s) containing 2.7 GiB over 85 second(s)
Found 34498 suspicious file(s) after processing 59228 file(s) containing 2.7 GiB over 84 second(s)
Found 34498 suspicious file(s) after processing 59228 file(s) containing 2.7 GiB over 84 second(s)

@davidnuzik
Copy link

davidnuzik commented May 6, 2024

v4.0.1-rc1
EDIT: CORRECTION - v3.0.2 has this issue as well -- scanning large signature set is not part of common regression testing on MacOS added it to tests going forward

On my ARM64 (M2 chip) Mac Mini I'm also encountering a segfault -- I get it with pcre specifically in this case though - vectorscan actually works for me. I could not reproduce this on Linux including when I install via pip3 just like as required for MacOS.

In this case I simply execute:
wordfence malware-scan malwaredemo-25904-files/ -a --match-engine pcre

and right after scanning file Processing file: /Users/david/qa/malware-sets-for-testing-cli/malwaredemo-25904-files/kali/Desktop/sigsamples2/9072/cfef519f80e8db01ce31abbb60241b702e26b67bfed525102107ee4670cffec2 (/Users/david/qa/malware-sets-for-testing-cli/malwaredemo-25904-files/kali/Desktop/sigsamples2/5821/a7d9665059de3f39eec37e89623887b1774c687aae46227eaf041350f8d31011,5821,Suspicious:TXT/ziplike.5821,Suspicious file encoding seen in malware infections)

I will see the output halt in my terminal (but appear to still be running) and only when I remote into the machine (VNC) I see python crashed with the following long error report (attached due to length). I tried to compare against the other segfault I reported earlier on arm64 linux but it's not clear to me if its the same issue. For the segfault I mentioned earlier in this issue I triggered it via specific arguments passed including vectorscan match-engine -- this is pcre on this mac mini.

Note that there is no traceback/etc useful info I have -- I just have this error report from the mac error reporting system. I am happy to assist further however if there is anything I can do to help in this case. The issue might be environmental but I don't think the issue is due to endpoint protection, etc on this system -- not from what I can see in logs and such.

Error report:
Davids-mac-mini-python-error-report_redacted.txt

@akenion
Copy link
Contributor Author

akenion commented May 8, 2024

The architecture detection has been fixed and should work correctly for aarch64 now. Additionally, it's been updated to skip fetching the pre-compiled database (a warning will be logged) and simply compile locally if the architecture is unrecognized.

The seg fault on ARM64 appears to be an issue with the build of the library on Ubuntu, specifically. I've opened a case there, but for now there doesn't seem to be a workaround on our side, so this will not be addressed.
https://bugs.launchpad.net/ubuntu/+source/vectorscan/+bug/2064951

The issue with inconsistent results when using vectorscan has been addressed as well. This was due to an issue with the ordering of the call to reset the vectorscan stream and the one to report results. These should be consistent now.

The issue on MacOS/ARM64 also appears to be an internal issue with the PCRE library. To address it, I've added detection for crashed workers that will skip problematic files and start a new replacement worker rather than leaving the process hanging if a worker crashes. I don't see a reasonable solution or workaround for the internal issue with library at present, so skipping files that fail in this way is likely the best path forward at this point.

@akenion akenion added the qa-ready Issue is ready for QA and included in the most recent release candidate label May 8, 2024
@davidnuzik
Copy link

v4.0.1rc2 5/8/24

SUMMARY:
QA validation PASSED. All issues are resolved and both general regression testing as well as my cli automation pass on both AMD64 and ARM64 cpu architectures and various install methods.


I can confirm I now have no issues with arm64 (AKA aarch64) with the exception of the Ubuntu bug Alex referenced (segfault when try to compile-local).

Additionally I can confirm that the inconsistent results issue is now fixed and on my MacOS/ARM64 machine (Mac Mini) I am able to scan even if the worker(s) crash due to problematic files.

I feel comfortable with releasing now that these issues have been resolved as well as the exit-code issues mentioned in case 244 and all regression/automation checks are green.

@davidnuzik davidnuzik added qa-passed QA has tested and confirmed the fix for this issue and removed qa-ready Issue is ready for QA and included in the most recent release candidate labels May 8, 2024
@akenion akenion removed the qa-passed QA has tested and confirmed the fix for this issue label May 10, 2024
@akenion
Copy link
Contributor Author

akenion commented May 10, 2024

We encountered an issue when attempting to build the pre-compiled vectorscan databases due to attempting to move a file across filesystems.

INFO:AMD64/vectorscan-5.4.7:[RUN]: b"Error: [Errno 18] Invalid cross-device link: '/tmp/tmpcweo17s3' -> '/cache/vectorscan-5.4.7-amd64-free.1.db'\n"

@davidnuzik
Copy link

v4.0.1rc5

I think we should make Compiling pattern(s) to vectorscan database... message INFO level so that -d, --debug arg does not need to be passed to see that the vectorscan database is compiling locally. Otherwise the user will just see the logo (provided they do not pass --no-banner) and the CLI will appear to have hung.

We should also do this for the Successfully compiled vectorscan database message as well per our slack chat.

My apologies again -- this is something I should have picked up earlier and prior to marking the issue as qa-passed. We'll have to do another RC now if we want this change.
I'll be adding an automation test case to check this going forward and also added some notes to the regression test doc to take care to exclude the -d / --debug flag for various tests including when compiling the vectorscan database locally.

@davidnuzik
Copy link

v4.0.1rc6

I can confirm the new changes work as expected. When the vectorscan database needs to be compiled locally there is now a warning level message indicating no compatible pre-compiled signature set was found and that they will be compiled locally. Additionally on the info log level there is a notice now describing not just that the compilation is in progress but that it may take a substantial amount of time depending on the system performance. This is great as this should help to set expectations for users in the not too likely situation local compilation is necessary.

image

I also did some tests among various environments and the slowest I saw was about a 45 min compilation time and fastest about 15 minutes -- I did this just based on my own curiosity -- your experience may vary.

I also confirmed the INFO level message will show still if specify --compile-local but the warning message will not which makes sense in this case.
Environments I tested on include both ARM64 and AMD64 of Ubuntu 22.04, Debian 12, and specifically AMD64 Fedora 39 and AMD64 RHEL9 as well as ARM64 MacOS. No issues observed in any environment except for ARM64 Ubuntu which has a known issue with their vectorscan deb package -- Alex has previously filed an issue about this (here) that he mentioned last week.

--

Once the pre-compiled vectorscan databases are ready for testing (everything all set server-side) I'll test this as well and ensure the warning and info messages no longer appear (note however they could in some cases a compatible vectorscan database isn't available).
All other tests including the full automation test suite as well pass. Once the pre-compiled databases are ready I'll test again. Once I have done so and double-checked my testing then I'll mark this case with the "qa-passed" label.

@davidnuzik davidnuzik added qa-passed QA has tested and confirmed the fix for this issue and removed qa-ready Issue is ready for QA and included in the most recent release candidate labels May 20, 2024
@davidnuzik
Copy link

v4.0.1rc6 5/20/2024

All checks pass now - remaining checks involved testing pre-built vectorscan database on production. I tested in a few ARM64, AMD64 environments utilizing hyperscan5 and vectorscan5 and also in one MacOS environment. All checks/tests passed without issue in all environments and we're good to release! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment