Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Milestone/4.0.1 #251

Merged
merged 60 commits into from
May 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
8dbf57c
Added basic support for vectorscan
akenion Feb 16, 2024
5c8ccf9
Made adjustments to pass flake8 validation
akenion Feb 16, 2024
6247459
Implemented caching for compiled vectorscan signatures
akenion Mar 11, 2024
173e16b
Updated vectorscan implementation to use streaming mode instead of bl…
akenion Mar 22, 2024
db62725
Added profiling support for malware scans
akenion Mar 27, 2024
63352ac
Improved signal handling when compiling vectorscan signatures
akenion Apr 2, 2024
c68f9e8
Made main.py executable
akenion Apr 2, 2024
0665b00
Added option to output profile results to file
akenion Apr 2, 2024
a4579a4
Fixed vectorscan availability output
akenion Apr 2, 2024
f7b87ea
Added lazy option to vectorscan match engine
akenion Apr 2, 2024
15f7e4c
Fixed additional issue where input queue can fill up and cause scans …
akenion Apr 2, 2024
f4b7227
Adjusted profiling options for malware scan command
akenion Apr 2, 2024
4567977
Corrected order of profile results in debug output
akenion Apr 3, 2024
ed25217
Added support for using direct IO to read files when scanning for mal…
akenion Apr 4, 2024
fef75bb
Changed profile output to use INFO log level instead of DEBUG
akenion Apr 5, 2024
62f724e
Fixed arithmetic error in calculating read offset for direct IO
akenion Apr 5, 2024
2699438
Added option to use custom path for precompiled pattern database
akenion Apr 12, 2024
e02c848
Fixed issue with pattern database compilation
akenion Apr 12, 2024
4aabdf7
Added support for downloading pre-compiled signatures
akenion Apr 29, 2024
6cf44b6
Skipped pre-compiled signature check when explicitly compiling signat…
akenion Apr 29, 2024
d46352d
Fixed issues with pre-compiled signature support
akenion Apr 29, 2024
2e20a16
Continued work on pre-compiled pattern support
akenion Apr 30, 2024
bce59dd
Fixed handling of license assignment for nested signature set instances
akenion Apr 30, 2024
b972229
Added support for filtering signatures with pre-compiled databases
akenion Apr 30, 2024
a2f0b71
Improved handling of incompatible pre-compiled signature databases
akenion Apr 30, 2024
b0a805f
Removed unused imports
akenion Apr 30, 2024
5e1914a
Adjusted signature compilation options
akenion Apr 30, 2024
fc1357c
Updated version to 4.0.1rc1
akenion Apr 30, 2024
218ff55
Made overwriting of external pattern database atomic
akenion May 2, 2024
54c6f9e
Fixed platform detection on ARM and corrected vectorscan parameter types
akenion May 7, 2024
68727c6
Added detection for scan workers that terminate abnormally
akenion May 7, 2024
0a6aa1d
Allowed processing to continue after skipping failed files
akenion May 7, 2024
b897889
Added libhyperscan5 recommendation to Debian package control file
akenion May 8, 2024
2cadd94
Fixed issue where vectorscan matcher was yielding inconsistent results
akenion May 8, 2024
28a8cf6
Merge branch 'gh-236' into milestone/4.0.1
akenion May 8, 2024
433ad87
Made exit code handling consistent between directly invoking wordfenc…
akenion May 8, 2024
e1b023e
Updated version to 4.0.1rc2
akenion May 8, 2024
c8cd1b2
Added documentation for Vectorscan integration.
barmat May 8, 2024
e53e176
Merge branch 'milestone/4.0.1' of github.com:wordfence/wordfence-cli …
barmat May 8, 2024
5ee4782
Copy updates for vectorscan docs.
barmat May 9, 2024
2711dff
Added installation instructions for Vectorscan/Hyperscan.
barmat May 9, 2024
56213ef
Create temp files in same directory as destinations to ensure they ar…
akenion May 10, 2024
d3a6f8f
Merge branch 'gh-236' into milestone/4.0.1
akenion May 10, 2024
1338973
Updated version to 4.0.1rc3
akenion May 10, 2024
976aafb
Corrected variable name
akenion May 10, 2024
2157cc8
Merge branch 'gh-236' into milestone/4.0.1
akenion May 10, 2024
fbfa5f6
Updated version to 4.0.1rc4
akenion May 10, 2024
d5b61c8
Added initial vectorscan benchmarks
akenion May 10, 2024
e5f0c23
Refactored vectorscan benchmark section
akenion May 10, 2024
24252ae
Added missing column to row in vectorscan benchmarks table
akenion May 10, 2024
7c3ad71
Added worker count to benchmarks
akenion May 10, 2024
51ead73
Fixed benchmarks table
akenion May 10, 2024
effea2d
Added additional benchmark data
akenion May 10, 2024
5f8b0a1
Updated vectorscan pre-compilation to respect umask when writing data…
akenion May 15, 2024
db7376a
Merge branch 'gh-236' into milestone/4.0.1
akenion May 15, 2024
8f9bad9
Updated version to 4.0.1rc5
akenion May 15, 2024
cc91005
Adjusted logging around signature database pre-compilation
akenion May 15, 2024
996aa1d
Merge branch 'gh-236' into milestone/4.0.1
akenion May 15, 2024
f372a70
Updated version to 4.0.1rc6
akenion May 15, 2024
0a76a06
Updated version to 4.0.1
akenion May 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions debian/control
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ X-Python3-Version: >= 3.8
Package: wordfence
Architecture: all
Depends: ${python3:Depends}, libpcre3
Recommends: libhyperscan5
Description: Command-line malware scanner powered by Wordfence
Wordfence CLI is a multi-process malware scanner written in Python. It's
designed to have low memory overhead while being able to utilize multiple
Expand Down
2 changes: 2 additions & 0 deletions docker/build/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ if [ "$PACKAGE_TYPE" = 'standalone' ]; then
--hidden-import wordfence.cli.remediate.definition \
--hidden-import wordfence.cli.countsites.countsites \
--hidden-import wordfence.cli.countsites.definition \
--hidden-import wordfence.scanning.matching.pcre \
--hidden-import wordfence.scanning.matching.vectorscan \
main.py

# compress and copy to output volume
Expand Down
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Wordfence CLI is a high performance, multi-process, command-line malware scanner
- **Malware Scanning**
- [Subcommand Configuration](malware-scan/Configuration.md)
- [Automatic Remediation](malware-scan/Remediation.md)
- [Faster Scanning with Vectorscan](malware-scan/Vectorscan.md)
- [Examples](malware-scan/Examples.md)
- [Scanning a single directory for malware](malware-scan/Examples.md#scanning-a-single-directory-for-malware)
- [Writing malware scan results to a CSV](malware-scan/Examples.md#writing-malware-scan-results-to-a-csv)
Expand Down
1 change: 1 addition & 0 deletions docs/malware-scan/Configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ In order to store malware scan specific configuration in the INI file, you shoul
- `--allow-io-errors`: Allow scanning to continue if IO errors are encountered. Files that cannot be read will be skipped and a warning will be logged.
- `-z`, `--chunk-size`: Size of file chunks that will be scanned. Use a whole number followed by one of the following suffixes: b (byte), k (kibibyte), m (mebibyte). Defaults to 3m.
- `-M`, `--scanned-content-limit`: The maximum amount of data to scan in each file. Content beyond this limit will not be scanned. Defaults to 50 mebibytes. Use a whole number followed by one of the following suffixes: b (byte), k (kibibyte), m (mebibyte).
- `--match-engine`: The regex engine to use for malware scanning. Options: `pcre`, `vectorscan` (default: `pcre`)
- `--match-all`: If set, all possible signatures will be checked against each scanned file. Otherwise, only the first matching signature will be reported
- `--pcre-backtrack-limit`: The regex backtracking limit for signature evaluation
- `--pcre-recursion-limit`: The regex recursion limit for signature evaluation
Expand Down
1 change: 1 addition & 0 deletions docs/malware-scan/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

- [Subcommand Configuration](Configuration.md)
- [Automatic Remediation of discovered malware](Remediation.md)
- [Faster Scanning with Vectorscan](malware-scan/Vectorscan.md)
- [Examples](Examples.md)
- [Scanning a directory for malware](Examples.md#scanning-a-directory-for-malware)
- [Running Wordfence CLI in a cron](Examples.md#running-wordfence-cli-in-a-cron)
Expand Down
56 changes: 56 additions & 0 deletions docs/malware-scan/Vectorscan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Faster Scanning with Vectorscan

The malware scan currently has 2 separate scan signature engines, [`libpcre`](https://www.pcre.org/) and [Vectorscan](https://github.com/VectorCamp/vectorscan). While both of these scan engines are highly performant, Vectorscan is software designed to address the specific use-case we have with malware scanning, where we need to run a large number of regular expressions against a stream of data.

## Benchmarks for Performance Increase using Vectorscan

Benchmarking has indicated that Vectorscan can be more than 30 times as fast as the exact same scan on the exact same hardware using PCRE.

|Files|Matches|Data|Workers|PCRE Time|Vectorscan Time|Improvement|Library|CPU|
|-----|-------|----|-------|---------|---------------|-----------|-------|---|
|12,006|11,998|605.4 MiB|1|388s|16s|~25x|Hyperscan 5.2.1|AMD Ryzen 7 1700|
|3,499|1|54.7 MiB|4|42s|3s|14x|Hyperscan 5.2.1|AMD Ryzen 7 1700|
|25905|25001|1.7 GiB|32|38s|4s|~10x|Hyperscan 5.4.0|AMD Ryzen 9 5950X|
|25905|25001|1.7 GiB|8|98s|3s|~32x|Hyperscan 5.4.0|AMD Ryzen 9 5950X|
|25905|25001|1.7 GiB|4|191s|6s|~32x|Hyperscan 5.4.0|AMD Ryzen 9 5950X|
|25905|25001|1.7 GiB|1|710s|21s|~34x|Hyperscan 5.4.0|AMD Ryzen 9 5950X|

*(The above benchmarks were conducted using a free Wordfence CLI license)*


## Configuring CLI to use Vectorscan

By default, CLI will use `libprce` for scanning. To configure CLI to use Vectorscan, you can use the following command-line argument:

wordfence malware-scan --match-engine=vectorscan

This can also be set in the INI file:

[MALWARE-SCAN]
match_engine=vectorscan

## Installing Vectorscan/Hyperscan

On Debian-based distros, Vectorscan can be installed with `apt`:

$ sudo apt install libvectorscan5

Vectorscan is a fork of Hyperscan and maintains a compatible API, so installing Hyperscan on Intel-based systems will work as well.

$ sudo apt install libhyperscan5

We do currently support both technologies, but this may change over time if Vectorscan's API diverges from Hyperscan's. We will officially support Vectorscan going forward.

Additional installation instructions can be found [in the Vectorscan Wiki](https://github.com/VectorCamp/vectorscan/wiki/Installation-from-package).

## Known issues

### Segfaults on Ubuntu versions 22.04 and 24.04

The Vectorscan package provided by Ubuntu versions 22.04 and 24.04 will produce segfaults when performing malware scanning against certain files on some ARM systems. We've identified that this is an issue specifically with the packaged version. Compiling Vectorscan from scratch does not produce the same segfaults. We've created a [ticket on LaunchPad](https://bugs.launchpad.net/ubuntu/+source/vectorscan/+bug/2064951) for this issue.

### Negligible Performance Improvement over NFS

CLI will oftentimes have to scan large filesystems of relatively small files. NFS is not setup particularly well to serve many small files. Vectorscan can scan these small files fast enough where there can be not much of discernible difference between using `libprce` and Vectorscan since the scanning processes are largely I/O bound while waiting on NFS to provide the file contents.

We've observed in our own benchmarks that there is not a significant performance increase using Vectorscan over `libpcre` for filesystems over NFS shares.
Empty file modified main.py
100644 → 100755
Empty file.
12 changes: 9 additions & 3 deletions wordfence/api/licensing.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from typing import Union
from typing import Union, Optional

from .exceptions import ApiException

Expand Down Expand Up @@ -36,8 +36,14 @@ def __init__(self):

class LicenseSpecific:

def __init__(self, license: License):
def __init__(self, license: Optional[License]):
self.license = license

def is_compatible_with_license(self, license: License):
return self.license == license
return self.license is None or self.license == license

def assign_license(self, license: Optional[License]):
self.license = license

def clear_license(self):
self.assign_license(None)
54 changes: 52 additions & 2 deletions wordfence/api/noc1.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
import json
import re
import base64
from typing import Callable, Optional

from .noc_client import NocClient
from .exceptions import ApiException
from .licensing import License

from ..intel.signatures import CommonString, Signature, SignatureSet
from ..util.validation import DictionaryValidator, ListValidator, Validator
from ..intel.signatures import CommonString, Signature, SignatureSet, \
PrecompiledSignatureSet, deserialize_precompiled_signature_set
from ..util.validation import DictionaryValidator, ListValidator, Validator, \
OptionalValueValidator
from ..util.platform import Platform

NOC1_BASE_URL = 'https://noc1.wordfence.com/v2.27/'

Expand Down Expand Up @@ -137,6 +141,52 @@ def get_malware_signatures(self) -> SignatureSet:
) from index_error
return SignatureSet(common_strings, signatures, self.license)

def get_precompiled_patterns(
self,
platform: str,
library_version: str,
library_type: Optional[str] = None,
database_version: int = PrecompiledSignatureSet.VERSION
) -> dict:
parameters = {
'platform': platform,
'library_version': library_version,
'database_version': database_version
}
if library_type is not None:
parameters['library_type'] = library_type
response = self.request('get_precompiled_patterns', parameters)
validator = DictionaryValidator({
'data': OptionalValueValidator(str)
})
self.validate_response(response, validator)
return response

def get_precompiled_malware_signatures(
self,
platform: Platform,
library_version: str,
library_type: Optional[str] = None,
database_version: int = PrecompiledSignatureSet.VERSION
) -> Optional[PrecompiledSignatureSet]:
response = self.get_precompiled_patterns(
platform.key,
library_version,
library_type,
database_version
)
data = response['data']
if data is None:
return None
data = base64.b64decode(data)
signature_set = deserialize_precompiled_signature_set(data)
signature_set.assign_license(self.license)
if isinstance(signature_set, PrecompiledSignatureSet):
return signature_set
raise ApiException(
'Malformed signature set data received from Wordfence API'
)

def ping_api_key(self) -> bool:
return self.process_simple_request('ping_api_key')

Expand Down
10 changes: 7 additions & 3 deletions wordfence/cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ def invoke(self) -> int:
return subcommand.invoke()


def main():
def invoke_cli():
exception_handler = ExceptionHandler()
try:
cli = WordfenceCli(exception_handler)
Expand All @@ -189,6 +189,10 @@ def main():
return 130


if __name__ == '__main__':
exit_code = main()
def main():
exit_code = invoke_cli()
sys.exit(exit_code)


if __name__ == '__main__':
main()
31 changes: 25 additions & 6 deletions wordfence/cli/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
from typing import Optional, Any, Callable, Set, Union

from ..version import __version__, __version_name__
from ..util import pcre
from ..util import pcre, vectorscan
from ..util.text import yes_no
from ..api import noc1, intelligence
from ..util.caching import Cache, CacheDirectory, RuntimeCache, \
InvalidCachedValueException, CacheException
Expand Down Expand Up @@ -156,17 +157,35 @@ def get_mailer(self) -> Mailer:
self._mailer = Mailer(self.config)
return self._mailer

def has_pcre(self) -> bool:
return pcre.AVAILABLE

def has_vectorscan(self) -> bool:
return vectorscan.AVAILABLE

def display_version(self) -> None:
if __version_name__ is None:
name_suffix = ''
else:
name_suffix = f' "{__version_name__}"'
print(f"Wordfence CLI {__version__}{name_suffix}")
jit_support_text = 'Yes' if pcre.HAS_JIT_SUPPORT else 'No'
print(
f"PCRE Version: {pcre.VERSION} - "
f"JIT Supported: {jit_support_text}"
)
has_pcre = self.has_pcre()
pcre_support_text = yes_no(has_pcre)
if has_pcre:
jit_support_text = yes_no(pcre.HAS_JIT_SUPPORT)
pcre_support_text += (
f" - PCRE Version: {pcre.VERSION}"
f" (JIT Supported: {jit_support_text})"
)
print(f'PCRE Supported: {pcre_support_text}')
has_vectorscan = self.has_vectorscan()
vectorscan_support_text = yes_no(has_vectorscan)
if has_vectorscan:
vectorscan_support_text += (
f' - Version: {vectorscan.VERSION} (API Version: '
f'{vectorscan.API_VERSION})'
)
print(f'Vectorscan Supported: {vectorscan_support_text}')

def has_terminal_output(self) -> bool:
return has_terminal_output()
Expand Down
76 changes: 76 additions & 0 deletions wordfence/cli/malwarescan/definition.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
PCRE_DEFAULT_MATCH_LIMIT_RECURSION
from wordfence.util.units import byte_length

from ...scanning.matching import MatchEngine
from ..subcommands import SubcommandDefinition, UsageExample
from ..config.typing import ConfigDefinitions
from .reporting import SCAN_REPORT_CONFIG_OPTIONS
Expand Down Expand Up @@ -146,6 +147,15 @@
"value_type": byte_length
}
},
"match-engine": {
"description": "The regex engine to use for malware scanning.",
"context": "ALL",
"argument_type": "OPTION",
"default": MatchEngine.get_default_option(),
"meta": {
"valid_options": MatchEngine.get_options()
}
},
"match-all": {
"description": "If set, all possible signatures will be checked "
"against each scanned file. Otherwise, only the "
Expand Down Expand Up @@ -187,13 +197,79 @@
"argument_type": "FLAG",
"default": False
},
"pre-compile": {
"description": "Pre-compile and cache the signature set without "
"actually running a scan",
"context": "CLI",
"argument_type": "FLAG",
"default": False,
"category": "Signature Compilation"
},
"pre-compile-generic": {
"description": "Pre-compile and cache the signature set without "
"any CPU-specific optimizations and without running "
"a scan",
"context": "CLI",
"argument_type": "FLAG",
"default": False,
"category": "Signature Compilation"
},
"pattern-database-path": {
"description": "Use an alternate path for storage of the pattern "
"database",
"context": "ALL",
"argument_type": "OPTION",
"meta": {
"accepts_file": True
},
"default": None,
"category": "Signature Compilation"
},
"compile-local": {
"description": "Always compile the signature set locally rather than "
"attempting to download a pre-compiled set",
"context": "ALL",
"argument_type": "FLAG",
"default": False,
"category": "Signature Compilation"
},
"re-compile": {
"description": "Always re-compile the signature set rather than using "
"a cached version (when compiling locally)",
"context": "CLI",
"argument_type": "FLAG",
"default": False,
"category": "Signature Compilation"
},
"profile": {
"description": "Profile scan performance",
"context": "CLI",
"argument_type": "FLAG",
"default": False,
"hidden": True
},
"profile-path": {
"description": "Path at which to save profiling results",
"context": "CLI",
"argument_type": "OPTION",
"default": None,
"hidden": True
},
"direct-io": {
"short_name": "D",
"description": "Use direct IO when opening files to avoid caching",
"context": "ALL",
"argument_type": "FLAG",
"default": False
}
}


cacheable_types = {
'wordfence.intel.signatures.SignatureSet',
'wordfence.intel.signatures.CommonString',
'wordfence.intel.signatures.Signature',
'wordfence.intel.signatures.PrecompiledSignatureSet',
'wordfence.api.licensing.License'
}

Expand Down
Loading
Loading