Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Busybox] "Invalid regexp" after applying the busybox patch from FAQ #17

Closed
pihug12 opened this issue Dec 6, 2020 · 3 comments
Closed
Labels

Comments

@pihug12
Copy link

pihug12 commented Dec 6, 2020

Hello,

I'm running Busybox on Alpine 3.12.1. I'm still getting an "Invalid regexp" after applying the Busybox patch from the FAQ (sed -i "s#\\\000#\\\001#g" JSON.awk).

Before patch:

/ # wget -qO- http://localhost:8080/actuator/metrics/jvm.memory.committed | awk -f JSON.awk -
awk: bad regex '^|^��|^��|"[^"\\': Invalid regexp

After patch (sed -i "s#\\\000#\\\001#g" JSON.awk):

/ # wget -qO- http://localhost:8080/actuator/metrics/jvm.memory.committed | awk -f JSON.awk -
awk: bad regex '^|^��|^��|"[^"\\╔-]*((\\[^u╔-]|\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])[^"\\╔-]*)*"|-?(0|[1-9][0-9]*)([.][0-9]+)?([eE][+-]?[0-9]+)?|null|false|true|[
]+|.': Invalid regexp
/ #

Infos:

/ # cat /etc/alpine-release
3.12.1
/ # awk --help
BusyBox v1.31.1 () multi-call binary.

Usage: awk [OPTIONS] [AWK_PROGRAM] [FILE]...

        -v VAR=VAL      Set variable
        -F SEP          Use SEP as field separator
        -f FILE         Read program from FILE
        -e AWK_PROGRAM
/ #
@step-
Copy link
Owner

step- commented Dec 6, 2020

Thank you for your report. I can reproduce this issue on Alpine Linux 3.12.1 with busybox 1.31.1.
I can't reproduce it so easily on Linux with busybox 1.27.0 (+patches). However, I did find a test case that does trigger the same error on Linux

busybox awk -f ~/bin/JSON.awk test-cases/20170131-issue-007-test.json 2>&1 | xxd
00000000: 6177 6b3a 2062 6164 2072 6567 6578 2027  awk: bad regex '
00000010: 5eef bbbf 7c5e fffe 7c5e feff 7c22 5b5e  ^...|^..|^..|"[^
00000020: 225c 5c27 3a20 556e 6d61 7463 6865 6420  "\\': Unmatched
00000030: 5b20 6f72 205b 5e0a                      [ or [^.

The issue is that busybox doesn't seem to like the encoded BOM marks. If you know that your input JSON will not include BOM marks, that is, the JSON file isn't generated by a Windows application, you can work around this bug by changing JSON.awk as follows:

I tested the changes on Alpine Linux with test-cases/20170131-issue-007-test.json. I got a parsing error from JSON.awk -- not from busybox awk -- which tells me two things: the regex without BOM is now valid; busybox awk is probably running out of memory before it has finished parsing the JSON object. This test file is fairly large.

This patch is temporary to enable you to further assess JSON.awk for your needs. I'm going to leave this issue open, and I will come back to it after further investigating BOM encoding options for busybox awk. Feel free to add more comments to this conversation.

@step- step- added the bug label Dec 6, 2020
@pihug12
Copy link
Author

pihug12 commented Dec 6, 2020

Thank you for the quick and detailed feedback!

As a workaround, I switched back to v1.4 (https://github.com/step-/JSON.awk/blob/1.4/JSON.awk).

@step- step- closed this as completed in c0331a1 Dec 14, 2020
@step-
Copy link
Owner

step- commented Dec 14, 2020

I tested 1.4.2 on Alpine Linux 3.12.1 with busybox 1.31.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants