Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(parser): handle negative numeric tokens #62

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 44 additions & 10 deletions index.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ const {
StringPrototypeCharAt,
StringPrototypeIncludes,
StringPrototypeIndexOf,
StringPrototypeMatch,
StringPrototypeSlice,
StringPrototypeStartsWith,
} = require('./primordials');
Expand All @@ -19,6 +20,11 @@ const {
validateObject
} = require('./validators');

const TOKENS = {
FLAG: 1,
NUMERIC: 2
};

function getMainArgs() {
// This function is a placeholder for proposed process.mainArgs.
// Work out where to slice process.argv for user supplied arguments.
Expand Down Expand Up @@ -98,9 +104,13 @@ const parseArgs = (
let pos = 0;
while (pos < argv.length) {
let arg = argv[pos];

if (StringPrototypeStartsWith(arg, '-')) {
if (arg === '-') {
if (token(arg) === TOKENS.NUMERIC) {
// Positional numerics, e.g., 33, 9995, -33.
result.positionals = ArrayPrototypeConcat(result.positionals, arg);
++pos;
continue;
} else if (arg === '-') {
// '-' commonly used to represent stdin/stdout, treat as positional
result.positionals = ArrayPrototypeConcat(result.positionals, '-');
++pos;
Expand All @@ -118,20 +128,23 @@ const parseArgs = (
if (arg.length > 2) {
for (let i = 2; i < arg.length; i++) {
const short = StringPrototypeCharAt(arg, i);
// Case of `-o=foo`:
Copy link
Collaborator

@shadowspawn shadowspawn Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not consider this a short option and its value in the way you are assuming. The = is for long options not for short options. The four possible forms for specifying the expected value are:

-v VVV
-vVVV // if we support it
--value VVV
--value=VVV

(Or -v on the end of a combined short group with following argument.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanding the = to an argument seems off to me, so I think think that two options are either throw a parsing error, or treat it the same as:

-v foo

What does POSIX do?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learnt something rereading the Utility Conventions. = is not a valid short option name, so we are into undefined behaviour. (I thought I saw something about "printable" somewhere, didn't find that just now.)

I have not checked getopt to see how it deals with this.

Guideline 3:
Each option name should be a single alphanumeric character (the alnum character classification) from the portable character set.

https://pubs.opengroup.org/onlinepubs/9699919799.2016edition/basedefs/V1_chap12.html#tag_12_01

if (short === '=') break;
// Add 'i' to 'pos' such that short options are parsed in order
// of definition:
ArrayPrototypeSplice(argv, pos + (i - 1), 0, `-${short}`);
}
}

const suffix = StringPrototypeSlice(arg, 2); // maybe -f=bar.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a test for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suffix code is only for what I think is mistaken processing of =, per previous comments.

(On a separate note, we use arg fairly loosely though this code. We could try for some more meaningful names, but I suggest try that is a pure refactor and not when we are changing code.)

arg = StringPrototypeCharAt(arg, 1); // short
// Alias a short option to its long name.
if (options.short && options.short[arg])
arg = options.short[arg]; // now long!
// ToDo: later code tests for `=` in arg and wrong for shorts
if (suffix.startsWith('='))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (suffix.startsWith('='))
if (StringPrototypeStartsWith(suffix, '='))

arg += suffix; // Add =bar back to arg.
} else {
arg = StringPrototypeSlice(arg, 2); // remove leading --
}

if (StringPrototypeIncludes(arg, '=')) {
// Store option=value same way independent of `withValue` as:
// - looks like a value, store as a value
Expand All @@ -143,18 +156,17 @@ const parseArgs = (
StringPrototypeSlice(arg, 0, index),
StringPrototypeSlice(arg, index + 1),
result);
} else if (pos + 1 < argv.length &&
!StringPrototypeStartsWith(argv[pos + 1], '-')
) {
} else if (peek(argv, pos) !== TOKENS.FLAG) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Processing the last arg will now drop into this conditional, whereas previously it would drop through to the next test.

i.e. removed pos + 1 < argv.length test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is processing an option with a following argument available as a value, if needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bounds check was moved into peek, if we decide not to handle negative numbers I'll abandon this refactor any ways.

// withValue option should also support setting values when '=
// isn't used ie. both --foo=b and --foo b should work

// If withValue option is specified, take next position arguement as
// If withValue option is specified, take next position argument as
// value and then increment pos so that we don't re-evaluate that
// arg, else set value as undefined ie. --foo b --bar c, after setting
// b as the value for foo, evaluate --bar next and skip 'b'
const val = options.withValue &&
ArrayPrototypeIncludes(options.withValue, arg) ? argv[++pos] :
(ArrayPrototypeIncludes(options.withValue, arg) ||
ArrayPrototypeIncludes(options.multiples, arg)) ? argv[++pos] :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have you added a test for multiples here? That does not look right for a multiple flag. The logic for multiples is in storeOptionValue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior of multiples mixing positional and '='s appeared to be buggy, I can add a test for this separately.

undefined;
storeOptionValue(options, arg, val, result);
} else {
Expand All @@ -175,6 +187,28 @@ const parseArgs = (
return result;
};

// Look ahead to next token in argv array:
function peek(argv, pos) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC: bakkot, @shadowspawn, @ljharb, a step towards a token based parser, as discussed in #52.

Benefit is it encapsulates the logic, and simplifies reading the codebase, e.g.,:

  } else if (peek(argv, pos) !== TOKENS.FLAG) {

if (pos + 1 >= argv.length) {
return 0;
}
return token(argv[pos + 1]);
}

// Guess token type based on individual string from argv:
function token(arg) {
const chr = StringPrototypeCharAt(arg, 0);
const nextChr = StringPrototypeCharAt(arg, 1);
if (StringPrototypeMatch(chr, /^[0-9]$/)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (StringPrototypeMatch(chr, /^[0-9]$/)) {
if (RegExpPrototypeSymbolMatch(/^[0-9]$/, chr)) {

return TOKENS.NUMERIC;
} else if (chr === '-' && StringPrototypeMatch(nextChr, /^[0-9]$/)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (chr === '-' && StringPrototypeMatch(nextChr, /^[0-9]$/)) {
} else if (chr === '-' && RegExpPrototypeSymbolMatch(/^[0-9]$/, nextChr)) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also use test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only testing for one digit? So -1foo is returning numeric? It does not actually matter too much since to support negative numbers we are dropping support for digits, but I feel misleading to call it NUMERIC based on that test.

Comment on lines +202 to +204
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can drop ^ and $ since were only matching one character (Unless I'm overlooking something).
  • Should we use \d for increased unicode support, or does that add more complexity/edge cases?
  • Should we be using .test() since were not using the result of .match()
    e.g. /[0-9]/.test(chr)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good call, this should use RegExpPrototypeTest

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to double check, but I believe \d is the same as [0-9] so a matter of taste.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to double check, but I believe \d is the same as [0-9] so a matter of taste.

Needed to double checked myself! Got my languages mixed up (: https://regex101.com/r/PhKpmF/1

return TOKENS.NUMERIC;
} else if (chr === '-') {
return TOKENS.FLAG;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two other things we could identify using token() are - and --. But the focus of this PR is negative numbers, so leave that for future refactor.

}
return 0;
}

module.exports = {
parseArgs
};