-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type named capture groups better #32098
Comments
I could have sworn there was already another ticket discussing this exact issue at length (there were a lot of tradeoffs etc.), but I can’t find it now. |
There may have been one for group arity? |
It would be kinda neat with this to make types like |
This sounds like dependent typing to me, which is a rather large can of worms to open. |
I don't think so. I think it would be as simple as something like: interface Match<G extends { [key: string]: string } | undefined> {
// Note that G is only undefined when the RegExp has no named capture groups
groups: G,
}
interface RegExp<G extends { [key: string]: string } | undefined = undefined> {
exec(s: string): null | Match<G>,
}
// and so on for String .match/.matchAll/etc Note that all of the type information is already encoded in the regular expression literal EDIT: Fixed code. |
Might be nice to add the same stronger typing for numbered groups at the same time, so that "foo".match(/(f)(oo)/) Would only have valid indexers |
There is an ESLint rules that enforces the use of named capture groups to avoid bugs & improve readability: https://eslint.org/docs/rules/prefer-named-capture-group Paired with this feature it would be amazing |
These helper methods kinda smell "any-ish" because of how their generic is used but they should be safer and easier to use than trying to directly read from /**
* Wrapper for functions to be given to `String.prototype.replace`, to make working
* with named captures easier and more type-safe.
*
* @template T the capturing groups expected from the regexp. `string` keys are named,
* `number` keys are ordered captures. Note that named captures occupy their place
* in the capture order.
* @param replacer The function to be wrapped. The first argument will have the
* shape of `T`, and its result will be forwarded to `String.prototype.replace`.
*/
export function named<T extends Partial<Record<string | number, string>> = {}>(
replacer: (
captures: { 0: string } & T,
index: number,
original: string
) => string
) {
const namedCapturesWrapper: (match: string, ...rest: any[]) => string = (
...args
) => {
const { length } = args
const named: string | Partial<Record<string, string>> = args[length - 1]
const captures: { 0: string } & T = Object.create(null)
if (typeof named === "string") {
// the regexp used does not use named captures at all
args.slice(0, -2).forEach((value, index) => {
Object.defineProperty(captures, index, {
configurable: true,
writable: true,
value
})
})
return replacer(captures, args[length - 2], named)
}
// the regexp has named captures; copy named own properties to captures,
// then copy the numeric matches.
Object.assign(captures, named)
args.slice(0, -3).forEach((value, index) => {
if (index in captures) {
throw new RangeError(
`Numeric name ${index} used as a regexp capture name`
)
}
Object.defineProperty(captures, index, {
configurable: true,
writable: true,
value
})
})
return replacer(captures, args[length - 3], args[length - 2])
}
return namedCapturesWrapper
}
// the first overload is here to preserve refinements if `null` was already
// checked for and excluded from the type of exec/match result.
/**
* Helper to extract the named capturing groups from the result of
* `RegExp.prototype.exec` or `String.prototype.match`.
*
* @template T type definition for the available capturing groups
* @param result the result of `RegExp.prototype.exec` or `String.prototype.match`
* @returns the contents of the `.groups` property but typed as `T`
* @throws if `.groups` is `undefined`; this only happens on regexps without captures
*/
export function groups<T extends Partial<Record<string, string>> = {}>(
result: RegExpMatchArray | RegExpExecArray
): T
/**
* Helper to extract the named capturing groups from the result of
* `RegExp.prototype.exec` or `String.prototype.match`.
*
* @template T type definition for the available capturing groups
* @param result the result of `RegExp.prototype.exec` or `String.prototype.match`
* @returns the contents of the `.groups` property but typed as `T`, or `null` if
* there was no match
* @throws if `.groups` is `undefined`; this only happens on regexps without captures
*/
export function groups<T extends Partial<Record<string, string>> = {}>(
result: RegExpMatchArray | RegExpExecArray | null
): T | null
/**
* Helper to extract the named capturing groups from the result of
* `RegExp.prototype.exec` or `String.prototype.match`.
*
* @template T type definition for the available capturing groups
* @param result the result of `RegExp.prototype.exec` or `String.prototype.match`
* @returns the contents of the `.groups` property but typed as `T`, or `null` if
* there was no match
* @throws if `.groups` is `undefined`; this only happens on regexps without captures
*/
export function groups<T extends Partial<Record<string, string>> = {}>(
result: RegExpMatchArray | RegExpExecArray | null
): T | null {
if (result === null) {
return null
}
if (result.groups === undefined) {
throw new RangeError(
"Attempted to read the named captures of a Regexp without named captures"
)
}
return result.groups as T
} There might be no need to copy the numeric captures, though; I just made them be copied because it seemed to make sense to put the matched substring in |
I've overall problem with RexExp definition and definition of objects and arrays. From my point of view, allowing something like: const x: {[x: string]: string} = {}
const y = x['foo'] // <= y is a string here
console.log(y.length)
> Uncaught TypeError: Cannot read property 'length' of undefined Same for arrays, but, well, this one is very surprising: const a: string[] = []
const b = a[0] // <= string - why why why?
const c = a.pop() // <= string | undefined
// and other way:
const a: [string] = ['foo']
const b = a[0] // <= string
const c = a.pop() // <= string | undefined - why why why? TS can infer from `if`, but not here? is a big misconception in sake of convenience. This leads the one of greatest type system ad absurdum. But I'm sure, the core team has another opinion on that, unfortunately. Based on above statements the definition for RegExpMatch* isn't helpful: interface RegExpMatchArray {
groups?: {
[key: string]: string
}
}
interface RegExpExecArray {
groups?: {
[key: string]: string
}
} Infer types from regular expression is possible (from my point of view), but very complex. Instead of that I would like to see more developer support to make it type safe (pseudo code): type RegExpMatch = {
[key: number]: string | undefined,
groups?: {
[key: string]: string | undefined
}
}
interface RegExp<T extends RegExpMatch> {
exec(string: string): T | null;
} To make it more type safe: const regexp = new RegExp<{0: string, {groups: {foo: string}}}>('/^\/(?<foo[^/]+)$/')
const result = regexp.exec('/bar')
if (result !== null) {
// now you get the typings here
result[0] // <= string
result[1] // <= string | undefined (or may be never?)
result.groups.foo // <= string
result.groups.test // <= string | undefined (or may be never?)
} If developer makes a mistake in typings, well, that's OK. But better as allow everything. A little bit related: #6579 |
I think it'd be great to implement this alongside #38671, so that generic regexes keep their current typing, but regex literals have strongly typed capturing groups. const re1 = /(?<year>[0-9]{4})-(?<month>[0-9]{2})/;
type Groups1 = ReturnType<typeof re1.exec>['groups']; // Remains Record<string, string>
const re2 = /(?<year>[0-9]{4})-(?<month>[0-9]{2})/ as const;
type Groups2 = ReturnType<typeof re2.exec>['groups']; // Would be { year: string, month: string } And generalize them so that: type hasYearAndMonth<T extends Regex> = T extends Regex<'year'|'month'> ? true : false;
const re1 = /(?<year>[0-9]{4})/ as const;
const re2 = /(?<year>[0-9]{4})-(?<month>[0-9]{2})/ as const;
type T1 = hasYearAndMonth<typeof re1>; // false
type T2 = hasYearAndMonth<typeof re2>; // true |
I really like the idea of extracting the named group static type information from RegExp literals. I'm curious how people imagine this 'metadata' would be associated with the RegExp literal before it's passed to interface RegExpWithGroups<G extends { [name: string]: string }> extends RegExp {
__secret_groups_metadata__: G // don't actually try and access me this is 'hidden' type-only Metadata
}
const reg: /(?<FirstFour>.{4})(?<NextFour>.{4})/ as const;
// ^^^ RegExpWithGroups<{ FirstFour: string, NextFour: string }> EDIT: ah ignore me. I re-read the thread properly and realized we don't need to expose it. |
The RegExp could be represented as a literal in the type system, i.e. function route(re: /(?<FirstFour>.{4})(?<NextFour>.{4})?/s) {
re.dotAll // true
const match = str.match(re);
if(match === null) return;
match[0] // string
match.groups.FirstFour // string
match[1] // string
match.groups.NextFour // string | undefined
match[2] // string | undefined
} That'd be a lot less verbose than Thankfully JavaScript doesn't have branch reset groups. |
For everyone that wants to have type safety and auto completion on the const output = 'hello_world.ts:13412:Missing ;.';
const m: RegExpMatchArrayWithGroups<{file: string, line: string; error: string}>
= output.match(/^(?<file>[^:]+):(?<line>[^:]+):(?<error>.*)/);
if (m && m.groups) {
// f: "hello_world.ts", l: "13412", e: "Missing ;."
console.log('f: "' + m.groups.file + '", l: "' + m.groups.line + '", e: "' + m.groups.error + '"');
// console.log(m.groups.filename);
// Property 'filename' does not exist on type '{ file: string; line: string; error: string; }'
} The Definitions: type RegExpMatchArrayWithGroupsOnly<T> = {
groups?: {
// eslint-disable-next-line no-unused-vars
[key in keyof T]: string;
}
}
type RegExpMatchArrayWithGroups<T> = (RegExpMatchArray & RegExpMatchArrayWithGroupsOnly<T>) | null; |
Thanks @hlovdal. I took your idea made it easier to use. type RegExpGroups<T extends string[]> =
| (RegExpMatchArray & {
groups?:
| {
[name in T[number]]: string;
}
| {
[key: string]: string;
};
})
| null;
const output = "hello_world.ts:13412:Missing ;.";
const match: RegExpGroups<["file", "line", "error"]> = output.match(
/^(?<file>[^:]+):(?<line>[^:]+):(?<error>.*)/
);
if (match) {
const { file, line, error } = match.groups!;
console.log({ file, line, error });
} |
I personally prefer just using a Union type versus using an array of strings (saves having to type square brackets if you only have one group), so mine looks like this: export type RegExpGroups<T extends string> =
| (RegExpMatchArray & {
groups?: { [name in T]: string } | { [key: string]: string };
})
| null; Usage: |
Would it not be possible for TS to fully grok regexes instead of having to type them manually? If there's a named group, Since regexes are an integral part of JS, kinda makes sense? |
VS Code already highlights the group names inside a regex, so surely it's not hard to tokenize them inside TS and attach the names as part of the regex's type? const r = /foo(?<qux>bar)/;
// current inference
r as RegExp
// can't it be inferred as this?
r as RegExp<{ qux: string }> |
I'd be glad to put effort towards that feature if there's not already work being done towards it, as long as someone can help me figure out where to get started in the codebase. (I've dug into it before, but that was like 8 years ago.) |
An iteration on the ideas from @hlovdal and others above, to be able apply the type directly to the regex: type RegExpMatchWithGroups<T extends string> = null | (Omit<RegExpExecArray, 'groups'> & { groups: { [name in T]: string | undefined } })
type RegExpWithGroups<T extends string> = Omit<RegExp, 'exec'> & {
exec(str: string): RegExpMatchWithGroups<T> | null
}
const lineMatcher = /^(?<file>[^:]+):(?<line>[^:]+):(?<error>.*)/ as RegExpWithGroups<'file' | 'line' | 'error'>
const match = lineMatcher.exec('hello_world.ts:13412:Missing ;.')
if (match) {
const {file, line, error} = match.groups
console.log(file, line, error)
} |
I've cloned TypeScript and am starting to look into how I might imlpement this. Any clues to where I should start looking would be appreciated to help speed it up. In the meantime, here's the simple function I'm using in imlib: // helper
function matcher<T extends string>(regex: RegExp) {
return (str: string) => {
return str.match(regex)?.groups as { [key in T]: string };
}
}
// examples
const isArrayFile = matcher<'ext' | 'slug'>(/\/.*(?<slug>\[.+\]).*(?<ext>\..+)\.js$/);
const isSingleFile = matcher<'ext'>(/(?<ext>\..+)\.js$/);
// usage
if (match = isArrayFile(file.path)) {
match.ext // string
match.slug // string
}
else if (match = isSingleFile(file.path)) {
match.ext // string
} |
Search Terms
named regexp, named capture groups
Motivation
Currently named capture groups are a bit of a pain in TypeScript:
.groups
even when that is the only possibility.Suggestion
I propose making
RegExp
higher order on its named capture groups so that.groups
is well typed.Checklist
My suggestion meets these guidelines:
The text was updated successfully, but these errors were encountered: