Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect validation of non-utf8 characters in setHeader function #50213

Open
jk1z opened this issue Oct 17, 2023 · 4 comments
Open

Incorrect validation of non-utf8 characters in setHeader function #50213

jk1z opened this issue Oct 17, 2023 · 4 comments

Comments

@jk1z
Copy link

jk1z commented Oct 17, 2023

Version

>=18.16.0

Platform

All

Subsystem

_http_common.js

What steps will reproduce the bug?

NodeJS is actively preventing characters in the extended ascii table from passing in as a string.
Evidence: When using writeHead() with headers parameter. It throws an ERR_INVALID_CHAR error.

However, when we use setHeader(). It does not validate the characters passed in.

const { createServer } = require("http");

const requestListener = function (req, res) {
    res.setHeader("Content-Length", "310");
    res.setHeader("Content-Disposition", `attachment; filename="ÇÕÑêÿ Island"`);
    // res.writeHead(200, [
    //     "Content-Length", "310",
    //     "Content-Disposition", `attachment; filename="ÇÕÑêÿ Island"`
    // ]);
    res.end();
};
const server = createServer(requestListener);
server.listen(8080);

How often does it reproduce? Is there a required condition?

It always occurs when using the setHeader() function with latin1 (extended ASCII table)

What is the expected behavior? Why is that the expected behavior?

The expected behaviour would be setHeader() function rejects non-ascii string.

Source: According to this PR http: unify header treatment by marco-ippolito · Pull Request #46528 · nodejs/node. Header value validation always rejects non-ascii characters. To support latin1 in the Content-Disposition HTTP header, the server has to parse the string as a binary string (within the ascii range). Just like what's written in the test https://github.com/nodejs/node/blob/main/test/parallel/test-http-server-non-utf8-header.js

What do you see instead?

setHeader() does not reject non-ascii string. Content-Disposition header got parsed incorrectly.

For example: An input of attachment; filename="ÇÕÑêÿ Island" becomes attachment; filename="ýýýýý Island"

Additional information

Through debugging, I found it really strange that checkInvalidHeaderChar() is behaving weirdly with the header value passed in from the writeHead() and setHeader(). writeHead() passes a string buffer which gets regex'ed/rejected correctly but setHeader() passes a plain string and it does not trigger the regex.

@marco-ippolito
Copy link
Member

marco-ippolito commented Oct 17, 2023

for reference nodejs/undici#2020 #47479
the current behavior is to convert to latin1 without validation that has been discussed here #46395

@jk1z
Copy link
Author

jk1z commented Oct 17, 2023

@marco-ippolito
Yes, but it converted the latin1 header incorrectly if the server bypassed the header validation. The PR that broke the conversion was this #46528. The changes were with good intentions of course.

However, when we buffer a string with latin1 encoding but without explicitly stating the encoding in toString(). It loses the character information in the end.

Minimum bug reproducible code below

Buffer.from(Buffer.from("ÇÕÑêÿ Island", 'latin1').toString(), 'latin1').toString('latin1')

Explanation of how it ended up this way:

The original 2 bytes per character latin1 Ç (hex 00C7) was trimmed into 1 byte per character UTF-8 � (hex 00). When sending HTTP response headers, NodeJS converts it again into latin1 by taking the first byte of character � (hex FD) which becomes ý in the latin1 set.

@jk1z
Copy link
Author

jk1z commented Oct 17, 2023

The current workaround I found is either

convert the string into the binary string before calling setHeader()

res.setHeader("Content-Disposition", Buffer.from("ÇÕÑêÿ Island").toString("binary"));

or
setHeader("Content-Disposition", "ÇÕÑêÿ Island")
before
setHeader("Content-length", "123")
to bypass

if (isContentDispositionField(key) && self._contentLength) {

@marco-ippolito
Copy link
Member

cc @ShogunPanda

@jk1z jk1z changed the title Incorrect validation of non-utf8 characters in setHeader function Incorrect validation of non-ascii characters in setHeader function Oct 17, 2023
@jk1z jk1z changed the title Incorrect validation of non-ascii characters in setHeader function Incorrect validation of non-utf8 characters in setHeader function Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants