-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs: add Buffer support in fs methods #5616
Conversation
3913d9d
to
0af3995
Compare
Sigh... CI is red all over. Suspected as much. I'll fix up the test case. Definitely not ready to land yet. |
or only the current directory. The applies when a directory is specified, and | ||
only on supported platforms (See [Caveats][]). | ||
* `encoding` {string} Specifies the character encoding to be used for the | ||
filename passed to the listener. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd list the possible values, or link to some other place that has them.
Why are you singling out fs.watch? Either add this option to every fs function, or don't add it anywhere. |
I agree with @seishun. I think it should work like this:
Point 1 can be simulated using |
New CI: https://ci.nodejs.org/job/node-test-pull-request/1889/ @seishun @bnoordhuis ... actually I've already starting looking at the expanded set. I definitely agree that we need to address them all but I'd like to separate it out across multiple PRs... or at least multiple commits within this PR. |
@bnoordhuis ... on point 1... if
Or is the encoding passed to |
052c669
to
5a6f1db
Compare
@bnoordhuis @seishun ... ok, PTAL. Started moving this in the direction of rounding out the Buffer support for all of fs... still in a work in progress. |
I'm still curious to know what actual real-world problem this solves that #3401 doesn't. |
@seishun ... as I understand it, #3401 assumes that the filenames are always utf8 which is not always the case. This approach, while a bit more complicated, provides quite a bit more flexibility and does not change the current default behavior the way #3401 would. Either way, I think I can live with either approach. I implemented this because the issue has been fairly long standing and an alternative fix hasn't yet landed. |
One other point... even if we land #3401, there is still value, I think, in allowing a dev to specify which encoding they want the filename to be in at the API level. |
Is there evidence that this has ever not been the case for any Node.js user?
The default would need to change anyway, since currently |
Yes. See nodejs/node-v0.x-archive#2387 for example |
What would likely be an even better fix would be to have readdir and watch not perform any decoding on the filename at all and instead have it return the filename as Buffer instances. Then in the fs code we can toString() as utf8 by default but allow the |
It feels strange to expose filenames as buffers through the Conceptually, filenames are strings and code which uses the Of course, this assumes that the |
@bjouhier ... because on certain systems it's impossible to know for sure because filenames don't inherently have an encoding. They're just byte sequences. On such systems, the filenames aren't simply strings. |
After talking it over with @trevnorris a bit, I'm going to back up and refactor this more. Will update when it's closer to being done! |
@bjouhier libuv can't make assumptions about the multi-byte-ness of the read filename. hence why: require('fs').watch('/tmp/ab\u0222', function(event, filename) {
console.log(filename);
}); prints |
@jasnell If filenames are just byte streams and the system does not impose their encoding then two applications that create files may assume different encodings. Then filenames created by app1 will look weird in app2, and vice versa. This is a mess and you need extra info (where?) to associate encodings with filenames. In OSX the encoding is imposed by the system (UTF8 encoding of unicode in fully decomposed form - which has its own problems BTW), so there is no ambiguity. Same story for Windows/NTFS (UTF16 pre-composed this time). Apparently the mess is on UNIX/Linux (I did not know it, just found out - http://unix.stackexchange.com/questions/39175/understanding-unix-file-name-encoding). Given that, yes, node should expose a buffer alternative, but for all |
It seems that user's filesystem is UTF-8 overall but has individual filenames encoded in a different encoding. I don't see how this PR would help them, unless they are going to use Buffers in all of
This Node.js shouldn't add APIs that only work on a single platform. |
This is not true. libuv never makes assumptions about a filename. It simply returns the bit stream it was handed by the OS. For example: fs.watch('path/\u0222-foo.txt', function(e, filename) {
console.log(filename);
}); Output:
It's node that makes an assumption in that particular API. |
0ff56bb
to
bc23340
Compare
@trevnorris @bnoordhuis @seishun @nodejs/ctc ... Ok, just pushed a significantly reworked update on this based on conversations with @trevnorris around the implementation. PTAL. @seishun, the reworked version assumes a default encoding of UTF8 so in the most common case, if encoding is not specified, filenames come through as UTF8. |
Changing to semver-major since this includes changes to the error handling and default encoding for fs.watch results. |
This makes several changes: 1. Allow path/filename to be passed in as a Buffer on fs methods 2. Add `options.encoding` to fs.readdir, fs.readdirSync, fs.readlink, fs.readlinkSync and fs.watch. 3. Documentation updates For 1... it's now possible to do: ```js fs.open(Buffer('/fs/foo/bar'), 'w+', (err, fd) => { }); ``` For 2... ```js fs.readdir('/fs/foo/bar', {encoding:'hex'}, (err,list) => { }); fs.readdir('/fs/foo/bar', {encoding:'buffer'}, (err, list) => { }); ``` encoding can also be passed as a string ```js fs.readdir('/fs/foo/bar', 'hex', (err,list) => { }); ``` The default encoding is set to UTF8 so this addresses the discrepency that existed previously between fs.readdir and fs.watch handling filenames differently. Fixes: #2088 Refs: #3519 PR-URL: #5616 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Trevor Norris <[email protected]>
PR-URL: #5616 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Trevor Norris <[email protected]>
Landed in 4d4f353, 060e5f0, and 53a95a5 |
Thanks @trevnorris and @bnoordhuis ! |
The following significant (semver-major) changes have been made since the previous Node v5.0.0 release. * Buffer * New Buffer constructors have been added [#4682](#4682) * Previously deprecated Buffer APIs are removed [#5048](#5048), [#4594](#4594) * Improved error handling [#4514](#4514) * Cluster * Worker emitted as first argument in 'message' event [#5361](#5361). * Crypto * Improved error handling [#3100](#3100), [#5611](#5611) * Simplified Certificate class bindings [#5382](#5382) * Improved control over FIPS mode [#5181](#5181) * pbkdf2 digest overloading is deprecated [#4047](#4047) * Dependencies * Reintroduce shared c-ares build support [#5775](#5775). * V8 updated to 5.0.71.31 [#6111](#6111). * DNS * Add resolvePtr API to query plain DNS PTR records [#4921](#4921). * Domains * Clear stack when no error handler [#4659](#4659). * File System * The `fs.realpath()` and `fs.realpathSync()` methods have been updated to use a more efficient libuv implementation. This change includes the removal of the `cache` argument and the method can throw new errors [#3594](#3594) * FS apis can now accept and return paths as Buffers [#5616](#5616). * Error handling and type checking improvements [#5616](#5616), [#5590](#5590), [#4518](#4518), [#3917](#3917). * fs.read's string interface is deprecated [#4525](#4525) * HTTP * 'clientError' can now be used to return custom errors from an HTTP server [#4557](#4557). * Modules * Current directory is now prioritized for local lookups [#5689](#5689) * Symbolic links are preserved when requiring modules [#5950](#5950) * Net * DNS hints no longer implicitly set [#6021](#6021). * Improved error handling and type checking [#5981](#5981), [#5733](#5733), [#2904](#2904) * Path * Improved type checking [#5348](#5348). * Process * Introduce process warnings API [#4782](#4782). * Throw exception when non-function passed to nextTick [#3860](#3860). * Readline * Emit key info unconditionally [#6024](#6024) * REPL * Assignment to `_` will emit a warning. [#5535](#5535) * Timers * Fail early when callback is not a function [#4362](#4362) * TLS * Rename 'clientError' to 'tlsClientError' [#4557](#4557) * SHA1 used for sessionIdContext [#3866](#3866) * TTY * Previously deprecated setRawMode wrapper is removed [#2528](#2528). * Util * Changes to Error object formatting [#4582](#4582). * Windows * Windows XP and Vista are no longer supported [#5167](#5167), [#5167](#5167).
The following significant (semver-major) changes have been made since the previous Node v5.0.0 release. * Buffer * New Buffer constructors have been added [#4682](#4682) * Previously deprecated Buffer APIs are removed [#5048](#5048), [#4594](#4594) * Improved error handling [#4514](#4514) * Cluster * Worker emitted as first argument in 'message' event [#5361](#5361). * Crypto * Improved error handling [#3100](#3100), [#5611](#5611) * Simplified Certificate class bindings [#5382](#5382) * Improved control over FIPS mode [#5181](#5181) * pbkdf2 digest overloading is deprecated [#4047](#4047) * Dependencies * Reintroduce shared c-ares build support [#5775](#5775). * V8 updated to 5.0.71.31 [#6111](#6111). * DNS * Add resolvePtr API to query plain DNS PTR records [#4921](#4921). * Domains * Clear stack when no error handler [#4659](#4659). * File System * The `fs.realpath()` and `fs.realpathSync()` methods have been updated to use a more efficient libuv implementation. This change includes the removal of the `cache` argument and the method can throw new errors [#3594](#3594) * FS apis can now accept and return paths as Buffers [#5616](#5616). * Error handling and type checking improvements [#5616](#5616), [#5590](#5590), [#4518](#4518), [#3917](#3917). * fs.read's string interface is deprecated [#4525](#4525) * HTTP * 'clientError' can now be used to return custom errors from an HTTP server [#4557](#4557). * Modules * Current directory is now prioritized for local lookups [#5689](#5689) * Symbolic links are preserved when requiring modules [#5950](#5950) * Net * DNS hints no longer implicitly set [#6021](#6021). * Improved error handling and type checking [#5981](#5981), [#5733](#5733), [#2904](#2904) * Path * Improved type checking [#5348](#5348). * Process * Introduce process warnings API [#4782](#4782). * Throw exception when non-function passed to nextTick [#3860](#3860). * Readline * Emit key info unconditionally [#6024](#6024) * REPL * Assignment to `_` will emit a warning. [#5535](#5535) * Timers * Fail early when callback is not a function [#4362](#4362) * TLS * Rename 'clientError' to 'tlsClientError' [#4557](#4557) * SHA1 used for sessionIdContext [#3866](#3866) * TTY * Previously deprecated setRawMode wrapper is removed [#2528](#2528). * Util * Changes to Error object formatting [#4582](#4582). * Windows * Windows XP and Vista are no longer supported [#5167](#5167), [#5167](#5167).
The following significant (semver-major) changes have been made since the previous Node v5.0.0 release. * Buffer * New Buffer constructors have been added [#4682](#4682) * Previously deprecated Buffer APIs are removed [#5048](#5048), [#4594](#4594) * Improved error handling [#4514](#4514) * Cluster * Worker emitted as first argument in 'message' event [#5361](#5361). * Crypto * Improved error handling [#3100](#3100), [#5611](#5611) * Simplified Certificate class bindings [#5382](#5382) * Improved control over FIPS mode [#5181](#5181) * pbkdf2 digest overloading is deprecated [#4047](#4047) * Dependencies * Reintroduce shared c-ares build support [#5775](#5775). * V8 updated to 5.0.71.31 [#6111](#6111). * DNS * Add resolvePtr API to query plain DNS PTR records [#4921](#4921). * Domains * Clear stack when no error handler [#4659](#4659). * File System * The `fs.realpath()` and `fs.realpathSync()` methods have been updated to use a more efficient libuv implementation. This change includes the removal of the `cache` argument and the method can throw new errors [#3594](#3594) * FS apis can now accept and return paths as Buffers [#5616](#5616). * Error handling and type checking improvements [#5616](#5616), [#5590](#5590), [#4518](#4518), [#3917](#3917). * fs.read's string interface is deprecated [#4525](#4525) * HTTP * 'clientError' can now be used to return custom errors from an HTTP server [#4557](#4557). * Modules * Current directory is now prioritized for local lookups [#5689](#5689) * Symbolic links are preserved when requiring modules [#5950](#5950) * Net * DNS hints no longer implicitly set [#6021](#6021). * Improved error handling and type checking [#5981](#5981), [#5733](#5733), [#2904](#2904) * OS X * MACOSX_DEPLOYMENT_TARGET has been bumped up to 10.7 [#6402](#6402). * Path * Improved type checking [#5348](#5348). * Process * Introduce process warnings API [#4782](#4782). * Throw exception when non-function passed to nextTick [#3860](#3860). * Readline * Emit key info unconditionally [#6024](#6024) * REPL * Assignment to `_` will emit a warning. [#5535](#5535) * Timers * Fail early when callback is not a function [#4362](#4362) * TLS * Rename 'clientError' to 'tlsClientError' [#4557](#4557) * SHA1 used for sessionIdContext [#3866](#3866) * TTY * Previously deprecated setRawMode wrapper is removed [#2528](#2528). * Util * Changes to Error object formatting [#4582](#4582). * Windows * Windows XP and Vista are no longer supported [#5167](#5167), [#5167](#5167).
The following significant (semver-major) changes have been made since the previous Node v5.0.0 release. * Buffer * New Buffer constructors have been added [#4682](#4682) * Previously deprecated Buffer APIs are removed [#5048](#5048), [#4594](#4594) * Improved error handling [#4514](#4514) * Cluster * Worker emitted as first argument in 'message' event [#5361](#5361). * Crypto * Improved error handling [#3100](#3100), [#5611](#5611) * Simplified Certificate class bindings [#5382](#5382) * Improved control over FIPS mode [#5181](#5181) * pbkdf2 digest overloading is deprecated [#4047](#4047) * Dependencies * Reintroduce shared c-ares build support [#5775](#5775). * V8 updated to 5.0.71.31 [#6111](#6111). * DNS * Add resolvePtr API to query plain DNS PTR records [#4921](#4921). * Domains * Clear stack when no error handler [#4659](#4659). * File System * The `fs.realpath()` and `fs.realpathSync()` methods have been updated to use a more efficient libuv implementation. This change includes the removal of the `cache` argument and the method can throw new errors [#3594](#3594) * FS apis can now accept and return paths as Buffers [#5616](#5616). * Error handling and type checking improvements [#5616](#5616), [#5590](#5590), [#4518](#4518), [#3917](#3917). * fs.read's string interface is deprecated [#4525](#4525) * HTTP * 'clientError' can now be used to return custom errors from an HTTP server [#4557](#4557). * Modules * Current directory is now prioritized for local lookups [#5689](#5689) * Symbolic links are preserved when requiring modules [#5950](#5950) * Net * DNS hints no longer implicitly set [#6021](#6021). * Improved error handling and type checking [#5981](#5981), [#5733](#5733), [#2904](#2904) * OS X * MACOSX_DEPLOYMENT_TARGET has been bumped up to 10.7 [#6402](#6402). * Path * Improved type checking [#5348](#5348). * Process * Introduce process warnings API [#4782](#4782). * Throw exception when non-function passed to nextTick [#3860](#3860). * Readline * Emit key info unconditionally [#6024](#6024) * REPL * Assignment to `_` will emit a warning. [#5535](#5535) * Timers * Fail early when callback is not a function [#4362](#4362) * TLS * Rename 'clientError' to 'tlsClientError' [#4557](#4557) * SHA1 used for sessionIdContext [#3866](#3866) * TTY * Previously deprecated setRawMode wrapper is removed [#2528](#2528). * Util * Changes to Error object formatting [#4582](#4582). * Windows * Windows XP and Vista are no longer supported [#5167](#5167), [#5167](#5167).
The following significant (semver-major) changes have been made since the previous Node v5.0.0 release. * Buffer * New Buffer constructors have been added [#4682](#4682) * Previously deprecated Buffer APIs are removed [#5048](#5048), [#4594](#4594) * Improved error handling [#4514](#4514) * Cluster * Worker emitted as first argument in 'message' event [#5361](#5361). * Crypto * Improved error handling [#3100](#3100), [#5611](#5611) * Simplified Certificate class bindings [#5382](#5382) * Improved control over FIPS mode [#5181](#5181) * pbkdf2 digest overloading is deprecated [#4047](#4047) * Dependencies * Reintroduce shared c-ares build support [#5775](#5775). * V8 updated to 5.0.71.31 [#6111](#6111). * DNS * Add resolvePtr API to query plain DNS PTR records [#4921](#4921). * Domains * Clear stack when no error handler [#4659](#4659). * File System * The `fs.realpath()` and `fs.realpathSync()` methods have been updated to use a more efficient libuv implementation. This change includes the removal of the `cache` argument and the method can throw new errors [#3594](#3594) * FS apis can now accept and return paths as Buffers [#5616](#5616). * Error handling and type checking improvements [#5616](#5616), [#5590](#5590), [#4518](#4518), [#3917](#3917). * fs.read's string interface is deprecated [#4525](#4525) * HTTP * 'clientError' can now be used to return custom errors from an HTTP server [#4557](#4557). * Modules * Current directory is now prioritized for local lookups [#5689](#5689) * Symbolic links are preserved when requiring modules [#5950](#5950) * Net * DNS hints no longer implicitly set [#6021](#6021). * Improved error handling and type checking [#5981](#5981), [#5733](#5733), [#2904](#2904) * OS X * MACOSX_DEPLOYMENT_TARGET has been bumped up to 10.7 [#6402](#6402). * Path * Improved type checking [#5348](#5348). * Process * Introduce process warnings API [#4782](#4782). * Throw exception when non-function passed to nextTick [#3860](#3860). * Readline * Emit key info unconditionally [#6024](#6024) * REPL * Assignment to `_` will emit a warning. [#5535](#5535) * Timers * Fail early when callback is not a function [#4362](#4362) * TLS * Rename 'clientError' to 'tlsClientError' [#4557](#4557) * SHA1 used for sessionIdContext [#3866](#3866) * TTY * Previously deprecated setRawMode wrapper is removed [#2528](#2528). * Util * Changes to Error object formatting [#4582](#4582). * Windows * Windows XP and Vista are no longer supported [#5167](#5167), [#5167](#5167).
Pull Request check-list
Please make sure to review and check all of these items:
make -j8 test
(UNIX) orvcbuild test nosign
(Windows) pass withthis change (including linting)?
test (or a benchmark) included?
existing APIs, or introduces new ones)?
Affected core subsystem(s)
fs (/cc @trevnorris @bnoordhuis )
Description of change
Updated: Reworked the implementation based on @trevnorris feedback. This now does several things:
Fixes: #2088
Ref: #3519
See Also: #3401