-
Notifications
You must be signed in to change notification settings - Fork 3k
Enable automated running of dedupe #4761
Comments
This makes sense to me, although I would approach it more from first principles. You are going from the particular to the general in a direction I'm not that excited about. (After all, people can use their favorite watch utility: What I would be more interested in is making sure npm always minimizes duplication. That is, @isaacs has mentioned that there was a conscious choice in the past not to do this, presumably because the two-stage download is inefficient in some cases. I would ask if he wants to reconsider. Otherwise, I'd suggest making this a config you could set on all package-modifying operations. An alternative, more ambitious idea: Right now we kind of have the worst of both worlds: it sometimes dedupes automatically, if it can, and it's convenient. But usually not. This inconsistency means you get neither the benefits of complete-deduping, nor the benefits of complete isolation. (I know @defunctzombie is a fan of the latter, and has in fact written npm-install to carry that out.) It might be nice to make npm do only one of these two modes, I guess controlled by a flag. Which the default would be is unclear to me; I'd prefer dedupe, but isolated would be closer to the current default. |
I like the config flag idea, though that precludes automating the use of something like lockdown. A hook is more general purpose and opens the door to more workflow customizations we may not have considered. I'm also hesitant to endorse any of the more ambitious suggestions, simply because they seem likely to delay any resolution by quite a bit of time. |
I really think the general solution for the general problem you describe is to just use a watch utility. |
I think a watch utility on a project of any significant size is going to run into "too many open files" issues and be a regular source of headaches. It's also a barrier to entry of being able to say "just use npm... oh, and set up a watch utility, configure it to do x and y in response to these events, etc.". It will have the same issues hook scripts do now, which is not knowing when an operation has really completed. A watch utility receiving 73 update events and 284 add events is far from useful. |
Localized dependencies make module maintenance and dependency management wonderful and I never understood why npm dedupes anything at all during install time. This seems like a premature optimization at best and leads to broken, non-deterministic installs at worst (yes, I have faced these issues myself). IMO npm should not do any sort of deduplication at any time and should always install the full tree (note, this does not mean that modules have to be downloaded over and over, we already have caching to solve this). Deduplication should be opt-in and not even something npm needs to do, deferring to external tools. To answer the comment about bower, I again don't see the issue. People are complaining that the same module version is installed several times? This is a job for the package tool (browserify already handles de-duping the same code which is a far better approach than some sort of arbitrary version de-duplication). The safety net with isolated modules and dependencies allows us to move away from the world of (install this giant framework on your page to use my module) and helps keep modules small, maintainable, and tested. I will also add that I think the notion of some duplicate code in client js is really a small nitpick trying to spread into FUD as to why npm can't work for client side modules. The reality is that it works fine and the few instances where you run into some duplicate code, it doesn't matter! If it does start to matter, you can take a look at your deps and see where you might have issues, but avoiding a great module and dependency system just to save 1k is not gonna make your break your hobby project or your production website. |
For the record, I completely agree with you that the concerns over duplication are overblown. The example always cited is multiple (conflicting?) versions of jQuery, leading to a bundle larger than you ideally want to serve, but I think the chances of that happening, especially without someone noticing, are very slim. That being said, it is something that comes up a lot and simply providing a hook that can be used seems like a decent way to appease people who feel like it is a reason to avoid npm. |
I've been screwing around with dedupe on some random projects of mine and have noticed it drops the size of node_modules nearly 75%. However, it is a compelling case that browserify handles this. Perhaps it is a job for the bundling script, not the package installer. |
LOL, Bower appears to have shipped this feature today. bower/bower#718 |
@bclinkinbeard I had a quick look into what would be involved in implementing the additional hooks, seems easy enough, maybe start with this, add tests, make names better, add remove hooks, and send a PR – Might be a waste of time but it's easier to discuss concrete implementations rather than hypotheticals. |
FWIW: I am ok with deduping in the process of installation. It's basically a rewrite of lib/install.js, which is a sensitive thing to go around rewriting, but it's a rewrite that can certainly be valuable. It'd be best to approach as a multi-stage process. The first step builds the tree, doing lots of http/git stuff, then it prunes and dedupes, then fetch all the things and lay them down on the file system, then walk through the tree three times to run preinstall/install/postinstall scripts. The other benefit is that we could conceivably do a bunch of the fs stuff synchronously, and have less locking and other monkey business. |
Also, then we could have progress bars, which is pretty much even theoretically impossible with the current approach. |
For me dedupe on install isn't as big of a deal as dedupe on publish. I would rather rest assured that this will be done at publish time. |
Running dedupe and lock at publish time would be ideal for all the use cases I have. |
I don't understand dedupe on publish; published packages don't contain dependencies, so what would that accomplish? |
You can already do anything at publish time via the prepublish script: https://www.npmjs.org/doc/misc/npm-scripts.html
https://www.npmjs.org/doc/misc/npm-developers.html#Keeping-files-out-of-your-package |
@isaacs Does that mean you would prefer not to create a new hook and/or change the conditions under which postinstall runs? Or does integrating dedupe not necessarily preclude those as well? I don't know that I have a preference either way at this point, just curious. |
That will teach me to try and comment on an issue from the train. I conflated the need to educate package publishers with issues bower proponents made for needing to not include multiple versions of a library. I.e. Compiling two versions of jQuery into your bundle etc. |
I'd complain that tree-flattening/deduping will lead to problems if I If a tree-flattening dedupe were made part of installation, I can still kinda foresee issues with people writing code that (ie. Then, later, a conflict introduced by dependency changes in an explicit dependency's patch bump makes that shared junction disappear (eg. This could warrant adding a warning to the core Node |
@stuartpb If you require from a The issue with tree-flattening is more subtle, but I'd say that's an issue of education mainly. Meddling with node's module loader comes with the usual caveat of it being frozen, though node core folk might be more open to adding a warning like that. |
If not, another alternative would be to have a "nanny" bot on the npm registry that scans |
This is going to be implemented under #6912. As such, I'm going to close this ticket and any further discussion should occur over there. |
The nested versus flat dependency tree argument is literally the only thing people regularly bring up as a reason to use Bower over npm. The only reply ever given is to use
dedupe
, but there is no way to automate it. If npm is going to significantly improve adoption for client side use, I think this needs to change. You can't make the argument that people should use a different tool than they currently are while rejecting their top complaint about the tool being offered.Currently, the primary hurdle to automating
dedupe
is that running a command likenpm install --save foo
does not trigger thepostinstall
hook. (I personally think that should be considered a bug, but that is neither here nor there.) There also seem to be issues with thenpm update
hook being inconsistent but I've yet to document their exact nature.What it comes down to is there needs to be a way to run a command whenever the contents of
node_modules
changes for any reason (as a result of an npm command). This could be used to rundedupe
for people worried about such things, or even something like lockdown.I want npm to "win" as badly as anyone, but refusing to address this issue just seems like a non-starter.
/cc @isaacs @domenic @timoxley @kristoferjoseph @techwraith @robdodson @IgorMinar
The text was updated successfully, but these errors were encountered: