Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gitignoringGlob exceeding call stack - add explicit package globs to workspace? #1231

Closed
roryc89 opened this issue Jun 17, 2024 · 9 comments · Fixed by #1234
Closed

gitignoringGlob exceeding call stack - add explicit package globs to workspace? #1231

roryc89 opened this issue Jun 17, 2024 · 9 comments · Fixed by #1234

Comments

@roryc89
Copy link
Contributor

roryc89 commented Jun 17, 2024

On a very large monorepo project we find that Glob.gitignoringGlob exceeds the maximum callstack.

We can work around this by increasing the maximum stack size on node but we would rather not have to do this.

Another possible solution would be to allow explicit setting of globs to find packages and to not examine .gitignore files.

This would also allow generated, gitignored packages to be part of a workspace.

roryc89 added a commit to OxfordAbstracts/spago-with-package-globs that referenced this issue Jun 17, 2024
roryc89 added a commit to OxfordAbstracts/spago-with-package-globs that referenced this issue Jun 17, 2024
roryc89 added a commit to OxfordAbstracts/spago-with-package-globs that referenced this issue Jun 17, 2024
@f-f
Copy link
Member

f-f commented Jun 17, 2024

Let's unpack this a little:

On a very large monorepo project we find that Glob.gitignoringGlob exceeds the maximum callstack.

I'd consider this a bug. Can we rewrite Glob.gitignoringGlob to be stack-safe?

This would also allow generated, gitignored packages to be part of a workspace.

Can I ask you what's the usecase for generating packages?

In any case, you should be able to include them in the build as extra packages (i.e. not autodetected), by pointing at their path, e.g.

workspace:
  extraPackages:
    some-package:
      path: "generated-packages/some-package"

@cakekindel
Copy link
Contributor

I recently touched the gitignoringglob implementation and have a hunch that it's a traverse that i added, i'll look now and add a stack-safe PR

@cakekindel
Copy link
Contributor

@roryc89 can you try the commit in #1234 and let me know if that works on your repo? (also make a wish! 1234!)

@cakekindel
Copy link
Contributor

Here's the fork with the bundle (./bin/bundle.js) if you don't want to clone and build it yourself:
spago-4251576.zip

@roryc89
Copy link
Contributor Author

roryc89 commented Jun 18, 2024

@f-f

I'd consider this a bug. Can we rewrite Glob.gitignoringGlob to be stack-safe?

If it's possible to do this and keep it reasonably performant on a large project, sounds good to me.

Can I ask you what's the usecase for generating packages?

They are for graphql schema .purs files that we generate from schema introspection.

Your example using extraPackages would work fine for us so it wouldn't be needed for us. Having to add 100s of extra packages will be a bit noisy but I understand our use case is probably not common.

@roryc89
Copy link
Contributor Author

roryc89 commented Jun 18, 2024

@cakekindel Sorry, this was due to a mistake on my part.

We had our .gitignore in the parent directory of our purs workspace. So this meant that we were traversing all of the dirs outside of version control.

Your changes did prevent the call stack being exceeded but it took too long to run and after a few minutes of waiting I cancelled the compilation.

@f-f I still think that allowing explicit package globs would be useful in our case as we prefer not to have multiple .gitignores in our repo but also this is not a big deal and I'd be happy to close this issue if it's not something that is relevant to most users.

@f-f
Copy link
Member

f-f commented Jun 18, 2024

@roryc89 I am interested in learning about your usecase because often it's possible to arrange the build in a way that can be addressed by existing features. That's preferrable to adding features to precisely address any given build, since in the long term it leads to a situation where there are many switches that address overlapping concerns.

For example, would it be possible for you to generate all the sources in a single project instead of creating 100s of different projects?
In that way you could only have one extra package to add to the workspace file.

Is there a reason for avoiding multiple gitignores? I would consider these few lines of gitignore worthwhile if they would avoid implementing an entirely new feature in spago.
We used to ask git if a file was gitignored, but that was not very performant. I wonder if there's a way to ask git about the ignored state of an entire tree. Or I wonder if we could climb up the tree and find the root .git folder, and start reading .gitignore files from there.

@roryc89
Copy link
Contributor Author

roryc89 commented Jun 18, 2024

@f-f sure thing.

For example, would it be possible for you to generate all the sources in a single project instead of creating 100s of different projects?

We have many apps that have quite fine grained permissions on what parts of the Graphql APIs they can access. We prefer to not to have unnecessary dependencies (for faster packages builds) so we have split out the generated schema files into quite a few packages. Most of the packages are shared dependencies of the various schema packages and we typically install a few schema packages per app.

Is there a reason for avoiding multiple gitignores?

We originally found it easier to manage all our gitignore paths in a single file. With multiple .gitignoress duplication tends to creep in. But, to be honest, we have ended up with a some extra ones anyway (in others parts of the repo), so it's not something we have been strict on and I really don't see it as a big deal.

I think that allowing spago workspaces to be decoupled from git could be useful for other users but I also understand that it may not worth building/maintaining the extra config option if it's only useful to a few users.

@f-f f-f closed this as completed in #1234 Jun 24, 2024
@f-f
Copy link
Member

f-f commented Jun 24, 2024

We have many apps that have quite fine grained permissions on what parts of the Graphql APIs they can access. We prefer to not to have unnecessary dependencies (for faster packages builds) so we have split out the generated schema files into quite a few packages. Most of the packages are shared dependencies of the various schema packages and we typically install a few schema packages per app.

Ah, this makes a lot of sense - thanks for the context!

I think that allowing spago workspaces to be decoupled from git could be useful for other users but I also understand that it may not worth building/maintaining the extra config option if it's only useful to a few users.

Indeed - I've been trying to keep this part of Spago as hands-free for the user as possible, but the line between "it's magic so I don't have to worry about it" and "it's magic so I don't understand how to bend it to do my thing" is very fine sometimes.
I think a part of this comes down to convention-vs-configuration, and I'd still like to lean on the side of least-configuration for as long as it's comfortable.

Re git coupling: I think "parsing the gitignore" is the sweet spot here; we used to call git to ask if a file was gitignored, but that was slow and coupled to git. Now we just parse the files, so even if you're not using git you can still configure the ignored folders in the .gitignore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants