-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Project Limit Request: scancode-toolkit - 20 GB #3027
Comments
Gentle ping... we are a bit stuck and unable to do any new release at this stage... Anything I can do to help? |
@cmaureir gentle ping! we are stuck here for three weeks, would be great if you can help us out here 😄 |
Hey @pombredanne and @AyanSinhaMahapatra sorry for not being able to attend this before, but I was out in the last couple of weeks. |
@cmaureir Thank you ++ for dealing with it... I assumed this was vacation time :) |
Note that we will implement some wheel splitting ideas from @AyanSinhaMahapatra in anycase so we will make each release smaller too soon! |
@cmaureir thanks a lot for increasing our limit 😄 🚀 |
Project URL
https://pypi.org/project/scancode-toolkit/
Does this project already exist?
New limit
20
Update issue title
Which indexes
PyPI
About the project
scancode-toolkit is a leading code-scanning tool and has the most accurate license detection, and is widely used across the industry and open-source organizations. This main project is in pypi for ~6 years and was open-source for longer, as the first release was from 2015.
Most of the size of scancode-toolkit comes from its database and index of licenses text and rules and its pre-built license model which supports one of the key feature. Bundling the pre-built model saves time and computing resources as it would require several minutes on each installation otherwise.
Since this is also a tool (and other tools which uses the library as dependency) used in CI, bundling this data and also the index saves valuable time and computing resources.
We actually made similar requests for the main project at #2926 and for another special build at #2961 for https://pypi.org/project/scancode-toolkit/, and all the reasons are basically same. Making this request as we've hit the project size limit also, recently: https://github.com/nexB/scancode-toolkit/actions/runs/5546159893/jobs/10126345730#step:6:791
How large is each release?
latest releases are about ~550 MB, with 5 wheels for supported python versions. This is required as there are non-native dependencies. See https://pypi.org/project/scancode-toolkit/32.0.6/#files
We are compressing actively any file we can.
We also reduced the the number of license files bundled by two which limited the growth of the wheel size.
As we are hitting more limits here, we would now start creating a seperate project and release for all our license-data, and publish it as one wheel separately. After which the release size of this main project will be drastically reduced, since that was responsible for most of the size. Note also, that this new project's release will contain one wheel per release, as the license data does not depend on python versions. The original project will have multiple wheels but they should be <10MB each then, so we should be good.
We need some time to be able to make these changes, and hence the project size limit increase request.
we only vendored a temporary version of attrs to work around version conflicts in pickling
We are not bundling example data.
How frequently do you make a release?
Usually we had about 6-9 releases yearly, but recently for the last two years we've had almost 2 releases per month due to lots of updates and release candidates for major updates (more so in 2023).
We expect that there would be about 1 release per month for the next year.
See https://pypi.org/project/scancode-toolkit/#history for more details.
Code of Conduct
The text was updated successfully, but these errors were encountered: