Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Limit Request: scancode-toolkit - 20 GB #3027

Closed
3 tasks done
AyanSinhaMahapatra opened this issue Jul 18, 2023 · 6 comments
Closed
3 tasks done

Project Limit Request: scancode-toolkit - 20 GB #3027

AyanSinhaMahapatra opened this issue Jul 18, 2023 · 6 comments
Assignees

Comments

@AyanSinhaMahapatra
Copy link

AyanSinhaMahapatra commented Jul 18, 2023

Project URL

https://pypi.org/project/scancode-toolkit/

Does this project already exist?

  • Yes

New limit

20

Update issue title

  • I have updated the title.

Which indexes

PyPI

About the project

scancode-toolkit is a leading code-scanning tool and has the most accurate license detection, and is widely used across the industry and open-source organizations. This main project is in pypi for ~6 years and was open-source for longer, as the first release was from 2015.

Most of the size of scancode-toolkit comes from its database and index of licenses text and rules and its pre-built license model which supports one of the key feature. Bundling the pre-built model saves time and computing resources as it would require several minutes on each installation otherwise.
Since this is also a tool (and other tools which uses the library as dependency) used in CI, bundling this data and also the index saves valuable time and computing resources.

We actually made similar requests for the main project at #2926 and for another special build at #2961 for https://pypi.org/project/scancode-toolkit/, and all the reasons are basically same. Making this request as we've hit the project size limit also, recently: https://github.com/nexB/scancode-toolkit/actions/runs/5546159893/jobs/10126345730#step:6:791

How large is each release?

latest releases are about ~550 MB, with 5 wheels for supported python versions. This is required as there are non-native dependencies. See https://pypi.org/project/scancode-toolkit/32.0.6/#files

If you have made efforts to reduce the size of your PyPI releases, mention it here.

We are compressing actively any file we can.
We also reduced the the number of license files bundled by two which limited the growth of the wheel size.

As we are hitting more limits here, we would now start creating a seperate project and release for all our license-data, and publish it as one wheel separately. After which the release size of this main project will be drastically reduced, since that was responsible for most of the size. Note also, that this new project's release will contain one wheel per release, as the license data does not depend on python versions. The original project will have multiple wheels but they should be <10MB each then, so we should be good.

We need some time to be able to make these changes, and hence the project size limit increase request.

If you bundle other packages in your project, mention it here.

we only vendored a temporary version of attrs to work around version conflicts in pickling

If you bundle example data in your project, mention it here.

We are not bundling example data.

How frequently do you make a release?

Usually we had about 6-9 releases yearly, but recently for the last two years we've had almost 2 releases per month due to lots of updates and release candidates for major updates (more so in 2023).
We expect that there would be about 1 release per month for the next year.
See https://pypi.org/project/scancode-toolkit/#history for more details.

Code of Conduct

  • I agree to follow the PSF Code of Conduct
@pombredanne
Copy link

Gentle ping... we are a bit stuck and unable to do any new release at this stage... Anything I can do to help?

@AyanSinhaMahapatra
Copy link
Author

@cmaureir gentle ping! we are stuck here for three weeks, would be great if you can help us out here 😄

@cmaureir
Copy link
Member

cmaureir commented Aug 7, 2023

Hey @pombredanne and @AyanSinhaMahapatra sorry for not being able to attend this before, but I was out in the last couple of weeks.
I have set the new project limit for scancode-toolkit to 20G on PyPI.
Have a nice week 🎉

@cmaureir cmaureir closed this as completed Aug 7, 2023
@pombredanne
Copy link

@cmaureir Thank you ++ for dealing with it... I assumed this was vacation time :)
we just slowed releasing versions and no harm done!

@pombredanne
Copy link

Note that we will implement some wheel splitting ideas from @AyanSinhaMahapatra in anycase so we will make each release smaller too soon!

@AyanSinhaMahapatra
Copy link
Author

@cmaureir thanks a lot for increasing our limit 😄 🚀
Really appreciate it, and we will be working towards splitting the wheel and this should create one bigger wheel (which would be the data + index and this does not need multiple wheels for different python) and the much much smaller main wheel (multiple wheels).

@cmaureir cmaureir self-assigned this Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants