-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure SDK is over 500MB and growing on each release. #17801
Comments
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @aznetsuppgithub. Issue DetailsThe azure SDK is ridiculously large for reasons that I have a hard time understanding. We pip install it for our CI pipelines and the vast majority of the size of our container is coming from the Azure SDK, in the SDK the network directory is taking almost half of the size and this is because there are 39 versions of the SDK. I have never seen anyone doing such a strange approach to version their API clients. I fail to understand why anyone would even want to use the client from 2015 on a cloud product like Azure.
Can the default release only prove the latest version of the client libraries, or at least provide a 'lean' version of the SDK? This release model is certainly not sustainable and is causing useless grief to your users.
|
Hi @sodul, thanks for the feedback, we'll investigate asap. |
Previously reported in #11149. |
To clarify #11149 is only about azure-mgmt-network which is the largest directory but the problem is present across the entire Azure SDK. I understand the reasoning for the approach to keep everything for backward compatibility but if you do have customers that point to the old versions then they should pin their requirement versions to the old pypi.org releases of the Azure SDK, not force everyone to keep a copy of everything around. How about providing two versions of the SDKs: one large with everything, one small with just the latest version. |
Hey, is there any update? |
I wrote a script that we run after If there is interest I can open source the script. |
We have released our script on GitHub. It does delete a good chunk of the API folders but not all of it. With the script the Azure directory is now just under 300MB instead of over 700MB. It is compatible with most, but not all, third party packages, as long as they do not point to a version that is trimmed. |
@kristapratico Following up to see if there is any update on this issue? - Thank you |
@KranthiPakala-MSFT we are working on this, and there is ongoing discussion on the issue to be sure we consider all possible impact of any decisions, and nobody would be broken by it. |
@lmazuel I think one old proposal that won't break anything is to release separate |
Removing non latest APIs, will remove about 60% of the disk space needed. A further design issues is that some of the API definitions import prior APIs in order to have a complete set of objects. I have no idea why these API definitions where designed this way but it is definitely not very good. I did not think of the idea of stripping comments, which means that we could probably extend |
@sodul Yeah, agreed. So far I saw only keyvault being broken by your tool (which should be fixed soon I guess #21623). I think there are actually 2 scenarios we're talking about.. Development - I agree, comments & doc strings are useful. |
Hi @sodul. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “ |
/unresolve |
Thanks @iscai-msft for keeping it opened. |
I've noticed a significant improvement with the more recent releases of the SDK. The space used has been pretty much halved from 1.2GB to 600MB and with azure-sdk-trim we went from 600MB to 300MB.
This was with |
Amazing, it's still an ongoing process but the sdk and the cli team have both been working on reducing the package size, glad that you're able to see the difference! |
Hi @sodul, we deeply appreciate your input into this project. Regrettably, this issue has remained unresolved for over 2 years and inactive for 30 days, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support. |
The latest SDK releases are back to 1.2GB somehow. Output from running https://github.com/clumio-code/azure-sdk-trim:
|
@msyyc do you know why the size would bump x 2? |
I think it is still related with some multiapi packages (e.g azure-mgmt-network/web/containerservice). These packages are updated frequently with more new api-version so the size increases more. |
@msyyc @iscai-msft is there a hard limit on the size where it will be deemed unacceptable and be made a blocker for new releases? Is it 1.5GB, 2GB, 5GB, 10GB? Unless there is some drastic changes with the current SDK model these sizes will be reached. I can't see this path to be sustainable, especially in the modern container based world. |
Version 2.64.0 installed on debian takes 1.9G with all |
I will contact CLI team for more discussion about the size issue. |
There were similar issues with the ruby gem. It just kept growing and growing until support was dropped. With rapidly iterating code, it is tricky to version APIs.
|
One problem is that the Azure SDK team insists on bundling every single past release of the SDKs as a whole. The reasoning is that some customers might be stuck on a specific version, which I personally find dubious since I find it difficult to believe that anyone would want to use the 9+y old version of the networking SDK for their work. Does it even actually work with the API in place in 2024? If such customers insist on using such old versions they probably use very old, long deprecated, versions of Python and other libraries, likely full of known CVEs as well. This causes a massive inconvenience to the vast majority of the users of the SDK and The current workaround, to help reduce the size, is to have newer SDKs import and add/override methods on top of prior versions of the SDKs. While this helps with reducing the incremental size, it does not fix the problem that the new releases are carrying unnecessary files that are useless to the vast majority of the users of the SDK. If prior versions of the SDKs are still required with the same model, at the very least introduce a deprecation model where Sorry for this long post but a lot of the SDK and az cli users are facing very concrete issues caused by this unusual approach to the SDK maintenance and this need to be addressed as a higher priority by the SDK PMs. |
i agree, and I apologize again for all of the issues you guys have been dealing with. I'm going to reach out more to the folks working on this. Really sorry for this still being a continued issue. |
I think your commitment to move the needle in the right direction is great. |
Thanks for commenting on this thread, this is highly appreciated. We're 100% aware of this, and we tried different approach to solve this problem (you may have noticed that azure-mgmt-network is smaller than it used to be), but there are unfortunately a few tooling (ironically this includes the CLI), that are designed assuming they can install multi-api SDK and load API-version by module name. I hope we can reach a conclusion that solve this problem for all party involved soon, I'll keep you guys updated, this issue is bookmarked in my top bucket list. |
Status update: |
Additional update: we have all agreed, including CLI team members, that SDK should stop shipping multi-api SDK entirely. A SDK will target the last known API version as the expected contract. It will still be possible to pass an We're rolling this on all packages over the next couple of months. Part of this effort, we will also split some packages into sub packages. For instance, We will do this split on a few packages, the exact list is TBD, but if you're aware of a SDK with a mix of various API versions, he's probably on the future list. |
The azure SDK is ridiculously large for reasons that I have a hard time understanding. We pip install it for our CI pipelines and the vast majority of the size of our container is coming from the Azure SDK, in the SDK the network directory is taking almost half of the size and this is because there are 39 versions of the SDK.
I have never seen anyone doing such a strange approach to version their API clients. I fail to understand why anyone would even want to use the client from 2015 on a cloud product like Azure.
Can the default release only prove the latest version of the client libraries, or at least provide a 'lean' version of the SDK? This release model is certainly not sustainable and is causing useless grief to your users.
The text was updated successfully, but these errors were encountered: