-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++] Extreme preprocessed size of core headers (vector
, string
, etc.)
#80196
Comments
We are aware that the header size is increasing continuously, and we are working on reducing them - that's the whole point of This isn't a huge priority though, since it takes quite a lot of time to work out where we can avoid includes, and we already have a lot to do with actually implementing new features. |
This is needed for I tried to extract a small part of |
Thank you for the quick response. What I take away from these messages is that:
I am spread too thin with my existing open source commitments to take on libc++ as an extra responsibility. But I must say that getting this under control seems more important than any shiny new C++23 feature could ever be. My hope with this issue was to draw attention to this problem. |
I wouldn't even know how to track header file sizes in the CI without having a bunch of noise all the time.
Most of the contributions are from volunteers, and they tend to care more about new features they want than include times. We also expect people to move to newer standards, so improving include times for C++11-C++17 is quite a high cost to not solve the problem in the end. People also complained when switching from C++14 to C++17, but nobody stepped up to try to improve things. |
Ok, I didn't realize how bad it was (the numbers posted don't really mean much to me). The last time I checked, including |
@wjakob Out of curiosity, have you come across headers other than I just spoke with @philnik777 and I think we may have found a solution that would both really improve this issue while being easy to maintain and unlikely to cause bugs. Roughly speaking, we currently have three types of headers in the library:
From a header that contains actual code (categories 1 and 2 above), we don't want to conditionally include dependencies because that's unmaintainable and will lead to bugs. However, conditionally including implementation-detail headers from umbrella headers is really straightforward. For example, /*
algorithm synopsis
...
*/
#include <__config>
#include <version>
#include <__algorithm/adjacent_find.h>
#include <__algorithm/all_of.h>
#include <__algorithm/any_of.h>
...
#include <__algorithm/unwrap_iter.h>
#include <__algorithm/upper_bound.h>
#if _LIBCPP_STD_VER >= 20
# include <__algorithm/ranges_adjacent_find.h>
...
# include <__algorithm/ranges_upper_bound.h>
#endif If we did the same for e.g. We may find issues with this approach, but it's worth trying. In particular, I think this will cause a massive change in our transitive includes, which could be downright prohibitive for adoption reasons, but we should at least try it out with a patch on a large codebase to see if it's a possibility. |
@idionne I've run a few more. GCC with libstdc++ does a lot better on many of these, with output roughly 3-5x smaller.
By the way, sorting is maybe not the best way to approach this problem. So for example the fact that @philnik777 Might I suggest that preprocessed file size is exactly the thing that should be tracked? The commits you referenced specify milliseconds compilation time, which is a completely arbitrary and difficult-to reproduce quantity. Whereas preprocessed file size refers to the raw source code size that has all the comments etc. stripped out. It is directly proportional to the number of tokens representing the thing that actually causes work for the lexer and parser. This will have much less noise than other quantities, and you can nicely plot it over time. Nikita Popov does something conceptually similar by tracking the number of CPU instructions retired in user space, instead of the raw time which would be noisy. You can see an example here, including his commentary on a significant drop in compilation performance caused by C++17-related additions: https://www.npopov.com/2022/12/20/This-year-in-LLVM-2022.html. In any case, tracking preprocessed file size isolates the libc++-specific header size concern from other changes in LLVM/Clang itself that might have an effect on raw compilation performance. This is how I run these benchmarks: clang++-17 -D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES test.cpp -stdlib=libc++ -std=c++17 -E -o- | wc -c |
Your metrics are actually quite arbitrary too. This includes stuff like line information that doesn't take a significant time to process. e.g. a file |
I'm not suggesting that this would be a benchmark to run on the user's end. It could be done just like here for compile time: https://llvm-compile-time-tracker.com/graphs.php?startDate=2021-12-01&interval=100&relative=on In other words: on a reference machine that runs |
I wouldn't call it a design failure. WG21 has created a solution for the growing header sizes; modules. So I'm not too surprised header size is not considered a huge issue. (I'm well aware of the status of modules in implementations.)
Care to elaborate why that would be unmaintainable? I think that would work. I've created a patch with that approach a while ago https://reviews.llvm.org/D157298 |
Closing, since the biggest problems are addressed and this will be an ongoing effort - nothing that will end at any point. If there is still a problem, please open an issue with a specific complaint. |
Dear libc++ team,
I'm really concerned about growth in the preprocessed file size of core STL header files. For example, preprocessing a 1-liner like the following with
-D_LIBCPP_REMOVE_TRANSITIVE_INCLUDES
expands to 2 077 167 bytes (~2 megabytes)! Without the transitive include define, it is even larger at 2 467 776 (~2.4 megabytes). The problem with this is the cost that it imposes on every compilation unit for using something as simple as a
std::vector
.Of those two megabytes of preprocessed code, only a tiny portion is ultimately concerned with
std::vector
.Here are some of the things that get pulled in, for unclear reasons:
So using
vector
pulls instring
(why), which pulls in__compare/strong_order.h
, which pulls in the entire C & C++ math library, which is huge.Major parts of
tuple
,locale
,atomic
,mutex
,ctime
,typeinfo
,memory
are also included as well.Much of this growth seems related to C++20 features that have started to affect C++17 builds. For example all of the code in
partial_order
,weak_order
,strong_order
, etc. is actually#ifdef
-ed out when compiling in C++17 mode. But that is only true for the body part. All of the dependent header files are still included. Here is an example from__compare/strong_order.h
-basically all of the body code is disabled by#if _LIBCPP_STD_VER >= 20
, but the#include
directives at the top aren't.This issue is a plea for some attention for this issue. Would it make sense to benchmark libc++ header growth for central header files needed by many projects and start to work more actively against regressions? For projects that want to avoid the C++20+ header file growth, it would be great to trim these to what is truly needed in C++11, 14, or 17 builds.
Here are a few more examples, again all compiled in C++17 mode:
vector
iostream
algorithm
string
list
forward_list
iterator
array
variant
tuple
utility
The text was updated successfully, but these errors were encountered: