-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Investigate scalar.h usage and reduce cost of function.h include #36246
Comments
I did some investigation on this. With the change drafted in #39312, the result shows a small reduction on build time (1350.7s VS. 1298.6s) and a noticeable reduction on the inclusion of Here attached detailed analysis results before and after #39312 : I'm not sure if we should proceed with the PR though. |
One thing to note, as you can see in |
Update. Found a problem in CI indicating something that we should probably pay attention to:
I'm worrying this is a breaking change. Thoughts? |
Seems |
Right, they essentially have the same problem. When user code is like below:
However note that if we include
|
I agree that I think we could also investigate other improvements:
|
Sorry but I think you meant
Of course, will do. (Assuming we need to still keep |
Well, that also depends if we can make |
I think what we were talking about is to NOT make legacy user code fail compiling, i.e., "unknown So how does "whether |
We can just keep it in |
So if we can't make
Forgive my ignorance about this, I'm not sure if this is an acceptable compatibility change. I appreciate any clearance. Thanks. |
Well, we don't really have an established policy on this. If the function still exists and there is a common spelling that works accross versions (here, @js8544 @felipecrv What do you think? |
I got your point now. Really appreciate the clearance! |
@zanmato1984 That's a 50 seconds diff. I think it's significant. Note that improvements in this area will be beneficial as the number of Acero functions grow — the cost of parsing these headers grows according to O(# of functions). When build issues start showing up in build benchmarks it's often too late. |
IIUC, the question is about removing |
Thanks for replying. Note that the time diff is not absolute. The ClangBuildAnalyzer result differs from time to time. I guess it depends on the idle-ness of the building machine when doing the experiment. But the time reduction is almost certain, though sometimes more sometimes less. And the inclusion times of the questioning headers are reduced for sure, as shown in the attachments in my other comment. |
The question is actually opposite. The header @felipecrv would you please help to confirm again? Thanks. |
I see. I think it's a valid "breaking" change. If you don't include what you use and succeed compilation because of transitive inclusion, there is little guarantees that things will continue building after the header changes. It's a common problem in C and C++, so I wouldn't block header hygiene improvements on that. If we cared a lot about not breaking header inclusion expectations we could do the cleanup like Microsoft did with |
Assuming we are going to proceed this work. Here lists the most recent comparison of include analysis. The result is trimmed by listing only arrow's own headers. Without this PR:
With this PR:
We can see that:
The cost of including
Including Is this result good enough already or should I keep investigating anything more? Thanks. |
We can merge these first and you can continue looking for more improvements if you want. Just create a more focused issue to refer in your PR (by changing the title of the PR) and keep this one open after your PR is merged. |
Thanks. I've created #39357 as you suggested to tackle |
### Rationale for this change As proposed in #36246 , by splitting function option structs from `function.h`, we can reduce the including of `function.h`. So that the total build time could be reduced. The total parser time could be reduced from 722.3s to 709.7s. And the `function.h` along with its transitive inclusion of `kernel.h` don't show up in expensive headers any more. The detailed analysis result before and after this PR are attached: [analyze-before.txt](https://github.com/apache/arrow/files/13756923/analyze-before.txt) [analyze-after.txt](https://github.com/apache/arrow/files/13756924/analyze-after.txt) Disclaimer (quote from #36246 (comment)): > Note that the time diff is not absolute. The ClangBuildAnalyzer result differs from time to time. I guess it depends on the idle-ness of the building machine when doing the experiment. But the time reduction is almost certain, though sometimes more sometimes less. And the inclusion times of the questioning headers are reduced for sure, as shown in the attachments in my other comment. ### What changes are included in this PR? Move function option structs into own `compute/options.h`, and change including `function.h` to including `options.h` wherever fits. ### Are these changes tested? Build is testing. ### Are there any user-facing changes? There could be potential build failures for user code (quote from #36246 (comment)): > The header function.h remains in compute/api.h, with and without this PR. The proposed PR removes function.h from api_xxx.h (then includes options.h instead), as proposed in the initial description of this issue. This results in compile failures for user code which includes only compute/api_xxx.h but not compute/api.h, and meanwhile uses CallFunction which is declared in function.h. But I think it's OK as described in #36246 (comment). * Closes: #39357 Authored-by: zanmato <[email protected]> Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
### Rationale for this change As proposed in apache#36246 , by splitting function option structs from `function.h`, we can reduce the including of `function.h`. So that the total build time could be reduced. The total parser time could be reduced from 722.3s to 709.7s. And the `function.h` along with its transitive inclusion of `kernel.h` don't show up in expensive headers any more. The detailed analysis result before and after this PR are attached: [analyze-before.txt](https://github.com/apache/arrow/files/13756923/analyze-before.txt) [analyze-after.txt](https://github.com/apache/arrow/files/13756924/analyze-after.txt) Disclaimer (quote from apache#36246 (comment)): > Note that the time diff is not absolute. The ClangBuildAnalyzer result differs from time to time. I guess it depends on the idle-ness of the building machine when doing the experiment. But the time reduction is almost certain, though sometimes more sometimes less. And the inclusion times of the questioning headers are reduced for sure, as shown in the attachments in my other comment. ### What changes are included in this PR? Move function option structs into own `compute/options.h`, and change including `function.h` to including `options.h` wherever fits. ### Are these changes tested? Build is testing. ### Are there any user-facing changes? There could be potential build failures for user code (quote from apache#36246 (comment)): > The header function.h remains in compute/api.h, with and without this PR. The proposed PR removes function.h from api_xxx.h (then includes options.h instead), as proposed in the initial description of this issue. This results in compile failures for user code which includes only compute/api_xxx.h but not compute/api.h, and meanwhile uses CallFunction which is declared in function.h. But I think it's OK as described in apache#36246 (comment). * Closes: apache#39357 Authored-by: zanmato <[email protected]> Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
### Rationale for this change As proposed in apache#36246 , by splitting function option structs from `function.h`, we can reduce the including of `function.h`. So that the total build time could be reduced. The total parser time could be reduced from 722.3s to 709.7s. And the `function.h` along with its transitive inclusion of `kernel.h` don't show up in expensive headers any more. The detailed analysis result before and after this PR are attached: [analyze-before.txt](https://github.com/apache/arrow/files/13756923/analyze-before.txt) [analyze-after.txt](https://github.com/apache/arrow/files/13756924/analyze-after.txt) Disclaimer (quote from apache#36246 (comment)): > Note that the time diff is not absolute. The ClangBuildAnalyzer result differs from time to time. I guess it depends on the idle-ness of the building machine when doing the experiment. But the time reduction is almost certain, though sometimes more sometimes less. And the inclusion times of the questioning headers are reduced for sure, as shown in the attachments in my other comment. ### What changes are included in this PR? Move function option structs into own `compute/options.h`, and change including `function.h` to including `options.h` wherever fits. ### Are these changes tested? Build is testing. ### Are there any user-facing changes? There could be potential build failures for user code (quote from apache#36246 (comment)): > The header function.h remains in compute/api.h, with and without this PR. The proposed PR removes function.h from api_xxx.h (then includes options.h instead), as proposed in the initial description of this issue. This results in compile failures for user code which includes only compute/api_xxx.h but not compute/api.h, and meanwhile uses CallFunction which is declared in function.h. But I think it's OK as described in apache#36246 (comment). * Closes: apache#39357 Authored-by: zanmato <[email protected]> Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
Describe the enhancement requested
Scalars are usually passed around via shared_ptr. So we can often get away with a forward declaration. However, clang build analyzer reports that the scalar.h header is included quite often. We should investigate why this is and see if we can shave a bit of time off our builds by fixing it.
In addition, the function.h include is rather heavy. It is included often because it is needed by the api_xyz.h files in the compute module. However, these files only need the function options. We should see if breaking function options into its own file helps shave down the build time.
Component(s)
C++
The text was updated successfully, but these errors were encountered: