-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jitify2 parsing and preprocessing overhaul #131
base: jitify2
Are you sure you want to change the base?
Conversation
#90 refers to Jitify1. I take it this is in response to the performance regression between CUDA 12.1 and 12.2 (in a test case I've been investigating calls to I've not looked significantly into Jitify2 since I created #90, I presume it's now worth me looking more seriously into the viability of migrating our code to use Jitify2. |
@Robadob This is actually not in response to that issue; I only just became aware of it and do not know the root cause (possibly some header hierarchy changed and became pathological for Jitify). All significant new work is expected to be on Jitify2, so I would encourage you to migrate, and would be happy to answer any questions that come up. |
Fair, the timing must just be a coincidence. I assume the same re: header-hierarchy, but I haven't looked into it too deeply beyond tracking the number of calls to nvrtc compile program.
I'm now planning to explore the migration today (having discussed it with my team), will create a new issue if I have any problems. Edit: Have managed to migrate to |
jitify2.hpp
Outdated
*full_path = include.nonlocal_full_path(kJitifyCallbackHeaderPrefix); | ||
*full_path = path_simplify(*full_path); | ||
if (already_loaded(*full_path)) return HeaderLoadStatus::kAlreadyLoaded; | ||
if (header_callback and header_callback(include, &source)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (header_callback and header_callback(include, &source)) { | |
if (header_callback && header_callback(include, &source)) { |
I've been doing our migration with this branch, this line errors in Visual Studio 2022.
case '"': return in_include_directive_ ? quote_include() : string(); | ||
case 'u': match('8'); | ||
// fall-through | ||
[[gnu::fallthrough]]; // Not sure why gcc complains here without this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[[gnu::fallthrough]]; // Not sure why gcc complains here without this | |
#ifdef __GNUC__ | |
[[gnu::fallthrough]]; // Not sure why gcc complains here without this | |
#endif |
Visual Studio 2022
jitify2.hpp(6192): warning C5030: attribute 'gnu::fallthrough' is not recognized
jitify2.hpp
Outdated
@@ -100,16 +103,19 @@ | |||
#endif | |||
|
|||
#include <algorithm> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <algorithm> | |
#include <algorithm> | |
#include <array> |
Incomplete type jitify2.hpp:6549
(std::array<value_type, Size> data_ = {};
) under visual studio 2022.
jitify2.hpp
Outdated
Token new_tokens[kMaxNewTokens]; | ||
int j = 0; | ||
Iterator before_where = where; | ||
--before_where; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertion here under Visual Studio 2022 Cannot decrement begin iterator
.
The begin iterator is coming from jitify2::parser::process_cuda_source()
s first call to insert_directive();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fiddling with the code (e.g. removing this line), later had a line below assert Cannot deref end iterator
when the same insert_directive()
was being called from with .end()
.
I have however got jitify2
branch working, see an edit to my earlier comment.
e1334e5
to
128e055
Compare
@Robadob Thanks for the help with this. I've pushed fixes for the issues you identified as well as some more that I found with manylinux2014. |
Visual Studio 2022 (CUDA 12.0) Log
Log
gcc (Ubuntu 10.5.0-1ubuntu1~20.04) 10.5.0 (CUDA 12.3) // get the dynamically generated header from curve rtc
headers.emplace("dynamic/curve_rtc_dynamic.h", dynamic_header);
getKnownHeaders(flamegpu_include_dir, headers); // Temp hack that prespecifies known headers
// jitify to create program (with compilation settings)
const std::string program_name = func_name + "_program"; // Does this name actually matter?
jitify2::PreprocessedProgram program = jitify2::Program(program_name, kernel_src, headers)->preprocess(options);
if (!program.ok()) { Log
ManyLinux2014 (CUDA 11.2/12.0) Now builds on CI after setting definition Visual Studio 2019 Still got the variadic template error Log
|
I think the I'm guessing the I need to find a way to reproduce the variadic template error. |
Yes, removing that line has moved Visual Studio 2022 to the same error as previously reported with gcc.
I made this change, and now it's unable to find a regular file included by #line 1 "outputdata_impl_curve_rtc_dynamic.h"
#ifndef CURVE_RTC_DYNAMIC_H_
#define CURVE_RTC_DYNAMIC_H_
#include "flamegpu/exception/FLAMEGPUDeviceException.cuh"
#include "flamegpu/detail/type_decode.h"
#include "flamegpu/runtime/detail/curve/Curve.cuh"
#include "flamegpu/util/dstring.h"
Whilst investigating this, I removed the preloading of all FLAMEGPU headers (which I presume shouldn't be required for speedup after this PR), and found it doesn't appear to be using our include directories passed via options to
It would appear I need to rewrite all of our internal includes as system includes to get any further. (This is rather undesirable, and feels like quite the breaking-change)
It's possibly a case of Visual Studio 2019 simply having poor support for If you have a windows box with Visual Studio 2022, you should be able to install the VS 2019 build tools rather than the whole of Visual Studio 2019. |
I think you need to pass The keys in the
So a key of Note that the |
I've tried this, it has no impact. It only appears able to find FLAMEGPU includes, if they are preloaded and passed via the headers map. All of our FLAMEGPU includes (and their includes) are relative to the FLAMEGPU include directory. In my local build this is
I've done this myself. jitify.hpp:7162 // Try loading from include directories.
for (const std::string& include_path : include_paths) {
*full_path = include.nonlocal_full_path(include_path);
*full_path = path_simplify(*full_path);
if (already_loaded(*full_path)) return HeaderLoadStatus::kAlreadyLoaded;
if (read_text_file(*full_path, &source)) {
return newly_loaded(std::move(source));
} else {
fprintf(stderr,
"Did not find #include %s at %s\n", include.name().c_str(), full_path->c_str());
}
} The first file it can't find is
Somehow the relative path is being interpreted wrong. That path should be (Note the extra So I've updated our code, so that the absolute path to the include directory is passed instead I think this is a case of relative to executable, rather than working directory. The first file that can't be found now has changed.
That file does exist on disk.
Digging into
However, if I make that change for all files loaded by TLDR: 2 issues
|
Thanks for the debugging. I've pushed fixes for preincludes, absolute filenames, and the line endings issue. The paths being relative to the executable directory is intentional, but I'm now re-thinking whether that's a good idea based on your feedback. |
I've now guarded out the offending code, and with the other changes discussed from my previous comment (not clear why jitify1/vanilla-jitify2 weren't complaining). The code I've been testing with now works under both the Windows and Linux. The Linux/CUDA12.3 RTC times are down to 961 & 1196ms from 24009 & 23543ms! (with preloaded FLAMEGPU headers disabled in both cases, which was the previous hack to accelerate compilation, previously 18793 & 17782ms with the preloading although it has negligible impact to performance with this branch) I don't have an immediate RTC model to test GLM with (the original thing that spurred #90), but I expect it will fare similarly. The main difference with GLM is that it's many headers all use relative paths to include each-other, which prevented it from benefiting from the header preload hack. I look forward to seeing this merged, thanks for all the help getting it working. The VS2022 warnings (in #132) are still upsetting CI, and the VS2019 std::function/varadic-template issue persists. But we can probably guard the VS2022 warnings if necessary (VS via CMake doesn't seem to obey ISystem), and we may drop support for VS2019 if necessary (that requires some internal discussion though). |
Realised we have something in our test suite, that reported 1349 & 1318ms for the two (different) kernels under CUDA 12.3. Significantly better than the original >60s reported in #90. To get any further I expected we'd need to be flattening the headers, so this is great. |
cc: @kmaehashi for vis (I need to find time to migrate to jitify2 so as to test this branch) |
410c320
to
0d42248
Compare
Hi, I've picked back up our FLAMEGPU branch that uses this version of Jitify as we're running an event soon and would benefit from the shorter compile times. A couple of (minor) issues that I've spotted, these are niggles, rather than anything we can't work around. compile_log() When compilation fails, if we call In the debugger, it appears these properties aren't even set within PreprocessedProgram, so I guess that explains why. werror Our GCC12 WError build raises this spurious looking warning here.
|
I've identified the other bug, that was getting me confused. When The Lines 193 to 195 in bf1c8c0
This is two statements. In both instances of Lines 1169 to 1177 in bf1c8c0
|
- Replaces C++ lexing/parsing/patching code with a proper lexer implementation, which significantly improves robustness and maintainability. - Replaces minification logic with robust token-based minification. - Replaces preprocessing logic with a new approach that uses custom parsing to find include directives. This only requires invoking NVRTC (and only its preprocessor) once per preprocess, which speeds up preprocessing by 50x in some cases. - Fixes include directory handling. Relative include paths are now handled robustly, and there is no longer any ambiguity between external and built-in headers. Note that relative paths (including in -I options) now start from the current executable directory instead of the current working directory. - These changes should be almost completely backwards compatible.
- This is required for building on some systems, including manylinux2014, and it doesn't hurt on other systems.
- This makes it more robust to changes in formatting in the compiler log output between different nvrtc versions.
- This appears to be a bug related to inline namespaces. - Also attempts to avoid warnings about incompatible redefinitions of constants.
- This has been fixed in recent versions.
- Adds nvtx3 to the build. - Adds NVTX ranges to key functions and ProgramCache hits/misses, gated by a macro JITIFY_ENABLE_NVTX (disabled by default). - Adds tracking of file-cache hits/misses. - Improves ProgramCache tests. Ensures that mem-miss file-hit is tested.
- Most people are using at least 17 now, and we have code that is only enabled from 17 onward.
- This was breaking some single-line if statements (thanks to @Robadob for pointing this out).
- This allows users to access the error log, options, and header info separately from the full error message. - Default error messages now no longer include options and header info; define JITIFY_VERBOSE_ERRORS=1 to restore this behaviour.
1a4bac2
to
99a3cc5
Compare
- It's unclear exactly why this is necessary, but it apparently solves a warning with GCC 12.
@Robadob I had a go at working out that |
Yeah, I wouldn't worry too much. I'd personally be happy to pull support for VS2019 in our codebase, though I can't speak for my colleagues. Just reviewed my earlier comments in this PR and I first noticed it in November 2023 so it's not a new issue. |
It looks like this is an CUDA/NVCC issue that was fixed in CUDA 11.2 after running a larger CI matrix (CUDA 11.8 with VS2019 was previoussly fine so I tweaked a workflow to run 11.0-11.8): https://github.com/FLAMEGPU/FLAMEGPU2/actions/runs/13810766631/job/38631646409 (this log will expire at some point) So will just be worth adding this as a known issue / ammending the CUDA support claims in the readme to reflect this? 11.0 is not supported by NVIDIA/CCCL already due to compiler issues (though potentially not msvc specific), and 11.3 fixes the sonames/dll names to 11.2, so 11.2 is the minimum we (FLAMEGPU) actually care about anyway. It's also worth being aware that our CI only compiles code, it does not run it (as we don't use gpu-backed ci runners). |
@benbarsdell I've tested this branch against CUDF, we have runtime-generated sources that are included as ** extra headers** and the source is then included during jit compilation. The header will have a function name, i.e. |
Fixes #90 and #107.
(This PR requires #128 and #129 to be merged first).