Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WebGPU] drawIndirect and drawIndexedIndirect #28389

Open
AIFanatic opened this issue May 15, 2024 · 13 comments
Open

[WebGPU] drawIndirect and drawIndexedIndirect #28389

AIFanatic opened this issue May 15, 2024 · 13 comments
Assignees
Labels
Milestone

Comments

@AIFanatic
Copy link

Description

Hi, is there any plans to support drawIndirect and drawIndexedIndirect? I have searched the issues and the code base and could not find any references to either.

Solution

Not entirely sure what the best approach would be here but maybe provide a renderer.renderIndirect method that allows an array/buffer reference to be passed? I guess the WebGL backend would have to fallback to drawElements or equivalent.

Alternatives

I have implemented a nanite equivalent in three.js and the bottleneck now is the lod test since its done on the cpu. The algorithm is perfect for the gpu and almost everything could be implemented in WebGL but it would always require a gpu->cpu->gpu roundtrip to read how many meshlets should be displayed in order to properly call drawElements.

Additional context

No response

@CodyJasonBennett
Copy link
Contributor

CodyJasonBennett commented May 15, 2024

I've been implementing that exactly, and you may take an interest to #28103 in lieu of indirect draw or some of my GPU driven experiments with meshlets and culling/visibility tests particularly https://twitter.com/Cody_J_Bennett/status/1736555185541886407, https://twitter.com/Cody_J_Bennett/status/1730911419707842973.

@AIFanatic
Copy link
Author

Hi @CodyJasonBennett I have came across your gpu-culling code before and if im understanding it only discards the fragment shader by setting the w value to 0 if not visible in the vertex shader. This is my issue with current solutions, same with BatchedMesh.
I have done some tests and discarding geometry in the vertex shader gives some performance boost but not a lot, specially considering that most meshlets won't be rendered. Also would be great to discard things on a geometry level and not on a geometry vertex level, if a whole meshlet is not visible I would like to just discard the whole thing in one go instead of discarding each vertex, wasting gpu.

With drawIndirect we can do something like (each is a compute pass):

  1. With meshlets AABB's and indices perform visibility culling
  2. Prefix sum
  3. Reindex
  4. Count visible meshlets
  5. Create indirect buffer based on above

After all done drawIndirect can be used with the GPU data to directly render the visible meshlets only, no need to go over invisible ones, also no gpu->cpu transfer is needed. Keep in mind that rendering the meshlets is not the issue, I have implemented something akin of an InstancedMesh for meshlets that can draw many instances in one draw call. The visibility culling is the problem, in non web environments this is easily solvable by using mesh shaders.

Offtopic but big fan of four, planning on using it as a base for what im trying to do above.

@CodyJasonBennett
Copy link
Contributor

CodyJasonBennett commented May 15, 2024

FWIW those experiments were as close as I could get to GPU driven rendering which is comparably trivial in WebGPU. The use of transform feedback performs culling at the instance or batch level and then the vertex shader will short circuit if the resulting instance buffer is zeroed. Doesn't work so great for virtualized geometry where you have a fixed number of vertices or vertex buffer memory; you want to move data and this can only be done with compute or carefully vectorized CPU code via WASM with GPU driver overhead from upload (introduces latency). May be the easiest path for you in the near term since you're already using METIS and Meshoptimizer.

On the topic of indirect drawing in WebGPU, or more specifically the WebGL compatibility side, my knee-jerk thought is to fallback to multi-draw (hence the backlink to my PR), but the data is different. WebGPU expects an interleaved buffer with draw arguments, but multi-draw expects separate buffers per argument, and they have to be consumed on the CPU. It may be best to consider this feature WebGPU only and consider compatibility only when proven feasible (cc @RenaudRohlinger, curious of your opinion here). This is one of the flagship features of WebGPU people migrate specifically for, alongside GPU shared memory, atomics, multisampled textures, etc. There is no WebGL 2 equivalent, even if it can be ported in a strictly worse fashion.

@RenaudRohlinger
Copy link
Collaborator

Interesting topic here! drawIndirect is also part of my roadmap for achieving a new type of 'static' scene that I have in mind for threejs. I have started working on it here, with the support of the CAD software Plasticity.

Here is how I envision my roadmap:

  • Add RenderBundle support with this pull request.
  • Implement a single buffer pass with the pull request of @aardgoose RFC: WebGPURenderer prototype single uniform buffer update / pass #27388 to batch all the writeBuffer commands in one command, already bundled by a RenderBundle.
  • Add a global UBO pipeline (UBO for the camera and scene elements such as fog, etc.) to update only a single UBO per frame, (instead of pre-calculating the modelViewMatrix on the CPU per mesh per frame) and multiply camera matrices by model matrices directly on the GPU instead of on the CPU.
  • Add frustum culling support via indirect draw calls and compute shading as described in Toji's article Render Bundles and Object Culling.

Each feature on this roadmap should be cherry-pickable once shipped to three.js.
I have never studied Unreal Nanite technology, but I believe all these features combined would be what's needed for this type of GPU rendering

@CodyJasonBennett
Copy link
Contributor

I'm not sure how that relates to this issue. What's described here is indirect drawing which would benefit from an interface with compute. That seems like a backlog of issues which affect WebGPURenderer generally.

@Mugen87 Mugen87 added the WebGPU label May 16, 2024
@CodyJasonBennett
Copy link
Contributor

CodyJasonBennett commented May 16, 2024

I won't hijack this issue any further, but I'm happy to explore this topic in WebGPU and later WebGL 2 (with expected latency). Today, I would lean with WASM + multi-draw which could use #28103. This is an area I've incidentally been studying which was the motivation behind all the prior work I linked. Awesome progress and great to see I'm not alone on the web, Bevy aside.

@Spiri0
Copy link
Contributor

Spiri0 commented Sep 1, 2024

I'm working with AIFanatic's repo. I brought this up to r168 and WebGPU and cleaned it up a lot.
This is the first time I've heard of DrawIndirect and DrawIndexedIndirect. When I think of something like that, I immediately think of compute shaders. These can compute very large amounts of data and store the geometry data directly in 1D, 2D, f32 textures.

I do this very excessively with my ocean2 repo, which I have upgraded to r167.1 three days ago and with an error fix so that you can see the wireframe without issues.
It's just a matter of head that when one think of textures, primarily think of images. f32 textures are exellent numeric data storages.
The geometry is updated with the computed geometry data from the compute shaders at every interval, which runs impressively well with the GPU for a topic as complex as IFFT.
Vertex shaders and fragment shaders can access the data textures of the compute shaders directly. So it all happens in the GPU.
That sounds to me like exactly what we're looking for here.
@AIFanatic I didn't make any pull requests in your repo because my changes to your code are very extensive. If you are interested in using compute shaders, please get in touch. I think I can help with that and with WebGPU.

@CodyJasonBennett
Copy link
Contributor

You should look into GPU driven rendering and storage buffers (with mixed precision since pure float data won't cut it). Mesh shaders can be emulated with compute shaders (with a performance penalty), but I'd think you also want software raster and visibility buffer. VKGuide has a good intro, but this leans into very specialized engine territory – which you could build on top of an interface for indirect drawing and storage memory which is what this issue describes. The rest is a separate exercise, probably a bit too involved for a PoC.

@Spiri0
Copy link
Contributor

Spiri0 commented Sep 1, 2024

If I was wrong I hereby apologize. I'm sorry.

@CodyJasonBennett
Copy link
Contributor

I'm being bad and overloading the topic. Just some resources for you since very few people are interested in this area, especially within the constraints of either three.js or WebGPU. It would be great if this could be expressed with three, but there's a lot to it paired with web/WebGPU limitations.

@AIFanatic
Copy link
Author

I'm working with AIFanatic's repo. I brought this up to r168 and WebGPU and cleaned it up a lot. This is the first time I've heard of DrawIndirect and DrawIndexedIndirect. When I think of something like that, I immediately think of compute shaders. These can compute very large amounts of data and store the geometry data directly in 1D, 2D, f32 textures.

I do this very excessively with my ocean2 repo, which I have upgraded to r167.1 three days ago and with an error fix so that you can see the wireframe without issues. It's just a matter of head that when one think of textures, primarily think of images. f32 textures are exellent numeric data storages. The geometry is updated with the computed geometry data from the compute shaders at every interval, which runs impressively well with the GPU for a topic as complex as IFFT. Vertex shaders and fragment shaders can access the data textures of the compute shaders directly. So it all happens in the GPU. That sounds to me like exactly what we're looking for here. @AIFanatic I didn't make any pull requests in your repo because my changes to your code are very extensive. If you are interested in using compute shaders, please get in touch. I think I can help with that and with WebGPU.

As CodyJasonBennett mentioned to make this work at top performance it would need Multi draw indirect calls, this would need to be implemented at the WebGPU level, currently only Draw indirect calls are supported. The issue with draw indirect only calls is that the number of indices and vertices are fixed (think of instances), so meshes need to be split into the same number of triangles. This works well for meshes that have a lot of triangles but for a simple cube it would actually be calling the vertex shader multiple times unnecessarily.

Emulating this behavior with WebGL can be done but the benefits are little to none since it always requires a GPU->CPU->GPU roundtrip (think of culling, if there are 100 meshes on the scene but 90 of them are occluded how can they be "filtered"/not rendered, it can be done on the vertex shader but the benefits are not great, I have benchmarked this).

Another approach which I have considered is some WebGPU->WebGL2 communication layer, it can be done through canvases without the CPU roundtrip but its kinda of a pain because of textures etc.

Regardless, with DrawIndirect a lot can be done on the GPU such as culling/dynamic LOD etc. I have started a fresh project based on this approach at Trident-WEBGPU.

@Spiri0
Copy link
Contributor

Spiri0 commented Sep 7, 2024

I thought more about the topic after I looked more closely at AIFanatic`s code and also looked more into drawIndirect/drawIndexedIndirect.

There is (positionNode, normalNode, colorNode, outputNode, vertexNode, shadowNode, ... ) in three.js

For this topic, I'm thinking about a drawNode which can then be used analogously to the other nodes.
I am aware that there would be a lot of effort involved in implementing this. The drawNode would then perhaps need a controlling node like the texture node does for textures. For the drawNode like draw(indirectDrawBuffer)
This is roughly my imagination:

const indirectDrawBuffer = ...

const drawShaderParams = {
   drawBuffer: draw(indirectDrawBuffer),
   //additional stuff
}

const material = new MeshBasicNodeMaterial();
material.drawNode = drawShader(drawShaderParams);//compute shader
material.vertexNode = vertexShader(vertexShaderParams);//vertex shader
material.colorNode = fragmentShader(fragmentShaderParams);//fragment shader

I need to think about this more. @sunag and @RenaudRohlinger what do you think about the idea of ​​the drawNode?

@RenaudRohlinger
Copy link
Collaborator

I'm also interested to implement the indirectDraw API in the WebGPURenderer. First I will try at the Renderer level.
Regarding the usage at the TSL usage I guess a pattern like the MRTNode could be indeed interesting, also it would be great if we can avoid a new node properly for the material.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants