Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EGL BadDisplay when using Wayland #5505

Open
valaphee opened this issue Apr 7, 2024 · 18 comments
Open

EGL BadDisplay when using Wayland #5505

valaphee opened this issue Apr 7, 2024 · 18 comments
Labels
backend: gles Issues with GLES or WebGL help required We need community help to make this happen. platform: linux Linux Specific Issues platform: wayland Issues with integration with linux/wayland type: bug Something isn't working

Comments

@valaphee
Copy link
Contributor

valaphee commented Apr 7, 2024

Description
When running any example on Wayland using WGPU_BACKEND=gl it crashes with:

[2024-04-07T11:57:37Z ERROR wgpu_hal::gles::egl] EGL 'eglMakeCurrent' code 0x3008: eglMakeCurrent
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: BadDisplay', wgpu-hal/src/gles/egl.rs:305:14

Repro steps
WGPU_BACKEND=gl cargo run --bin wgpu-examples

Expected vs observed behavior
Running like it does on GL/Vulkan on X11 and Vulkan onWayland

Extra materials
backtrace.txt

Platform

[2024-04-07T11:57:37Z DEBUG wgpu_hal::gles::adapter] Vendor: AMD
[2024-04-07T11:57:37Z DEBUG wgpu_hal::gles::adapter] Renderer: AMD Radeon RX 6900 XT (radeonsi, navi21, LLVM 17.0.6, DRM 3.57, 6.8.2-arch2-1)
[2024-04-07T11:57:37Z DEBUG wgpu_hal::gles::adapter] Version: OpenGL ES 3.2 Mesa 24.0.4-arch1.2
[2024-04-07T11:57:37Z DEBUG wgpu_hal::gles::adapter] SL version: OpenGL ES GLSL ES 3.2
@valaphee valaphee changed the title OpenGL EGL BadDisplay when using Wayland EGL BadDisplay when using Wayland Apr 7, 2024
@AuthorityFX
Copy link

I'm getting a similar issue.

Surface isn't supported by the adapter.

@Friz64
Copy link

Friz64 commented Jul 13, 2024

Okay I THINK I have acquired the arcane knowledge of what wgpu-hal/src/gles/egl.rs is trying to do.

I got it working, but I have committed some very cursed crimes in the process.

proof

image

Essentially the problem is exactly what the error is saying: BadDisplay. On Wayland, get_platform_display is currently fed with khronos_egl::DEFAULT_DISPLAY, here. When I use some "special unsafe debugging tricks" to get the winit window's display handle in there, that error disappears. And then when you completely disable this specialty here, it runs!

By "special unsafe debugging tricks" I mean I stored the display handle in a static beforehand, to be able to access it there. Obviously, a proper solution remains to be found.

I also tried modifying DisplayRef::Wayland to hold the display handle from test_wayland_display(), and then using that to get the platform display, but I think that behaves the same way as khronos_egl::DEFAULT_DISPLAY.

Sooo... I'm thinking the play might be to delay get_platform_display until we have access to the correct display handle in create_surface? Does that make sense? I don't know, I'm going to bed.

@tombh
Copy link

tombh commented Aug 24, 2024

I'm sure this must be obvious to most, but I got the examples working with: WGPU_BACKEND=vulkan cargo run --bin wgpu-examples bunnymark

I'm on Asahi Linux (ie ARM64 and no Vulkan support) and just wanted to see the examples running, even though they're merely being emulated on the CPU (device_type: Cpu, driver: "llvmpipe").

My own wpgu project works fine using OpenGL, it's just the majority of the examples that seem to have a problem.

@BartBM
Copy link

BartBM commented Sep 2, 2024

Bump ^^

Same issue

[2024-09-02T17:32:10Z ERROR wgpu_hal::gles::egl] EGL 'eglMakeCurrent' code 0x3008: eglMakeCurrent
thread 'main' panicked at wgpu-hal/src/gles/egl.rs:298:14:
called `Result::unwrap()` on an `Err` value: BadDisplay
[2024-09-02T17:32:10Z INFO  wgpu_core::instance] Adapter AdapterInfo { name: "AMD Radeon RX 7900 XTX (radeonsi, navi31, LLVM 17.0.6, DRM 3.57, 6.8.0-41-generic)", vendor: 4098, device: 0, device_type: Other, driver: "", driver_info: "4.6 (Core Profile) Mesa 24.0.9-0ubuntu0.1", backend: Gl }
[2024-09-02T17:32:10Z INFO  wgpu_examples::framework] Using AMD Radeon RX 7900 XTX (radeonsi, navi31, LLVM 17.0.6, DRM 3.57, 6.8.0-41-generic) (Gl)

@jimblandy
Copy link
Member

@MarijnS95 I don't want to distract you from the good work you're doing with #6152 and related topics, but: do you have any idea how we could make progress on this?

Is this just an upstream winit problem that we can't do anything about?

@MarijnS95
Copy link
Contributor

MarijnS95 commented Sep 4, 2024

@jimblandy I can surely take a look and comment on the implementation, when booted back into Linux in the coming days. It sounds like @Friz64 already grasped the problem in that passing DEFAULT_DISPLAY is not valid since the implementation now has no clue what Wayland "display connection" to use.

I did run raw-gles on Wayland successfully though (#6176 (comment)), but it uses its own initialization path and surface.

@MarijnS95
Copy link
Contributor

MarijnS95 commented Sep 5, 2024

Theoretically DEFAULT_DISPLAY is allowed, which simply creates a new socket as documented. But as also noted in "the specialty" linked by @Friz64, that wouldn't be compatible with a Surface created / "imported" from a different socket, hence this hack replaces the context in create_surface() 🤔

On a side-note, it's equally strange that eglMakeCurrent() is called with self.pbuffer = None, from Device::desotry_buffer() (while it lock()s/currents the context) during startup.


All in all it looks like wgpu got EGL/WGL context handling wrong. There is a reason raw-window-handle separated out RawWindowHandle from RawDisplayHandle, yet wgpu somehow only passes the latter around when a surface is created? This makes it impossible to correctly initialize a context API that is relevant to the current display connection (set up by winit or otherwise) and requires ugly hacks like fn open_x_display(), fn test_wayland_display() and recreating/overwriting the context.

To whomever is planning to fix these bugs, I recommend starting by deleting the entire gles module and using glutin to implement the context API going forward. A good starting point is its example which clearly highlights important platform differences.


EDIT: Just above the hack there's a call to ANativeWindow_setBuffersGeometry(). I equally don't understand why that's necessary, as the config/surface internally will make sure the backing buffer format of the buffer producer is selected/updated. From some testing very long ago (this may have been wrong), calling this function limits the available EGL configurations for that surface on future queries.

@BartBM
Copy link

BartBM commented Sep 5, 2024

Accessing the raw texture for sharing via DMA-BUF would still be possible after rewriting the gles module?
In this poc I export the raw WGPU OpenGL texture via DMA-BUF and use it in Slint or Glutin:
https://github.com/BartBM/wgpu-dma-buf/blob/main/src/texture_export_wgpu.rs

Some poc code for what I am using it
pub fn export_to_opengl_texture(texture: &Texture) -> Option<NativeTexture> {
    let mut native_texture: Option<NativeTexture> = None;
    unsafe {
        texture.as_hal::<hal::api::Gles, _, _>(|hal_texture| {
            if let Some(hal_texture) = hal_texture {
                if let wgpu_hal::gles::TextureInner::Texture { raw, target: _target } = hal_texture.inner {
                    native_texture = Some(raw);
                }
            }
        });
    }
    return native_texture;
}
pub fn export_to_dma_buf(adapter: &wgpu::Adapter, native_texture: NativeTexture, width: u32, height: u32) -> (TextureStorageMetadata, RawFd) {
    unsafe {
        adapter.as_hal::<hal::api::Gles, _, _>(|hal_adapter| {
            match hal_adapter {
                Some(adapter) => {
                    let raw_display = adapter.adapter_context().raw_display().unwrap().as_ptr();
                    let raw_context = adapter.adapter_context().raw_context();
                    let egl_instance = adapter.adapter_context().egl_instance().unwrap();

                    let fn_egl_create_image_khr = egl_instance.get_proc_address("eglCreateImageKHR");
                    let egl_create_image_khr = fn_egl_create_image_khr.expect("eglCreateImageKHR not present!");
                    let egl_function = std::mem::transmute_copy::<_, PFNEGLCREATEIMAGEKHRPROC>(&egl_create_image_khr);
                    let egl_image_khr: EGLImageKHR = (egl_function.unwrap())(
                        raw_display,
                        raw_context,
                        EGL_GL_TEXTURE_2D_KHR,
                        native_texture.0.get() as EGLClientBuffer,
                        std::ptr::null()
                    );
                    assert!(!egl_image_khr.is_null(), "eglCreateImageKHR failed");
...

@MarijnS95
Copy link
Contributor

MarijnS95 commented Sep 5, 2024

@BartBM the gles module has to be rewritten from scratch is what I'm saying :)

Depending on what your needs are, DMA-BUF sharing is also possible with Vulkan. Note that your repository appears to be private, I cannot access it.

@BartBM
Copy link

BartBM commented Sep 5, 2024

@MarijnS95 yes rewriting it is what I meant. Slint only supports displaying textures via GL, not Vulkan: https://releases.slint.dev/1.7.0/docs/rust/slint/struct.borrowedopengltexturebuilder that's why I was asking.. The other option is copying data via a SharedPixelBuffer but this is slow.

@grovesNL
Copy link
Collaborator

grovesNL commented Sep 5, 2024

I'm not sure if this is helpful, but for context some of the original workarounds were attempting to support surfaceless/headless GL contexts for initial queries, then being able to optionally associate the context with a surface afterwards through whatever context-specific APIs were available for that platform (e.g., context sharing). The implementation may have drifted over time but that was the original intent at least.

glutin and surfman didn't support this use case at the time, so wgpu would have needed a separate API for GL to fit glutin's API. Maybe glutin supports this now though?

Some other background:

@MarijnS95
Copy link
Contributor

MarijnS95 commented Sep 5, 2024

@BartBM it doesn't look like that relies on DMA-BUF (otherwise they should take file descriptors and DRM modifiers?), but instead uses the given texture as a render target when Slint renders its output using OpenGL?

Note that its safety constraints - which describe soundness when other safe-to-call functions are called, or callbacks are invoked - are exactly the ones that @ErichDonGubler shot down in #6152 (comment) / #6211 🤭


@grovesNL Thanks for that backstory, yeah it seems quite common to create a dummy window or surface to get this done. New glutin design however shows that you can typically get away without it (though I'm incredibly glad APIs like Vulkan exist that don't require a "current context" with render target well before you're actually rendering anything).

Specifically on EGL, this might be tricky if the select EGLConfig is incompatible with a window/surface down the line, requiring you to create a new context. This context could be shared with the dummy context though, allowing them to share resources while their "current-ness" is independent.
(Except that on at least wglShareLists() there are some extra constraints when calling).

EDIT: Being all honest, I'm still completely oblivious to wgpu's "async" design and would have to do quite some preliminary reading to understand how the GL context API limitations fit into this design. I.e. what needs to be queried/created up-front before a surface is known and a render pass can start?

@BartBM
Copy link

BartBM commented Sep 5, 2024

@MarijnS95 The slint::BorrowedOpenGLTextureBuilder can only display a texture when it is created in the same OpenGL context. That's why I needed access to the raw texture in WGPU, expose it using DMA-BUF and then import it in Slint. The context is different. The example code is in the link above.

@grovesNL
Copy link
Collaborator

grovesNL commented Sep 5, 2024

@MarijnS95

understand how the GL context API limitations fit into this design. I.e. what needs to be queried/created up-front before a surface is known and a render pass can start?

It's similar to the kind of information you'd get during Vulkan initialization so you can select which device to use, e.g., driver information, features, limits (https://docs.rs/wgpu-hal/22.0.0/wgpu_hal/struct.ExposedAdapter.html)

@MarijnS95
Copy link
Contributor

@BartBM thanks for making the repository public so that we can have a look. Going through DMA-BUF is quite extensive and may only be necessary for interop between APIs and/or when transferring resources across process boundaries.

If all you need is sharing resources between two GL contexts inside the same process (especially Textures) OpenGL context sharing should be exactly what you need. For example GStreamer uses these extensively to share textures and buffers across the pipeline while still being able to (un)current contexts on different threads independently of each other.

This is currently not at all possible with wgpu (and even resulted in some strange internal hacks), and is why @ErichDonGubler opened #6211 to see if we can rectify that. Even though that issue is specific to the current "impossible" soundness requirements on some unsafe functions inside the GLES backend, I don't think we can fix it without properly reimplementing context initialization with support for shared contexts and basing things off of existing RawDisplayHandles for compatibility (etc).


@grovesNL thanks for the link - preliminary reading shows that there's no RawDisplayHandle in there at the moment, so we'd have to find a way to add it.

@MarijnS95
Copy link
Contributor

@cwfitzgerald replying to #6289 (comment) here, since that issue is closed. You gave me the go-ahead to strip out the current GL backend entirely (as proposed above) and replace it with glutin. Note that I don't currently use wgpu nor GL and would need to free up a significant amount of time to get this done. While I've been "active" in maintaining (and learning about GL intricacies via) glutin and know the API design and caveats on that part, I'm not very familiar with the wgpu design yet.

One thing I wanted to get into first and foremost, as outlined in the last paragraph of #6211 (comment), is getting a raw_window_handle::DisplayHandle into Adapter so that we can create a Context for the adequate windowing system. Is that something you think is feasible, or otherwise achievable by creating the context in a different place? I'd hate to have to persist hacks that overwrite the context/surface whenever it finally becomes known to wgpu.

@cwfitzgerald
Copy link
Member

cwfitzgerald commented Jan 14, 2025

One thing I wanted to get into first and foremost, as outlined in the last paragraph of #6211 (comment), is getting a raw_window_handle::DisplayHandle into Adapter so that we can create a Context for the adequate windowing system. Is that something you think is feasible, or otherwise achievable by creating the context in a different place

So we already do this on WebGL. As all contexts come from canvases, we require that you call create_surface before you call enumerate adapters, or else you simply won't get any adapters. We could have something like this requirement for native too, though the big missing piece would be how to deal with headless adapters. The advantage of how the situation is currently set up is that you can make an adapter without regard for the surface you will use it on, or share between surfaces etc. This seems to be plausibly defined, at least in my understanding of WGL and EGL and it would be disappointing to lose that ability.

We do currently have the notion of surface compatibility, so if we could expose one adapter per surface, then one headless adapter, we could have the surface compatibility logic give you the correct surface.

If we do want to take on this limitation, we'd need to think about user education and if there is a better way to expose this causality to get people to fall into the pit of success more easily.

Additionally we need to think about surfaces being unconfigured for a given adapter. This is a not-particularly-well-tested part of the api and probably is horribly broken in a lot of ways anyway, but the api does exist and we should at least put some passing thought to it.

@alextechcc
Copy link

alextechcc commented Feb 4, 2025

Is there any experimental branch someone has that has a "ugly" fixed BadDisplay version I can experiment with? I'd like to contribute here and I've done my share of playing with test_wayland_display but it's a bit difficult to piece together how things still crash even with the hack here

#[cfg(not(Emscripten))]
(Rwh::Wayland(_), raw_window_handle::RawDisplayHandle::Wayland(display_handle)) => {
if inner
.wl_display
.map(|ptr| ptr != display_handle.display.as_ptr())
.unwrap_or(true)
{
/* Wayland displays are not sharable between surfaces so if the
* surface we receive from this handle is from a different
* display, we must re-initialize the context.
*
* See gfx-rs/gfx#3545
*/
log::warn!("Re-initializing Gles context due to Wayland window");
use std::ops::DerefMut;
let display_attributes = [khronos_egl::ATTRIB_NONE];
let display = unsafe {
inner
.egl
.instance
.upcast::<khronos_egl::EGL1_5>()
.unwrap()
.get_platform_display(
EGL_PLATFORM_WAYLAND_KHR,
display_handle.display.as_ptr(),
&display_attributes,
)
}
.unwrap();
let new_inner = Inner::create(
self.flags,
Arc::clone(&inner.egl.instance),
display,
inner.force_gles_minor_version,
)?;
let old_inner = std::mem::replace(inner.deref_mut(), new_inner);
inner.wl_display = Some(display_handle.display.as_ptr());
drop(old_inner);
}
}
.

I'm getting the same issue with Vivante and PowerVR drivers (NXP/TI embedded processors on GL)

@cwfitzgerald cwfitzgerald added help required We need community help to make this happen. platform: linux Linux Specific Issues labels Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: gles Issues with GLES or WebGL help required We need community help to make this happen. platform: linux Linux Specific Issues platform: wayland Issues with integration with linux/wayland type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests