-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Brave story - graphics glitches (debugged) #6754
Comments
|
The first problem might be a little of a pain to fix. We would possibly have to decode the vertices differently to apply flat shading (GLES 2.0 doesn't support flat.) I'm not really sure, since OpenGL is not my strong suit. @hrydgard? The second problem is somewhat hard. We try to detect when a game uses a render surface as a texture, because downloading the render to memory is expensive. How different is the height? Are the addresses identical? Oh, did you mean ti actually renders the clut palette? The third problem, yes there are several games that vertex cache causes problems for. And optimizations have been made that have made the vertex cache not much faster in some games. I would like to disable it by default, personally, but there are a lot of games that work fine and it is still much faster for. People end up complaining that the new release is slower and will just use the older version (with its other bugs.) Another idea might be to find a way to make the vertex cache less aggressive in cases where it's wrong, somehow, or learn from its failures better. -[Unknown] |
Actually 1) is even easier if the game is rendering just RECT primitives, then we can just copy the color from the first to the second vertex (or the other way around?) before expanding to a RECT. Yeah, vertex cache is a big tricky tradeoff between glitches and speed that could probably use more work. |
Maybe we could bail all flat prims to software transform? I wonder how common they are. -[Unknown] |
For ES 2.0 where flat is not available, we would have to add some extra code to "de-index" triangles too, similarly to how we expand rectangles, in order to set vertex colors separately. Although that would be pretty easy. I don't think enabling flat shading is very common, maybe we should just add some reporting and see. It could be that games enable it for 2D stuff for no good reason then leave it enabled when drawing non-lit 3D stuff by accident though, and then we could lose performance... |
Can definitely detect it, but those checks have a cost. If this is the only game that renders directly to CLUT, maybe we could do something specific like we had to do for Dangan Ronpa... |
Does the CLUT load address match the render address exactly? 0x041ff800 and 0x041ffc00, I mean. It might not be too bad to call PerformMemoryDownload for the CLUT size in that case. -[Unknown] |
Oh, and about the height - we have no other choice but to estimate it. Games do not need to set any form of height to render. They just set a start address, a stride, and send coordinates. This is actually why "read framebuffers to ram" crashes, since it simply writes all renders to RAM always, and does so based on estimates which aren't always right. We actually have a bug about removing or at least hiding that option. -[Unknown] |
Yes, the framebuffer ptr matches the clut load address exactly in Brave Story. Curious - are there are any discussions about any alternative graphics rendering approaches (aka - somewhat slower but still fast on desktop & high quality) that would give better compatibility? |
I think detecting a direct clut would nit be so expensive (on clut load), will see. Do you have a savestate there with both enemies? I beat the game a while back on my psp... don't remember where such enemies were common. Hmm. Only fragmented ones. Best place may be the forum our a new bug if we have a specific direction in mind. The software renderer could use SIMD and definitely has optimization potential. There are still bugs and things we have not tested or fully understood yet in it. But on desktop I think it's feasible to be way faster. Something that gave us more direct access to unswizzled vram, if it existed, would of course be awesome. -[Unknown] |
I uploaded a savestate with the colorful rabbits: |
Looks like it also uses a block transfer for the same CLUT when rendering other enemies. So if we always download the rendered pixels, we end up wrong for the other rabbit (and the render is not updated because the single row of pixels in the top left uses the wrong stride.) Probably the best thing, in this specific case at least, would be to detect a render to an address that is used as a clut, and flush it right away. It might not be so expensive to check whenever switching render targets, will just need to keep track of (and decimate) a list of CLUT addresses. A bit hacky though. Ugh. -[Unknown] |
By the way, it's unclear in general from which of the two input vertices of a rectangle we should take the color for each of the four output vertices, flat mode or not. I think we'll need some testing there, just like the UVs as mentioned in #3902. |
@whyisthisfieldhere yes, a way faster software renderer could be made, but it would be a lot of work. Maybe we could even hook in JPCSP's new "external software renderer" although I'm not sure if they would be OK with that. |
Does #6855 help with the vertex cache flickering? -[Unknown] |
Still flickering,but not that strong. |
the smoke fixed in #7115 |
Thanks for testing, that's good. It's expected that the enemy colors aren't working yet because we have no checks for rendering to a palette yet... |
Even forcing the vertex cache to rehash every draw, the flicker is there still. In fact, deleting the vbo/ebo and regenerating it every time still has flicker. I also tried disabling draws by vert count (they're all either 4, 18, 138, 384, 3750, or 6528 in the area I was testing.) Different areas and types of flicker were caused by different counts of verts. All but 4 used indexes, and disabling all using indexes did kill the flicker. After drawing the background, it uses a texture (0x041977f0 in my case) to apply a texture to the ground (you can see the texturing in the video.) It uses fixed blending to do this: The following completely eliminated the flicker: if (gstate.getUVGenMode() != GE_TEXMAP_TEXTURE_COORDS)
useCache = false; This only affected ~4% of draws. So I think this means we have some bug in the vertex cache affecting UVgen. But it could just be a coincidence (something else affecting exactly the same verts.) -[Unknown] |
Hm, interesting. What kind of bug could that be that would not affect drawing non-cached? We do have a slightly different path when uvgen is active but only when texture coord prescale is enabled.. |
I'm not sure. I wasn't able to find anything - the shader id is the same, the vertex format is the same, etc. The issue only happens when using a vbo - using an ebo but no vbo doesn't do it. And prescale doesn't affect it. Strange... -[Unknown] |
This is pretty much only tested with Brave Story. See hrydgard#6754. There may be other cases which are not handled yet.
For reference, a version of render-to-clut that does work for this game: However, the performance hit is not good at all. 815% -> 420% in battle with the rabbits. -[Unknown] |
Cool! But yeah, that's surely the downloads killing performance. Will be even worse on mobile, most likely.. I fear we will have to "depal" on-GPU, either directly when rendering (so we have to do the filtering ourselves too) or in a prepass like we do with paletted texturing from framebuffers, as we discussed. |
Indeed, I just wanted to validate the uploads to clut and etc. (since the rabbits are rendered differently - one is from a block transfer, the other from a render.) The upload to clut (to the clut framebuffer, I mean) isn't free either, but it's not nearly as bad. Without download it's about 780%. -[Unknown] |
Yeah, uploads are much cheaper because they can be queued by the driver and do not stall the rendering more than the time to copy the data to the driver's buffer queue, while downloads require the gpu to catch up to a certain point and sync, which is terrifyingly expensive in comparison, as you know. |
Also fixes #6162 |
Neat, thanks for testing. I should've realized it might be doing that. @hrydgard, would it be better to upload as single channel? It's tempting to try to upload the raw texture directly, but I think we'd need to translate everything (definitely at least 4-bit) to 8-bit first. Then, maybe upload as GL_ALPHA / GL_RED depending on GL version? With my other change (tex-split), we can potentially detect 1:1 texture draws, might might allow us to make different choices in where we decode the texture. But I suppose the simplest thing would be to process it first (like fb depal.) Not sure if it makes sense to try to combine this code into the existing depal. Maybe not... -[Unknown] |
This is pretty much only tested with Brave Story. See hrydgard#6754. There may be other cases which are not handled yet.
This is pretty much only tested with Brave Story. See hrydgard#6754. There may be other cases which are not handled yet.
This is pretty much only tested with Brave Story. See hrydgard#6754. There may be other cases which are not handled yet.
This is pretty much only tested with Brave Story. See hrydgard#6754. There may be other cases which are not handled yet.
This is pretty much only tested with Brave Story. See hrydgard#6754. There may be other cases which are not handled yet.
I haven't necessarily given up on #8246 yet, but I'm not really sure how much gain we'll see from it now... Anyway, this particular issue is fixed, I believe. The vertex cache flickering is still there a bit, but not as bad. I would personally like to disable by default, but it's still a big speedup on some devices and in some games. Can reopen if I missed anything. -[Unknown] |
Reopening since it still seems to be a thing and differ depending on backend:
Weirdly the graphics sometimes display fine, or the problems are reduced by disabling and re-enabling slower effects, using savestates seem to also affect it in various ways. The problem doesn't show on frame dumps, so here's a post-game savedata which has access to pretty much every place on the world map with dragon travel(triangle -> other -> Vyne) as well as ability to freely swap party members as some special attacks might also be affected/buggy on it's own. Edit:
Just reloading savestate doesn't do anything, so this is all weird and might be some race issue:]. In OGL nothing helps other than disabling slower effects and sacrificing colors, would be nice if someone could confirm on non AMD gpu, just in-case... drivers heh;p. |
The game does drawing for characters to render CLUTs - it sounds like this is what's going wrong on your card, maybe? Vulkan definitely has messed up effects for me, but OpenGL - I messed around and didn't see any missing parts or invisible enemies. I only tried like 5 or 6 battles though - how common was it for you? I think the CLUT download isn't fully working on non-GL, though... that or flat interpolation isn't. -[Unknown] |
Heh, so another driver bug it is.. as for me every single fight in OGL has invisible enemies. Unless I disable slow effects only their special effects like fire in those birds can be seen. Stupidly OGL is the only backend where that fire actually looks ok:P. I tried to see if it's not a regression, but in older versions PPSSPP was just crashing when trying to boot that game in OGL for me. |
Hmm, crash on boot in this game was caused by (and I think has always been caused by) "read framebuffers to memory". -[Unknown] |
It's really time to rip out those modes now... I know, I know, some rare game still needs it but they're clearly way more trouble than they're worth. |
I never use read framebuffers to memory, so I'm quite sure it's my AMD ogl drivers:C. But yeah the option itself doesn't have much reason to be available. Many people might consider going from skipping buffer effects through buffered rendering as a gradation and consider read framebuffers to memory as the best which is really bad. Edit: |
Actually turns out I lied about testing latest buildbot not crashing, it still crashes, I just confused with my custom build. So official version doesn't have missing graphics in OGL, it simply crashes. Also I found out it's caused by be removal of legacy BGRA path, that also caused missing graphics in Final Fantasy on AMD gpu's as can be seen here. Except in this game it simply crashes:c. Edit: On a side note, the missing graphics I experienced in custom build instead of the crash could just be me doing a poor job at trying to restore that legacy path.:| |
On NVIDIA, all backends look good. Is this still an issue on AMD with the latest git? Note: make sure vertex cache is off. It's now off by default, except when OpenGL is the default. -[Unknown] |
How does this look now on AMD devices? Note: The BGRA path has probably been removed again with the latest rewrite, but hopefully it still works. -[Unknown] |
I think this is probably fixed, not sure about AMD devices but hopefully we're in a good place now. Probably should create new issues if there are new problems. -[Unknown] |
Hi, I found a few issues in brave story (hardware graphics) and debugged them to varying degrees.
In order of most to least simple to fix / likely to be fixed:
Smoke & fire are colored oddly with hardware graphics (also see Dante's Inferno weird colors around the fire. #6332).
From debugging, it looks like the smoke primitives contain various color data and the GE's shademodel is set to GE_SHADE_FLAT.
The software renderer appears to check for the value of shademodel and ignore all but the first color in this case, resulting in the correct image.
However, it looks like no other renderers check shademodel, likely resulting in the colorful smoke.
This should be easy for a dev to fix, I think. (NOW FIXED)
Enemy colors don't always differ in hardware graphics: (Notice color of right bunny in below screenshot in software vs hardware)
From debugging, it looks like brave story sets the framebuffer to the clut area (with inappropriate framebuffer height) and draws colors into it. It then uses the clut for drawing the texture.
In software, you can see that the clut area in memory gets updated between the first and second rabbit.
In hardware, you can see that the clut area in memory doesn't get updated in this case.
(I'm not sure how it's updated at all in hardware, to be honest)
With vertex cache set (which is the default), there is serious flickering in the worldmap. Perhaps vertex cache shouldn't be enabled by default until it's more reliable?
screens of bug 1 & 2:


Hardware:
Software: (no bug)
The text was updated successfully, but these errors were encountered: