Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.6.0: Perfect Tides crashes in 3.6.0 (Mac) #1973

Closed
messengerbag opened this issue Apr 8, 2023 · 32 comments
Closed

3.6.0: Perfect Tides crashes in 3.6.0 (Mac) #1973

messengerbag opened this issue Apr 8, 2023 · 32 comments
Labels
context: graphics context: script api type: bug unexpected/erroneous behavior in the existing functionality

Comments

@messengerbag
Copy link

messengerbag commented Apr 8, 2023

Describe the bug
Running the game Perfect Tides (released in an AGS 3.5.1 Mac version for Steam) with a v3.6.0 engine build on MacOS, it crashes at a certain point with the error message:

Assert failed at line 289 of <...>/ags_3.6.0.47_source/libsrc/allegro/src/graphics.c

The line in question is ASSERT(width >= 0); in create_bitmap_ex()

The point at which is crashes is a sort of cutscene where the main character goes on her computer and an in-game AIM chat window pops up. The game uses Khris's ChatDisplay module to display the chat window.

AGS Version
It crashes on Mac in both v3.6.0.44 and v3.6.0.47, built by me from source according to the instructions here. I've also tried running it both in OpenGL and with software driver, fullscreen and windowed, and it does not make a difference.

It does not crash in the official Mac build that Steam distributes (running ACI version 3.5.1.16, compiled with 3.5.1.11).

I have not been able to test on Windows to see whether the problem is limited to Mac or applies to 3.6.0 generally.

Game
Perfect Tides v1.0.1, official Mac version as shipped on Steam

Savegame before crash: agssave.029

My latest home-built AGS app bundle (into which I drop the game files): ags_3.6.0.47.app

To Reproduce
Steps to reproduce the behavior:

  1. Click Hand cursor on computer to begin cutscene
  2. Click through a few dialog boxes
  3. Game crashes at point where chat window is supposed to appear

Expected behavior
Game should not crash. An in-game chat window should pop up.

Additional context
The reason I want to run it in 3.6.0 is that because of the awkward resolution (640x480), the game cannot be cleanly scaled to full-screen, and ends up with uneven pixel scaling using the software driver. In 3.6.0 I can use OpenGL and supersampling to get even pixel scaling (with anti-aliased edges).

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Apr 8, 2023

The reason I want to run it in 3.6.0 is that because of the awkward resolution (640x480), the game cannot be cleanly scaled to full-screen, and ends up with uneven pixel scaling using the software driver. In 3.6.0 I can use OpenGL and supersampling to get even pixel scaling (with anti-aliased edges).

Unrelated to the bug, but does not OpenGL work in 3.5.1?

@ivan-mogilko ivan-mogilko added this to the 3.6.0 maintenance milestone Apr 8, 2023
@ivan-mogilko ivan-mogilko added type: bug unexpected/erroneous behavior in the existing functionality context: graphics labels Apr 8, 2023
@messengerbag
Copy link
Author

Unrelated to the bug, but does not OpenGL work in 3.5.1?

No:

Graphic settings: driver: OGL, windowed: no, screen def: scaling, screen size: 0 x 0, match device ratio: yes, game scale: proportional
Requested graphics driver 'OGL' not found, will try existing drivers instead

And I don't think it's expected to work. The 3.6.0 release notes mention:

OSX:

  • OpenGL renderer is now supported on MacOS.

@ivan-mogilko
Copy link
Contributor

Oh, I had bad memory, this is how OSX port used opengl prior to 3.6.0 (quoting a comment from a different ticket):

This may be important to mention that in 3.5.* mac port works somewhat similar to mobile ports: the engine only draws game in native size using software renderer, then this image is passed into the patched Allegro library, where its own graphic or system driver uses OpenGL to display a scaled texture.

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Apr 8, 2023

So, this is interesting. I did not have this crash on Windows, but then I found out that the ASSERT that you reference is disabled by default in our builds. I only get this when rebuilding the engine with Allegro ASSERTs enabled (by declaring DEBUGMODE compile flag). If it's disabled, then the execution simply continues, and creates an invalid bitmap.

This originates from the call to DynamicSprite.Create(/*width*/ 210, /*height*/ 0);. This function does not do any safety checks, so returns a sprite with invalid size (probably no pixels allocated, only header data).

EDIT: sadly, returning null from DynamicSprite.Create will cause a script error in the game, and probably in some other games that had similar mistake. So, supposedly, we may fix it by creating a bitmap of minimal size, only to ensure that it's valid...

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Apr 8, 2023

After more testing in AGS 3.2.1 and 3.5.1, apparently DynamicSprite.Create never really cared about the arguments.

Here's my test run:
DynamicSprite.Create(0, 0); // success
DynamicSprite.Create(100, 0); // success
DynamicSprite.Create(0, 100); // success
DynamicSprite.Create(-1, -1); // success?!
DynamicSprite.Create(-1, 10); // crashes, failed to allocate -40 bytes, lol

In all the successful cases DynamicSprite object reports exact Width and Height it was requested (including -1 x -1 case).

EDIT: well, the problem is clear, the question now is which behavior to implement while also keeping compatible with existing games...

@ericoporto
Copy link
Member

ericoporto commented Apr 8, 2023

I only get this when rebuilding the engine with Allegro ASSERTs enabled (by declaring DEBUGMODE compile flag).

set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} -DDEBUGMODE=1")

Uhm it looks like it's there on the CMake, so this flag is enabled in debug builds on macOS that are done through CMake (I need to update the stream api moved around in the Xcode project, so that may not be working for a different reason) and also and also in Android debug builds. I think we never officially build Emscripten for debug, so web builds are not affected.

built by me from source according to the instructions here

Why not following the instructions from our README? If there's something wrong it may be required to fix those. Btw, I am not sure those instructions on notarization are correct, as Apple changed things there recently.

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Apr 8, 2023

Pushed 82194b2, which will enforce minimal size of 1 x 1 and proceed (with a warning to a game log). Not the perfect solution, but this is just to keep old games working.

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Apr 8, 2023

Also, while testing, I noticed that the computer chat is made horribly slow. It drops game fps from 40 to 15-16.
Upon further debugging, it fully recreates dynamic sprite on each game frame(!), and probably redraws it over and over. Maybe there's more.

I wonder if it's a game's or original module's problem.

@messengerbag
Copy link
Author

I only get this when rebuilding the engine with Allegro ASSERTs enabled (by declaring DEBUGMODE compile flag).

set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} -DDEBUGMODE=1")

Uhm it looks like it's there on the CMake, so this flag is enabled in debug builds on macOS that are done through CMake

It makes sense to me that debug mode is enabled in debug builds. I did notice that building it in XCode placed the binaries in /Debug, but I don't know XCode well enough (/at all) to figure out any other way to build it, and it never caused problems up till now.

Why not following the instructions from our README?

Because I never knew there was a readme with instructions for building on Mac. It's a pretty natural progression: the link has good explanation of how to get Windows AGS games to run on Mac, with a pre-built app bundle for two engine versions. Then if you want some other engine version, it also has step-by-step instructions on how to build it yourself right below.

Now looking at the readme, yeah, I think it could be improved. I mean, it starts out talking about icons. That's just about the LAST think you need to worry about if you want to get an AGS game to run on Mac. And the way it presents alternative processes is just confusing.

(And couldn't we distribute an official pre-built app bundle for each version of the engine, rather than ask every Mac user to build their own?)

@ivan-mogilko
Copy link
Contributor

And couldn't we distribute an official pre-built app bundle for each version of the engine, rather than ask every Mac user to build their own?

There have been an issue open for this: #1333.
I don't know what the current state is.

@messengerbag
Copy link
Author

well, the problem is clear, the question now is which behavior to implement while also keeping compatible with existing games...

My knee-jerk answer is that either throwing a script exception or returning NULL would be the right behavior if the bitmap dimensions are illegal. Then you'd need a backwards-compatible script option for running older games, of course.

The assertion failure in Debug mode is probably OK? There just needs to be a well-documented way to make non-debug builds for Mac.

I wonder if it's a game's or original module's problem.

We could ask in the module thread. (Or first examine the module code to see how it's meant to run, I suppose.)

@ivan-mogilko
Copy link
Contributor

@messengerbag could you check if building from commit 82194b2 fixes the issue? It's a compatible hack, will create a bitmap of minimal valid size instead.

We may change it to return null in case of bad parameters in the future major versions (like ags4).

@ericoporto
Copy link
Member

ericoporto commented Apr 9, 2023

Just a thing, using cmake by telling it is a release, there shouldn't be that assert.

cmake -G Xcode -DCMAKE_BUILD_TYPE=Release ..

If this doesn't work though, then there is a bug in the CMakeLists.txt, which is a possibility.

Edit: ah, in that website the CMAKE_BUILD_TYPE is not specified, this makes it default to Debug build - which btw should be notably slower.

@messengerbag
Copy link
Author

@messengerbag could you check if building from commit 82194b2 fixes the issue? It's a compatible hack, will create a bitmap of minimal valid size instead.

Hmm...
First, I didn't want to download the whole repo, so I just applied the change in my local copy of the 3.6.0.47 code.

And yes, it no longer crashes, but the scene is now extremely laggy. As you said:

Also, while testing, I noticed that the computer chat is made horribly slow. It drops game fps from 40 to 15-16.

This is not the case for me with 3.5.1. The difference is very noticeable in this build.

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Apr 10, 2023

I tested once more with both original game (made in 3.5.1.11 apparently) and 3.6.0.

3.5.1.11 gives around 24 fps for me with 1 chat window, and 14 with 2 chat windows.
3.6.0 gives around 15 fps with 1 chat window, and 8 with 2 chat windows.

So, obviously there's a curious regression in 3.6.0. On another hand, it's been already quite slow in 3.5.1, which is an issue on its own (maybe the module is working inefficiently, or used inefficiently).

I will try to figure out what is making it slower in 3.6.0 though.

UPDATE: my first guess was that it's related to the sprite update / deletion that checks all possible objects to ensure they are cleared of a sprite's reference (because that was expanded in 3.6.0), but disabling these checks does not fix anything (it stops chat windows from updating visually on screen, but game is still as slow).

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Apr 10, 2023

Found the commit that caused the slowdown: c34f0c9
This was an experiment, which was partially disabled later: e0edb28

So, what that commit did (besides the later disabled part), was the periodic system events poll inside the long scripts. But it only supposed to be running when the script does not return to the engine for a long time.
Rising the poll time from 10 ms to some higher number rises fps, but still not to the amount it had been before this commit (~24 fps for me, but with less frequent poll it does not rise higher than 20 fps).
There are couple of questions to investigate here...

UPDATE: hmm yes, apparently the script operations under which the "hanging" test is done are performed so frequent in this script, that testing for a passed time alone costs around 5 fps. Perhaps I could disable this check, the downside would be that you won't be able to close a game window again if script is hanging for real. This problem may only be properly resolved by moving game logic (along with the script execution) to a separate thread, while keep polling events on a main thread.
In theory we could have a thread for just checking the timer, but I don't know if that's worth it (besides the mutexes may also slow things down if checked too often).

@ericoporto
Copy link
Member

sys_evt_process_pending() should not be that costly, if there are no events we go there, the event list is empty, it backs out. We already call it once per frame normally, how many times is it being called per frame to cause this slowdown?

@messengerbag
Copy link
Author

3.5.1.11 gives around 24 fps for me with 1 chat window, and 14 with 2 chat windows.
3.6.0 gives around 15 fps with 1 chat window, and 8 with 2 chat windows.

I haven't been able to test it properly on my computer. I've tried to use Quartz Debug to measure the framerate, and it reports:

3.5.1: runs at 60 fps at all times, even with 2 chat windows.
3.6.0: runs at 44-50 fps normally, drops to 18-26 fps with either 1 or 2 chat windows (makes no difference)

However, this does not seem to reflect the actual refresh speed. The scrolling animation when a new line appears in the chat window completes in <1 second on 3.5.1, but takes >25 seconds on 3.6.0.

How are you getting your numbers?

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Apr 11, 2023

sys_evt_process_pending() should not be that costly, if there are no events we go there, the event list is empty, it backs out. We already call it once per frame normally, how many times is it being called per frame to cause this slowdown?

It's not really sys_evt_process_pending that is causing this, but the time check code.
For instance, AGS_Clock::now() alone drops fps by ~4 on my pc (in this particular scene).

auto now = AGS_Clock::now();

As for sys_evt_process_pending, it was set to be polled each 10ms while not leaving a script:

ccSetScriptAliveTimer(10u, 1000u, 150000u);

In retrospect I should have set it to 1/GetGameSpeed, or just 1/60. But apparently, this is not the most important thing here.


How are you getting your numbers?

Running with --fps command and looking at fps numbers at the bottom of the window.
Also, when running with --fps you may press Ctrl+E to enable "infinite fps" mode, which runs as fast as possible. This may be easier way to measure the speed in faster scenes, as with big fps numbers you will see difference easier.

BTW, the game is set to run at 40 fps normally.

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Apr 11, 2023

Okay, I removed the whole check added by c34f0c9.
It's sad to do this, but on the second thought, it's totally not correct to add extras like this inside a script execution, which is quite time sensitive, and may be already slower than wanted...

Here's the commit made on release-3.6.0 branch: 5c0fdb2
With this change the game runs at least as fast / slow as the 3.5.1 for me.

@ericoporto
Copy link
Member

ericoporto commented Apr 11, 2023

I read a bit about now() and apparently there are faster ways to get time. There's even a single magical assembly instruction - perhaps there's some abstraction to select the correct assembly per processor. Are we using now() multiple times per frame in different places?

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Apr 11, 2023

Apparently the event pending was also done incorrectly, because it did not reset the _lastAliveTs when running pending. Which means that pending would keep running every turn after reaching a time limit. Fixing that would improve things a little bit (but still not fully).

Are we using now multiple times per frame in different places?

It's in several places around the code, in frame delay calculation, input handling. It's easy to see if you search for AGSClock::now().

I made a simple test, counting how many times the time test is done sequentially while the script is running, and it shows numbers in the range of 16000 to 30000 in that game scene in question.
EDIT: that's not necessarily "per frame", that's rather per one script callback run (before returning execution to the engine). So it easily may be x2 or x3 times that per game frame.

EDIT2:
Well, for the reference, out of a pure curiosity I did a counter for SCMD_JMP and a script time test per frame.

The results for that scene are:
with 1 chat window: 47400
with 2 chat windows: 87670

@ericoporto
Copy link
Member

There is some interesting discussion here: microsoft/STL#624

Apparently if we know the duration range (do we need to go up to seconds or at most milliseconds?) and precision, there are ways to do it a bit faster.

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Apr 11, 2023

Hmm, I replaced the AGSClock declaration with using AGS_Clock = std::chrono::system_clock;
and can confirm that it runs at least almost as fast as in 3.5.1 for me.

EDIT: it still slightly slower, like 0.5-1.5 fps slower , randomly.
I'm still not sure if that's worth it.

@ericoporto
Copy link
Member

ericoporto commented Apr 11, 2023

There is also __rdtsc(), I used it at some point in a benchmark variant AGS engine I had, where I needed to profile something - around this time here. I think it wasn't able to go much longer than 1 second without overflowing.

Looking on GitHub there isn't a readily available cross-platform library with this so my guess is using __rdtsc requires some care or may not be readily available in other platforms or something - apparently in old CPUs it wasn't reliable for multicore timings.

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Apr 11, 2023

Hm, alright, I got persuaded to try and rethink this instead.

In the end I did even more slight optimizations, like doing time check only once per 1000 JMPs. The reason I did not do this originally was that I was concerned about some script actions that may take long time but don't take many loops.
But seeing as the fix meant for rare cases hinders a common case, I think it's worth it for now.

Please see the latest commits at release-3.6.0 branch.
Reverted timeout removal: 547ead9
Added fixes: 5884aee (see commit description for a list of changes)

Compared to the last 3.6.0 release this "won" around 9-10 fps on my system in that scene, which is massive...

EDIT: also pushed to master branch now.
EDIT: tested that it still allows to interact with the system window, and send "app close" event while script is hanging.

@ericoporto
Copy link
Member

ericoporto commented Apr 11, 2023

@ivan-mogilko unrelated, but does running in unlimited frame mode (fps on + ctrl+e) gives you even more fps gains there?

I think I found the commit that introduced that AGS_Clock f2ab428

@messengerbag
Copy link
Author

Running with --fps command and looking at fps numbers at the bottom of the window.

Thanks!

Though I can't seem to get it to work with the original 3.5.1 Steam app bundle. Calling it with that command line option displays framerate info in the 3.6.0 builds, but running it the exact same way (and the same cfg options) with the officially distributed version shows nothing. It doesn't require a debug build of the engine, does it?

In any case, in my 3.6.0 build the framerate drops from 40 fps (60 fps when set to unlimited) to 1 fps with one chat window, and to 0.5 fps with two.

@ericoporto
Copy link
Member

ericoporto commented Apr 11, 2023

@messengerbag , I am sorry because it was some messages above, but make sure you are using the -DCMAKE_BUILD_TYPE=Release as in cmake -DCMAKE_BUILD_TYPE=Release -G Xcode .. when you generate the Xcode project - assuming you are using the Xcode generator. I think you are building for debug.

Btw you can also just copy the binary ags from inside the bundle in your /usr/local/bin and then run any ags game by typing ags <game_path>

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Apr 11, 2023

to 1 fps with one chat window, and to 0.5 fps with two.

This clearly indicates you are building a debug build of the engine, it contains too much extra check and assertion code.

It doesn't require a debug build of the engine, does it?

No, it fps command should work always, regardless of the engine build and game settings.

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Apr 11, 2023

@ericoporto

@ivan-mogilko unrelated, but does running in unlimited frame mode (fps on + ctrl+e) gives you even more fps gains there?

Yes; on the title screen it rised fps from ~3000 to 3300.

Just to clarify, I used system_clock only for the timeout check, introduced new typedef AGS_FastClock for this purpose. Other places still use AGS_Clock = std::steady_clock.

@ivan-mogilko
Copy link
Contributor

I believe this should be closed, as the original reported bug has been fixed.

In regards to the slowdown, @messengerbag, I propose to test with the latest release-3.6.0 branch (and use the "release" engine build), but if the problem persists I think it's best to open a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
context: graphics context: script api type: bug unexpected/erroneous behavior in the existing functionality
Projects
None yet
Development

No branches or pull requests

3 participants