Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AGS 4: Implement script RTTI #1922

Merged
merged 12 commits into from
Mar 9, 2023

Conversation

ivan-mogilko
Copy link
Contributor

@ivan-mogilko ivan-mogilko commented Feb 15, 2023

Resolves #1259.

CC @fernewelten , because he is working on a new script compiler, and I added few functions to it.

The detailed description of the purpose is given in the task ticket #1259, here I will give a brief overview.
There are several potential operations within the script interpreter that are impossible to achieve without having an indication and description of object's type and its inner contents:

  • Having managed pointers in managed structs and being able to dereference them at runtime when disposing the parent object (thus avoiding memory leaks).
  • Dynamic pointer casting (at runtime), parent-to-child type casting specifically.
  • Being able to identify the script data in game saves (see Version-independent save games #1371).

Following actions would be also potentially easier to implement if we had type indication and content description:

  • Virtual methods (would need a runtime type reference mostly, full RTTI tables are probably not needed for this).
  • Runtime debugging, such as watching the struct's fields in real-time, for example.

This is where RTTI feature comes from. The idea is to have a table of types and their contents, generated by the script compiler as an extra step, and written either as an extra (optional) part of a script data, or a separate file along with the game project.

This PR does the following:

  1. Implements RTTI generation in both supported script compilers (classic ags3 and new ags4 one).
  2. Serialize generated RTTI per script, as an extension to the script data.
  3. The runtime interpreter creates a joint RTTI collection, updating it after each loaded script.
  4. "--print-rtti" command arg, which makes engine print resulting joint collection on the each room load into the log (this seemed to be the most logical place to have this, as room script is the last loaded script).

--print-rtti command arg is useful for testing the resulting RTTI table.

Additionally, engine also creates "quick references" within the gathered RTTI, which is basically pointers between type and field structs. This allows for easier traversing of types and respective fields under the debugger.


Generation

Currently the type information consists of an ID, a name, a location (where this type was declared), a parent's ID (if it has a parent), a set of flags which define type's kind, and a collection of fields.
The fully qualified type's name may be generated by combining a location's name (usually header or script name) and a type's name. This is required to be able to distinguish potential unrelated types of same names declared in different scripts.
Fields info contains: relative offset, name, typeid, and a set of flags which define field's kind and qualifiers.
The gathered data may be easily expanded in the future (also see notes to the serialization).

Compilers generate RTTI per each compiled script unit. Because of that the type's ID is a local ID, which is relative to this compiled unit only. The engine will have to use fully qualified names to map script's local ID with a global ID in a joint table.


Serialization

The RTTI serialization format is designed having ease of expansion in mind. For that purpose it is not done as a single table with types and nested field items inside, but as a number of separate tables, where each table's entry has a fixed size. The RTTI header describes tables' offsets, and the fixed sizes of their items. The tables are connected using index references (meaning, an item in table 1 may refer to an item in table 2 using its index).

The advantages of such structure are:

  • faster reading;
  • faster and easier parsing, if one is e.g. writing a tool that does not read full data, but only wants to find particular item in a file stream;
  • much easier extension: if you want to expand an existing item then you increase the fixed item's size in header, and even older program will still be able to read new data (although it will only understand old parts of data); if you want to add completely new kind of information then you add a new table, which may be easily skipped by a parser if it does not need it or does not know it.

Following is a format description.

RTTI header

field type / size comment
format uint32 for expanding the rtti format
header size uint32 size in bytes of a header (counting from "format" field, until header ends)
full rtti size uint32 size in bytes of a rtti data (counting from "format" field, until data ends)
location entry size uint32 fixed size of a location's description in bytes (may depend on format)
locations table offset uint32 a relative position of a locations table (in file)
num locations uint32 total number of location entries
type entry size uint32 fixed size of a type's description in bytes (may depend on format)
types table offset uint32 a relative position of a types table (in file)
num types uint32 total number of type entries
field entry size uint32 fixed size of a type field's description in bytes (may depend on format)
fields table offset uint32 a relative position of a type fields table (in file)
num type fields uint32 total number of all fields in all types
string table offset uint32 a relative position of a strings table (in file)
string table size uint32 total size of a string table, in bytes

RTTI tables

field type / size comment
location table num locs * loc size see "Location description" below
type table num types * type size see "Type description" below
fields table num type fields * field entry size see "Field description" below
string table string table size all RTTI null-terminated strings packed in a single array (separated by 0s)

Location description

field type / size comment
local id uint32 local ID of this location
name uint32 an offset of a name in a string table

Type description

field type / size comment
local id uint32 local ID of this type
name uint32 an offset of a name in a string table
location id uint32 ID of location this type was declared at
parent type uint32 local type ID; 0 if no parent
type flags uint32 may contains helper flags which simplify analyzing this type
size uint32 in bytes
num fields uint32 number of member fields this type has, 0 if none
field table index uint32 index of the first field in the fields table

Field description

field type / size comment
offset uint32 relative offset of this field, in bytes
name uint32 an offset of a name in a string table
type uint32 this field's local type ID
type flags uint32 may contains helper flags which simplify analyzing this member
num elements uint32 number of (array's) elements

TODO

  • Make sure there's type with typeid 0 with a meaning "No Type" generated by each compiler (I think old compiler does not do this atm).
  • There's a nasty problem with fully qualified names taking extra space in a file and mem because of the repeated location (script) name. I'm investigating options, but possibly the type desc may contain a "location id" instead, and then either 1) location names are stored separately, 2) location names taken from section names in basic script data, 3) create a new table of "locations" inside RTTI (which may have some extra uses in the future).
  • Compiler should only put member name for fields, currently one or both of the compilers put Type::Member there instead, which is redundant.
  • Perhaps better format the --print-rtti output.
  • Might add explicit compiler option for generating & saving rtti.
  • I made Fields that are not array save "num elems" as 1, but maybe that's redundant (need to think this over).

@ivan-mogilko ivan-mogilko added this to the 4.0.0 (preliminary) milestone Feb 15, 2023
@ivan-mogilko ivan-mogilko force-pushed the ags4--rtti branch 2 times, most recently from d076b7a to 41fde01 Compare February 15, 2023 08:30
@ivan-mogilko ivan-mogilko changed the title AGS 4.0: Implement script RTTI AGS 4: Implement script RTTI Feb 15, 2023
@ivan-mogilko ivan-mogilko marked this pull request as draft February 20, 2023 00:12
@ivan-mogilko
Copy link
Contributor Author

ivan-mogilko commented Feb 20, 2023

I decided to do some refactor on this pr, because in the current variant the RTTI struct has too many internal "mechanics" exposed, so it's too easy to produce an invalid state (and there's code duplication too).

@ivan-mogilko ivan-mogilko force-pushed the ags4--rtti branch 3 times, most recently from 030cfdd to 840fe39 Compare February 23, 2023 20:21
@ivan-mogilko
Copy link
Contributor Author

Added a separate "locations" table. Now the fully qualified type's name is generated at runtime by combining location's name and type's name.

@ivan-mogilko ivan-mogilko marked this pull request as ready for review February 23, 2023 21:31
@AlanDrake
Copy link
Contributor

Does this PR have any impact on script performance?

@ivan-mogilko
Copy link
Contributor Author

ivan-mogilko commented Mar 8, 2023

Does this PR have any impact on script performance?

A little longer script saving and loading.
The script running will be impacted by another pr (#1923), that actually puts this into use in the new "new" command, which saves the type. But I don't think too much (may be tested though).

I was planning to make one last refactor tbh in the nearest days (I got distracted by the latest 3.6.0 bugs), because I put everything in script.h/cpp, and one class may be split into two for more consistent behavior, but this may also be done later too.

The script writing was reimplemented in the managed code (C++ CLR) in c5c7c87 , but this is quite inconvenient, as there's a script serialization code in the native part which is still used by the compiled room serialization code, and thus we'd have to duplicate any additions in native and managed code in parallel.

This change removes managed reimplementation, and makes CompiledScript::Write use native serialization instead.

Because the scripts are written in the middle of the compiled game format, which is serialized by the managed code too, we cannot use same stream object directly.
There are two options:
1) Write to a temp file using native stream, then copy file contents to the provided managed stream.
2) Write to a native membuf, then copy buf's contents to the managed stream (might require conversion, or marshaling).

I implemented this using a temp file for now. If this proves to be slow with large number of scripts, we may switch to using memory buffer (and memory stream) instead.
@ivan-mogilko ivan-mogilko force-pushed the ags4--rtti branch 2 times, most recently from fd7fd67 to 8a6cf80 Compare March 9, 2023 19:02
@ivan-mogilko
Copy link
Contributor Author

ivan-mogilko commented Mar 9, 2023

Okay, I think this is done. I had doubts about how the structs are organized, but I am maybe overthinking again, and it should not be impossible to refactor into something more convenient (if a need arises). The serialization format is something I am certain about though.

As been mentioned, it's possible to completely disable rtti generation by toggling a compiler option. Engine can load scripts both with and without rtti.

@ivan-mogilko ivan-mogilko merged commit f13fd18 into adventuregamestudio:ags4 Mar 9, 2023
@ivan-mogilko ivan-mogilko deleted the ags4--rtti branch March 9, 2023 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants