-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AGS 4: RTTI for AGS script #1259
Comments
indeed, if you refer to the same thing as i think you're talking about, this is the fixup mechanism used by compiler and engine. it's basically what traditionally a linker does, to create a final executable from a bunch of object files (in toolchain lingua this is called relocation, not fixup). from your description it sounds likely this could be used for RTTI table too.
sounds like the cleaner way. since ags opcodes are 32bit, and only ~70 used so far, we have still about 4 million possible opcodes available... |
Yes, if we create dedicated code to manage, consolidate and distribute the function, variable and type information that is collected by the individual compiler runs -- then we create a "linker" (in the modern meaning of the word). It's the concept of a linker, even if it doesn't run in one go or as a separate phase. |
That's what the new compiler does internally. It has atomic types, such as Then, the individual variables are assigned a type. All the characteristics are stored with the type and not with the variable. For instance, size: A variable has a size only indirectly. Instead, it has a type and that type has a size. The number of elements of a static array is kept with the array type, too, not with the array name. For instance, the variable "arr" has the type, e.g. "int[20]", that has the number of elements of 20. So from the vantage point of the compiler, the static array is completely subsumed by the general type concept. There's no such thing as a special array variable. It's just one of the cases of a variable. However, with dynamic arrays the number of elements can change at runtime. We can't make the number of elements a part of a dynamic array type because that can change lots of times. So in my opinion we need an opcode to specify just how many elements we want to request. |
True, I guess if new opcode will include array creation, then we need at least 2 args: type and num of elements (we may also add type size for a tiny bit of extra speed, but size is also a part of type so that's optional). |
This is a very very dirty draft, but technically... it's working! (the type table is written by the old compiler for now) Tested by repeatedly allocating and disposing very big managed structs containing pointers to more big managed structs. I think some cases may still not be working; like when inside a managed struct you've got a regular struct, which contains a managed pointer. But it's a matter of completing this properly. |
The supposed RTTI format, based on my older experimental branch, and comments to commits in that branch (branch link) The table is designed having multiple potential uses, and possible expansion in mind, not only the issue of nesting managed pointers. As per rofl0r's suggestion, I tried to design the table in such way that general entries have fixed size. This supposedly allows simplier parsing and faster jump to particular index in the table, without having to read through whole data. But as a consequence this requires to store varied length data, such as type names (strings) and member lists, in separate tables, while the entries only store indexes to reference these. (This is similar to how script strings are stored, for instance). Such approach may seem complicated at first, but, in my opinion, is quite viable once you learn how it works. It also allows for much easier expansion, especially if we'd need to add something completely new. RTTI header
RTTI tables
Type description
Member description
So, there is "Types" table, "Fields/Members" table and "Strings" table. Types and Members may reference each other using indexes and type IDs, and may reference Strings table using offsets. How this data is read? Suppose you read the Type entry, and it has a
After you read the above, you know that you need to go to MemberIndex in the member table. Or jump to (MemberIndex * member entry size) offset from the member table start, if you're still parsing from stream. Then recover consecutive MemberNum entries from there. Similarly, for querying actual name strings, you use NameOffset from the string table start, and read a null-terminated string from there. Engine (and any other tools that would work with this data) will have options on how to organize a final rtti storage in their memory. It may keep the data broken into 3 tables and keep using cross-reference indexes, or it may have a nested structure, with type members and strings inside the type descriptions; whatever seem more convenient. Comments.
|
Preliminary branch that gathers rtti in both old and new compiler: There are few things that could be optimized, probably, but I'll be dealing with that later, after opening a proper pr draft. In regards to the scanning for managed pointers, currently is done:
missing:
|
RTTI generation was implemented by #1922. From now on it's a matter of extending and using this information as necessary in the engine. |
NOTE: RTTI stands for "RunTime Type Information".
Overview
When compiled, AGS script currently looses all information about types. This is tolerable with regular structs, because their work is fully controlled by the compiler and interpreter may rely on its instructions. But when it comes to managed structs lack of knowledge on engine side restricts their functionality and use.
The related problems are, but may not be limited to:
While each of the above problems could be individually solved by its own workaround, my belief is that RTTI would provide more consistent and wholesome solution.
RTTI may come in a form of a table containing information about types. It's generated and written by compiler and/or accompanying tool. This information may be at first limited to our basic needs but later expanded (thus when serialized it also has to have format number).
Each type comes with convenient ID, and when created managed objects would also store this type ID. Thus when engine needs to know something about their type it would use this ID to retrieve the type from the table.
Several years ago we had a conversation about this with other developers and following are points we came to as a result.
Building the RTTI table.
Before we know how to use it or what information to store there, the first question is how to build it.
When compiler compiles a script unit it gathers all the types visible to that script, and information on struct contents which let it calculate memory offsets for all operations. So we already have this info from the perspective of a single compiled unit. Let's say we append this information as an extra segment to the compiled script format. Let's say each table entry would have a numeric type ID which we could pass into "new" operator for user objects.
This sounds fine at first, but having just local tables per script is not going to be sufficient:
This problem on its own could be worked around by keeping type tables separate from script data in the engine, so that when script is unloaded the type data remains in memory.
One could speculate that, because the order of how script headers are currently included in scripts is always fixed, then the order of types (and their numeric index) will also always be fixed. So it seems like we could use equal local type indexes found in multiple scripts as refering to same type.
In truth that is not correct, as scripts may have types declared inside their bodies - and then their indexes would conflict with types declared in script headers following them in the script list.
But even if above did work, in the long term this would not be a good solution, because that would impose limitations on how scripts are used. Consider if we will change the way script headers are included, especially with stand-alone compiler tools: that will break whole system.
This means that we cannot rely on local type table alone and need the global type table.
But how may it be constructed if compiler knows only about one script it compiles at a time?
If I remember correctly, we already have similar mechanism for function binding. Function binding works this way: every script has a list of functions it knows / uses, and when writing call ops it puts their local index in the byte code. Engine uses this local function index to look up in the local table and find function name, under which it is registered in global table.
Perhaps we could use same solution for types. Each compiled script module will have a local table of types where it will map local type index to something that defines its global key. So we would use local numeric id to map to a global (e.g. string) id from scripts's table of types, then this global id is used further to find a type description from global table.
What could be the unique global type ID? For instance it could be a string constructed as a pair of a "scope name" (header/script name?) and a type name.
Now, how a global table is built? I think there may be two approaches here.
The downside is that each compiled unit will contain full info of all the included types. I don't think it will be large in practice, but still worth noting.
script.rtti
. This would provide a full type table for whole game ready to be read at once.In such case individual compiled units will only have ID mapping in their tables, and it's the global table file that would contain type descriptions.
The upside of this is that we have a type table for the whole project at once, and no data duplication.
The downside though is that we would need extra handling for this file inside a project, need to know when and how to update it, and so forth.
So even if method 2 may be beneficial as an option, guess we'd rather first try method 1.
It may be also possible to set up compiler switch to toggle between saving full type info and ID mapping only, if we'd like to support method 2 in the future.
Using Type IDs in script (bytecode)
Assuming we have a local type table, where entries are identified with numeric IDs, we may pass these IDs as an argument when creating a user object. Current
SCMD_NEWUSEROBJECT
command accepts 1 arg meaning object size. Because object size may also be a part of type info, there are various ways we may go from here:SCMD_NEWUSEROBJECT
but interpret its argument as either size or type ID depending on a "script format" or something.SCMD_NEWUSEROBJECT2
which accepts type ID. This will work faster (no if switch), but has an extra opcode.Reading old discussion on this topic I also found a curious proposal to try to merge
SCMD_NEWUSEROBJECT
andSCMD_NEWARRAY
in one new opcode. Because "dynamic array of T" may also be considered a type on its own, and written into type table, then type info could distinguish "plain" types and "array" types. OR array may be indicated as a flag in an object itself.If either of this would seem feasible and convenient, then we'll only need one command for allocating anything managed in script.
Type Info in the engine
Like described above, engine would either construct global type table adding entries as it loads various scripts, or by reading a single file.
Supposing
SCMD_NEWUSEROBJECT
or an additional command would have local type ID as an argument, engine creates a user object (currentlyScriptUserObject
struct) by storing three members: global type info ID / index (or pointer to type info), size, and a buffer for instance data.The necessary contents of type info are dictated by what we want to use it for. For example, if we'd like to be able to release managed pointers we'll need at least a list of offsets of those pointers (handles).
How the engine will deal with recursive object release, or with circular dependencies - I believe this is another topic entirely and may be discussed separately.
If I missed or forgot any potential problems - these may be added to this ticket as we realize them.
The text was updated successfully, but these errors were encountered: