notes.txt

2023-11-01
--------------------------------------------------------------------------------
Trying to figure out how to have renaming of imported symbols. Codegen needs to
generate the name for the original symbol name, not the rename. 

Another issue this exposed, is that a package can import two symbols with the
same name from different packages. Currently a global reachable_syms_map is
used, which is keyed by symbol name. If two symbols have the same name, only the
most recently encountered symbol will exist in reachable_syms_map since each
insertion will overwrite the previous. Need a way to retrieve symbols by name
from codgen, but also make sure that we get the "right" symbol.

The solution is simple but it evaded me for a whole day, we just need to key the
reachable_syms_table by the "local name" for symbols, the same way that package
symbol tables works.


2023-10-26
--------------------------------------------------------------------------------
Importing Packages

1. First parsing the package into a bunch of declarations
2. Add all package declarations to symbol table, (No resolution occurs here,
  just a name -> symbol association
3. Process packages own imports and add name associations for any imported
  symbols

Note: No type checking, no order resolution, no codegen

Resolving packages
- When resolving, there is a new way to associate ast nodes with symbols. There
  are set_resolved_sym() and get_resolved_sym() functions which take a void
  pointer and symbol as arguments, and can associate any pointer with a symbol
  by adding an entry to a hash table


Code Readthrough
- in parse package we are setting gen_all_symbols to true for any packages that
  have foreign sources. is this the right approach? it seems that per may have
  changed his approach for how to incorporate c library code into ion projects.
  need to look into this.

- sym_global_put is a now a misnomer. it only adds symbols to package symbol
  table

- check if var declaration but not definition does zero initialization or not,
  something like x: int;

2023-10-25
--------------------------------------------------------------------------------
It has been a few months since I worked on the compiler (I moved to North
Carolina and bought a house). I need to re-familiarize with the codebase and I
am thinking that this would be a great time to takes some notes and document
anything that I find confusing or unintuative as I am regaining my bearings.
After I am back up to speed, I want to continue working on packages and get a
package system working well.

- read through once to get a high level view of the system
- read through again to understand more details


2023-07-15
--------------------------------------------------------------------------------
Packages

Any package that contains a #foreign(source= ...) directive, requires all of its
symbols to be resolved and code to be generated for them. This is because the
foreign c file could potentially use any symbol declared in the ion package.

I am starting to think that maybe the default should be resolving and generating
code for all symbols, and then having a compiler flag to only resolve and
generate code for reachable symbols. Keep in mind though, that regardless of the
default, packages with foreign sources will always need all symbols resolved and
gen'd.


Designated Initializers
When parsing compound expressions, the compound expression will contain a list of
args. for regular initializer these are simple expressions, but for designated
initializers we need to parse two expressions for each arg, the field name and
the value.

1. change parsing of compound literals arguments from a list of expressions, to
   a list of expression pairs, where the field name expression is optional

2. when resolving, for a compound literal arg that has both a field name
   expression and field value expression need to check that the field 
   name expression is EXPR_NAME, and that the field exists in the type

3. in codegen, if the compound literal expression has named fields, output the
   corresponding c designated initializer


2023-06-28
--------------------------------------------------------------------------------
Packages: resolution and code gen

current problem:
compiling solitaire, no code is generated for imported symbols that are not used
by the importing package. however, when a package has foreign sources (in this
case noir), the foreign c file is not a part of the resolution phase, so symbols
that it uses from the associated ion package will not be put into reachable
symbols and no code will be generated for them.

In this case the constants like KEY_RETURN are defined in noir.ion and
referenced in noir_impl.c, code is not generated for them.

I'm not yet sure if there are any other cases where reachable_syms, the way they
are currently gathered does not account for all code that needs to be generated.
It seems at this point that the only case is when there are foreign sources
included. So could have any package with a foreign source directive have code
generated for all its symbols 

!Another use case would be writing c libraries in ion. You want code for all
symbols to be generated in the c file.


2023-06-08
--------------------------------------------------------------------------------
Packages


Part One: compiling all .ion files in a "package" directory
1. pass a package (directory name) to the compiler on the command line 
2. walk the directory whose name matches the package name, and collect all the
   .ion files in that directory
3. parse all the files and add all the declarations to the symbol table
4. resolve all the symbols 
5. generate code 

Part Two: scoped packages
1. Create a package structure to contain package symbols and other package
   specific data
2. When parsing a package, add symbols to the symbol table within the package
3. resolve all the symbols in the package
4. generate code for the package

Part Three: imports
1. create a new declaration for imports
[ same as 2 and 3 above ]
2. need to walk the directory whose name matches the imported package name, and
   collect all the .ion files in that directory
3. parse all the files and add all the declarations to a package scoped symbol
   table


Package *import_package(char *dot_path) : allocates a new package (or returns
cached one), parses the package

---

Search path order:
1. system-packages for the %IONHOME% directory
2. the current working directory
3. the semicolon separated list of directories in %IONPATH%

Bitwise Day 20 Extra approx. 1:09:00 working on starting packages in proper


2023-06-04
--------------------------------------------------------------------------------
break scope and continue scope can be different. I haven't encountered this yet,
but in switch statements, break refers to the switch statement but continue
refers to the innermost loop that it can break out of. right now for defers,
there is no separation and this is causing a bug.

2023-05-29
--------------------------------------------------------------------------------
Implicit Sized Arrays

Today I implemented implicit sized arrays in var declarationss and init
statements. After trying a couple of approaches, I settled on the following:
- in parse_typespec(), if you encounter [], allow the size expression to be NULL
- in resolve_typespec() for TYPESPEC_ARRAY, if the size expresssion is NULL,
  then construct an array type with size zero
- in resolve_expr_expected() for EXPR_COMPOUND, check if the typespec or
  expected type have an array size of 0, and if so create a new array type,
  whose size is determined by the number of args in the compound expression

In the other approaches I was trying, I was handling implicit sized arrays in
resolve_decl_var() and resolve_stmt() for STMT_INIT, and not resolving typespecs
with no array size. This had issues when trying to resolve the compound
expression (array initializer) and potentially pass an expected type. The reason
I ultimately went with the approach I outlined above is so that implicit sized
arrays can be used in any typespec context using the same codepath, for example: 
- var arr: int[] = { 1, 2, 3 };
- var arr := (:int[]){ 1, 2, 3 };


One thing I don't like about this, is that creating an array type with size 0
caches that type that will ultimately not be used and will take up space in the
cache. I was thinking I could do some kind of incomplete type for arrays or
maybe a separate type constructor that doesn't cache it and acts like a dummy
type just to allow compound expressions to have an expected type to use,
essentially just to get the array element type. I am not sure. However, the
extra cached array types should only potentially be a problem if many implicitly
sized arrays of different types are used in the program. Note that at the
moment cached types are just using dynamic arrays, so the impact of useless
cached types is worse, since a cache lookup is a linear search. However, this is
intended to be replaced by a hashmap in the future anyway so I am not
necessarily worried about that.


For posterity, one of the previous implementation attempts was like this:

Type *resolve_decl_var(Decl *decl) {
	assert(decl->kind == DECL_VAR);
	Typespec *typespec = decl->var.typespec;
	Expr *init_expr = decl->var.expr;
	Type *type = NULL;
	if (typespec) {
		// allow implicit sized array typespec
		if (typespec->kind == TYPESPEC_ARRAY && typespec->array.num_items == NULL) {
			if (!init_expr || init_expr->kind != EXPR_COMPOUND) {
				semantic_error(decl->pos, "implicit sized array must have an initializer expression");
			} else {
				Type *elem_type = resolve_typespec(typespec->array.base);
				type = type_array(elem_type, init_expr->compound.num_args);
			}
		} else {
			type = resolve_typespec(typespec);
		}
	}

	if (init_expr) {
		ResolvedExpr resolved = resolve_expr_expected(init_expr, type);
		if (resolved.type->kind == TYPE_ARRAY && init_expr->kind != EXPR_COMPOUND) {
			pointer_decay(&resolved);
		}

		if (!type) {
			type = resolved.type;
		} else if (resolved.type != type) {
			if (!convert_operand(&resolved, type)) {
				semantic_error(decl->pos, 
					"type mismatch in var declaration: specified type is %s, expression type is %s",
					type_to_str(type), type_to_str(resolved.type));
			}
		}
		init_expr->type = type;
	}

	return type;
}


2023-05-14
--------------------------------------------------------------------------------

When resolving, we want to pass type information on to the code generator. Right
now the type info only lives in symbols. In Bitwise video day 10 and the extra
stream, Per adds type info to expressions (a Type *) and also adds to Decls a
symbol pointer. He says he wants to do this so that the generator has type
information, but it is not yet clear to me when this will be used. i.e when do
you need type information directly from an expr or symbol directly from a decl.

Okay after trying to write the c code generation myself I now see that type info
needs to be stored in expressions to account for inferred types. For example
init statments like count := 0; this statement has no typespec field, so the
type needs to be inferred from the rhs. The type is found during resolution, but
needs to be passed on to code generation, so storing it in the expression is how
we will do this.


- Some symbols, like built ins, don't have associated declarations, so we need to
  skip over these when generating forward decls in c code generator for example.


2023-03-15
--------------------------------------------------------------------------------
Resolution / Binding

When resolving a declaration (e.g. a struct), there are two possible scenarios:
1. we encountered the definition
  - resolve the symbol AND complete the type
2. we encountered some other usage
  - resolve the symbol, but only complete types if we actually need the 
    complete type at this moment


2023-03-08
--------------------------------------------------------------------------------
Symbol Resolution


What does it mean to "resolve" a symbol?

there are two kinds of "resolution" happening
1 discovering dependencies
2 type completion

a dependency on a particular symbol does not necessarily mean it must be ordered
before you, only if it trys to use the symbol in a way that it needs to know the
full definition of the thing is the type completed and the symbol put into the
ordered symbol list


concrete example, symbol of kind SYM_VAR
- discover the type of the var sym and either create a new type or associate it
  with the already created canonical version of that type
- discover what symbols that the var sym depends on
  (resolve the rhs expression)
- make sure that the rhs expression (if it exists) matches the type of the lhs
- if the rhs is a constant, evaluate it and store that somewhere
- detect cycles
  i.e. var x: int = x;


struct x { v: Vec3; }

SYM_TYPE
struct, union
- create a new type and associate it with the symbol
- iterate over the fields and discover dependencies
  (resolve the type of each field)
- make sure fields have unique names
- ensure there are some fields at all
- ensure no duplicate definitions


structs and unions, the moment a symbol is created for them, they are resolved
in other words they begin life resolved (SYM_RESOLVED)
however they are not put into the ordered symbols list, until they are COMPLETED


2023-03-08
--------------------------------------------------------------------------------
Order Independent Declarations

The simple, but shallow approach of just ordering declarations based on walking
their dependencies failed. The reason was the dependecies that were only on a pointer
to a thing and not its value. There are cases like the following where it falls
apart:

const x = sizeof(*p);
var p: T*;
struct T { ... }

Here p only depends on T through a pointer so it does not have to be ordered
before it. However x depends on p and deferences it so actually depends on T as
a value. This sort of "transitivity" issue can not be solved with the simple
shallow walking of dependencies.


The new approach is the following:

- Parse a bunch of declarations
- create a new symbol for each decl, and add it to a symbol table
  (a symbol is basically a package level binding)
- iterate through all the symbols and recursively resolve everything
  adding them to a list of ordered symbols
- 


resolving a typespec is turning the syntactic typespec into a semantic type

structs and union types will always start out as TYPE_INCOMPLETE, then only when
in a situation where the complete type is explicitly needed do we finish
"completing" the type


2023-03-06
--------------------------------------------------------------------------------
Order Independent Declarations

1. Create a symbol table which maps names to declarations

2. iterate over the symbol table and order the declarations

3. Order:
   if ordered, return the ordered decl
   if unordered, set to ordering and recursively order it
   if ordering, there is a cycle, print error


2023-02-07
--------------------------------------------------------------------------------
Current Video: Bitwise Day 8

Lexing
- transform a stream of characters into a stream of tokens

Parsing
- transform a stream of tokens into an AST
- An ast node corresponds to a grammar production rule

Resolving Symbols/Entities
- order independent declarations
- cyclic dependency detection
- constant expression evaluation
- type checking/inference


--------------------------------------------------------------------------------

How It Works


Ion Compiler High Level Algorithm
1. Parse a bunch of declarations corresponding to a package
2. Create a new symbol for each decl, and add it to a symbol table
   (a symbol is basically a package level binding)
3. Iterate through all symbols and resolve each one recursively, adding them to
   a list of ordered symbols 


What does it mean to "resolve" a symbol?
There are two kinds of "resolution" happening:
1. Discovering dependencies and ordering
2. Type completion


A dependency on a particular symbol does not necessarily mean that it must
precede in the ordering, only if the symbol is used in a way that its full
definition must be known is the type completed and the symbol put into the
ordered symbol list.  For example, a dependency on a pointer to a type does not
require the type to precede in the ordering, but a dependency on a type as a
value does require it to precede.


incomplete types - aggregate types begin their life as incomplete types. as
resolution proceeds, types are sort of lazily resolved where fields are only
resolved at the time of first use, 

there are two types of resolving going on: declaration level ordering and stuff,
and then type level dependency resolution


difference in Type and Typespec?

resolving a typespec is turning the syntactic typespec into a semantic type

A "Type" is an interned, canonical version of a type that can be compared
directly with pointers 

A "Typespec" is sort of like a reference to a type. 

One illustrative example is an array type:
The following is a typespec, where the size of the array is an expression:
var arr: int[const1 + const2];

Once this is resolved into a type, the size of the array must be resolved into a
constant value. So if the values of const1 and const2 are 10 and 6, the type
would be:
var arr: int[16];