-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnotes.txt
463 lines (331 loc) · 17 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
2023-11-01
--------------------------------------------------------------------------------
Trying to figure out how to have renaming of imported symbols. Codegen needs to
generate the name for the original symbol name, not the rename.
Another issue this exposed, is that a package can import two symbols with the
same name from different packages. Currently a global reachable_syms_map is
used, which is keyed by symbol name. If two symbols have the same name, only the
most recently encountered symbol will exist in reachable_syms_map since each
insertion will overwrite the previous. Need a way to retrieve symbols by name
from codgen, but also make sure that we get the "right" symbol.
The solution is simple but it evaded me for a whole day, we just need to key the
reachable_syms_table by the "local name" for symbols, the same way that package
symbol tables works.
2023-10-26
--------------------------------------------------------------------------------
Importing Packages
1. First parsing the package into a bunch of declarations
2. Add all package declarations to symbol table, (No resolution occurs here,
just a name -> symbol association
3. Process packages own imports and add name associations for any imported
symbols
Note: No type checking, no order resolution, no codegen
Resolving packages
- When resolving, there is a new way to associate ast nodes with symbols. There
are set_resolved_sym() and get_resolved_sym() functions which take a void
pointer and symbol as arguments, and can associate any pointer with a symbol
by adding an entry to a hash table
Code Readthrough
- in parse package we are setting gen_all_symbols to true for any packages that
have foreign sources. is this the right approach? it seems that per may have
changed his approach for how to incorporate c library code into ion projects.
need to look into this.
- sym_global_put is a now a misnomer. it only adds symbols to package symbol
table
- check if var declaration but not definition does zero initialization or not,
something like x: int;
2023-10-25
--------------------------------------------------------------------------------
It has been a few months since I worked on the compiler (I moved to North
Carolina and bought a house). I need to re-familiarize with the codebase and I
am thinking that this would be a great time to takes some notes and document
anything that I find confusing or unintuative as I am regaining my bearings.
After I am back up to speed, I want to continue working on packages and get a
package system working well.
- read through once to get a high level view of the system
- read through again to understand more details
2023-07-15
--------------------------------------------------------------------------------
Packages
Any package that contains a #foreign(source= ...) directive, requires all of its
symbols to be resolved and code to be generated for them. This is because the
foreign c file could potentially use any symbol declared in the ion package.
I am starting to think that maybe the default should be resolving and generating
code for all symbols, and then having a compiler flag to only resolve and
generate code for reachable symbols. Keep in mind though, that regardless of the
default, packages with foreign sources will always need all symbols resolved and
gen'd.
Designated Initializers
When parsing compound expressions, the compound expression will contain a list of
args. for regular initializer these are simple expressions, but for designated
initializers we need to parse two expressions for each arg, the field name and
the value.
1. change parsing of compound literals arguments from a list of expressions, to
a list of expression pairs, where the field name expression is optional
2. when resolving, for a compound literal arg that has both a field name
expression and field value expression need to check that the field
name expression is EXPR_NAME, and that the field exists in the type
3. in codegen, if the compound literal expression has named fields, output the
corresponding c designated initializer
2023-06-28
--------------------------------------------------------------------------------
Packages: resolution and code gen
current problem:
compiling solitaire, no code is generated for imported symbols that are not used
by the importing package. however, when a package has foreign sources (in this
case noir), the foreign c file is not a part of the resolution phase, so symbols
that it uses from the associated ion package will not be put into reachable
symbols and no code will be generated for them.
In this case the constants like KEY_RETURN are defined in noir.ion and
referenced in noir_impl.c, code is not generated for them.
I'm not yet sure if there are any other cases where reachable_syms, the way they
are currently gathered does not account for all code that needs to be generated.
It seems at this point that the only case is when there are foreign sources
included. So could have any package with a foreign source directive have code
generated for all its symbols
!Another use case would be writing c libraries in ion. You want code for all
symbols to be generated in the c file.
2023-06-08
--------------------------------------------------------------------------------
Packages
Part One: compiling all .ion files in a "package" directory
1. pass a package (directory name) to the compiler on the command line
2. walk the directory whose name matches the package name, and collect all the
.ion files in that directory
3. parse all the files and add all the declarations to the symbol table
4. resolve all the symbols
5. generate code
Part Two: scoped packages
1. Create a package structure to contain package symbols and other package
specific data
2. When parsing a package, add symbols to the symbol table within the package
3. resolve all the symbols in the package
4. generate code for the package
Part Three: imports
1. create a new declaration for imports
[ same as 2 and 3 above ]
2. need to walk the directory whose name matches the imported package name, and
collect all the .ion files in that directory
3. parse all the files and add all the declarations to a package scoped symbol
table
Package *import_package(char *dot_path) : allocates a new package (or returns
cached one), parses the package
---
Search path order:
1. system-packages for the %IONHOME% directory
2. the current working directory
3. the semicolon separated list of directories in %IONPATH%
Bitwise Day 20 Extra approx. 1:09:00 working on starting packages in proper
2023-06-04
--------------------------------------------------------------------------------
break scope and continue scope can be different. I haven't encountered this yet,
but in switch statements, break refers to the switch statement but continue
refers to the innermost loop that it can break out of. right now for defers,
there is no separation and this is causing a bug.
2023-05-29
--------------------------------------------------------------------------------
Implicit Sized Arrays
Today I implemented implicit sized arrays in var declarationss and init
statements. After trying a couple of approaches, I settled on the following:
- in parse_typespec(), if you encounter [], allow the size expression to be NULL
- in resolve_typespec() for TYPESPEC_ARRAY, if the size expresssion is NULL,
then construct an array type with size zero
- in resolve_expr_expected() for EXPR_COMPOUND, check if the typespec or
expected type have an array size of 0, and if so create a new array type,
whose size is determined by the number of args in the compound expression
In the other approaches I was trying, I was handling implicit sized arrays in
resolve_decl_var() and resolve_stmt() for STMT_INIT, and not resolving typespecs
with no array size. This had issues when trying to resolve the compound
expression (array initializer) and potentially pass an expected type. The reason
I ultimately went with the approach I outlined above is so that implicit sized
arrays can be used in any typespec context using the same codepath, for example:
- var arr: int[] = { 1, 2, 3 };
- var arr := (:int[]){ 1, 2, 3 };
One thing I don't like about this, is that creating an array type with size 0
caches that type that will ultimately not be used and will take up space in the
cache. I was thinking I could do some kind of incomplete type for arrays or
maybe a separate type constructor that doesn't cache it and acts like a dummy
type just to allow compound expressions to have an expected type to use,
essentially just to get the array element type. I am not sure. However, the
extra cached array types should only potentially be a problem if many implicitly
sized arrays of different types are used in the program. Note that at the
moment cached types are just using dynamic arrays, so the impact of useless
cached types is worse, since a cache lookup is a linear search. However, this is
intended to be replaced by a hashmap in the future anyway so I am not
necessarily worried about that.
For posterity, one of the previous implementation attempts was like this:
Type *resolve_decl_var(Decl *decl) {
assert(decl->kind == DECL_VAR);
Typespec *typespec = decl->var.typespec;
Expr *init_expr = decl->var.expr;
Type *type = NULL;
if (typespec) {
// allow implicit sized array typespec
if (typespec->kind == TYPESPEC_ARRAY && typespec->array.num_items == NULL) {
if (!init_expr || init_expr->kind != EXPR_COMPOUND) {
semantic_error(decl->pos, "implicit sized array must have an initializer expression");
} else {
Type *elem_type = resolve_typespec(typespec->array.base);
type = type_array(elem_type, init_expr->compound.num_args);
}
} else {
type = resolve_typespec(typespec);
}
}
if (init_expr) {
ResolvedExpr resolved = resolve_expr_expected(init_expr, type);
if (resolved.type->kind == TYPE_ARRAY && init_expr->kind != EXPR_COMPOUND) {
pointer_decay(&resolved);
}
if (!type) {
type = resolved.type;
} else if (resolved.type != type) {
if (!convert_operand(&resolved, type)) {
semantic_error(decl->pos,
"type mismatch in var declaration: specified type is %s, expression type is %s",
type_to_str(type), type_to_str(resolved.type));
}
}
init_expr->type = type;
}
return type;
}
2023-05-14
--------------------------------------------------------------------------------
When resolving, we want to pass type information on to the code generator. Right
now the type info only lives in symbols. In Bitwise video day 10 and the extra
stream, Per adds type info to expressions (a Type *) and also adds to Decls a
symbol pointer. He says he wants to do this so that the generator has type
information, but it is not yet clear to me when this will be used. i.e when do
you need type information directly from an expr or symbol directly from a decl.
Okay after trying to write the c code generation myself I now see that type info
needs to be stored in expressions to account for inferred types. For example
init statments like count := 0; this statement has no typespec field, so the
type needs to be inferred from the rhs. The type is found during resolution, but
needs to be passed on to code generation, so storing it in the expression is how
we will do this.
- Some symbols, like built ins, don't have associated declarations, so we need to
skip over these when generating forward decls in c code generator for example.
2023-03-15
--------------------------------------------------------------------------------
Resolution / Binding
When resolving a declaration (e.g. a struct), there are two possible scenarios:
1. we encountered the definition
- resolve the symbol AND complete the type
2. we encountered some other usage
- resolve the symbol, but only complete types if we actually need the
complete type at this moment
2023-03-08
--------------------------------------------------------------------------------
Symbol Resolution
What does it mean to "resolve" a symbol?
there are two kinds of "resolution" happening
1 discovering dependencies
2 type completion
a dependency on a particular symbol does not necessarily mean it must be ordered
before you, only if it trys to use the symbol in a way that it needs to know the
full definition of the thing is the type completed and the symbol put into the
ordered symbol list
concrete example, symbol of kind SYM_VAR
- discover the type of the var sym and either create a new type or associate it
with the already created canonical version of that type
- discover what symbols that the var sym depends on
(resolve the rhs expression)
- make sure that the rhs expression (if it exists) matches the type of the lhs
- if the rhs is a constant, evaluate it and store that somewhere
- detect cycles
i.e. var x: int = x;
struct x { v: Vec3; }
SYM_TYPE
struct, union
- create a new type and associate it with the symbol
- iterate over the fields and discover dependencies
(resolve the type of each field)
- make sure fields have unique names
- ensure there are some fields at all
- ensure no duplicate definitions
structs and unions, the moment a symbol is created for them, they are resolved
in other words they begin life resolved (SYM_RESOLVED)
however they are not put into the ordered symbols list, until they are COMPLETED
2023-03-08
--------------------------------------------------------------------------------
Order Independent Declarations
The simple, but shallow approach of just ordering declarations based on walking
their dependencies failed. The reason was the dependecies that were only on a pointer
to a thing and not its value. There are cases like the following where it falls
apart:
const x = sizeof(*p);
var p: T*;
struct T { ... }
Here p only depends on T through a pointer so it does not have to be ordered
before it. However x depends on p and deferences it so actually depends on T as
a value. This sort of "transitivity" issue can not be solved with the simple
shallow walking of dependencies.
The new approach is the following:
- Parse a bunch of declarations
- create a new symbol for each decl, and add it to a symbol table
(a symbol is basically a package level binding)
- iterate through all the symbols and recursively resolve everything
adding them to a list of ordered symbols
-
resolving a typespec is turning the syntactic typespec into a semantic type
structs and union types will always start out as TYPE_INCOMPLETE, then only when
in a situation where the complete type is explicitly needed do we finish
"completing" the type
2023-03-06
--------------------------------------------------------------------------------
Order Independent Declarations
1. Create a symbol table which maps names to declarations
2. iterate over the symbol table and order the declarations
3. Order:
if ordered, return the ordered decl
if unordered, set to ordering and recursively order it
if ordering, there is a cycle, print error
2023-02-07
--------------------------------------------------------------------------------
Current Video: Bitwise Day 8
Lexing
- transform a stream of characters into a stream of tokens
Parsing
- transform a stream of tokens into an AST
- An ast node corresponds to a grammar production rule
Resolving Symbols/Entities
- order independent declarations
- cyclic dependency detection
- constant expression evaluation
- type checking/inference
--------------------------------------------------------------------------------
How It Works
Ion Compiler High Level Algorithm
1. Parse a bunch of declarations corresponding to a package
2. Create a new symbol for each decl, and add it to a symbol table
(a symbol is basically a package level binding)
3. Iterate through all symbols and resolve each one recursively, adding them to
a list of ordered symbols
What does it mean to "resolve" a symbol?
There are two kinds of "resolution" happening:
1. Discovering dependencies and ordering
2. Type completion
A dependency on a particular symbol does not necessarily mean that it must
precede in the ordering, only if the symbol is used in a way that its full
definition must be known is the type completed and the symbol put into the
ordered symbol list. For example, a dependency on a pointer to a type does not
require the type to precede in the ordering, but a dependency on a type as a
value does require it to precede.
incomplete types - aggregate types begin their life as incomplete types. as
resolution proceeds, types are sort of lazily resolved where fields are only
resolved at the time of first use,
there are two types of resolving going on: declaration level ordering and stuff,
and then type level dependency resolution
difference in Type and Typespec?
resolving a typespec is turning the syntactic typespec into a semantic type
A "Type" is an interned, canonical version of a type that can be compared
directly with pointers
A "Typespec" is sort of like a reference to a type.
One illustrative example is an array type:
The following is a typespec, where the size of the array is an expression:
var arr: int[const1 + const2];
Once this is resolved into a type, the size of the array must be resolved into a
constant value. So if the values of const1 and const2 are 10 and 6, the type
would be:
var arr: int[16];