-
Notifications
You must be signed in to change notification settings - Fork 3
Optimization techniques
Whenever you think you want to use macros just for inlining purpose, you should use a regular function with (declaim (inline ...))
. Use compiler-macros if it doesn't work, and only in the case even compiler-macros has failed should you use macros.
On SBCL, inlining is equivalent to treating functions as a macro. The Python compiler performs all of constant propagation, type propagation and dead-code elimination on the code inside the inlined function.
Inlining a function also helps allocating objects on stack with (declare (dynamic-extent ...))
.
If you declare a tail-recursive local function inline
, you get a poor-man's loop unrolling.
You should carefully modify a special variable SB-EXT:*INLINE-EXPANSION-LIMIT*
, and you still cannot control the detail.
For example, following code unrolls four recursions into one. Note that the code does not eliminate the bounds checking.
However, if you further inline the entire function, and if the size of the array is known to be below *inline-expansion-limit*
at compile time, then you can sometimes skip the bounds checking (due to dead-code elimination after certain recursions).
(setf sb-ext:*inline-expansion-limit* 4)
(defun fn (array)
(declare (optimize (speed 3) (safety 0)))
(let ((result 0))
(declare (fixnum result)
((simple-array fixnum) array))
(labels ((rec (index)
(when (array-in-bounds-p array index)
(incf result (aref array index))
(rec (1+ index)))))
(declare (inline rec))
(rec 0)
result)))
; disassembly for FN
; Size: 122 bytes. Origin: #x22BE44F9
; 4F9: 31C9 XOR ECX, ECX ; no-arg-parsing entry point
; 4FB: 488B47F9 MOV RAX, [RDI-7]
; 4FF: 4885C0 TEST RAX, RAX
; 502: 7F09 JNLE L1
; 504: L0: 488BD1 MOV RDX, RCX
; 507: 488BE5 MOV RSP, RBP
; 50A: F8 CLC
; 50B: 5D POP RBP
; 50C: C3 RET
; 50D: L1: 488B4F01 MOV RCX, [RDI+1]
; 511: BB02000000 MOV EBX, 2
; 516: EB59 JMP L3
; 518: 0F1F840000000000 NOP
; 520: L2: 48851D89FFFFFF TEST RBX, [RIP-119] ; [#x22BE44B0] = #x8000000000000001
; 527: 75DB JNE L0
; 529: 488B77F9 MOV RSI, [RDI-7]
; 52D: 4839F3 CMP RBX, RSI
; 530: 7DD2 JNL L0
; 532: 488B749F01 MOV RSI, [RDI+RBX*4+1]
; 537: 4801CE ADD RSI, RCX
; 53A: 488BCE MOV RCX, RSI
; 53D: 4883C302 ADD RBX, 2
; 541: 488B77F9 MOV RSI, [RDI-7]
; 545: 4839F3 CMP RBX, RSI
; 548: 7DBA JNL L0
; 54A: 488B749F01 MOV RSI, [RDI+RBX*4+1]
; 54F: 4801CE ADD RSI, RCX
; 552: 488BCE MOV RCX, RSI
; 555: 4883C302 ADD RBX, 2
; 559: 488B77F9 MOV RSI, [RDI-7]
; 55D: 4839F3 CMP RBX, RSI
; 560: 7DA2 JNL L0
; 562: 488B749F01 MOV RSI, [RDI+RBX*4+1]
; 567: 4801CE ADD RSI, RCX
; 56A: 488BCE MOV RCX, RSI
; 56D: 4883C302 ADD RBX, 2
; 571: L3: EBAD JMP L2