The implementation and technology behind Self is quite old in comparison to modern compiler implementations but at the time it was state of the art. I hoped it would hold up reasonably well. The test I wrote in Self was:
(| doSomething = ( ^self ). test = ( |n <- 0| [ n < 100000000 ] whileTrue: [ doSomething. n: n + 1 ] ) |)
Running this in the Self shell shows:
"Self 1" _AddSlots: ...code snippet from above... shell "Self 2" [ test ] time. 2587
2.5 seconds seems a bit slow to me but I tested in Pharo to confirm and to see how it compares. The Pharo code looks almost exactly like the Self code:
doSomething = ^self test = |count| count := 0. [ count < 100000000 ] whileTrue: [ count := count + 1. doSomething. ]. [ MyObject new test ] timeToRun => 0:00:00:00.239
That's 239ms vs 2,587ms, a factor of over 10x. Further investigation revealed that calling 'time' in Self seems to cause the code to run slower. If I call the 'test' method first, and then call 'time' then it's much faster:
"Self 2" [ test ] time. 2587 "Self 3" [ test ] time. 2579 "Self 4" test. nil "Self 5" [ test ] time. 650 "Self 6" [ test ] time. 628
At 650ms it is about 2.7x slower than Pharo, an improvement over 10x. More investigation is needed to see if there is room for other improvements.
The Self implementation has some primitives that can be changed to show debugging information from the JIT. All primitives can be listed with:
primitives primitiveList do: [ | :e | e printLine ].
Looking through this shows some interesting ones prefixed with
_Print that can be set to output debug data. One is
_PrintCompiledCode. Setting this to true allows viewing the generated assembler code on the Self console.
"Self 16" _PrintCompiledCode: true. false "Self 17" 40 + 2. ... // loadOop2 movl $0xa0 (40), #-16(%ebp) // loadOop2 movl $0x8 (2), #-20(%ebp) // loadArg movl #-20(%ebp), %ebx movl %ebx, #4(%esp) // selfCall movl #-16(%ebp), %ebx movl %ebx, (%esp) nop nop nop call 0x8186597 <SendMessage_stub> (bp) // begin SendDesc jmp L7f .data 3 jmp L9f .data 0 .data 0 .data 0x4578341 ('+') .data 4 L7: L8: // end SendDesc movl %eax, #-16(%ebp) // epilogue movl #-16(%ebp), %eax // restore_frame_and_return leave ret
_PrintInlining display debug information related to inlining code.
"Self 18" _PrintInlining: true fales "Self 19" test. *inlining size, cost 0/size 0 (0x8b7e864) *PIC-type-predicting - (1 maps) *type-casing - *inlining - (smallInt.self:153), cost 1/size 0 (0x8b7ee38)* *inlining asSmallInteger (number.self:108), cost 1/size 0 (0x8b7fa94)* *inlining raiseError, cost 0/size 0 (0x8b80530)* *inlining asSmallIntegerIfFail: (smallInt.self:302), cost 0/size 0 (0x8b808fc)* *inlining TSubCC: *cannot inline value:With:, cost = 10 (rejected) *marking value:With: send ReceiverStatic *sending value:With: *sending - *inlining size:, cost 0/size 0 (0x8b8434c) *inlining rep, cost 0/size 0 (0x8b846a8) *PIC-type-predicting removeFirstLink (1 maps) *type-casing removeFirstLink *inlining removeFirstLink (list.self:300), cost 2/size 0 (0x8b84b48)* *inlining next, cost 0/size 0 (0x8b85628) *PIC-type-predicting remove (1 maps) *type-casing remove *cannot inline remove, cost = 9 (rejected) *sending remove *sending removeFirstLink *PIC-type-predicting value (1 maps) *type-casing value *inlining value, cost 0/size 0 (0x8b86570)* *sending value *inlining asSmallInteger (number.self:108), cost 1/size 0 (0x8b7e5b0)* *inlining raiseError, cost 0/size 0 (0x8b7f074)* *inlining asSmallIntegerIfFail: (smallInt.self:302), cost 0/size 0 (0x8b7f440)* *inlining TSubCC: *cannot inline value:With:, cost = 10 (rejected) *marking value:With: send ReceiverStatic *sending value:With: nil
For more involved benchmarks there is some code shipped with the Self source. It can be loaded with:
"Self 28" bootstrap read: 'allTests' From: 'tests'. reading ./tests/allTests.self... reading ./tests/tests.self... reading ./tests/programmingTests.self... reading ./tests/debugTests.self... reading ./tests/lowLevelTests.self... reading ./tests/numberTests.self... reading ./tests/deltablue.self... reading ./tests/sicTests.self... reading ./tests/branchTests.self... reading ./tests/nicTests.self... reading ./tests/testSuite.self... reading ./tests/languageTests.self... reading ./tests/cons.self... reading ./tests/benchmarks.self... reading ./tests/richards.self... reading ./tests/parser.self... reading ./tests/parseNodes.self... modules allTests
There are methods on the
bootstrap object for running the tests and printing results. For example:
"Self 32" benchmarks measurePerformance compile mean C mean/C % recur: 5 0 sumTo: 2 7 sumFromTo: 2 7 fastSumTo: 2 6 nestedLoop: 2 10 ...
There is also
measurePerformance3 methods. The code comments for the
measure3 methods explain the differences.
Self 2 was well known for generating very fast code that compared favourably with C. This implementation of this was described in Craig Chamber's thesis. Compilation was slow however so in Self 3 and 4 two new compilers were created. These were 'nic' and 'sic'. I believe this is covered in Urs Holzle's thesis The 'nic' compiler is the 'Non Inlining Compiler' and is simpler to implement. It's the compiler you write to get Self bootstrapped and running on new platforms fairly quickly. There is no inlining and no type feedback so performance is slower as shown by the benchmarking when changing the compiler used, as described below. The 'sic', or 'Single Inlining Compiler', generates better code through more optimisations. While neither is as fast as the Self 2 compiler it is faster to compile code and makes for a better interactive system. You can read more about this in the Merlintec Self FAQ.
There is a
defaultCompiler slot in the
benchmark object that can be set to
sic to compare the different JIT compilers that Self implements. Comparing the 'nic' compiler vs the 'sic' compiler shows a speedup of about 6x in the 'richards' benchmark when using 'sic'.
There's probably a fair bit of low hanging fruit to improve run times. I don't think the x86 backend has had as much work on it as the Sparc or PPC backends. The downside is much of the compiler code is written in C++ so for people interested in 'Self the language' it's not as fun to hack on. Klein was an attempt to write a Self VM in Self and includes a compiler and assembler which might make a more interesting project for those that want to use Self itself to implement compiler code.