Stalin
stalin brutally optimizing Scheme compiler, with Debianization patches
Install / Use
/learn @barak/StalinREADME
Stalin - a STAtic Language ImplementatioN
Finally, a Lisp compiler that does what it should...
Jeffrey Mark Siskind
School of Electrical and Computer Engineering
Purdue University
Electrical Engineering Building, Room 330
465 Northwestern Avenue
West Lafayette IN 47907-2035 USA
voice: 765/496-3197
fax: 765/494-6440
qobi@purdue.edu
http://www.ece.purdue.edu/~qobi
Monday 2 October 2006
INTRODUCTION
This is release 0.11 of Stalin, a global optimizing compiler for Scheme. It is an alpha release and contains many known bugs. Not everything is fully implemented. This is the first self-hosting release. Stalin can now compile itself. Unlike previous releases, this release does not require you to have Scheme->C. And it does not require you to install Infrastructure or QobiScheme.
Stalin is an extremely efficient compiler for Scheme. It is designed to be used not as a development tool but rather as a means to generate efficient executable images either for application delivery or for production research runs. In contrast to traditional Scheme implementations, Stalin is a batch-mode compiler. There is no interactive READ-EVAL-PRINT loop. Stalin compiles a single Scheme source file into an executable image (indirectly via C). Running that image has equivalent semantics to loading the Scheme source file into a virgin Scheme interpreter and then terminating its execution. The chief limitation is that it is not possible to LOAD or EVAL new expressions or procedure definitions into a running program after compilation. In return for this limitation, Stalin does substantial global compile-time analysis of the source program under this closed-world assumption and produces executable images that are small, stand-alone, and fast.
Stalin incorporates numerous strategies for generating efficient code. Among them, Stalin does global static type analysis using a soft type system that supports recursive union types. Stalin can determine a narrow or even monomorphic type for each source code expression in arbitrary Scheme programs with no type declarations. This allows Stalin to reduce, or often eliminate, run-time type checking and dispatching. Stalin also does low-level representation selection on a per-expression basis. This allows the use of unboxed base machine data representations for all monomorphic types resulting in extremely high-performance numeric code. Stalin also does global static life-time analysis for all allocated data. This allows much temporary allocated storage to be reclaimed without garbage collection. Finally, Stalin has very efficient strategies for compiling closures. Together, these compilation techniques synergistically yield efficient object code. Furthermore, the executable images created by Stalin do not contain (user-defined or library) procedures that aren't called, variables and parameters that aren't used, and expressions that cannot be reached. This encourages a programming style whereby one creates and uses very general library procedures without fear that executable images will suffer from code bloat.
IMPORTANT NOTE
Contrary to any other indication in the documentation, this release does not run on DEC/Alpha under Linux due to limitations of gcc (it is not able to compile the file stalin-Alpha.c because it has too many global variables). It is possible to run Stalin under Linux on DEC/Alpha when it is compiled with Scheme->C (rather than self compiled) but I am not currently distributing that version. Contact me if you need to run Stalin under Alpha Linux and I will send you the Scheme->C version.
It might be possible to compile stalin-Alpha.c with the DEC C compiler under Digital Unix. I do not have access to this environment to try this. If anybody else does, I would appreciate any feedback as to whether this works.
DEVIATIONS
Stalin nominally accepts the language defined by The Revised^4 Report on the Algorithmic Language Scheme (R4RS). The language accepted by Stalin is known to deviate from R4RS in a number of ways as described below. (If you discover additional differences, please send mail to Big-Stalin@AI.MIT.EDU.) Many of these simply are features that are not (yet) implemented though I make no commitment to fully comply with R4RS and reserve the right to deviate from R4RS for any reason whatsoever, particularly performance.
-
Unimplemented syntax: 4.2.6 Nested quasiquotation is not supported. appendix Macros
-
Unimplemented procedures: 6.5.5 NUMERATOR DENOMINATOR RATIONALIZE MAKE-RECTANGULAR MAKE-POLAR REAL-PART IMAG-PART MAGNITUDE ANGLE 6.10.4 LOAD TRANSCRIPT-ON TRANSCRIPT-OFF
-
(1.1) Tail recursion optimization is done only on self calls.
-
(6.5) The following numeric data types are not supported: a. bignums: exact integers of arbitrary magnitude Furthermore, exact integer arithmetic can overflow without signaling an error and without yielding an inexact floating point number. b. ratios: exact non-integer rationals c. polar format complex numbers d. exact rectangular format complex numbers
-
It is not possible to access all of the underlying C scalar types. (This is not an incompatibility with R4RS but nonetheless important for other reasons.) a. No independent control over the float/double distinction. b. No long/short chars/ints/floats/doubles c. No unsigned chars/ints
-
(1.3.1) No R4RS compliance mode is provided.
-
Limitations of (6.5.6) STRING->NUMBER, (6.10.2) READ, and the source program: a. Can't parse largest negative number. b. Can't parse polar numbers with @. c. Can't parse rectangular numbers with i. d. Can't parse ratios with /. e. Can't parse numbers with embedded #. f. Can't parse exactness with #E and #I. g. Can't parse inexact numbers with mantissas where the digit string before the decimal point would overflow if read as an exact number, h. On Intel, SPARC, and SGI, the source program can't contain exact integer constants <-536870912 or >536870911. On Intel and SPARC, the compiler will generate incorrect C code without warning. On SGI, the compiler might signal an exception or might generate incorrect C code without warning. On Alpha, the source program can't contain exact integer constants <-2305843009213693952 or >2305843009213693951. The compiler will generate incorrect C code without warning.
-
(6.5.6) STRING->NUMBER doesn't accept the optional radix argument.
-
APPLY and CALL-WITH-CURRENT-CONTINUATION don't accept continuations or foreign procedures as their first argument.
-
(6.2) EQV? might return #T when given two procedures that can never be called. (BEGIN (DEFINE (F X) (LAMBDA () X)) (EQV? (F 1) (F 2))) ==> #T EQV? might return #T when given two fictitious pairs or degenerate vectors. (EQV? (CONS #T #T) (CONS #T #T)) ==> #T (EQV? (VECTOR #T) (VECTOR #T)) ==> #T
-
(6.5.5) =, <, >, <=, >= might not be transitive.
-
(6.5.6) STRING->NUMBER, READ, DISPLAY, and WRITE don't obey write/read invariance for inexact numbers.
-
(APPLY (LAMBDA X X) y) ==> y without checking that y is a list and without copying y. Furthermore, because of the way LIST is defined, (APPLY LIST X) doesn't copy X.
-
(6.10.1) Closing a port that has already been closed yields undefined behaviour rather than having no effect.
EXTENSIONS
Stalin extends R4RS in a number of ways:
- New syntax: PRIMITIVE-PROCEDURE and FOREIGN-PROCEDURE.
- New procedures: LIST-LENGTH, SUBLIST, SUB, LIST-APPEND, LIST-REVERSE, REF, LIST-SET!, REF!, LIST-FILL!, FILL!, LIST-COPY, STRING->UNINTERNED-SYMBOL, STRING-REVERSE, <<, >>, BITWISE-NOT, BITWISE-AND, BITWISE-OR, MAKE-DISPLACED-VECTOR, SUBVECTOR, VECTOR-APPEND, VECTOR-REVERSE, VECTOR-COPY, PANIC, POINTER?, INTEGER->STRING, INTEGER->INPUT-PORT, INTEGER->OUTPUT-PORT, and INTEGER->POINTER.
- New data type: pointer. ZERO? can be used to check for null pointers (as well as null strings and null ports). INTEGER->POINTER can be used to convert an integer address to a pointer. INTEGER->STRING, INTEGER->INPUT-PORT, and INTEGER->OUTPUT-PORT can be used to convert an integer address to a string, input port, or output-port, respectively.
- New variable: ARGV is bound to a vector of strings containing the command line arguments.
- If the last expression executed by the program evaluates to an integer, then its value is returned as the status code of the program. Otherwise, the status code zero is returned.
- Vectors are self evaluating.
- DO iterator list can be empty.
- Bodies can be empty.
- The commands of DO and the expressions of COND, CASE, BEGIN, and DO are considered bodies.
- Bodies can have definitions intermixed with expressions.
- Definitions are executed in order and can reference variables defined by previous definitions.
- Any body with just definitions treated like BEGIN.
- Binding can be just a variable in LET, LET*, and LETREC.
- DEFINE can take a single argument.
- (EQV? "" "") ==> #T (EQV? #() #()) ==> #T
- The procedures LENGTH, APPEND, REVERSE, and COPY are generic.
- EOF objects, input ports, and output ports are disjoint from each other and all other data types.
- All EOF objects are EQ?.
A future version of this document will describe all of the above extensions in greater detail.
INSTALLATION
The Stalin compiler is written in Scheme and can compile itself. (While I have run Stalin under other Scheme implementations such as Scheme->C, SCM, and Gambit-C, this requires some modification of the Stalin source code.) To compile and use Stali
