2011-03-31
A Quick Look at the Rust Programming Language
The Rust Programming Language is a systems programming language being developed by Mozilla. It was announced last year and has seen quite a bit of development since then.
I've only been lightly following the development over the past year but recently decided to spend a bit more time looking at it. The following is a look at the language and implementation from someone who isn't heavily involved in the development so don't take anything I write as gospel. Hopefully I don't get too many things wrong.
The Rust Language FAQ lists the following summary of features:
- Memory safe. No null pointers, wild pointers, etc. Automatic storage management.
- Mutability control. Immutable by default. No shared mutable state across tasks.
- Dynamic execution safety: task failure / unwinding, trapping, logging. RAII / dtors.
- Typestate system: ability to define complex invariants that hold over data structures.
- Explicit memory control. Layout and allocation control. Interior / value types.
- Very lightweight tasks (coroutines). Cheap to spawn thousands-to-millions.
- Stack iterators (effectively lambda-blocks w/o heap allocation).
- Static, native compilation. Emits ELF / PE / Mach-o files.
- Direct and simple interface to C code (switch stacks and call, ~8 insns).
- Multi-paradigm. pure-functional, concurrent-actor, imperative-procedural, OO.
- First class functions with bindings.
- Structurally-typed objects (no nominal types or type hierarchy).
- Multi-platform. Developed on Windows, Linux, OSX.
- UTF8 strings, assortment of machine-level types.
- Works with existing native toolchains. GDB / Valgrind / Shark / etc.
- Practical rule-breaking: can break safety rules, if explicit about where and how.
The Rust implementation is still in the 'only useful for Rust developers' state since the implementation and libraries are still being developed. There are two Rust compilers in various states of development. They are:
- rustboot - written in O'Caml with it's own x86 code generation backend. This is being used to bootstrap the 'rustc' implementation.
- rustc - self hosting compiler written in Rust and bootstrapped using 'rustbost'. Code generation is done using LLVM.
rustboot
can be used to write Rust programs and try out the language. rustc
is still in heavy development and only very basic stuff works from what I can see.
Building
Please note, as of 2011-10-24, the instructions for building Rust below are out of date. For updated instructions read my recent post on building Rust.
The instructions to build Rust are in the wiki. One important point is you need an SVN version of LLVM. I built this from source using a git mirror of LLVM (on 64 bit x86 Linux):
$ git clone git://github.com/earl/llvm-mirror.git
$ cd llvm-miror
~/llvm-mirror $ CXX='g++ -m32' CC='gcc -m32' CFLAGS=-m32 CXXFLAGS=-m32 \
LDFLAGS=-m32 ./configure --enable-shared --disable-bindings \
--{build,host,target}=i686-unknown-linux-gnu \
--enable-targets=x86,x86_64,cbe
~/llvm-mirror $ make && make install
Once you have LLVM and the other pre-requisites installed:
$ git clone git://github.com/graydon/rust.git
$ cd rust
~/rust $ mkdir build
~/rust $ cd build
~/rust/build $ ../configure
~/rust/build $ make check
This builds the rustboot
compiler, uses that to compile rustc
, then runs a series of tests using both builds. If you have valgrind
installed this will be slow as the tests are run under valgrind
.
Bootstrap Compiler
The bootstrap compiler executable is boot/rustboot
. The command to execute it, assuming the build occurred in the directory ~/rust/build
is:
~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o hello hello.rs
This will compile a Rust program in hello.rs
and leave an executable hello
. The -L
argument references the boot
subdirectory containing the Rust standard library in libstd.so
. A 'hello world' program for testing is:
use std;
fn main() {
log "Hello World";
}
To run the executable you'll need to adjust the LD_LIBRARY_PATH
to include the rt
subdirectory to pick up the Rust runtime shared library:
~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o test test.rs
~/rust/build $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/rust/build/rt
~/rust/build $ ./hello
rt: ---
rt: ca09:main:main: rust: Hello World
rustc Compiler
The rustc
compiler lives in stage0/rustc
. The output of this compiler is LLVM bytecode which must then be compiled using LLVM tools. To compile the hello.rs
program mentioned in the previous section:
~/rust/build $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/rust/build/rt:~/rust/build/rustllvm
~/rust/build $ stage0/rustc -nowarn -L stage0 -o hello.bc hello.rs
~/rust/build $ llc -march=x86 -relocation-model=pic -o hello.s hello.bc
~/rust/build $ gcc -fPIC -march=i686 -m32 -fno-rtti -fno-exceptions -g \
-o hello.o -c hello.s
~/rust/build $ gcc -fPIC -march=i686 -m32 -fno-rtti -fno-exceptions -g \
stage0/glue.o -o hello hello.o -Lstage0 -Lrt -lrustrt
Note the need to add the rustllvm
directory to LD_LIBRARY_PATH
to pick up a shared library. That sequence of commands will compile the hello.rs
file to LLVM bytecode. llc
compiles the bytecode to x86 assembly. gcc
compiles this to an object file, followed by a final gcc
invocation to link it. And to run:
~/rust/build $ ./hello
rt: ---
rt: ee0b:main:main: rust: Hello World
Rust Language Details
For examples of the Rust language there are multiple tests and the source code to rustc
itself in the github repository. The wiki has a link to the PDF documentation, currently a snapshot from 2011-02-25.
The following are some quick examples of Rust features that work with the bootstrap compiler.
Foreign Function Interface
This program uses the C FFI to call the 'puts' function from the C shared library:
use std;
import std._str.rustrt.sbuf;
import std._str;
native mod libc = "libc.so.6" {
fn puts(sbuf s) -> int;
}
unsafe fn main() {
libc.puts(_str.buf("hello from C\n"));
}
Compile and run with:
~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o cffi cffi.rs
~/rust/build $ ./cffi
hello from C
Command Line Arguments
use std;
fn main(vec[str] args) {
for(str s in args) {
log s;
}
}
The main
function takes a vector of strings as the argument. This holds the command line arguments passed to the program. The example iterates over the vector printing out each element.
~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o args args.rs
~/rust/build $ ./args a b c
rt: ---
rt: ccca:main:main: rust: ./args
rt: ccca:main:main: rust: a
rt: ccca:main:main: rust: b
rt: ccca:main:main: rust: c
Factorial
use std;
fn fac(uint x) -> uint {
if (x <= 1u) {
ret 1u;
}
else {
ret x * fac(x-1u);
}
}
fn main() {
log fac(5u);
}
No language is complete without showing how factorial can be computed.
~/rust/build $ ./fac
rt: ---
rt: d158:main:main: rust: 120 (0x78)
Spawning Tasks
use std;
impure fn logger(port[str] logs) {
let int i = 0;
while (i < 2) {
auto msg <- logs;
log msg;
i = i + 1;
}
log "logger exited";
}
impure fn main() {
let port[str] logs = port();
let task p = spawn logger(logs);
auto out = chan(logs);
out <| "Hello";
out <| "World";
join p;
}
A port is the receiving end of a typed inter task communication mechanism. A channel is the sending end of the communication mechanism. <|
sends to the port and <-
receives from the channel.
~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o spawn spawn.rs
~/rust/build $ ./spawn
rt: ---
rt: 9a73:main:spawn.rs:1: rust: Hello
rt: 9a73:main:spawn.rs:1: rust: World
rt: 9a73:main:spawn.rs:1: rust: logger exited
Types
use std;
tag list {
nil;
cons(int, @list);
}
fn list_len(@list l) -> uint {
fn len(@list l, &uint n) -> uint {
alt (*l) {
case (nil) {
ret n;
}
case (cons(_, ?xs)) {
ret len(xs, n+1u);
}
}
}
ret len(l, 0u);
}
fn main() {
let @list l = @cons(1, @cons(2, @cons(3, @nil)));
log list_len(l);
}
This example creates a list
type. It has constructors for an empty list, nil
, and a cons
containing an integer and the rest of the list. The '@' prefixed to the list
type in places means that that variable holds a boxed object. That is, it's a reference-counted heap allocation.
The list_len
function shows the use of a local function which pattern matches over the list (using the alt
keyword) and keeps a running total of the list length. The main
function creates a list
and prints the length.
~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o types types.rs
~/rust/build $ ./types
rt: ---
rt: 8126:main:main: rust: 3 (0x3)
Typestate
The typestate system is one of the things that most interests me about Rust. If you've been reading my ATS posts, in particular the ones involving type safe checking of array bounds, you'll know I'm interested in languages that can help with detecting programming problems at compile time. It seems to me that the typesafe system can help here.
Here's a contrived example demonstrating one use of typestate. main
takes a vector of strings containing the command line arguments. I want to call a function, dosomething
, that will use these arguments but for some reason there must be never more than 2 arguments. Imagine I'm calling some C routine that'll die if I do.
I could check at runtime that the number of arguments is less than three. Here's an example that does this:
use std;
import std._vec;
fn dosomething(&vec[str] args) {
log "vector length is less than 3";
}
fn main(vec[str] args) {
log _vec.len[str](args);
check (_vec.len[str](args) < 3u);
dosomething(args);
}
Some example runs:
~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o ts1 ts1.rs
~/rust/build $ ./ts1 a
rt: ---
rt: b3b1:main:main: rust: 2 (0x2)
rt: b3b1:main:main: rust: vector length is less than 3
~/rust/build $ ./ts1 a b
rt: ---
rt: d884:main:main: rust: 3 (0x3)
rt: d884:main:main: upcall fail '.t0 < 3u', ts2.rs:5
Notice the last invocation has failed since three arguments are used (the program name is counted as an argument).
It would be good to be able to check at compile time that somewhere the assertion holds that the number of arguments is less than three. In our example if we left the check
call out, the program would still compile, but dosomething
might do something disasterous. We can tell the typestate system that the precondition must hold for callers of dosomething
and to fail at compile time by adding a 'prove statement' to the function:
use std;
import std._vec;
fn prove_length(&vec[str] args, uint n) -> bool {
ret _vec.len[str](args) < n;
}
fn dosomething(&vec[str] args) : prove_length(args, 3u) {
log "vector length is less than 3";
}
fn main(vec[str] args) {
log _vec.len[str](args);
dosomething(args);
}
The addition of : prove_length(args, 3u)
to the function tells the typestate system that this boolean function must evaluate to true. It examines what it knows of the constraints made via check
statements and the like to see if it can prove that this is the case. If it is not then a compile error occurs. The above program will fail to compile:
~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o ts2 ts2.rs
ts2.rs:14:13:14:19: error: Unsatisfied precondition constraint prove_length(args, 3u)
If we add a check
statement we are adding an assertion check that this precondition holds. Typestate makes note of this and will now allow that call to dosomething
to compile:
use std;
import std._vec;
fn prove_length(&vec[str] args, uint n) -> bool {
ret _vec.len[str](args) < n;
}
fn dosomething(&vec[str] args) : prove_length(args, 3u) {
log "vector length is less than 3";
}
fn main(vec[str] args) {
log _vec.len[str](args);
check prove_length(arg, 3u);
dosomething(args);
}
~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o ts3 ts3.rs
~/rust/build $ ./ts3
rt: ---
rt: d9e6:main:main: rust: 1 (0x1)
rt: d9e6:main:main: rust: vector length is less than 3
It'll be interesting to see how typestate is used in the standard library and third party libraries to help with compile time checking of code.
Conclusion
Rust is an interesting language and there's quite a bit more to it than I've covered here. I just picked random features to try. I'm looking forward to rustc
being more complete and trying out some of the language features in more 'real world' examples to get a feel for it.