Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2011-03-31

A Quick Look at the Rust Programming Language

The Rust Programming Language is a systems programming language being developed by Mozilla. It was announced last year and has seen quite a bit of development since then.

I've only been lightly following the development over the past year but recently decided to spend a bit more time looking at it. The following is a look at the language and implementation from someone who isn't heavily involved in the development so don't take anything I write as gospel. Hopefully I don't get too many things wrong.

The Rust Language FAQ lists the following summary of features:

  • Memory safe. No null pointers, wild pointers, etc. Automatic storage management.
  • Mutability control. Immutable by default. No shared mutable state across tasks.
  • Dynamic execution safety: task failure / unwinding, trapping, logging. RAII / dtors.
  • Typestate system: ability to define complex invariants that hold over data structures.
  • Explicit memory control. Layout and allocation control. Interior / value types.
  • Very lightweight tasks (coroutines). Cheap to spawn thousands-to-millions.
  • Stack iterators (effectively lambda-blocks w/o heap allocation).
  • Static, native compilation. Emits ELF / PE / Mach-o files.
  • Direct and simple interface to C code (switch stacks and call, ~8 insns).
  • Multi-paradigm. pure-functional, concurrent-actor, imperative-procedural, OO.
  • First class functions with bindings.
  • Structurally-typed objects (no nominal types or type hierarchy).
  • Multi-platform. Developed on Windows, Linux, OSX.
  • UTF8 strings, assortment of machine-level types.
  • Works with existing native toolchains. GDB / Valgrind / Shark / etc.
  • Practical rule-breaking: can break safety rules, if explicit about where and how.

The Rust implementation is still in the 'only useful for Rust developers' state since the implementation and libraries are still being developed. There are two Rust compilers in various states of development. They are:

  • rustboot - written in O'Caml with it's own x86 code generation backend. This is being used to bootstrap the 'rustc' implementation.
  • rustc - self hosting compiler written in Rust and bootstrapped using 'rustbost'. Code generation is done using LLVM.

rustboot can be used to write Rust programs and try out the language. rustc is still in heavy development and only very basic stuff works from what I can see.

Building

Please note, as of 2011-10-24, the instructions for building Rust below are out of date. For updated instructions read my recent post on building Rust.

The instructions to build Rust are in the wiki. One important point is you need an SVN version of LLVM. I built this from source using a git mirror of LLVM (on 64 bit x86 Linux):

$ git clone git://github.com/earl/llvm-mirror.git
$ cd llvm-miror
~/llvm-mirror $ CXX='g++ -m32' CC='gcc -m32' CFLAGS=-m32 CXXFLAGS=-m32 \
                LDFLAGS=-m32 ./configure --enable-shared --disable-bindings \
                 --{build,host,target}=i686-unknown-linux-gnu \
                 --enable-targets=x86,x86_64,cbe
~/llvm-mirror $ make && make install

Once you have LLVM and the other pre-requisites installed:

$ git clone git://github.com/graydon/rust.git
$ cd rust
~/rust $ mkdir build
~/rust $ cd build
~/rust/build $ ../configure
~/rust/build $ make check

This builds the rustboot compiler, uses that to compile rustc, then runs a series of tests using both builds. If you have valgrind installed this will be slow as the tests are run under valgrind.

Bootstrap Compiler

The bootstrap compiler executable is boot/rustboot. The command to execute it, assuming the build occurred in the directory ~/rust/build is:

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o hello hello.rs

This will compile a Rust program in hello.rs and leave an executable hello. The -L argument references the boot subdirectory containing the Rust standard library in libstd.so. A 'hello world' program for testing is:

use std;

fn main() {
  log "Hello World";
}

To run the executable you'll need to adjust the LD_LIBRARY_PATH to include the rt subdirectory to pick up the Rust runtime shared library:

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o test test.rs
~/rust/build $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/rust/build/rt
~/rust/build $ ./hello
rt: ---
rt: ca09:main:main: rust: Hello World

rustc Compiler

The rustc compiler lives in stage0/rustc. The output of this compiler is LLVM bytecode which must then be compiled using LLVM tools. To compile the hello.rs program mentioned in the previous section:

~/rust/build $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/rust/build/rt:~/rust/build/rustllvm
~/rust/build $ stage0/rustc -nowarn -L stage0 -o hello.bc hello.rs
~/rust/build $ llc -march=x86 -relocation-model=pic -o hello.s hello.bc
~/rust/build $ gcc -fPIC -march=i686 -m32 -fno-rtti -fno-exceptions -g \
               -o hello.o -c hello.s
~/rust/build $ gcc -fPIC -march=i686 -m32 -fno-rtti -fno-exceptions -g \
               stage0/glue.o -o hello hello.o -Lstage0 -Lrt -lrustrt

Note the need to add the rustllvm directory to LD_LIBRARY_PATH to pick up a shared library. That sequence of commands will compile the hello.rs file to LLVM bytecode. llc compiles the bytecode to x86 assembly. gcc compiles this to an object file, followed by a final gcc invocation to link it. And to run:

~/rust/build $ ./hello
rt: ---
rt: ee0b:main:main: rust: Hello World

Rust Language Details

For examples of the Rust language there are multiple tests and the source code to rustc itself in the github repository. The wiki has a link to the PDF documentation, currently a snapshot from 2011-02-25.

The following are some quick examples of Rust features that work with the bootstrap compiler.

Foreign Function Interface

This program uses the C FFI to call the 'puts' function from the C shared library:

use std;
import std._str.rustrt.sbuf;
import std._str;

native mod libc = "libc.so.6" {
    fn puts(sbuf s) -> int;
}

unsafe fn main() {
  libc.puts(_str.buf("hello from C\n"));
}

Compile and run with:

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o cffi cffi.rs
~/rust/build $ ./cffi
hello from C

Command Line Arguments

use std;

fn main(vec[str] args) {
  for(str s in args) {
    log s;
  }
}

The main function takes a vector of strings as the argument. This holds the command line arguments passed to the program. The example iterates over the vector printing out each element.

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o args args.rs
~/rust/build $ ./args a b c
rt: ---
rt: ccca:main:main: rust: ./args
rt: ccca:main:main: rust: a
rt: ccca:main:main: rust: b
rt: ccca:main:main: rust: c

Factorial

use std;

fn fac(uint x) -> uint {
  if (x <= 1u) {
    ret 1u;
  }
  else {
    ret x * fac(x-1u);
  }
}

fn main() {
  log fac(5u);
}

No language is complete without showing how factorial can be computed.

~/rust/build $ ./fac
rt: ---
rt: d158:main:main: rust: 120 (0x78)

Spawning Tasks

use std;

impure fn logger(port[str] logs) {
  let int i = 0;
  while (i < 2) {
    auto msg <- logs;
    log msg;
    i = i + 1;
  }
  log "logger exited";
}

impure fn main() {
  let port[str] logs = port();
  let task p = spawn logger(logs);
  auto out = chan(logs);
  out <| "Hello";
  out <| "World";
  join p;
}

A port is the receiving end of a typed inter task communication mechanism. A channel is the sending end of the communication mechanism. <| sends to the port and <- receives from the channel.

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o spawn spawn.rs
~/rust/build $ ./spawn
rt: ---
rt: 9a73:main:spawn.rs:1: rust: Hello
rt: 9a73:main:spawn.rs:1: rust: World
rt: 9a73:main:spawn.rs:1: rust: logger exited

Types

use std;

tag list {
  nil;
  cons(int, @list);
}

fn list_len(@list l) -> uint {
   fn len(@list l, &uint n) -> uint {
    alt (*l) {
      case (nil) {
        ret n;
      }
      case (cons(_, ?xs)) {
        ret len(xs, n+1u);
      }
    }
  }
  ret len(l, 0u);
}

fn main() {
  let @list l = @cons(1, @cons(2, @cons(3, @nil)));
  log list_len(l);
}

This example creates a list type. It has constructors for an empty list, nil, and a cons containing an integer and the rest of the list. The '@' prefixed to the list type in places means that that variable holds a boxed object. That is, it's a reference-counted heap allocation.

The list_len function shows the use of a local function which pattern matches over the list (using the alt keyword) and keeps a running total of the list length. The main function creates a list and prints the length.

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o types types.rs
~/rust/build $ ./types
rt: ---
rt: 8126:main:main: rust: 3 (0x3)

Typestate

The typestate system is one of the things that most interests me about Rust. If you've been reading my ATS posts, in particular the ones involving type safe checking of array bounds, you'll know I'm interested in languages that can help with detecting programming problems at compile time. It seems to me that the typesafe system can help here.

Here's a contrived example demonstrating one use of typestate. main takes a vector of strings containing the command line arguments. I want to call a function, dosomething, that will use these arguments but for some reason there must be never more than 2 arguments. Imagine I'm calling some C routine that'll die if I do.

I could check at runtime that the number of arguments is less than three. Here's an example that does this:

use std;
import std._vec;

fn dosomething(&vec[str] args) {
  log "vector length is less than 3";
}

fn main(vec[str] args) {
  log _vec.len[str](args);
  check (_vec.len[str](args) < 3u);
  dosomething(args);
}

Some example runs:

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o ts1 ts1.rs
~/rust/build $ ./ts1 a
rt: ---
rt: b3b1:main:main: rust: 2 (0x2)
rt: b3b1:main:main: rust: vector length is less than 3
~/rust/build $ ./ts1 a b
rt: ---
rt: d884:main:main: rust: 3 (0x3)
rt: d884:main:main: upcall fail '.t0 < 3u', ts2.rs:5

Notice the last invocation has failed since three arguments are used (the program name is counted as an argument).

It would be good to be able to check at compile time that somewhere the assertion holds that the number of arguments is less than three. In our example if we left the check call out, the program would still compile, but dosomething might do something disasterous. We can tell the typestate system that the precondition must hold for callers of dosomething and to fail at compile time by adding a 'prove statement' to the function:

use std;
import std._vec;

fn prove_length(&vec[str] args, uint n) -> bool {
  ret _vec.len[str](args) < n;
}

fn dosomething(&vec[str] args) : prove_length(args, 3u) {
  log "vector length is less than 3";
}

fn main(vec[str] args) {
  log _vec.len[str](args);
  dosomething(args);
}

The addition of : prove_length(args, 3u) to the function tells the typestate system that this boolean function must evaluate to true. It examines what it knows of the constraints made via check statements and the like to see if it can prove that this is the case. If it is not then a compile error occurs. The above program will fail to compile:

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o ts2 ts2.rs
ts2.rs:14:13:14:19: error: Unsatisfied precondition constraint prove_length(args, 3u) 

If we add a check statement we are adding an assertion check that this precondition holds. Typestate makes note of this and will now allow that call to dosomething to compile:

use std;
import std._vec;

fn prove_length(&vec[str] args, uint n) -> bool {
  ret _vec.len[str](args) < n;
}

fn dosomething(&vec[str] args) : prove_length(args, 3u) {
  log "vector length is less than 3";
}

fn main(vec[str] args) {
  log _vec.len[str](args);
  check prove_length(arg, 3u);
  dosomething(args);
}

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o ts3 ts3.rs
~/rust/build $ ./ts3
rt: ---
rt: d9e6:main:main:                       rust: 1 (0x1)
rt: d9e6:main:main:                       rust: vector length is less than 3

It'll be interesting to see how typestate is used in the standard library and third party libraries to help with compile time checking of code.

Conclusion

Rust is an interesting language and there's quite a bit more to it than I've covered here. I just picked random features to try. I'm looking forward to rustc being more complete and trying out some of the language features in more 'real world' examples to get a feel for it.

Tags


This site is accessable over tor as hidden service 6vp5u25g4izec5c37wv52skvecikld6kysvsivnl6sdg6q7wy25lixad.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-61/


Tags

Archives
Links