Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2011-04-16

My Git Workflow for Mozilla Development

Since my post on converting mozilla-central to git I've had a few requests about what my git workflow is for Mozilla Development.

When working on bugs I'll create a git branch for development. The workflow in this branch looks like:

git fetch origin
git checkout -b bug/123456 origin/master
...make changes...
git commit
...make more changes...
git commit

Occasionally I'll want to merge with the latest code from mozilla-central. I do this by rebasing my changes on top of the latest trunk code:

git fetch origin
git rebase origin/master

I'll often have multiple temporary branches as I work on different ideas during the fix. I find being able to diff between the branches, cherry pick patches, etc useful.

When I've finished with the fix and want to generate a patch for review I rebase on top of trunk and squash all my commits down into one patch. I do this use git's interactive rebase:

git fetch origin
git rebase -i origin/master
...squash commits into one commit and set commit message to my checkin commit message...
git hgp >~/mypatch.patch

The last command, git hgp, uses an alias I got from Rafael Espindola's Blog. You can install this alias by adding the following to your ~/.gitconfig file:

[alias]
hgp = show --format=\"From: %an <%ae>%n%s%n%b\" -U8

The adds an 'hgp' git command that does the same as git show but includes a header for the committers name. This allows an hg import of the patch to include the correct patch authors details which is useful if you use the checkin-needed keyword for others to commit the patch.

I attach the patch file to the bug for review. If I get review comments I then go back to the git branch and make the necessary changes. Regenerating the patch involves repeating the steps above.

When the patch needs to be committed to mozilla-central it can be imported and pushed directly using mercurial:

hg clone https://hg.mozilla.org/mozilla-central
cd mozilla-central
hg import ~/mypatch.patch
hg push
Tags: mozilla  git 

2011-03-31

A Quick Look at the Rust Programming Language

The Rust Programming Language is a systems programming language being developed by Mozilla. It was announced last year and has seen quite a bit of development since then.

I've only been lightly following the development over the past year but recently decided to spend a bit more time looking at it. The following is a look at the language and implementation from someone who isn't heavily involved in the development so don't take anything I write as gospel. Hopefully I don't get too many things wrong.

The Rust Language FAQ lists the following summary of features:

  • Memory safe. No null pointers, wild pointers, etc. Automatic storage management.
  • Mutability control. Immutable by default. No shared mutable state across tasks.
  • Dynamic execution safety: task failure / unwinding, trapping, logging. RAII / dtors.
  • Typestate system: ability to define complex invariants that hold over data structures.
  • Explicit memory control. Layout and allocation control. Interior / value types.
  • Very lightweight tasks (coroutines). Cheap to spawn thousands-to-millions.
  • Stack iterators (effectively lambda-blocks w/o heap allocation).
  • Static, native compilation. Emits ELF / PE / Mach-o files.
  • Direct and simple interface to C code (switch stacks and call, ~8 insns).
  • Multi-paradigm. pure-functional, concurrent-actor, imperative-procedural, OO.
  • First class functions with bindings.
  • Structurally-typed objects (no nominal types or type hierarchy).
  • Multi-platform. Developed on Windows, Linux, OSX.
  • UTF8 strings, assortment of machine-level types.
  • Works with existing native toolchains. GDB / Valgrind / Shark / etc.
  • Practical rule-breaking: can break safety rules, if explicit about where and how.

The Rust implementation is still in the 'only useful for Rust developers' state since the implementation and libraries are still being developed. There are two Rust compilers in various states of development. They are:

  • rustboot - written in O'Caml with it's own x86 code generation backend. This is being used to bootstrap the 'rustc' implementation.
  • rustc - self hosting compiler written in Rust and bootstrapped using 'rustbost'. Code generation is done using LLVM.

rustboot can be used to write Rust programs and try out the language. rustc is still in heavy development and only very basic stuff works from what I can see.

Building

Please note, as of 2011-10-24, the instructions for building Rust below are out of date. For updated instructions read my recent post on building Rust.

The instructions to build Rust are in the wiki. One important point is you need an SVN version of LLVM. I built this from source using a git mirror of LLVM (on 64 bit x86 Linux):

$ git clone git://github.com/earl/llvm-mirror.git
$ cd llvm-miror
~/llvm-mirror $ CXX='g++ -m32' CC='gcc -m32' CFLAGS=-m32 CXXFLAGS=-m32 \
                LDFLAGS=-m32 ./configure --enable-shared --disable-bindings \
                 --{build,host,target}=i686-unknown-linux-gnu \
                 --enable-targets=x86,x86_64,cbe
~/llvm-mirror $ make && make install

Once you have LLVM and the other pre-requisites installed:

$ git clone git://github.com/graydon/rust.git
$ cd rust
~/rust $ mkdir build
~/rust $ cd build
~/rust/build $ ../configure
~/rust/build $ make check

This builds the rustboot compiler, uses that to compile rustc, then runs a series of tests using both builds. If you have valgrind installed this will be slow as the tests are run under valgrind.

Bootstrap Compiler

The bootstrap compiler executable is boot/rustboot. The command to execute it, assuming the build occurred in the directory ~/rust/build is:

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o hello hello.rs

This will compile a Rust program in hello.rs and leave an executable hello. The -L argument references the boot subdirectory containing the Rust standard library in libstd.so. A 'hello world' program for testing is:

use std;

fn main() {
  log "Hello World";
}

To run the executable you'll need to adjust the LD_LIBRARY_PATH to include the rt subdirectory to pick up the Rust runtime shared library:

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o test test.rs
~/rust/build $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/rust/build/rt
~/rust/build $ ./hello
rt: ---
rt: ca09:main:main: rust: Hello World

rustc Compiler

The rustc compiler lives in stage0/rustc. The output of this compiler is LLVM bytecode which must then be compiled using LLVM tools. To compile the hello.rs program mentioned in the previous section:

~/rust/build $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/rust/build/rt:~/rust/build/rustllvm
~/rust/build $ stage0/rustc -nowarn -L stage0 -o hello.bc hello.rs
~/rust/build $ llc -march=x86 -relocation-model=pic -o hello.s hello.bc
~/rust/build $ gcc -fPIC -march=i686 -m32 -fno-rtti -fno-exceptions -g \
               -o hello.o -c hello.s
~/rust/build $ gcc -fPIC -march=i686 -m32 -fno-rtti -fno-exceptions -g \
               stage0/glue.o -o hello hello.o -Lstage0 -Lrt -lrustrt

Note the need to add the rustllvm directory to LD_LIBRARY_PATH to pick up a shared library. That sequence of commands will compile the hello.rs file to LLVM bytecode. llc compiles the bytecode to x86 assembly. gcc compiles this to an object file, followed by a final gcc invocation to link it. And to run:

~/rust/build $ ./hello
rt: ---
rt: ee0b:main:main: rust: Hello World

Rust Language Details

For examples of the Rust language there are multiple tests and the source code to rustc itself in the github repository. The wiki has a link to the PDF documentation, currently a snapshot from 2011-02-25.

The following are some quick examples of Rust features that work with the bootstrap compiler.

Foreign Function Interface

This program uses the C FFI to call the 'puts' function from the C shared library:

use std;
import std._str.rustrt.sbuf;
import std._str;

native mod libc = "libc.so.6" {
    fn puts(sbuf s) -> int;
}

unsafe fn main() {
  libc.puts(_str.buf("hello from C\n"));
}

Compile and run with:

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o cffi cffi.rs
~/rust/build $ ./cffi
hello from C

Command Line Arguments

use std;

fn main(vec[str] args) {
  for(str s in args) {
    log s;
  }
}

The main function takes a vector of strings as the argument. This holds the command line arguments passed to the program. The example iterates over the vector printing out each element.

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o args args.rs
~/rust/build $ ./args a b c
rt: ---
rt: ccca:main:main: rust: ./args
rt: ccca:main:main: rust: a
rt: ccca:main:main: rust: b
rt: ccca:main:main: rust: c

Factorial

use std;

fn fac(uint x) -> uint {
  if (x <= 1u) {
    ret 1u;
  }
  else {
    ret x * fac(x-1u);
  }
}

fn main() {
  log fac(5u);
}

No language is complete without showing how factorial can be computed.

~/rust/build $ ./fac
rt: ---
rt: d158:main:main: rust: 120 (0x78)

Spawning Tasks

use std;

impure fn logger(port[str] logs) {
  let int i = 0;
  while (i < 2) {
    auto msg <- logs;
    log msg;
    i = i + 1;
  }
  log "logger exited";
}

impure fn main() {
  let port[str] logs = port();
  let task p = spawn logger(logs);
  auto out = chan(logs);
  out <| "Hello";
  out <| "World";
  join p;
}

A port is the receiving end of a typed inter task communication mechanism. A channel is the sending end of the communication mechanism. <| sends to the port and <- receives from the channel.

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o spawn spawn.rs
~/rust/build $ ./spawn
rt: ---
rt: 9a73:main:spawn.rs:1: rust: Hello
rt: 9a73:main:spawn.rs:1: rust: World
rt: 9a73:main:spawn.rs:1: rust: logger exited

Types

use std;

tag list {
  nil;
  cons(int, @list);
}

fn list_len(@list l) -> uint {
   fn len(@list l, &uint n) -> uint {
    alt (*l) {
      case (nil) {
        ret n;
      }
      case (cons(_, ?xs)) {
        ret len(xs, n+1u);
      }
    }
  }
  ret len(l, 0u);
}

fn main() {
  let @list l = @cons(1, @cons(2, @cons(3, @nil)));
  log list_len(l);
}

This example creates a list type. It has constructors for an empty list, nil, and a cons containing an integer and the rest of the list. The '@' prefixed to the list type in places means that that variable holds a boxed object. That is, it's a reference-counted heap allocation.

The list_len function shows the use of a local function which pattern matches over the list (using the alt keyword) and keeps a running total of the list length. The main function creates a list and prints the length.

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o types types.rs
~/rust/build $ ./types
rt: ---
rt: 8126:main:main: rust: 3 (0x3)

Typestate

The typestate system is one of the things that most interests me about Rust. If you've been reading my ATS posts, in particular the ones involving type safe checking of array bounds, you'll know I'm interested in languages that can help with detecting programming problems at compile time. It seems to me that the typesafe system can help here.

Here's a contrived example demonstrating one use of typestate. main takes a vector of strings containing the command line arguments. I want to call a function, dosomething, that will use these arguments but for some reason there must be never more than 2 arguments. Imagine I'm calling some C routine that'll die if I do.

I could check at runtime that the number of arguments is less than three. Here's an example that does this:

use std;
import std._vec;

fn dosomething(&vec[str] args) {
  log "vector length is less than 3";
}

fn main(vec[str] args) {
  log _vec.len[str](args);
  check (_vec.len[str](args) < 3u);
  dosomething(args);
}

Some example runs:

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o ts1 ts1.rs
~/rust/build $ ./ts1 a
rt: ---
rt: b3b1:main:main: rust: 2 (0x2)
rt: b3b1:main:main: rust: vector length is less than 3
~/rust/build $ ./ts1 a b
rt: ---
rt: d884:main:main: rust: 3 (0x3)
rt: d884:main:main: upcall fail '.t0 < 3u', ts2.rs:5

Notice the last invocation has failed since three arguments are used (the program name is counted as an argument).

It would be good to be able to check at compile time that somewhere the assertion holds that the number of arguments is less than three. In our example if we left the check call out, the program would still compile, but dosomething might do something disasterous. We can tell the typestate system that the precondition must hold for callers of dosomething and to fail at compile time by adding a 'prove statement' to the function:

use std;
import std._vec;

fn prove_length(&vec[str] args, uint n) -> bool {
  ret _vec.len[str](args) < n;
}

fn dosomething(&vec[str] args) : prove_length(args, 3u) {
  log "vector length is less than 3";
}

fn main(vec[str] args) {
  log _vec.len[str](args);
  dosomething(args);
}

The addition of : prove_length(args, 3u) to the function tells the typestate system that this boolean function must evaluate to true. It examines what it knows of the constraints made via check statements and the like to see if it can prove that this is the case. If it is not then a compile error occurs. The above program will fail to compile:

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o ts2 ts2.rs
ts2.rs:14:13:14:19: error: Unsatisfied precondition constraint prove_length(args, 3u) 

If we add a check statement we are adding an assertion check that this precondition holds. Typestate makes note of this and will now allow that call to dosomething to compile:

use std;
import std._vec;

fn prove_length(&vec[str] args, uint n) -> bool {
  ret _vec.len[str](args) < n;
}

fn dosomething(&vec[str] args) : prove_length(args, 3u) {
  log "vector length is less than 3";
}

fn main(vec[str] args) {
  log _vec.len[str](args);
  check prove_length(arg, 3u);
  dosomething(args);
}

~/rust/build $ OCAMLRUNPARAM="b1" boot/rustboot -L boot -o ts3 ts3.rs
~/rust/build $ ./ts3
rt: ---
rt: d9e6:main:main:                       rust: 1 (0x1)
rt: d9e6:main:main:                       rust: vector length is less than 3

It'll be interesting to see how typestate is used in the standard library and third party libraries to help with compile time checking of code.

Conclusion

Rust is an interesting language and there's quite a bit more to it than I've covered here. I just picked random features to try. I'm looking forward to rustc being more complete and trying out some of the language features in more 'real world' examples to get a feel for it.

Tags: mozilla  rust 

2011-03-31

New Wave Audio Backend Landed

Yesterday I landed bug 635649 on mozilla-central. This is a refactoring of the wave backend for the HTML audio element. The initial work on this bug was done by Matthew Gregan with me completing the patch.

Prior to this landing the wave backend implemented the HTML audio API itself, sharing very little implementation details with the other audio backends. This meant the backend had to implement the intricate details of what events get raised and when they get raised. When we wrote the WebM backend we refactored the code so we could share this sort of thing amongst backends. This bug refactored the wave backend to share this code.

The advantages of this are:

  • Less likelihood of 'wave backend only' bugs.
  • Less code to maintain.
  • The Audio Data API can now use wave files as this was implemented in the shared backend code.
Tags: mozilla  firefox  audio 

2011-03-26

Bitcoin continues to grow

A few months ago I posted a short introduction to bitcoin. Since that post a number of new services using bitcoin have appeared and it has become a little easier to trade bitcoins for other currencies.

If you bought bitcoins around the time of my last post you would have paid about $0.06 USD per bitcoin. The current exchange rate from Mt. Gox at the time of writing this article is about $0.87 USD per bitcoin. That's a big increase if you've been sitting on bitcoins since then.

The We Use Coins website contains a great introduction to bitcoins along with an animated video about how it works:

Some high profile websites have start accepted bitcoins as donations:

  • The Electronic Frontier Foundation (EFF) allows donating by sending to a bitcoin address listed on their Ways You Can Help EFF page.
  • The Freenet Project lists a bitcoin address for donations on their donate page.
  • The Singularity Institute has a bitcoin address on their donations page.
  • The I2P Anonymous Network accepts bitcoins.

Some new services have come online providing ways of buying and seeing bitcoins:

  • CoinPal allows purchasing of bitcoins using PayPal.
  • CoinCard allows using bitcoins to buy gift cards. Currently you can buy Domino's Pizza gift cards and PayPal payments. The latter effectively allows you to trade your bitcoins for US dollars.
  • Mt. Gox is still going strong and is probably the leading bitcoin exchange.
  • BitcoinUSA is an exchange that focuses on complying with all US Government regulations regarding financial exchanges (including 'Know Your Customer' requirements).
  • bitcoin-otc is an 'over the counter' marketplace that operates from the #bitcoin-otc IRC channel on irc.freenode.net. Trades occur between parties in the channel and includes a web of trust database where you can rate people based on the trade. The web of trusts uses GPG authentication to identify users.

GPG is becoming a popular way for bitcoin traders to associate other trust data (like EBay feedback) with their registered identity in the web of trust. CoinPal also provides a way to link with the web of trust to get higher purchase limits.

Information on the bitcoin network and economy is available from sites like:

  • Bitcoin Block Explorer. This is a webview of the bitcoin block database. Each bitcoin block and transaction can be viewed. Indvidual addresses can be examined.
  • Bitcoin Charts shows detailed information on all the bitcoin markets.

Bitcoins are generated by running mining software that performs a computationally expensive task. Miners are effectively participating in a lottery and when they solve the task they 'win'. The prize is 50 bitcoins and this is known as generating a 'block'. The block is published to the P2P network and it contains the transactions that are accepted by the network.

As the network power of the bitcoin P2P network has grown so has the chance of winning the lottery decreased. CPU based miners can now take years to generate a block. GPU based miners are now the preferred way of mining. Even with GPU based miners generating blocks can take a long time (days and weeks). Mining pools have started to appear where miners group together to solve the task and share a proportion of the generated 50 bitcoins. So instead of earning 50 bitcoins a month you might earn 1.5 bitcoins a day.

Some pools that are operating are:

Some good places to keep up to date with the happenings in the bitcoin world are:

And finally, some services that use bitcoins:

  • WitCoin. A reddit-like system that uses bitcoin for voting. Note, may sometimes contain NSFW content. Submitters and commenters can earn bitcoins based on upvotes.
  • AutoVPS. Provides VPS hosting that can be paid using bitcoins.
  • YouTipIt. Give and receive tips using bitcoins for online content.
  • BiddingPond. Internet auctions using bitcoins.
  • Ubitious. File sharing site where downloaders pay bitcoins to get content and the submitter gets a portion.
  • BitcoinService. Another file sharing site.
  • Others. Many more listed on the bitcoin wiki.
Tags: bitcoin 

2011-02-27

Linear Datatypes in ATS

A datatype in ATS is similar to types defined in languages like ML and Haskell. They are objects allocated in the heap controlled by the garbage collector. The programmer does not need to explicitly free the memory or manage the lifetime of the object.

A dataviewtype is similar to a datatype in that it is an object allocated on the heap. A dataviewtype is a linear type. The programmer must explicitly free the memory allocated and control the lifetime of the object. The type checker tracks usage of the linear type and ensures at the type level that it is consumed and that dangling references to it are not kept around.

To compare the datatype and dataviewtype I'll start with a simple definition of a list type using datatype:

datatype list (a:t@ype) =
  | nil (a)
  | cons (a) of (a, list a)

fun {seed:t@ype}{a:t@ype} fold(f: (seed,a) -<> seed, s: seed, xs: list a): seed =
  case+ xs of
  | nil () => s
  | cons (x, xs) => fold(f, f(s, x), xs)

implement main() = let
  val a = cons(1, cons(2, cons(3, cons(4, nil))))
  val b = fold(add_int_int, 0, a)
  val c = fold(mul_int_int, 1, a)
  val () = printf("b: %d c: %d\n", @(b, c))
in
  ()
end

This program defines a list type that can hold instances of a particular type. A function fold is defined that performs a fold on a list. The main routine creates a list, a, and performs two folds. One to get a sum of the list and the other to get the product. The results of these are printed.

The fold function is similar to function templates in C++. The syntax fun {seed:t@type}{a:t@ype} defines a function template with two arguments. The type of the seed for the fold and the type of the list elements. When fold is called the function is instantiated with the actual types used.

add_int_int and mul_int_int are two ATS prelude functions that are defined to perform addition and multiplication respectively on two integers.

Notice there is no manual memory management. The list held in a is on the garbage collected heap. This ATS program must be built with -D_ATS_GCATS to link in the garbage collector to reclaim memory used by the list type:

atscc -D_ATS_GCATS test.dats

The above example shows that programming with datatypes is very similar to any other garbage collected functional programming language.

The following is the same program converted to use dataviewtype:

dataviewtype list (a:t@ype) =
  | nil (a)
  | cons (a) of (a, list a)

fun {a:t@ype} free(xs: list a): void =
  case+ xs of
  | ~nil () => ()
  | ~cons (x, xs) => free(xs)

fun {seed:t@ype}{a:t@ype} fold(f: (seed,a) -<> seed, s: seed, xs: !list a): seed =
  case+ xs of
  | nil () => (fold@ xs; s)
  | cons (x, !xs1) => let val s = fold(f, f(s, x), !xs1) in fold@ xs; s end

implement main() = let
  val a = cons(1, cons(2, cons(3, cons(4, nil))))
  val b = fold(add_int_int, 0, a)
  val c = fold(mul_int_int, 1, a)
  val () = printf("b: %d c: %d\n", @(b, c))
in
  free(a)
end

The definition of the list looks the same but uses the keyword dataviewtype. This makes list a linear type. Functions that accept linear types will show in the argument list whether they consume or leave unchanged the linear type. This is denoted by an exclamation mark prefixed to the type. For example:

// Consumes (free's) the linear type
fun consume_me(xs: list a) = ...

// Does not consume the linear type
fun use_me(xs: !list a) = ...

implement main() = let
  val a = nil
  val () = use_me(a)
  // Can use 'a' here because 'use_me' did not consume the linear type
  val () = consume_me(a)
  // The following line is a compile error because
  // 'consume_me' consumed the type
  val () = use_me(a)
in () end

Notice in the new definition of fold that the xs argument type is prefixed with an exclamation mark meaning it does not consume the linear type. This allows calling fold multiple times on the same list.

The implementation of fold had to be changed slightly to use the linear type without consuming it. The body of the function is:

case+ xs of
| nil () => (fold@ xs; s)
| cons (x, !xs1) => let val s = fold(f, f(s, x), !xs1) in fold@ xs; s end

Notice the fold@ call and the use of !. xs is a linear value (ie. our linear list). Because the linear value matches a pattern in a case statement that has a ! then the type of the linear value xs changes to be cons(int, an_address) where an_address is a memory address. xs1 is assigned the type ptr(an_address) where the value of xs is stored at address an_address. This is why the !xs1 is needed later. Since xs1 is now a pointer it must be dereferenced before calling fold again to pass the actual list. This pattern matching process is known as 'unfolding a linear value' in the documentation.

As the type of xs has changed during the pattern matching we need to change it back in the body of the pattern. The fold@ call returns the type of xs from cons(int,an_address) back to cons(int,list int). This is referred to as 'folding a linear value' in the documentation.

After using the list a in main we need to free the memory. This is done with the free function which iterates over the list consuming each element. Notice that free does not have an exclamation mark prefixed to the argument type.

Each line in the case statement in free has a ~ prefixed to the constructor to be pattern matched against. This is known as a destruction pattern. If a linear value matches the pattern then the pattern variables are bound and the linear value is free'd. The result of this is free walks the list freeing each element.

case+ xs of
  | ~nil () => ()
  | ~cons (x, xs) => free(xs)

There's a lot of detail going on under the hood but the code between the two versions is very similar. ATS makes it relatively easy to deal with the complexities of manual memory management and fails at compile time if it's not managed correctly. Much of my interest in ATS is how to write low level code that manages memory safely.

Tags: ats 


This site is accessable over tor as hidden service 6vp5u25g4izec5c37wv52skvecikld6kysvsivnl6sdg6q7wy25lixad.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-61/


Tags

Archives
Links