A quick look at the Pony Programming Language

2015-11-04

A quick look at the Pony Programming Language

Pony is a new programming language described on their site as "an open-source, object-oriented, actor-model, capabilities-secure, high performance programming language."

It has some interesting features and is different enough to existing popular programming languages to make it a nice diversion to experiment with. Some features include:

lightweight actor based concurrency with M:N threading, mapping multiple language level threads to operating system threads.
strong static typing with generics
data-race free. The type system ensures at compile time that a concurrent program can never have data races.
deadlock free. There are no locking mechanisms exposed to the user so there are no deadlocks.
capabilities exposed to the type system to allow compile time enforcing of such things as objects that have no other references to it, immutable values, reference values, etc.
lightweight C FFI

This post is an outline of my initial experiments with the languages including pitfalls to be aware of.

Installing

Pony can be installed from git and run from the build directory:

$ git clone https://github.com/CausalityLtd/ponyc
$ cd ponyc
$ make config=release
$ export PATH=`pwd`/build/release:$PATH
$ ponyc --help

Run tests with:

$ make config=release test

Some of the Pony standard packages dynamically load shared libraries. If they're not installed this will be reflected in build failures during the tests. The required libraries on a Linux based machine are openssl and pcre2-8. To build Pony itself llvm version 3.6 needs to be installed. There is an llvm37 branch on github that works on Linux but is awaiting some llvm37 fixes before it is merged into master.

Pony can be installed in a default location, or using prefix to install it somewhere else:

$ make config=release prefix=/home/user/pony install

One catch is that running ponyc requires it to find the Pony runtime library libponyrt.a for linking purposes. This might not be found if installed somewhere that it doesn't expect. This can be resolved by setting the environment variable LIBRARY_PATH to the directory where libponyrt.a resides. I had to do this for the Nix Pony package.

Compiling Pony programs

A basic "Hello World" application looks like:

actor Main
  new create(env: Env) =>
    env.out.print("hello world")

Place this in a main.pony file in a directory and compile:

$ mkdir hello
$ cat >hello/main.pony
  actor Main
   new create(env: Env) =>
     env.out.print("hello world")
$ ponyc hello
$ ./hello1
hello world

ponyc requires a directory as an argument and it compiles the *.pony files in that directory. It generates an executable based on the directory name, with a number appended if needed to prevent a name clash with the directory. The program starts executing by creating a Main actor and passing it an Env object allowing access to command line arguments, standard input/output, etc. The Main actor can then create other actors or do whatever required for program execution.

Actors

Actors are the method of concurrency in Pony. An actor is like a normal object in that it can have state and methods. It can also have behaviours. A behaviour is a method that when called is executed asynchronously. It returns immediately and is queued to be run on an actor local queue. When the actor has nothing to do (not running an existing method or behaviour) it will pop the oldest queued behaviour and run that. An actor can only run one behaviour at a time - this means there needs to be no locking within the behaviour since access to actor local state is serialized. For this reason it's useful to think of an actor as a unit of sequential execution. Parallelism is achieved by utilising multiple actors.

To compare the difference between a standard object and an actor I'll use the following program:

class Logger
  let _env: Env
  let _prefix: String

  new create(env: Env, prefix: String) =>
    _env = env
    _prefix = prefix

  fun log(msg: String, delay: U32) =>
    @sleep[I32](delay)
    _env.out.print(_prefix + ": " + msg)

actor Main
  new create(env: Env) =>
    let l1 = Logger.create(env, "logger 1")
    let l2 = Logger.create(env, "logger 2")

    l1.log("one", 3)
    l2.log("two", 1)
    l1.log("three", 3)
    l2.log("four", 1)

This creates a class called Logger that on construction takes an Env to use to output log messages and a string prefix to prepend to a message. It has a log method that will log a message to standard output after sleeping for a number of seconds given by delay. The unusual syntax for the sleep call is the syntax for calling the sleep C function using the Pony FFI. I'll cover this later.

The Main actor creates two loggers and logs twice to each one with a different delay. As a standard object using class is not asynchronous running this will result in a delay of three seconds, outputting the first log line, a delay of one second, outputting the second line, a delay of three seconds, outputting the third line and finally a delay of one second, outputting the final line. Everything happens on the single Pony thread that runs the Main actor's create constructor. Pony runs this on a single operating system thread. Total elapsed time is the sum of the delays.

Compile and build with:

$ mkdir clogger
$ cat >clogger/main.pony
  ..contents of program above...
$ ponyc clogger
$ time ./clogger1
  logger 1: one
  logger 2: two
  logger 1: three
  logger 2: four

  real  0m8.093s
  user  0m0.116s
  sys   0m0.132s

Changing the Logger class to an actor and making the log method a behaviour will result in the logging happen asynchronously. The changes are:

actor Logger
  let _env: Env
  let _prefix: String

  new create(env: Env, prefix: String) =>
    _env = env
    _prefix = prefix

  be log(msg: String, delay: U32) =>
    @sleep[I32](delay)
    _env.out.print(_prefix + ": " + msg)

Nothing else in the program changes. I've just changed class to actor and fun to be. Now when the Main actor calls log it will add the behaviour call to the actor's queue and immediately return. Each Logger instance is running in its own Pony thread and will be mapped to an operating system thread if possible. On a multiple core machine this should mean each actor's behaviour is running on a different core.

Compiling and running gives:

$ mkdir alogger
$ cat >alogger/main.pony
  ..contents of program above...
$ ponyc alogger
$ time ./alogger1
  logger 2: two
  logger 2: four
  logger 1: one
  logger 1: three

  real  0m6.113s
  user  0m0.164s
  sys   0m0.084s

Notice that the total elapsed time is now six seconds. This is the sum of the delays in the calls to log in the first Logger instance. The second instance is running on another OS thread so executes in parallel. Each log call immediately returns and is queued to run. The delays on the second Logger instance are shorter so they appear first. They two log calls on the second Logger run sequentially as behaviours on a single actor instance are executed in order. The log calls for the first Logger instance run after their delay, again sequentially for the calls within that actor.

Capabilities

Pony uses reference capabilities to allow safe concurrent access to objects. In practice this means annotating types with a tag to indicate how 'sharable' an object is. For data to be passed to another actor it must be safe for that actor to use without data races. Reference capabilities allow enforcing this at compile time. There are defaults for most types so you don't need to annotate everything. Notice that none of the examples I've done so far use any capability annotations. I'll go through a few examples here but won't be exhaustive. The Pony tutorial has coverage of the combinations and defaults.

val and ref

A val capability is for value types. They are immutable and therefore anyone can read from them at any time. val objects can be passed to actors and used concurrently. Primitives like U32 are val by default. This is why none of the primitive arguments to behaviours in the previous examples needed annotation.

A ref capability is for references to mutable data structures. They can be read from and written to and have multiple aliases to it. You can't share these with other actors as that would potentially cause data races. Classes are ref by default.

This is an example of passing a val to another actor:

actor Doer
  be do1(n: U32) =>
    None

actor Main
  new create(env: Env) =>
    let a = Doer.create()
    let n: U32 = 5
    a.do1(n)

As U32 is a primitive it defaults to a val reference capability. It is immutable and can be read by anyone at any time so this compiles without problem. This example fails to compile however:

class Foo
  let n: U32 = 5

actor Doer
  be do1(n: Foo) =>
    None

actor Main
  new create(env: Env) =>
    let a = Doer.create()
    let b = Foo.create()
    a.do1(b)

The error is:

main.pony:5:13: this parameter must be sendable (iso, val or tag)
  be do1(n: Foo) =>
            ^

class defaults to the ref capability which can be read, written and aliased. It can't be used to send to another actor as there's no guarantee that it won't be modifed by any other object holding a reference to it. The iso and tag capabilities mentioned in the error message are other capability types.

iso is for single references to data structures that can be read and written too. The type system guarantees that only one reference exists to the object. It is short for 'isolated'.

tag is for identification only. Objects of capability tag cannot be read from or written too. They can only be used for object identity or, if they are an Actor, calling behaviours on them. Actors default to tag capabilities. Calling behaviours is safe as behaviour running is serialized for the actor instance and they don't return data.

To get the previous example to work we can force the Foo object to be of type val if it can be immutable:

class Foo 
  let n: U32 = 5

actor Doer
  be do1(n: Foo val) =>
    None

actor Main
  new create(env: Env) =>
    let a = Doer.create()
    let b: Foo val = Foo.create()
    a.do1(b)

ref and iso

Let's modify the example so we can change the value of the Foo object to demonstrate moving a mutable reference from one actor to another:

class Foo
  var n: U32 = 5

  fun ref set(m: U32) =>
    n = m

  fun print(env: Env) =>
    env.out.print(n.string())

actor Doer
  be do1(env:Env, n: Foo iso) =>
    n.print(env)

actor Main
  new create(env: Env) =>
    let a = Doer.create()
    let b = Foo.create()
    a.do1(env, b)

In this example the do1 behaviour now requires an iso reference capability. As mentioned previously, iso means only one reference to the object exists therefore it is safe to read and write. But where we create the instance of Foo we have a reference to it in the variable b. Passing it as an argument to do1 effectively aliases it. The compile time error is:

main.pony:18:16: argument not a subtype of parameter
    a.do1(env, b)
               ^
main.pony:11:19: parameter type: Foo iso
  be do1(env:Env, n: Foo iso) =>

main.pony:18:16: argument type: Foo iso!
a.do1(env, b)
           ^

This error states that do1 requires a Foo iso parameter whereas it is being passed a Foo iso!. The ! at the end means that it is an alias to another variable. Even though class objects are ref by default, Pony has inferred the capability for b as iso as we didn't declare a type for b and we are passing it to a function that wants an iso. However as it has an alias it can't be used as an iso therefore it's an error.

One way of avoiding the aliasing is to pass the result of the create call directly:

actor Main
  new create(env: Env) =>
    let a = Doer.create()
    a.do1(env, Foo.create())

There is no alias here so it compiles fine.

If we do want to have an initial reference to it, say to set a value first, we can tell the type system that we are consuming the existing reference and will no longer use it. This is what the consume keyword is for:

actor Main
  new create(env: Env) =>
    let a = Doer.create()
    let b = Foo.create()
    b.set(42)
    a.do1(env, consume b)
    // b.set(0)

This now compiles. Uncommenting out the use of b after the do1 call will be a compile error as we've consumed b and it no longer exists. In this case the error owuld be:

main.pony:20:5: can't use a consumed local in an expression
    b.set(0)
    ^
main.pony:20:6: invalid left hand side
    b.set(0)

consume is more often used for passing iso objects around. To pass it to another object you need to consume the existing reference to it. This becomes problematic if you are consuming a field of an object. Modifying the example so that the Foo is stored as a field of Main shows the problem:

actor Main
  var b: Foo iso = Foo.create()

  new create(env: Env) =>
    let a = Doer.create()
    b.set(42)
    a.do1(env, consume b)

The error is:

main.pony:20:16: consume must take 'this', a local, or a parameter
    a.do1(env, consume b)
               ^

b can't be consumed as it's a field of Main. It can't be left consumed - it must have a valid Foo iso object stored in it. In Pony assignment returns the old value of the variable being assigned too. This allows assigning a new value to the field and returning the old value in one operation and avoiding leaving the field in an invalid state:

new create(env: Env) =>
  let a = Doer.create()
  b.set(42)
  a.do1(env, b = Foo.create())

b gets a new value of a new instance of Foo and do1 gets passed the old value.

There's a lot more to capabilities and the capabilities section of the tutorial covers a lot. Although there are sane defaults it feels like that 'capability tutorials' will be the Pony equivalent of 'Monad tutorials' in other languages for a while. When I first was learning ATS I spent a lot of time floundering with function annotations to get things to compile, trying random changes, until I learnt how it worked. I'm probably at that stage with capabilities at the moment and I hope it becomes clearer as I write more Pony programs.

Pattern Matching

Pony has many of the concepts of most modern functional programming languages. Matching on values is allowed:

let x: U32 = 2
match x
  | 1 => "one"
  | 2 => "two"
else
  "3"
end

Union types with capturing:

type Data is (U32 | String | None)
....
match x
| None => "None"
| 1 => "one"
| let u: U32 => "A number that is not one: " + u.string()
| let s: String => "A string: " + s
end

Enumerations are a bit verbose in that you have to use primitive to define each variant of the enumeration first:

primitive Red
primitive Blue
primitive Green

type Colour is (Red | Blue | Green)
...
let x: Colour = Red
match x
| Red => "Red"
| Blue => "Blue"
| Green => "Green"
end

C FFI

Pony has an easy to use C FFI. I showed an example of this previously:

@sleep[I32](delay)

The @ signifies that this is a C FFI function call. The type in the backets is the return type of the C function call. The types of the arguments must match what the actual C function expects. Errors here will crash the program. Pony allows specifying the type of an FFI function in advance so argument types are checked. For sleep it would be:

use @sleep[I32](n: U32)
...
@sleep(10)

Note that it's no longer necessary to specify the return type at the call point as it's already been defined in the declaration.

If the C function is part of a library already linked into the Pony executable then there is no need use a statement to define the library file to link against. sleep is part of libc so it isn't needed. In the cases where you need to link against a specific library then the use statement is used in this manner:

use "lib:foo"

The addressof keyword is used to pass pointers to C code. It can be used for passing out parameters of primitives types:

var n: U32 = 0
@dosomething[None](addressof n)
env.out.print("Result: " + n.string())

Callbacks

The FFI allows passing Pony functions to C for the C code to later call back. The syntax for this looks like:

let foo = Foo.create()
@callmeback[None](addressof foo.method, foo)

Calling C code example

A working example for the following C function in a cbffi.c file:

void do_callback(void (*func)(void* this, char* s), void* this) {
    func(this, "hello world");
}

The Pony code to use this is:

use "lib:cbffi"

class Foo
  let prefix: String
  let env: Env

  new create(e: Env, p: String) =>
    prefix = p
    env = e

  fun display(msg: Pointer[U8]) =>
    env.out.print(prefix + ":" + String.copy_cstring(msg))

actor Main
  new create(env: Env) =>
    let foo = Foo.create(env, "From Pony")
    @do_callback[None](addressof foo.display, foo)

Note that the display function takes a Pointer[U8] as an argument. Pointer[U8] is a generic type with U8 being the parameter. In this case it is the C string that the C function passes. Pony String types are an object with fields so C doesn't pass it directly. The String type has a couple of constructor functions that take Pointer[U8] as input and return a Pony String - the one used here, copy_cstring, makes a copy of the C string passed in.

Compile with:

$ mkdir cb
$ cat >cb/main.pony
  ...Pony code...
$ cat >cb/cbffi.c
  ...C code...
$ gcc -fPIC -shared -o libcbffi.so cb/cbffi.c
$ LIBRARY_PATH=. ponyc cb
$ LD_LIBRARY_PATH=. ./cb1
  From Pony:hello world

Here LIBRARY_PATH is set to find the shared library during compiling and linking. To run the generated executable LD_LIBRARY_PATH is used to find the shared library at runtime.

It's also possible to link against static C libraries:

$ rm libcbffi.so
$ gcc -c -o libcbffi.o cb/cbffi.c
$ ar -q libcbffi.a libcbffi.o
$ LIBRARY_PATH=. ponyc cb
$ ./cb1
  From Pony:hello world

Things to look out for

While writing Pony code I came across a couple of things to be aware of. Each actor has their own garbage collector but it runs only between behaviour calls. If a behaviour runs for a long time, never calling another actor behaviour, then it can be a while before garbage is collected. An example of where this can happen is a simple Main actor where everything is done in the default constructor and never calls another actor. Benchmarks can be an example here. No GC will occur and you can get an OOM (Out of Memory) situation.

Another is that there is no backpressure handling for behaviour calls on an actor. The message queues are unbounded so if a producer sends messages to an actor at a faster rate than it processes them then it will eventually OOM. This can occur if you have the message sender tied to an external process. For example a TCP listener that uses sockets and translates the data to a message to an actor. If the external users of the TCP interface (a webserver for example) are sending data faster than the actor handling the messages then OOM will occur. Slides from the Pony developers indicates that backpressure is on their radar to look at.

As usual with a new programming language there is a lack of libraries and library documentation. Expect to look through the Pony source code to find examples of how to do things. The tutorial is great though - even though parts are incomplete - and is on github.

There is a --docs command line argument that can be used to parse docstrings in Pony libraries and produce documentation in markdown format. For example:

$ cd packages
$ ponyc --docs collections
$ ls collections-docs/

Conclusion

This has only been a quick overview of some features of Pony. There's more too it. Some places to get more Pony information:

Pony website
Tutorial
/r/ponylang
#ponylang on irc.freenode.net
Mailing List
Online Sandbox to try Pony in a browser

Bluish Coder