2015-11-04
A quick look at the Pony Programming Language
Pony is a new programming language described on their site as "an open-source, object-oriented, actor-model, capabilities-secure, high performance programming language."
It has some interesting features and is different enough to existing popular programming languages to make it a nice diversion to experiment with. Some features include:
- lightweight actor based concurrency with M:N threading, mapping multiple language level threads to operating system threads.
- strong static typing with generics
- data-race free. The type system ensures at compile time that a concurrent program can never have data races.
- deadlock free. There are no locking mechanisms exposed to the user so there are no deadlocks.
- capabilities exposed to the type system to allow compile time enforcing of such things as objects that have no other references to it, immutable values, reference values, etc.
- lightweight C FFI
This post is an outline of my initial experiments with the languages including pitfalls to be aware of.
Installing
Pony can be installed from git
and run from the build directory:
$ git clone https://github.com/CausalityLtd/ponyc
$ cd ponyc
$ make config=release
$ export PATH=`pwd`/build/release:$PATH
$ ponyc --help
Run tests with:
$ make config=release test
Some of the Pony standard packages dynamically load shared libraries. If they're not installed this will be reflected in build failures during the tests. The required libraries on a Linux based machine are openssl
and pcre2-8
. To build Pony itself llvm
version 3.6
needs to be installed. There is an llvm37
branch on github that works on Linux but is awaiting some llvm37
fixes before it is merged into master
.
Pony can be installed in a default location, or using prefix
to install it somewhere else:
$ make config=release prefix=/home/user/pony install
One catch is that running ponyc
requires it to find the Pony runtime library libponyrt.a
for linking purposes. This might not be found if installed somewhere that it doesn't expect. This can be resolved by setting the environment variable LIBRARY_PATH
to the directory where libponyrt.a
resides. I had to do this for the Nix Pony package.
Compiling Pony programs
A basic "Hello World" application looks like:
actor Main
new create(env: Env) =>
env.out.print("hello world")
Place this in a main.pony
file in a directory and compile:
$ mkdir hello
$ cat >hello/main.pony
actor Main
new create(env: Env) =>
env.out.print("hello world")
$ ponyc hello
$ ./hello1
hello world
ponyc
requires a directory as an argument and it compiles the *.pony
files in that directory. It generates an executable based on the directory name, with a number appended if needed to prevent a name clash with the directory. The program starts executing by creating a Main
actor and passing it an Env
object allowing access to command line arguments, standard input/output, etc. The Main
actor can then create other actors or do whatever required for program execution.
Actors
Actors are the method of concurrency in Pony. An actor is like a normal object in that it can have state and methods. It can also have behaviours
. A behaviour
is a method that when called is executed asynchronously. It returns immediately and is queued to be run on an actor local queue. When the actor has nothing to do (not running an existing method or behaviour) it will pop the oldest queued behaviour and run that. An actor can only run one behaviour at a time - this means there needs to be no locking within the behaviour since access to actor local state is serialized. For this reason it's useful to think of an actor as a unit of sequential execution. Parallelism is achieved by utilising multiple actors.
To compare the difference between a standard object and an actor I'll use the following program:
class Logger
let _env: Env
let _prefix: String
new create(env: Env, prefix: String) =>
_env = env
_prefix = prefix
fun log(msg: String, delay: U32) =>
@sleep[I32](delay)
_env.out.print(_prefix + ": " + msg)
actor Main
new create(env: Env) =>
let l1 = Logger.create(env, "logger 1")
let l2 = Logger.create(env, "logger 2")
l1.log("one", 3)
l2.log("two", 1)
l1.log("three", 3)
l2.log("four", 1)
This creates a class called Logger
that on construction takes an Env
to use to output log messages and a string prefix
to prepend to a message. It has a log
method that will log a message to standard output after sleeping for a number of seconds given by delay
. The unusual syntax for the sleep
call is the syntax for calling the sleep C function using the Pony FFI. I'll cover this later.
The Main
actor creates two loggers and logs twice to each one with a different delay. As a standard object using class
is not asynchronous running this will result in a delay of three seconds, outputting the first log line, a delay of one second, outputting the second line, a delay of three seconds, outputting the third line and finally a delay of one second, outputting the final line. Everything happens on the single Pony thread that runs the Main
actor's create constructor. Pony runs this on a single operating system thread. Total elapsed time is the sum of the delays.
Compile and build with:
$ mkdir clogger
$ cat >clogger/main.pony
..contents of program above...
$ ponyc clogger
$ time ./clogger1
logger 1: one
logger 2: two
logger 1: three
logger 2: four
real 0m8.093s
user 0m0.116s
sys 0m0.132s
Changing the Logger
class to an actor and making the log
method a behaviour will result in the logging happen asynchronously. The changes are:
actor Logger
let _env: Env
let _prefix: String
new create(env: Env, prefix: String) =>
_env = env
_prefix = prefix
be log(msg: String, delay: U32) =>
@sleep[I32](delay)
_env.out.print(_prefix + ": " + msg)
Nothing else in the program changes. I've just changed class
to actor
and fun
to be
. Now when the Main
actor calls log
it will add the behaviour call to the actor's queue and immediately return. Each Logger
instance is running in its own Pony thread and will be mapped to an operating system thread if possible. On a multiple core machine this should mean each actor's behaviour is running on a different core.
Compiling and running gives:
$ mkdir alogger
$ cat >alogger/main.pony
..contents of program above...
$ ponyc alogger
$ time ./alogger1
logger 2: two
logger 2: four
logger 1: one
logger 1: three
real 0m6.113s
user 0m0.164s
sys 0m0.084s
Notice that the total elapsed time is now six seconds. This is the sum of the delays in the calls to log
in the first Logger
instance. The second instance is running on another OS thread so executes in parallel. Each log
call immediately returns and is queued to run. The delays on the second Logger
instance are shorter so they appear first. They two log
calls on the second Logger
run sequentially as behaviours on a single actor instance are executed in order. The log
calls for the first Logger
instance run after their delay, again sequentially for the calls within that actor.
Capabilities
Pony uses reference capabilities to allow safe concurrent access to objects. In practice this means annotating types with a tag to indicate how 'sharable' an object is. For data to be passed to another actor it must be safe for that actor to use without data races. Reference capabilities allow enforcing this at compile time. There are defaults for most types so you don't need to annotate everything. Notice that none of the examples I've done so far use any capability annotations. I'll go through a few examples here but won't be exhaustive. The Pony tutorial has coverage of the combinations and defaults.
val and ref
A val
capability is for value types. They are immutable and therefore anyone can read from them at any time. val
objects can be passed to actors and used concurrently. Primitives like U32
are val
by default. This is why none of the primitive arguments to behaviours in the previous examples needed annotation.
A ref
capability is for references to mutable data structures. They can be read from and written to and have multiple aliases to it. You can't share these with other actors as that would potentially cause data races. Classes are ref
by default.
This is an example of passing a val
to another actor:
actor Doer
be do1(n: U32) =>
None
actor Main
new create(env: Env) =>
let a = Doer.create()
let n: U32 = 5
a.do1(n)
As U32
is a primitive it defaults to a val
reference capability. It is immutable and can be read by anyone at any time so this compiles without problem. This example fails to compile however:
class Foo
let n: U32 = 5
actor Doer
be do1(n: Foo) =>
None
actor Main
new create(env: Env) =>
let a = Doer.create()
let b = Foo.create()
a.do1(b)
The error is:
main.pony:5:13: this parameter must be sendable (iso, val or tag)
be do1(n: Foo) =>
^
class
defaults to the ref
capability which can be read, written and aliased. It can't be used to send to another actor as there's no guarantee that it won't be modifed by any other object holding a reference to it. The iso
and tag
capabilities mentioned in the error message are other capability types.
iso
is for single references to data structures that can be read and written too. The type system guarantees that only one reference exists to the object. It is short for 'isolated'.
tag
is for identification only. Objects of capability tag
cannot be read from or written too. They can only be used for object identity or, if they are an Actor, calling behaviours on them. Actors default to tag
capabilities. Calling behaviours is safe as behaviour running is serialized for the actor instance and they don't return data.
To get the previous example to work we can force the Foo
object to be of type val
if it can be immutable:
class Foo
let n: U32 = 5
actor Doer
be do1(n: Foo val) =>
None
actor Main
new create(env: Env) =>
let a = Doer.create()
let b: Foo val = Foo.create()
a.do1(b)
ref and iso
Let's modify the example so we can change the value of the Foo
object to demonstrate moving a mutable reference from one actor to another:
class Foo
var n: U32 = 5
fun ref set(m: U32) =>
n = m
fun print(env: Env) =>
env.out.print(n.string())
actor Doer
be do1(env:Env, n: Foo iso) =>
n.print(env)
actor Main
new create(env: Env) =>
let a = Doer.create()
let b = Foo.create()
a.do1(env, b)
In this example the do1
behaviour now requires an iso
reference capability. As mentioned previously, iso
means only one reference to the object exists therefore it is safe to read and write. But where we create the instance of Foo
we have a reference to it in the variable b
. Passing it as an argument to do1
effectively aliases it. The compile time error is:
main.pony:18:16: argument not a subtype of parameter
a.do1(env, b)
^
main.pony:11:19: parameter type: Foo iso
be do1(env:Env, n: Foo iso) =>
main.pony:18:16: argument type: Foo iso!
a.do1(env, b)
^
This error states that do1
requires a Foo iso
parameter whereas it is being passed a Foo iso!
. The !
at the end means that it is an alias to another variable. Even though class
objects are ref
by default, Pony has inferred the capability for b
as iso
as we didn't declare a type for b
and we are passing it to a function that wants an iso
. However as it has an alias it can't be used as an iso therefore it's an error.
One way of avoiding the aliasing is to pass the result of the create
call directly:
actor Main
new create(env: Env) =>
let a = Doer.create()
a.do1(env, Foo.create())
There is no alias here so it compiles fine.
If we do want to have an initial reference to it, say to set a value first, we can tell the type system that we are consuming the existing reference and will no longer use it. This is what the consume
keyword is for:
actor Main
new create(env: Env) =>
let a = Doer.create()
let b = Foo.create()
b.set(42)
a.do1(env, consume b)
// b.set(0)
This now compiles. Uncommenting out the use of b
after the do1
call will be a compile error as we've consumed b
and it no longer exists. In this case the error owuld be:
main.pony:20:5: can't use a consumed local in an expression
b.set(0)
^
main.pony:20:6: invalid left hand side
b.set(0)
consume
is more often used for passing iso
objects around. To pass it to another object you need to consume the existing reference to it. This becomes problematic if you are consuming a field of an object. Modifying the example so that the Foo
is stored as a field of Main
shows the problem:
actor Main
var b: Foo iso = Foo.create()
new create(env: Env) =>
let a = Doer.create()
b.set(42)
a.do1(env, consume b)
The error is:
main.pony:20:16: consume must take 'this', a local, or a parameter
a.do1(env, consume b)
^
b
can't be consumed as it's a field of Main
. It can't be left consumed - it must have a valid Foo iso
object stored in it. In Pony assignment returns the old value of the variable being assigned too. This allows assigning a new value to the field and returning the old value in one operation and avoiding leaving the field in an invalid state:
new create(env: Env) =>
let a = Doer.create()
b.set(42)
a.do1(env, b = Foo.create())
b
gets a new value of a new instance of Foo
and do1
gets passed the old value.
There's a lot more to capabilities and the capabilities section of the tutorial covers a lot. Although there are sane defaults it feels like that 'capability tutorials' will be the Pony equivalent of 'Monad tutorials' in other languages for a while. When I first was learning ATS I spent a lot of time floundering with function annotations to get things to compile, trying random changes, until I learnt how it worked. I'm probably at that stage with capabilities at the moment and I hope it becomes clearer as I write more Pony programs.
Pattern Matching
Pony has many of the concepts of most modern functional programming languages. Matching on values is allowed:
let x: U32 = 2
match x
| 1 => "one"
| 2 => "two"
else
"3"
end
Union types with capturing:
type Data is (U32 | String | None)
....
match x
| None => "None"
| 1 => "one"
| let u: U32 => "A number that is not one: " + u.string()
| let s: String => "A string: " + s
end
Enumerations are a bit verbose in that you have to use primitive
to define each variant of the enumeration first:
primitive Red
primitive Blue
primitive Green
type Colour is (Red | Blue | Green)
...
let x: Colour = Red
match x
| Red => "Red"
| Blue => "Blue"
| Green => "Green"
end
C FFI
Pony has an easy to use C FFI. I showed an example of this previously:
@sleep[I32](delay)
The @
signifies that this is a C FFI function call. The type in the backets is the return type of the C function call. The types of the arguments must match what the actual C function expects. Errors here will crash the program. Pony allows specifying the type of an FFI function in advance so argument types are checked. For sleep
it would be:
use @sleep[I32](n: U32)
...
@sleep(10)
Note that it's no longer necessary to specify the return type at the call point as it's already been defined in the declaration.
If the C function is part of a library already linked into the Pony executable then there is no need use a statement to define the library file to link against. sleep
is part of libc
so it isn't needed. In the cases where you need to link against a specific library then the use
statement is used in this manner:
use "lib:foo"
The addressof
keyword is used to pass pointers to C code. It can be used for passing out parameters of primitives types:
var n: U32 = 0
@dosomething[None](addressof n)
env.out.print("Result: " + n.string())
Callbacks
The FFI allows passing Pony functions to C for the C code to later call back. The syntax for this looks like:
let foo = Foo.create()
@callmeback[None](addressof foo.method, foo)
Calling C code example
A working example for the following C function in a cbffi.c
file:
void do_callback(void (*func)(void* this, char* s), void* this) {
func(this, "hello world");
}
The Pony code to use this is:
use "lib:cbffi"
class Foo
let prefix: String
let env: Env
new create(e: Env, p: String) =>
prefix = p
env = e
fun display(msg: Pointer[U8]) =>
env.out.print(prefix + ":" + String.copy_cstring(msg))
actor Main
new create(env: Env) =>
let foo = Foo.create(env, "From Pony")
@do_callback[None](addressof foo.display, foo)
Note that the display
function takes a Pointer[U8]
as an argument. Pointer[U8]
is a generic type with U8
being the parameter. In this case it is the C string that the C function passes. Pony String
types are an object with fields so C doesn't pass it directly. The String
type has a couple of constructor functions that take Pointer[U8]
as input and return a Pony String
- the one used here, copy_cstring
, makes a copy of the C string passed in.
Compile with:
$ mkdir cb
$ cat >cb/main.pony
...Pony code...
$ cat >cb/cbffi.c
...C code...
$ gcc -fPIC -shared -o libcbffi.so cb/cbffi.c
$ LIBRARY_PATH=. ponyc cb
$ LD_LIBRARY_PATH=. ./cb1
From Pony:hello world
Here LIBRARY_PATH
is set to find the shared library during compiling and linking. To run the generated executable LD_LIBRARY_PATH
is used to find the shared library at runtime.
It's also possible to link against static C libraries:
$ rm libcbffi.so
$ gcc -c -o libcbffi.o cb/cbffi.c
$ ar -q libcbffi.a libcbffi.o
$ LIBRARY_PATH=. ponyc cb
$ ./cb1
From Pony:hello world
Things to look out for
While writing Pony code I came across a couple of things to be aware of. Each actor has their own garbage collector but it runs only between behaviour calls. If a behaviour runs for a long time, never calling another actor behaviour, then it can be a while before garbage is collected. An example of where this can happen is a simple Main
actor where everything is done in the default constructor and never calls another actor. Benchmarks can be an example here. No GC will occur and you can get an OOM (Out of Memory) situation.
Another is that there is no backpressure handling for behaviour calls on an actor. The message queues are unbounded so if a producer sends messages to an actor at a faster rate than it processes them then it will eventually OOM. This can occur if you have the message sender tied to an external process. For example a TCP listener that uses sockets and translates the data to a message to an actor. If the external users of the TCP interface (a webserver for example) are sending data faster than the actor handling the messages then OOM will occur. Slides from the Pony developers indicates that backpressure is on their radar to look at.
As usual with a new programming language there is a lack of libraries and library documentation. Expect to look through the Pony source code to find examples of how to do things. The tutorial is great though - even though parts are incomplete - and is on github.
There is a --docs
command line argument that can be used to parse docstrings in Pony libraries and produce documentation in markdown format. For example:
$ cd packages
$ ponyc --docs collections
$ ls collections-docs/
Conclusion
This has only been a quick overview of some features of Pony. There's more too it. Some places to get more Pony information:
- Pony website
- Tutorial
- /r/ponylang
#ponylang
on irc.freenode.net- Mailing List
- Online Sandbox to try Pony in a browser