Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2017-07-31

Reference Capabilities, Consume and Recover in Pony

I've written about reference capabilities in the Pony programming language spread across some of my other posts but haven't written about them directly. This post is my attempt to provide an intuitive understanding of reference capabilities and when to use consume and recover. Hopefully this reduces the confusion when faced with reference capability compilation errors and the need to memorize capability tables.

Reference capabilities are used to control aliasing. Controlling aliasing is important when sharing data to avoid data races in the presence of multiple threads. An example of aliasing is:

let a: Object = create_an_object()
let b: Object = a

In that snippet, 'b' is an alias to the object referenced by 'a'. If I pass 'b' to a parallel thread then the same object can be mutated and viewed by two different threads at the same time resulting in data races.

Reference capabilities allow annotating types to say how they can be aliased. Only objects with a particular set of reference capabilities can be shared across actors.

tag - opaque reference

A tag reference capability is an opaque reference to an object. You cannot read or write fields of the object referenced through a tag alias. You can store tag objects, compare object identity and share them with other actors. The sharing is safe since no reading or writing of state is allowed other than object identity.

let a: Object tag = create_a_tag()
let b: Object tag = a
if a is b then ...we're the same object... end

There is also the digestof operator that returns a unique unsigned 64 bit integer value of an object. This is safe to use on tag:

let a: Object tag = create_a_tag()
env.out.print("Id is: " + (digestof a).string())

The tag reference capability is most often used for actors and passing references to actors around. Actor behaviours (asynchronous method calls) can be made via tag references.

val - immutable and sharable

A val reference capability on a variable means that the object is immutable. It cannot be written via that variable. Only read-only fields and methods can be used. Aliases can be shared with other actors because it cannot be changed - there is no issue of data races when accessed from multiple threads. There can be any number of aliases to the object but they all must be val.

let a: Object val = create_an_object()
let b: Object val = a
let c: Object val = a
call_some_function(b)
send_to_an_actor(c)

All the above are valid uses of val. Multiple aliases can exist within the same actor or shared with other actors.

ref - readable, writable but not sharable

The ref reference capability means the object is readable and writable but only within a single actor. There can exist multiple aliases to it within an actor but it cannot be shared with other actors. If it were sharable with other actors then this would allow data races as multiple threads of control read or write to it in a non-deterministic manner.

let a: Object ref = create_a_ref_object()
let b: Object ref = a
call_some_function(b)

The above are valid uses of ref if they are occuring within a single actor. The following are invalid uses - they will result in a compile error:

let a: Object ref = create_a_ref_object()
send_to_an_actor(a)
let b: Object val = a

The send_to_an_actor call would result in an alias of a being accessible from another thread. This would cause data races so is disallowed and results in a compilation error. The assignment to an Object val is also a compilation error. The reasoning for this is access via b would assume that the object is immutable but it could be changed through the underlying alias a. If b were passed to another actor then changes made via a will cause data races.

iso - readable, writable uniquely

The iso reference capability means the object is readable and writable but only within a single actor - much like ref. But unlike ref it cannot be aliased. There can only be one variable holding a reference to an iso object at a time. It is said to be 'unique' because it can only be written or read via that single unique reference.

let a: Object iso = create_an_iso_object()
let b: Object iso = a
call_some_function(a)

The first line above creates an iso object. The other two lines are compilation errors. The assignment to b attempts to alias a. This would enable reading and writing via a and b which breaks the uniqueness rule.

The second line calls a function passing a. This is an implicit alias of a in that the parameter to call_some_function has aliased. It is readable and writable via a and the parameter in call_some_function.

When it comes to fields of objects things get a bit more complicated. Reference capabilities are 'deep'. This means that the capability of an enclosing object affects the capability of the fields as seen by an external user of the object. Here's an example that won't work:

class Foo
  var bar: Object ref = ...

let f: Foo iso = create_a_foo()
let b: Object ref = create_an_object()
f.bar = b

If this were to compile we would have a ref alias alive in b and another alias to the same object alive in the bar field of f. We could then pass our f iso object to another actor and that actor would have a data race when trying to use bar since the original actor also has an alias to it via b.

The uniqueness restriction would seem to make iso not very useful. What makes it useful is the ability to mark aliases as no longer used via the consume keyword.

consume - I don't want this alias

The consume keyword tells Pony that an alias should be destroyed. Not the object itself but the variable holding a reference to it. By removing an alias we can pass iso objects around.

let a: Object iso = create_an_iso_object()
let b: Object iso = consume a
call_some_function(consume b)

This snippet creates an iso object referenced in variable a. The consume in the second line tells Pony that a should no longer alias that object. It's floating around with nothing pointing to it now. Pony refers to this state as being 'ephemeral'. At this point the variable a doesn't exist and it is a compile error to use it further. The object has no aliases and can now be assigned to another variable, in this case b. This meets the requirements of iso because there is still only one reference to the object, via b.

The function call works in the same manner. The consume b makes the object ephemeral and can then be assigned to the parameter for the function and still meet the uniqueness requirements of iso.

iso objects can be sent to other actors. This is safe because there is only a single alias. Once it has been sent to another actor, the alias from the original actor cannot read or be written to because the alias it had was consumed:

let a: Object iso = create_an_iso_object()
send_to_an_actor(consume a)

Converting capabilities

Given an iso reference that is consumed, can that ephemeral object be assigned to other reference capabilities other than iso? The answer to that is yes.

Intuitively this makes sense. If you have no aliases to an object then when you alias that object you can make it whatever capability you want - it is like having created a new object, nothing else references it until you assign it. From that point on what you can do with it is restricted by the reference capability of the variable you assigned it to.

let a: Object iso = create_an_iso()
let b: Object val = consume a

let c: Object iso = create_an_iso()
let d: Object ref = consume c;

The above are examples of valid conversions. You can have an iso, make changes to the object, then consume the alias to assign to some other reference capability. Once you've done that you are restricted by that new alias:

let c: Object iso = create_an_iso()
let d: Object ref = consume c;
send_to_an_actor(d)

That snippet is an error as ref cannot be sent to another actor as explained earlier. This is also invalid:

let c: Object iso = create_an_iso()
let d: Object ref = consume c;
send_to_an_actor(c)

Here we are trying to use c after it is consumed. The c alias no longer exists so it is a compile error.

What if you want to go the other way and convert a val to an iso:

let a: Object val = create_a_val()
let b: Object iso = consume a

This is an error. Consuming the a alias does not allow assigning to another reference capability. Because val allows multiple aliases to exist the Pony compiler doesn't know if a is the only alias to the object. There could be others aliases elsewhere in the program. iso requires uniqueness and the compiler can't guarantee it because of this. The same reasoning is why the following is an error:

let a: Object val = create_a_val()
let b: Object ref = consume a

Intuitively we can reason why this fails. ref allows reading and writing within an actor. val requires immutability and can have multiple aliases. Even though we consume a there may be other aliases around, like the iso example before. Writing to the object via the b alias would break the guarantee of any other aliases to a.

Given this, how do you do this type of conversion? This is what the recover expression is used for.

recover - restricted alias conversion

A recover expression provides a block scope where the variables that can enter the scope are restricted based on their reference capability. The restriction is that only objects that you could send to an actor are allowed to enter the scope of the recover expression. That is iso, val and tag.

Within the recover expression you can create objects and return them from the expression as a different reference capability than what you created them as. This is safe because the compiler knows what entered the block, knows what was created within the block, and can track the aliases such that it knows it's safe to perform a particular conversion.

let a: Object val = recover val
                      let b: Object ref = create_a_ref()
                      ...do something...
                      b
                    end

In this snippet we create a ref object within the recover block. This can be returned as a val because the compiler knows that all aliases to that ref object exist within the recover block. When the block scope exits those aliases don't exist - there are no more aliases to the object and can be returned as the reference type of the recover block.

How does the compiler know that the b object wasn't stored elsewhere within the recover block? There are no global variables in Pony so it can't be stored globally. It could be passed to another object but the only objects accessable inside the block are the restricted ones mentioned before (iso, val and tag). Here's an attempt to store it that fails:

var x: Object ref = create_a_ref()
let a: Object val = recover val
                      let b: Object ref = create_a_ref()
                      x = b
                      b
                    end 

This snippet has a ref object created in the enclosing lexical scope of the recover expression. Inside the recover an attempt is made to assign the object b to that variable x. Intuitively this should not work - allowing it would mean that we have a readable and writeable alias to the object held in x, and an immutable alias in a allowing data races. The compiler prevents this by not allowing a ref object from the enclosing scope to enter a recover expression.

Can we go the other way and convert a val to a ref using recover? Unfortunately the answer here is no.

let a: Object ref = recover ref
                      let b: Object val = create_a_ref()
                      ...do something...
                      b
                    end

This results in an error. The reason is a val can be stored in another val variable in the enclosing scope because val objects are safely shareable. This would make it unsafe to return a writeable alias to the val if it is stored as an immutable alias elsewhere. This code snippet shows how it could be aliased in this way:

let x: Object val = create_a_val()
let a: Object val = recover val
                      let b: Object val = create_a_ref()
                      x = b
                      b
                    end

We are able to assign b to a variable in the enclosing scope as the x variable is a val which is one of the valid reference capabilities that can be accessed from within the recover block. If we were able to recover to a ref then we'd have a writeable and an immutable alias alive at the same time so that particular conversion path is an error.

A common use for recover is to create objects with a reference capability different to that defined by the constructor of the object:

class Foo
  new ref create() => ...

let a: Foo val = recover val Foo end
let b: Foo iso = recover iso Foo end

The reference capability of the recover expression can be left out and then it is inferred by the capability of the variable being assigned to:

let a: Foo val = recover Foo end
let b: Foo iso = recover Foo end

Two more reference capabilities to go. They are box and trn.

box - allows use of val or ref

The box reference capability provides the ability to write code that works for val or ref objects. A box alias only allows readonly operations on the object but can be used on either val or ref:

let a: Object ref = create_a_ref()
let b: Object val = create_a_val()
let c: Object box = a
let d: Object box = b

This is particularly useful when writing methods on a class that should work for a receiver type of val and ref.

class Bar
  var count: U32 = 0

  fun val display(out:OutStream) =>
    out.print(count.string())

actor Main
  new create(env:Env) =>
    let b: Bar val = recover Bar end
    b.display(env.out)

This example creates a val object and calls a method display that expects to be called by a val object (the "fun val" syntax). The this from within the display method is of reference capability val. This compiles and works. The following does not:

let b: Bar ref = recover Bar end
b.display(env.out)

Here the object is a ref but display expects it to be val. We can change display to be ref and it would work:

fun ref display(out:OutStream) =>
  out.print(count.string())

But now we can't call it with a val object as in our first example. This is where box comes in. It allows a ref or a val object to be assigned to it and it only allows read only access. This is safe for val as that is immutable and it is safe for ref as an immutable view to the ref:

fun box display(out:OutStream) =>
  out.print(count.string())

Methods are box by default so can be written as:

fun display(out:OutStream) =>
  out.print(count.string())

As an aside, the default of box is the cause for a common "new to Pony" error message where an attempt to mutate a field in an object fails with an "expected box got ref" error:

fun increment() => count = count + 1

This needs to be the following as the implicit box makes the this immutable within the method:

fun ref increment() => count = count + 1

trn - writeable uniquely, consumable to immutable

A trn reference capability is writeable but can be consumed to an immutable reference capability, val. This is useful for cases where you want to create an object, perform mutable operations on it and then make it immutable to send to an actor.

let a: Array[U32] trn = recover Array[U32] end
a.push(1)
a.push(2)
let b: Array[U32] val = consume a
send_to_actor(b)

box and ref methods can be called on trn objects:

class Bar
  var count: U32 = 0

  fun box display(out:OutStream) =>
    out.print(count.string())

  fun ref increment() => count = count + 1

actor Main
  new create(env:Env) =>
    let a: Bar trn = recover Bar end
    a.increment()
    a.display(env.out)

This provides an alternative to the "How do I convert a ref to a val?" question. Instead of starting with a ref inside a recover expression you can use trn and consume to a val later.

You can use iso in place of trn in these examples. Where trn is useful is passing it to box methods to perform readonly operations on it. This is difficult with iso as you have to consume the alias everytime you pass it around, and the methods you pass it to have to return it again if you want to perform further operations on it. With trn you can pass it directly.

actor Main
  let out: OutStream

  fun display(b: Bar box) =>
    b.display(out)

  new create(env:Env) =>
    out = env.out

    let a: Bar trn = recover Bar end
    display(a)
    let b : Bar val = consume a
    send_to_actor(b)

The equivalent with iso is more verbose and requires knowledge of ephemeral types (the hat, ^, symbol):

actor Main
  let out: OutStream

  fun display(b: Bar iso): Bar iso^ =>
    b.display(out)
    consume b

  new create(env:Env) =>
    out = env.out

    let a: Bar iso = recover Bar end
    let b: Bar iso = display(consume a)
    let c: Bar val = consume b
    send_to_actor(c)

Capability Subtyping

I've tried to use a lot of examples to help gain an intuitive understanding of the capability rules. The Pony Tutorial has a Capability Subtyping page that gives the specific rules. Although technical seeming the rules there encode our implicit understanding. This section is a bit more complex and isn't necessary for basic Pony programming if you have a reasonable grasp of it intuitively. It is however useful for working out tricky capability errors and usage.

The way to read those rules are that "<:" means "is a subtype of" or "can be substituted for". So "ref :< box" means that a ref object can be assigned to a box variable:

let a: Object ref = create_a_ref()
let b: Object box = a

The effects are transitive. So if "iso^ <: iso" and "iso <: trn" and "trn <: ref" then "iso^ <: ref":

let a: Object iso = create_an_iso()
let b: Object ref = consume a

Notice we start with iso^ which is an ephemeral reference capability. We get ephemeral types with consume. So consuming the iso gives an iso^ which can be assigned to a ref due to the transitive subtyping path above.

Why couldn't we assign the iso directly without the consume? This is explained previously using inutition but following the rules on the subtyping page we see that "iso! <: tag". The ! is for an aliased reference capability. When we do "something = a" we are aliasing the iso and the type of that a in that expression is iso!. This can only be assigned to a tag according to that rule:

let a: Object iso = create_an_iso()
let b: Object tag = a

Notice there is no "iso! <: iso" which tells us that an alias to an iso cannot be assigned to an iso which basically states the rule that iso can't be aliased.

In a previous section I used an ephemeral type in a method return type:

fun display(b: Bar iso): Bar iso^ =>
    b.display(out)
    consume b

This was needed because the result of display was assigned to an iso:

let b: Bar iso = display(consume a)

If we used Bar iso as the return type then the compiler expects us to be aliasing the object being returned. This alias is of type iso!. The error message states that iso! is not a subtype of iso which is correct as there is no "iso! :< iso" rule. Thankfully the error message tells us that "this would be possible if the subcap were more ephemeral" which is the clue that we need the return type to be ephemeral.

Viewpoint Adaption

I briefly mentioned in a previous section that reference capabilities are 'deep' and this is important when accessing fields of objects. It is also important when writing generic classes and methods and using collections.

Viewpoint adaption is described in the combining capabilities part of the tutorial. I also have a post, Borrowing in Pony which works through some examples.

The thing to remember is that a containing objects reference capability affects the reference capabilities of fields and methods when accessed via a user of the container.

let a: Array[Bar ref] iso = recover Array[Bar ref] end
a.push(Bar)
try
  let b: Bar ref = a(0)?
end

Here is an iso array of Bar ref objects. The third line to retrieve an element of the array fails to compile, stating that tag is not a subtype of ref. Where does the tag come from? Intuitively we can reason that we shouldn't be able to get a ref alias of an item in an iso array as that would give us two ref aliases of an item in an iso that could be shared across actors. This can give data races.

Viewpoint adaption encodes this in the type rules. We have receiver, a of type iso, attempting to call the apply method of the array. This method is declared as:

fun apply(i: USize): this->A ?

The this->A syntax is the viewpoint adaption. It states that the result of the call is the reference capability of A as seen by this. In our case, this is an iso and A is a ref. The viewpoint adaption for iso->ref is tag and that's where the tag in the error message comes from.

We could get an immutable alias to an item in the array if the array was trn:

let a: Array[Bar ref] trn = recover Array[Bar ref] end
a.push(Bar)
try
  let b: Bar box = a(0)?
end

The viewpoint adaption table shows that trn->ref gives box. To get a ref item we'd need the array to be ref:

let a: Array[Bar ref] ref = recover Array[Bar ref] end
a.push(Bar)
try
  let b: Bar ref = a(0)?
end

For more on viewpoint adaption I recommend my Borrowing in Pony and Bang, Hat and Arrow posts.

Miscellaneous things

In my examples I've used explicit reference capabilities in types to make it a bit clearer of what is happening. They can be left off in places to get reasonable defaults.

When declaring types the default capability is the capability defined in the class:

class ref Foo

let a: Foo = ...

class val Bar
let b: Bar = ...

The type of a is Foo ref and the type of Bar is Bar val due to the annotation in the class definition. By default classes are ref if they aren't annotated.

The type of an object returned from a constructor is based on the type defined in the constructor:

class Foo
  new ref create() => ...
  new val create_as_val() => ...
  new iso create_as_iso() => ...

let a: Foo val = Foo.create_as_val()

Here the Foo class is the default type ref, but there are constructors that explicitly return ref, val and iso objects. As shown previously you can use recover to change the capability returned by a constructor in some instances.

let a: Foo val = recover Foo.create() end // Ok - ref to val
let b: Foo ref = recover Foo.create_as_val() end // Not Ok - val to ref

As can be seen, converting val to ref using recover is problematic as shown in previous examples.

Conclusion

Reference capabilities are complex for people new to Pony. An intuitive grasp can be gained without needing to memorize tables of conversions. This intuitive understanding requires thinking about how objects are being used and whether such use might cause data races or prevent sharing across actors. Based on that understanding you pick the capabilities to match what you want to use. For those times that errors don't match your understanding use the viewpoint adaption and capability subtyping tables to work out where things are going wrong. Over time your intuitive understanding improves for these additional edge cases.

Tags: pony 

2017-07-15

Runtime typing and eval in Alice ML

I originally wtote this post three years ago but I wasn't happy with how it read so never finished it. It's been sitting around in draft making me feel guilty for too long so I've cleaned it up and published it.

I like the style of prototyping and programming that dynamic languages like Self promote. When building systems inside the language environment it feels like you are living in a soup of objects. Programming becomes creating and connecting objects to perform tasks while debugging and refactoring as you go. The animated image below shows a use of the Self environment to instantiate and run a VNC object I wrote for example. Other examples can be seen in screencasts in my Self language posts.

Recently I've been using more statically typed languages to explore the world of type safety and how it can improve correctness of programs. My series of ATS posts go through a lot of the features that this approach provides. Most of these languages promote an edit/compile/link/run style of development and I miss the live development and debugging in the dynamic environments.

Some of the statically typed functional programming languages provide ways of doing dynamic types. Alice ML, developed in mid-2000, was an extension of Standard ML which provided support for concurrent, distributed and constraint programming. It was an attempt to see what a statically typed functional version of the Mozart/Oz language would be like. Development stopped in 2007 with the release of version 1.4 of Alice ML but the system remains very useable. I had been following Alice ML since the early days of its development and the concurrency and distribution features of it were the inspiration for some of my explorations with using futures and promises in JavaScript and concurrency in Factor.

As part of the support for distributed programming it required the ability to serialize and deserialize values along with their types. This form of dynamic behaviour would seem to be useful for developing a live coding environment. In fact Alice ML includes a GUI editor and REPL written in Alice ML that makes use of the library to evaluate, compile and produce components and executables.

I've imported the source of Alice ML into a github repository with minor bitrot changes and a couple of bug fixes so that it builds on recent Linux and Mac OS X systems. The code there is from the original Alice ML source with many changes and fixes made by Gareth Smith in his bitbucket repository. The original Alice developers and Gareth have kindly allowed me to host this on github.

Packages

Alice ML does dynamic runtime typing through packages. A package encapsulates a module and its signature. The package is an opaque type and accessing the module stored within is only possible via an unpack operation which requires giving the explicit signature of the module stored. If the signature doesn't match the type of the module stored then a runtime exception occurs. Packages can be passed around as first class values, stored in files, sent to other processes, etc.

Packages are created using the pack expression. This follows the form:

pack structure_expression : signature_expression

Where structure_expression and signature_expression are expressions that evaluate to Standard ML structures and signatures. The following would create a package for the value 42 stored in a module (as typed in the Alice ML REPL):

> val p = pack struct val x:int = 42 end : sig val x:int end;
val p : package = package{|...|}

In the pack expression a structure is created with an x member of type int and the value of that is 42. This structure is the value that is stored in the package. The type of this is given by the signature expression and when later unpacked only this signature can be used to get at the value. For simple examples like this the struct and sig syntax is quite verbose but Alice ML allows shortening this to just the contents of the structure and signature. The following is the equivalent shortened code:

> val p = pack (val x:int = 42) : (val x:int);
val p : package = package{|...|}

Getting the value back from a package is done using unpack. The general form of the unpack expression is:

unpack expression : signature_expression

If the signature_expression does not match the signature of the value stored in the package then an exception is raised. The type of the unpack expression is signature_expression so if it successfully unpacks then use of the resulting value is type safe. Unpacking our example above looks like:

> structure S = unpack p : sig val x:int end;
structure S : sig val x : int end

Or using the shorter syntax:

> structure S = unpack p : (val x:int);
structure S : sig val x : int end

The resulting module S can be used as any other SML module to access the fields within:

> print (Int.toString S.x);
42

Eval

To create an environment that allows evaluating code and manipulating results requires an eval facility. Alice ML provides this through the Compiler module. This module provides, amongst other functions, the following variants of eval:

val eval :     string -> package
val evalWith : env * string -> env * package

The first function, eval takes a string of Alice ML code, evaluates it, and returns the result as a package. The second, evalWith, takes an additional parameter which is the environment which is used to evaluate the code within. It also returns the modified envrionment after evaluating the code. This allows keeping a persistent state of changes made by the evaluated code.

The result is returned as a package because the type of the evaluated code is unknown. It could be anything. If the caller of eval needs to manipulate or display the result in some manner it needs to unpack it with a known type that it expects it to contain and handle any exception that might occur if the type is incorrect at runtime. An example of doing this is:

> val x = Compiler.eval("1+2");
val x : package = package{|...|}
> structure X = unpack x : (val it:int);
structure X : sig val it : int end
> X.it;
val it : int = 3

In this case the result of our evaluation is an int so this is what's used in the signature for the unpack expression.

An example using evalWith to track the changes to the environment is:

> val x = Compiler.evalWith(Compiler.initialEnv,
                            "fun fac(n:int) = if n <= 1 then 1 else n * fac(n - 1)");
val x : Compiler.env * package = (_val, package{|...|})
> val y = Compiler.evalWith(#1 x, "fac(10)");
val y : Compiler.env * package = (_val, package{|...|})
> structure Y = unpack (#2 y) : (val it:int);
structure Y : sig val it : int end
> Y.it;
val it : int = 3628800

The function evalWith returns a tuple where the first element is the resulting environment after the evaluation and the second element is the package containing the result. For the second call to evalWith the environment resulting from the first call is passed to it so the function fac can be found.

Pretty Printing

One thing to note in the previous example is that the call to unpack required knowing the type of what we were unpacking. This is usually the case but when writing a REPL we need to print the result of evaluating what is entered at the top level - and this could be any type depending on what the user entered to evaluate.

There are some internal Alice ML modules that make it possible to do this. An example follows:

> import structure Reflect     from "x-alice:/lib/system/Reflect";
> import structure PPComponent from "x-alice:/lib/system/PPComponent";
> import structure PrettyPrint from "x-alice:/lib/utility/PrettyPrint";
> val a = Compiler.prepareWith (Compiler.initialEnv, "1+2");
val a : Compiler.env * (unit -> package) * t = (_val, _fn, _val)
> val b = (#2 a) ();
val b : package = package{|...|}
> val c = Reflect.reflectPackage b;
val c : Reflect.module * t = (_val, _lazy)
> val d = PPComponent.ppComp(#1 c, #2 c);
val d : doc = _val
> PrettyPrint.toString(d,40);
val it : string = "val it : int = 3"

The Compiler.prepareWith function does not evaluate the string passed to it but performs part of the step of evaluation. It returns a tuple containing the environment which will result from evaluation, a function that when called will perform the evaluation, and a value representing the type of the result of the evaluation.

In step (b) the evaluation function is called which returns the package containing the result. Reflect.reflectPackage returns a tuple describing the package. These are passed to PPComponent.ppComp to return a PrettyPrint document. The pretty printer is based on A prettier printer by Phil Wadler. PrettyPrint.toString converts this to a string which could then be displayed by a REPL.

Conclusion

As mentioned previously the Alice ML tools are written in Alice ML. The toplevel code uses the modules and functions outlined previously to implement the REPL and IDE. Unfortunately it's mostly undocumented but the source is available to show how it is implemented and used.

There's much more to Alice ML run time use of types, including pickling, components, sandboxing, and distribution .

An interesting exercise would to to write a web based client to provide a "Try Alice ML" in a similar manner to other languages online playgrounds to allow trying Alice ML code snippets without needing to install it. I'd also like to explore how close to a Self like environment could be done in an Alice ML system.

Tags: aliceml 

2017-05-16

Distributed Wikipedia Mirrors in Freenet

There was a recent post about uncensorable Wikipedia mirrors on IPFS. The IPFS project put a snapshot of the Turkish version of Wikipedia on IPFS. This is a great idea and something I've wanted to try on Freenet.

Freenet is an anonymous, secure, distributed datastore that I've written a few posts about. It wasn't too difficult to convert the IPFS process to something that worked on Freenet. For the Freenet keys linked in this post I'm using a proxy that retrieves data directly from Freenet. This uses the SCGIPublisher plugin on a local Freenet node. The list of whitelisted keys usable are at freenet.cd.pn. There is also a gateway available at d6.gnutella2.info. The keys can also be used directly from a Freenet node, which is likely to be more performant than going through my underpowered proxy. Keep in mind that the "distributed, can't be taken down" aspect of the sites on Freenet is only when accessed directly through Freenet. It's quite likely my clearnet proxy won't be able to handle large amounts of traffic.

I started with the Pitkern/Norfuk Wikipedia Snapshot as that was relatively small. Once I got the scripts for that working I converted the Māori Wikipedia Snapshot. The lastest test I did was the Simple English Wikipedia Snapshot. This was much bigger so I did the version without images first. Later I plan to try the version with images when I've resolved some issues with the current process.

The Freenet keys for these mirrors are:

  • USK@m79AuzYDr-PLZ9kVaRhrgza45joVCrQmU9Er7ikdeRI,1mtRcpsTNBiIHOtPRLiJKDb1Al4sJn4ulKcZC5qHrFQ,AQACAAE/simple-wikipedia/0/
  • USK@jYBa5KmwybC9mQ2QJEuuQhCx9VMr9bb3ul7w1TnyVwE,OMqNMLprCO6ostkdK6oIuL1CxaI3PFNpnHxDZClGCGU,AQACAAE/maori-wikipedia/5/
  • USK@HdWqD7afIfjYuqqE74kJDwhYa2eetoPL7cX4TRHtZwc,CeRayXsCZR6qYq5tDmG6r24LrEgaZT9L2iirqa9tIgc,AQACAAE/pitkern-wikipedia/2/

The keys are 'USK' keys. These keys can be updated and have an edition number at the end of them. This number will increase as newer versions of the mirrors are pushed out. The Freenet node will often find the latest edition it knows about, or the latest edition can be searched for using '-1' as the edition number.

The approach I took for the mirroring follows the approach IPFS took. I used the ZIM archives provided by Kiwix and a ZIM extractor written in Rust. The archive was extracted with:

$ extract_zim wikipedia_en_simple_all_nopic.zim

This places the content in an out directory. All HTML files are stored in a single directory, out/A. In the 'simple english' case that's over 170,000 files. This is too many files in a directory for Freenet to insert. I wrote a script in bash to split the directory so that files are stored in '000/filename.html' where '000' is the first three digits of a SHA256 hash of the base filename, computed with:

$ echo "filename.html"|sha256sum|awk '{ print $1 }'|cut -c "1,2,3"

The script then went through and adjusted the article and image links on each page to point to the new location. The script does some other things to remove HTML tags that the Freenet HTML filter doesn't like and to add a footer about the origin of the mirror.

Another issue I faced was that filenames with non-ascii characters would get handled differently by Freenet if the file was inserted as a single file vs being inserted as part of a directory. In the later case the file could not be retrieved later. I worked around this by translating filenames into ascii. A more robust solution would be needed here if I can't track down where the issue is occurring.

This script to do the conversion is in my freenet-wikipedia githib repository. To convert a ZIM archive the steps are:

$ wget http://download.kiwix.org/zim/wikipedia_pih_all.zim
$ extract_zim wikipedia_pih_all.zim
$ ./convert.sh
$ ./putdir.sh result my-mirror index.html

At completion of the insert this will output a list of keys. the uri key is the one that can be shared for others to retrieve the insert. The uskinsert key can be used to insert an updated version of the site:

$ ./putdir.sh result my-mirror index.html <uskinsert key>

The convert.sh script was a quick 'proof of concept' hack and could be improved in many ways. It is also very slow. It took about 24 hours to do the simple english conversion. I welcome patches and better ways of doing things.

The repository includes a bash script, putdir.sh, which will insert the site using the Freenet ClientPutDiskDir API message. This is a useful way to get a directory online quickly but is not an optimal way of inserting something the size of the mirror. The initial request for the site downloads a manifest containing a list of all the files in the site. This can be quite large. It's 12MB for the Simple English mirror with no images. For the Māori mirror it's almost 50MB due to the images. The layout of the files doesn't take into account likely retrieval patterns. So images and scripts that are included in a page are not downloaded as part of the initial page request, but may result in pulling in larger amounts of data depending on how that file was inserted. A good optimisation project would be to analyse the directory to be inserted and create an optimal Freenet insert for faster retrieval. pyFreenet has a utility, freesitemgr, that can do some of this and there are other insertion tools like jSite that may also do a better job.

My goal was to do a proof of concept to see if a Wikipedia mirror on Freenet was viable. This seems to be the case and the Simple English mirror is very usable. Discussion on the FMS forum when I announced the site has been positive. I hope to improve the process over time and welcome any suggestions or enhancements to do that.

What are the differences between this and the IPFS mirror? It's mostly down to how IPFS and Freenet work.

In Freenet content is distributed across all nodes in the network. The node that has inserted the data can turn their node off and the content remains in the network. No single node has all the content. There is redundancy built in so if nodes go offline the content can still be fully retrieved. Node space is limited so as data is inserted into Freenet, data that is not requested often is lost to make room. This means that content that is not popular disappears over time. I suspect this means that some of the wikipedia pages will become inaccessible. This can be fixed by periodically reinserting the content, healing the specific missing content, or using the KeepAlive plugin to keep content around. Freenet is encrypted and anonymous. You can browse Wikipedia pages without an attacker knowing that you are doing so. Your node doesn't share the Wikipedia data, except possibly small encrypted chunks of parts of it in your datastore, and it's difficult for the attacker to identify you as a sharer of that data. The tradeoff of this security is retrievals are slower.

In IPFS a node inserting the content cannot be turned off until that content is pinned by another node on the network and fully retrieved. Nodes that pin the content keep the entire content on their node. If all pinned nodes go offline then the content is lost. All nodes sharing the content advertise that fact. It's easy to obtain the IP address of all nodes that are sharing Wikipedia files. On the positive side IPFS is potentially quite a bit faster to retrieve data.

Both IPFS and Freenet have interesting use cases and tradeoffs. The intent of this experiment is not to present one or the other as a better choice, but to highlight what Freenet can do and make the content available within the Freenet network.

Tags: freenet 

2017-04-27

Installing GNAT and SPARK GPL Editions

GNAT is an implementation of the Ada programming language. SPARK is a restricted subset of Ada for formally verifying programs. It provide features comparable to languages like Rust and ATS. A recent article comparing SPARK to Rust caught my eye and I decided to spend some time learnig Ada and SPARK. This post just outlines installing an implementation of both, a quick test to see if the installation worked, and some things to read to learn. I hope to post more later as I learn more.

Installation

Download GNAT GPL from libre.adacore.com. Choose "Free Software or Academic Development" and click "Build Your Download Package". Select the platform and click the checkboxes next to the required components. For my case I chose them all but "GNAT Ada 2016" and "Spark 2016" are the main ones I needed.

To install Ada and SPARK from the downloaded tar file:

$ tar xvf AdaCore-Download-2017-04-27_0537.tar
$ cd x86_64-linux/adagpl-2016/gnatgpl
$ mkdir ~/ada
$ tar -xf gnat-gpl-2016-x86_64-linux-bin.tar.gz
$ cd gnat-gpl-2016-x86_64-linux-bin
$ ./doinstall
...answer prompts about where to install...
...for this example I used /home/username/gnat...
$ export PATH=/home/username/gnat/bin:$PATH

$ cd ../sparkgpl
$ tar -xf spark-gpl-2016-x86_64-linux-bin.tar.gz
$ cd spark-gpl-2016-x86_64-linux-bin
$ ./doinstall
...answer prompts about where to install...
...it should pick up the location used above...

Be aware that the install comes with its own gcc and other utilities. By putting it first in the PATH they are used over the systems versions.

Testing GNAT

The following is a "Hello World" application in Ada:

with Ada.Text_IO; use Ada.Text_IO;
procedure Hello is
begin
  Put_Line ("Hello World!");
end Hello;

It imports a package, Ada.Text_IO, and uses it so the package contents can be used without prefixing them with the package name. A procedure called Hello is created that outlines a line of text. If put in a file hello.adb it can be compiled with:

$ gnatmake hello.adp
gnatbind -x hello.ali
gnatlink hello.ali

$ ./hello
Hello World!

Completely static executables can also be created:

$ gnatmake hello.adb -bargs -static -largs -static
$ ldd hello
not a dynamic executable
$ ./hello
Hello World!

Testing SPARK

I used an example taken from Generating Counterexamples for failed Proofs. The SPARK checker, gnatproof, requires a project file. This is the contents of saturate.gpr:

project Saturate is
   for Source_Dirs use (".");

   package Compiler is
      for Default_Switches ("Ada") use ("-gnatwa");
   end Compiler;
end Saturate;

It gives the project name, Saturate, the location to search for source files (the current directory), and any compiler switches. The function to be implemented is a saturation function. It ensures a value given to it is in a specific range. In this case, a non-negative value less than or equal to 255. In file saturate.ads we put the interface definition:

with Interfaces;
use Interfaces;

function Saturate (Val : Unsigned_16) return Unsigned_16 with
  SPARK_Mode,
  Post => Saturate'Result <= 255 and then
         (if Val <= 255 then Saturate'Result = Val);

The code first pulls the Interfaces package into the current namespace. This provides unprefixed access to Unsigned_16. It declares a function, Saturate, that takes an Unsigned_16 as an argument and returns the same type. The SPARK_Mode is an annotation that identifes code to be checked by SPARK. The Post portion is a postcondition that the implementation of the function must adhere to. In this case the result must be less than 255 and if the given value is less than 255 then the result will be equal to the value.

The implementation of the function is in a file saturate.adb:

function Saturate (Val : Unsigned_16) return Unsigned_16 with
  SPARK_Mode
is
begin
  return Unsigned_16'Max (Val, 255);
end Saturate;

This calls the Max function for Unsigned_16 types to return the maximum between the given value and 255. The code compiles with the Ada compiler:

$ gnatmake saturate.adb
gcc -c saturate.adb

It fails however when running the SPARK checker:

$ gnatprove -Psaturate 
Phase 1 of 2: generation of Global contracts ...
Phase 2 of 2: flow analysis and proof ...
saturate.ads:6:11: medium: postcondition might fail (e.g. when Saturate'Result = 255 and Val = 0)
Summary logged in gnatprove/gnatprove.out

This tells us that the postcondition might fail if the given value to the function is 0 and the result is 255. This is because we are using Max - given the value 0 to Saturate, the Max of 0 and 255 is 255. The function result will be 255. The postcondition however states that the result should be equal to val - it should be 0. Changing the function call to Min fixes it:

$ gnatprove -Psaturate 
Phase 1 of 2: generation of Global contracts ...
Phase 2 of 2: flow analysis and proof ...
Summary logged in gnatprove/gnatprove.out

Having a postcondition that states what the result should be is probably unlikely in a lot of code. If the signature was the following, would SPARK find the error still?:

function Saturate (Val : Unsigned_16) return Unsigned_16 with
  SPARK_Mode,
  Post => Saturate'Result <= 255

$ gnatprove -Psaturate 
Phase 1 of 2: generation of Global contracts ...
Phase 2 of 2: flow analysis and proof ...
saturate.ads:6:11: medium: postcondition might fail,
         cannot prove Saturate'Result <= 255 (e.g. when Saturate'Result = 256)
Summary logged in gnatprove/gnatprove.out

Apparently so. Now it identifies that the result can be 256. Other examples following different contracts on the function are in the original article.

Documentation

The GNAT User's Guide for Native Platforms and Spark 2014 User's Guide contains the instructions for the main tools. GNAT can interface with C and C++. There is a full list of documentation here. Two useful books covering Ada and Spark:

Some technical papers that give a quick overview of Ada:

I used the command line tools here but there is a gps command which is a full graphical IDE which may be more approachable. I'm looking forward to using Ada and SPARK and seeing how they compare to tools like Rust and ATS.

Tags: ada  spark 

2017-04-24

Shen Language Port for Wasp Lisp

This post intersects two of my favourite lispy languages. Shen is a functional programming language with a number of interesting features. These include:

  • Optional static type checking
  • Pattern matching
  • Integrated Prolog system
  • Parsing libraries

I've written about Shen Prolog before which gives a bit of a feel for the language.

Wasp Lisp is a small Scheme-like lisp with lightweight concurrency and the ability to send bytecode across the network. It's used in the MOSREF secure remote injection framework. I've written a number of posts about it.

A feature of Shen is that it is designed to run on top of a lighter weight lisp called KLambda. KLambda has only about 46 primitives, many of which already exist in lisp systems, making it possible to write compilers to other languages without too much work. There exist a few Shen ports already. I wanted to port Shen to Wasp Lisp so I can experiment with using the pattern matching, Prolog and types in some of the distributed Wasp code I use.

Wasp Lisp is not actively developed but the author Scott Dunlop monitors the github repository and processes pull requests. Shen requires features that Wasp Lisp doesn't currently support, like real numbers. I maintain a fork on github that implements the features that Shen needs and any features that apply back to core Wasp Lisp I'll upstream.

This port is heavily based on the Shen Scheme implementation. Much of the code is ported from Scheme to Wasp Lisp and the structure is kept the same. The license for code I wrote is the same as the Shen Scheme License, BSD3-Clause.

The Shen Source is written in the Shen language. Using an existing Shen implementation this source is compiled to Klambda:

$ shen-chibi
(0-) (load "make.shen")
(1-) (make)
compiling ...

To port to another language then becomes writing a KLambda interpreter or compiler. In this case it's a compiler from KLambda to Wasp Lisp. Implementing the primitives is also required but there aren't many of them. Some of the characters that KLambda uses in symbols aren't compatible with the Wasp reader so I used an S-expression parser to read the KLambda code and then walked the tree converting expressions as it went. This is written in Wasp code, converted from the original Scheme. In hindsight it probably would have been easier to write this part in Shen and bootstrap it in another Shen instance to make use of Shen's parsing and pattern matching libraries.

Shen makes heavy use of tail calls in code meaning some form of tail call optimisation is needed to be efficient. In a previous post I mentioned some places where Wasp doesn't identify tail calls. These are cases Shen hit a lot, causing performance issues. I made some changes to the optimizer to identify these cases and it improved the Shen on Wasp runtime performance quite a bit.

Current Port State

This is a very early version. I've only just got it working. The Shen tests pass with the exception of the Proof Assistant test which hangs when loading.

Note 2017-04-26: The bug with the proof assistant test not passing is now fixed. It was caused by an integer overflow when computing complexities within the Shen prolog code. Wasp integers are smaller than other Shen implementations which is why none of them hit the issue. The binaries have been updated with this fix.

The port is slower than I'd like - about half the speed of the Shen C interpreter and significantly slower than Shen Scheme and Shen on SBCL. I've done some work on optimizing tail calls in the fork of the Wasp VM for Shen but there's much more work on the entire port that could improve things.

Binaries

The following compiled binaries are available:

shen_static.bz2. This is a static 64-bit linux binary with no dependancies. It should run on any 64-bit Linux system. Decompress with:

$ bunzip2 shen_static.bz2
$ chmod +x shen_static
$ ./shen_static

shen_macos.bz2. 64-bit binary for Mac OS. Decompress with bunzip2 as above.

shen.zip. The zip file contains a Windows 64-bit binary, shen.exe. It should run on any modern 64-bit Windows system.

Building

First step, build the fork of Wasp Lisp needed to run:

$ git clone --branch shen https://github.com/doublec/WaspVM wasp-shen
$ cd wasp-shen
$ make install

Follow the prompts for the location to install the wasp lisp binaries and add that bin directory of that location to your path:

$ export PATH=$PATH:/path/to/install/bin

Shen is provided in source code format from the Shen Sources github repository. The code is written in Shen. It needs a working Shen system to compile that code to KLambda, a small Lisp subset that Shen uses as a virtual machine.

This KLamda code can be found in the kl directory in the shen-wasp repository. These KLambda files are compiled to Wasp Lisp and stored as compiled code in the compiled directory. The shen wasp repository includes a recent version of these files. To generate, or re-generate, run the following commands:

$ git clone https://github.com/doublec/shen-wasp
$ cd shen-wasp
$ rlwrap wasp
>> (import "driver")
>> (compile-all)
Compiling toplevel.kl
Compiling core.kl
Compiling sys.kl
Compiling sequent.kl
Compiling yacc.kl
Compiling reader.kl
Compiling prolog.kl
Compiling track.kl
Compiling load.kl
Compiling writer.kl
Compiling macros.kl
Compiling declarations.kl
Compiling types.kl
Compiling t-star.kl

This will create files with the Wasp Lisp code in the compiled/*.ms files, and the compiled bytecode in compiled/*.mo files.

Creating a Shen executable can be done with:

$ waspc -exe shen shen.ms
$ chmod +x shen
$ rlwrap ./shen
Shen, copyright (C) 2010-2015 Mark Tarver
www.shenlanguage.org, Shen 20.0
running under Wasp Lisp, implementation: WaspVM
port 0.3 ported by Chris Double


(0-) 

Note that it takes a while to startup as it runs through the Shen and KLambda initialization.

Running from the Wasp REPL

Shen can be run and debugged from the Wasp REPL. To load the compiled code and run Shen:

$ rlwrap wasp
>> (import "driver")
>> (load-all)
>> (kl:shen.shen)
Shen, copyright (C) 2010-2015 Mark Tarver
www.shenlanguage.org, Shen 20.0
running under Wasp Lisp, implementation: WaspVM
port 0.3 ported by Chris Double


(0-)

When developing on the compiler it's useful to use eval-all instead of load-all. This will load the KLambda files, compile them to Scheme and eval them:

>> (eval-all)
>> (kl:shen.shen)
...

A single input line of Shen can be entered and run, returning to the Wasp REPL with:

>> (kl:shen.read-evaluate-print) 
(+ 1 2)
3:: 3

KLambda functions can be called from Wasp by prefixing them with kl:. For example:

>> (kl:shen.read-evaluate-print)
(define factorial
  1 -> 1
  X -> (* X (factorial (- X 1))))
factorial:: factorial
>> (kl:factorial 10)
:: 3628800

Shen allows introspecting compiled Shen functions and examining the KLambda code. From the Wasp REPL this is useful for viewing the KLambda and comparing with the generated Wasp Lisp:

>> (kl:ps 'factorial)
:: (defun factorial (V1172) (cond (...) (...)))
>> (pretty (kl:ps 'factorial))
(defun factorial (V1172 ) (cond ((= 1 V1172 ) 1 ) (#t (* V1172 (factorial (- V1172 1 ) ) ) ) ) ) :: null
>> (pretty (kl->wasp (kl:ps 'factorial)))
(begin (register-function-arity (quote factorial ) 1 )
       (define (kl:factorial V1172)
         (cond
           ((kl:= 1 V1172) 1)
           (#t (* V1172 (kl:factorial (- V1172 1))))))
       (quote factorial ) ) :: null

Cross Compilation

Wasp binaries are a small Wasp VM stub plus the compiled Lisp code appended to it. This makes building for other platforms easy as long as you have the stub for that platform. Wasp can be built for Android and static binaries via musl are possible.

I've made the following stubs available for building binaries for other systems:

Decompress them and copy into the lib/waspvm-stubs directory where Wasp Lisp was installed. Shen can then be built on any host platform for 64 bit linux, 64 bit Linux static binaries, 64 bit Windows or 64 bit Mac OS with:

$ waspc -exe shen -platform linux-x86_64 shen.ms
$ waspc -exe shen_static -platform static-linux-x86_64 shen.ms
$ waspc -exe shen.exe -platform win-x86_64 shen.ms
$ waspc -exe shen_macos -platform Darwin-x86_64 shen.ms

Learning Shen

Some places to go to learn Shen:

Other Ports

Tags: shen  waspvm 


This site is accessable over tor as hidden service mh7mkfvezts5j6yu.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-44/


Tags

Archives
Links