Borrowing Internal Pointers in ATS

Recently Robert O'Callahan wrote a post on what Rust can do that other languages can't. It involves getting a pointer to an internal field of a stack allocated object and having the type system ensure that the lifetime of that pointer doesn't outlive the owning object.

As soon as I saw this I thought "Aha, ATS can do that!" but I suspected it would be quite a bit more verbose. It turns out it's possible but tedious to do and while ATS 1 can do it in safe code, ATS 2 requires some help. This isn't a "Robert's post is wrong" post or a tear down of Rust, it's a thought experiment in how to do things in ATS that are easier in Rust. Plus ATS is in no way a "remotely mainstream language".

The following is a representation of the scenario that works in ATS 1. First we declare the type of an object that contains two fields:

viewtypedef X = @{
  a= int,
  b= int
}

A function that safely returns a pointer to one of the fields looks like this:

fun get_b_ref {l:addr} (pf: X @ l | x: ptr l):
              [l2:addr] (int @ l2, int @ l2 -<lin> X @ l | ptr l2) =
let
  val p = &(x->b)
  prval pf = view@ (x->b)
in
  #[.. | (pf, llam pf => let
          prval () = view@ (x->b) := pf
        in
          view@ (!x)
        end
    | p)]
end

This can look imposing if you're not familiar with ATS. What does this all mean? Once you're familiar with ATS you can read this as get_b_ref takes a pointer to an X and a proof that you can use it. It consumes this proof meaning we can't access the X object after calling. It returns a pointer to an int - one of the internal fields of X - and provides a proof to use it as an int. It also returns a proof function that consumes that proof of an int and returns the original proof that there is an X at l - allowing you to use the original x as an X again and preventing using the returned pointer to an int due to the proof allowing you to use that getting consumed.

In more detail, it defines a function, get_b_ref, that takes a pointer to an X object as an argument: (pf: X @ l | x: ptr l). The items to the left of the | are proof arguments and only exist at type checking time. This says that pf contains a proof that an object of type X exists at address l. The l is defined in the {l:addr} section of the function declaration meaning l is a memory address. The items to the right of the | are normal values. In this case, x is a pointer with address l. Because we have a pointer at address l and a proof that an object of type X exists at l we can treat the pointer safely as being a pointer to an X. The function consumes the proof when called. That means that the proof cannot be used again so we can no longer treat x as a pointer to an X. It's a pointer to something we can't do anything with after calling.

The result of the function comes after the : symbol. The (int @ l2, int @ l2 -<lin> X @ l | ptr l2) is the meat of it. Again, proofs are to the left of the | and normal values to the right. In this case the right is ptr l2 which means a pointer of some address, l2, is returned. The intent of the function is to return an internal pointer to one of the int fields so we'd expect to have it return a proof that l2 points to an int. The proof returned is a tuple: (int @ l2, int @ l2 -<lin> X @ l).

The first field of that tuple is int @ l2 which is the proof we expected. So a caller of this function will have a pointer to an int and the proof to use it as such. The second field of the tuple is int @ l2 -<lin> X @ l. This is the syntax for declaring an anonymous function type. The items to the left of the -<...> are the arguments to the function and the items to the right are the return value. The second field of the tuple is therefore a function that takes a proof that an int exists at address l2 and returns a proof that an X exists at l. The lin means this function is linear - it can only be used once. Because this tuple appears in the proof section of the result, this anonymous function is a proof function - it is called at type checking time.

The implementation of the function gets a pointer to the b field of the object and a proof that we can use it:

val p = &(x->b)
prval pf = view@ (x->b)

The prval line is proof code that runs at type checking time. view@ is a ATS keyword that gets the proof for an object if we can access it. Once we get a proof using view@ we need to return it later else the proof checking fails. This is a form of 'borrowing' - we are borrowing the proof to access the internal object and we have to give it back later otherwise we can't access that field again and the compiler complains if we don't give it back.

The main part of the return value is:

(pf, llam pf => let
              prval () = view@ (x->b) := pf
            in
              view@ (!x)
            end
        | p)]

It is a tuple, with two proof arguments and a value. The first proof argument is the proof of int @ someaddress to access the internal field we got earlier using view@. The second is a linear lambda function. That function is what returns the proof back to the view@ for the internal field in this line:

prval () = view@ (x->b) := pf

This is the returning of the proof we borrowed earlier. The linear lambda function returns the view@ of the original X object. The value being returned p is the pointer to the internal field.

The [.. | ] syntax is a way of telling the compiler that the return value is of a particular existential type and that it should compute it from the context of the expression such that it matches the [l2:addr] existential result type we declared in the function declaration. Ideally the compiler would infer this without our help but unfortunately it does not. It will tell us if we get it wrong though.

Usage of the function looks like:

implement main() = let
  var x: X = @{ a=1, b=2 }
  val (pf, pff | p) = get_b_ref(view@ x | &x)
  val _ = !p := 3
  prval () = view@ x := pff(pf)
in
  println!("x.b = ", x.b)
end

A variable x is declared on the stack. We get a pointer to the b field using the function above and set that field via the pointer. We're allowed to do this because we have a proof that p is a pointer to an int via the pf proof variable. Calling get_b_ref makes x no longer usable as get_b_ref consumed the proof to access it. The proof function, pff is used to return the proof to access X so we can then use x again. After that we can't use the internal pointer to b anymore. The compiler prevents the escape of the proof to access the internal pointer from the lifetime of the stack variable x.

Unfortunately this code doesn't work in ATS 2. The equivalent code, which fails to compile, is:

fun get_b_ref {l:addr} .<>. (pf: X @ l | x: ptr l):<>
                            [l2:addr] (int @ l2, int @ l2 -<lin> X @ l | ptr l2) =
let
  val p = addr@(x->b)
  prval pf = view@ (x->b)
in
  (pf, llam (pf) => let
          prval () = view@ (x->b) := pf
        in
          view@ (!x)
        end
    | p)
end

ATS 2 does not allow the use of view@ (!x) inside the lambda if the x comes from outside the lambda as it does in this case. Maybe this will change in future iterations of ATS 2 but currently it does not. One workaround is to resort to embedded C code:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

viewtypedef X = @{
  a= int,
  b= int
}
extern viewtypedef "X" = X

%{
int* get_b_ref(X* x) {
  return &x->atslab__b;
}
%}

extern fun get_b_ref {l:addr} (pf: X @ l | x: ptr l):<>
         [l2:addr] (int @ l2, int @ l2 -<lin> X @ l | ptr l2) = "mac#get_b_ref"

implement main0() = let
  var x: X = @{ a=1, b=2 }
  val (pf, pff | p) = get_b_ref(view@ x | addr@ x)
  val _ = !p := 3
  prval () = view@ x := pff(pf)
in
  println!("x.b = ", x.b)
end

Here the X object is given a name "X" that can be accessed from C using the line:

extern viewtypedef "X" = X

The get_b_ref function is implemented in C and given an extern definition that matches the previous one. Although the C implementation is effectively 'unsafe' code, any calls of get_b_ref from ATS is safe as the type defines exactly what it does. This is an example of implementing a typesafe kernel around potentially unsafe code. In this case there is no runtime overhead as the typesafe kernel is proof code.

Which approach to use depends on the project. Often the overhead of implementing safe pointer code in ATS in terms of satisfying the proofs of the implementation is too much at the time. Embedded C with a safe external definition is a faster approach to get things done. Later the C code could be replaced with proof-correct ATS as desired.

Hopefully this gives some insight in what ATS can do, and gives some appreciation of the simplicity of the approach Rust has taken. ATS tends towards providing the low level tools to build the level of access you want but this has cognitive and syntactic overhead. Once you're famliar with the system it becomes understandable but it's probably not "new user friendly" like Rust is.