# Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.

2018-10-16

## Generalized Algebraic Data Types in ATS

The ATS programming language supports defining Generalized Algebraic Data Types (GADTS). They allow defining datatypes where the constructors for the datatype are explicitly defined by the programmer. This has a number of uses and I'll go through some examples in this post. GADTs are sometimes referred to as Guarded Recursive Datatypes.

Some useful resources for reading up on GADTs that I used to write this post are:

The examples here were tested with ATS2-0.3.11.

## Arithmetic Expressions

This is probably the most common demonstration of GADT usage and it's useful to see how to do it in ATS. The example is taken from the Haskell/GADT Wikibook.

First we'll create datatype to represent a simple expression language, and write an evaluation function for it, without using GADTs:

``````#include "share/atspre_define.hats"

datatype Expr  =
| I of int
| Mul of (Expr, Expr)

fun eval (x:Expr): int =
case+ x of
| I i => i
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)

implement main0() = let
val term = Mul(I(2), I(4))
val res = eval(term)
in
println!("res=", res)
end
``````

`Expr` is a datatype with three constructors. `I` represents an integer, `Add` adds two expressions together and `Mul` multiples two expressions. The `eval` function pattern matches on these and evaluates them. The example can be compiled with the following if placed in a file `arith1.dats`:

``````\$ patscc -DATS_MEMALLOC_LIBC -o arith1 arith1.dats
\$ ./arith1
res=8
``````

Now we extend the expression language with another type, booleans, and add an `Expr` constructor to compare for equality:

``````datatype Expr  =
| I of int
| B of bool
| Mul of (Expr, Expr)
| Eq of (Expr, Expr)

(* Does not typecheck *)
fun eval (x:Expr): int =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)
| Eq (t1, t2) => eval(t1) = eval(t2)
``````

This code fails to typecheck - the `eval` function is defined to return an `int` but the `B b` pattern match returns a boolean. We can work around this by making the result value of `eval` return a datatype that can represent either an `int` or a `bool` but then we have to add code throughout `eval` to detect the invalid addition or multiplication of a boolean and raise a runtime error. Ideally we'd like to have make it impossible to construct invalid expressions such that they error out at compile time.

We can imagine the type constructors for `Expr` as if they were functions with a type signature. The type signature of the constructors would be:

``````fun I(n:int): Expr
fun B(b:bool): Expr
fun Mul(t1:Expr, t1:Expr): Expr
fun Eq(t1:Expr, t2:Expr): Expr
``````

The problem here is that `Add` and `Mul` both take `Expr` as arguments but we want to restrict them to integer expressions only - we need to differentiate between integer and boolean expressions. We'd like to add a type index to the `Expr` datatype that indicates if it is a boolean or an integer expression and change the constructors so they have types like:

``````fun I(n:int): Expr int
fun B(b:bool): Expr bool
fun Add(t1:Expr int, t2:Expr int): Expr int
fun Mul(t1:Expr int, t1:Expr int): Expr int
fun Eq(t1:Expr int, t2:Expr int): Expr bool
``````

Using these constructors would make it impossible to create an expression that would give a runtime error in an `eval` function. Creating an `Expr` with a type index looks like:

``````datatype Expr(a:t@ype)  =
| I(a) of int
| B(a) of bool
| Add(a) of (Expr int, Expr int)
| Mul(a) of (Expr int, Expr int)
| Eq(a) of (Expr int, Expr int)
``````

This adds a type index of sort `t@ype`. A 'sort' in ATS is the type of type indexes. The `t@ype` is for types that have a flat memory structure of an unknown size. The '@' sigil looks odd inside the name, but '@' is used elsewhere in ATS when defining flat records and tuples which is a useful mnemonic for remembering what the '@' within the sort name means.

This doesn't help our case though as the type signatures for the constructors would be:

``````fun I(n:int): Expr a
fun B(b:bool): Expr a
fun Add(t1:Expr int, t2:Expr int): Expr a
fun Mul(t1:Expr int, t1:Expr int): Expr a
fun Eq(t1:Expr int, t2:Expr int): Expr a
``````

There is still the problem that an `Expr a` is not an `Expr int` for `Add`, `Mul` and `Eq`, and requiring runtime errors when pattern matching. This is where GADTs come in. GADTs enable defining a datatype that provides constructors where the generic `a` of the type index can be constrained to a specific type, different for each constructor. The `Expr` datatype becomes:

``````datatype Expr(t@ype)  =
| I(int) of int
| B(bool) of bool
| Add(int) of (Expr int, Expr int)
| Mul(int) of (Expr int, Expr int)
| Eq(bool) of (Expr int, Expr int)
``````

The constructor type signature now match those that we wanted earlier and it becomes possible to check at compile time for invalid expressions. The `eval` functions looks similar, but is parameterized over the type index:

``````fun{a:t@ype} eval(x:Expr a): a =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)
| Eq (t1, t2) => eval(t1) = eval(t2)
``````

The "`{a:t@ype}`" syntax following `fun` is ATS' way of defining a function template where `a` is the template argument. ATS will generate a new function specific for each `a` type used in the program.

An example `main` function to demonstrate the new GADT enabled `Expr` and `eval`:

``````implement main0() = let
val term1 = Eq(I(5), Add(I(1), I(4)))
val term2 = Mul(I(2), I(4))
val res1 = eval(term1)
val res2 = eval(term2)
in
println!("res1=", res1, " and res2=", res2)
end

\$ patscc -DATS_MEMALLOC_LIBC -o arith2 arith2.dats
\$ ./arith2
res1=true and res2=8
``````

You may have noticed that our `Expr` type can't represent equality between booleans, or between booleans and integers. It's defined to only allow equality between integers. It's possible to make equality polymorphic but I'll go through this later as it brings up some more complicated edge cases in types with ATS beyond just using GADTs.

## Typesafe sprintf

This example of GADT usage in ATS tries to implement a `sprintf` function that allows providing a format specification to `sprintf` which then processes it and accepts arguments of types as defined by the format specification.

The format specification is defined as a GADT:

``````datatype Format (a:type) =
| I(int -<cloref1> a) of (Format a)
| S_(a) of (string, Format a)
| S0(string) of string
``````

Each constructor represents part of a format string. The type index is the type that `sprintf` will return given a format string for that type. `sprintf` is declared as:

``````fun sprintf {a:type} (fmt: Format a): a
``````

Here are some example invocations and the types that should result:

``````// Returns a string
sprintf (S0 "hello")

// Returns a string
sprintf (S_ ("hello", S0 "world"))

// Returns a function that takes an int and returns a string
sprintf (I (S0 "<- is a number"))
``````

The last example shows how `sprintf` emulates variable arguments using closures. By returning a function any immediately following tokens are treated as arguments to that function. For example:

``````sprintf (I (S0 "<- is a number")) 42
(int -> string) 42
``````

With some syntactic sugar using currying we can make this look nicer. Curried functions is a feature that isn't often used in ATS as it requires the garbage collector (to clean up partially curried functions) and performance is diminished due to the allocation of closures but it's useful for this example.

With the `Format` datatype defined, here's the implementation of `sprintf`:

``````fun sprintf {a:type} (fmt: Format a): a = let
fun aux {a:type}(pre: string, fmt': Format a): a =
case fmt' of
| I fmt => (lam (i) => aux(append(pre, int2string(i)), fmt))
| S_ (s, fmt) => aux(append(pre, s), fmt)
| S0 s => append(pre, s)
in
aux("", fmt)
end
``````

It uses an inner function, `aux`, that does the work of deconstructing the format argument and returning a closure when needed, and appending strings as the result string is computed. For this example I'm cheating with memory management to keep the focus on GADT, and off dealing with the intricacies of linear strings. The example should be run with the garbage collector to free the allocated strings.

Here's an example of executing what we have so far:

``````implement main0() = let
val fmt = S_ ("X: ", (I (S_ (" Y: ", (I (S0 ""))))))
val s = sprintf fmt 5 10
in
println!(s)
end

\$ ./format
X: 5 Y: 10
``````

The format string is pretty ugly though. Some helper functions make it a bit more friendly:

``````fun Str {a:type} (s: string) (fmt: Format a) = S_ (s, fmt)
fun Int {a:type} (fmt: Format a) = I (fmt)
fun End (s:string) = S0 (s)

infixr 0 >:
macdef >: (f, x) = ,(f) ,(x)
``````

Now the format definition is:

``````val fmt = Str "X: " >: Int >: Str " Y: " >: Int >: End ""
``````

The string handling functions that I threw together to append strings and convert an integer to a string without dealing with linear strings follow.

``````fun append(s0: string, s1: string): string = let
extern castfn from(s:string):<> Strptr1
extern castfn to(s:Strptr0):<> string
extern prfun drop(s:Strptr1):<> void
val s0' = from(s0)
val s1' = from(s1)
val r = strptr_append(s0', s1')
prval () = drop(s0')
prval () = drop(s1')
in
to(r)
end

fun int2string(x: int): string =
tostring(g0int2string x) where {
extern castfn tostring(Strptr0):<> string
}
``````

## Issues

These example of GADT usage were pulled from the papers and articles mentioned at the beginning of the post. They were written for Haskell or Dependent ML (a predecessor to ATS). There may be better ways of approaching the problem in ATS and some constraints on the way ATS generates C code limits how GADTs can be used. One example of the limitation was mentioned in the first example of the expression evaluator.

If the `Expr` datatype is changed so that `Eq` can work on any type, not just `int`, then a compile error results:

``````datatype Expr(a:t@ype)  =
| I(a) of int
| B(a) of bool
| Add(a) of (Expr int, Expr int)
| Mul(a) of (Expr int, Expr int)
| Eq(a) of (Expr a, Expr a)
``````

The error occurs due to the `Eq` branch in the case statement in `eval`:

``````fun{a:t@ype} eval(x:Expr a): a =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)
| Eq (t1, t2) => eval(t1) = eval(t2)
``````

The compiler is unable to find a match for the overloaded `=` call for the general case of any `a` (vs the specific case of `int` as it was before). This is something I've written about before and the workaround in that post gets us past that error:

``````extern fun{a:t@ype} equals(t1:a, t2:a): bool
implement equals<int>(t1,t2) = g0int_eq(t1,t2)
implement equals<bool>(t1,t2) = eq_bool0_bool0(t1,t2)

fun{a:t@ype} eval(x:Expr a): a =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)
| Eq (t1, t2) => equals(eval(t1), eval (t2))
``````

Now the program typechecks but fails to compile the generated C code. This can usually be resolved by explicitly stating the template parameters in template function calls:

``````fun{a:t@ype} eval(x:Expr a): a =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval<int>(t1) + eval<int>(t2)
| Mul (t1, t2) => eval<int>(t1) * eval<int>(t2)
| Eq (t1, t2) => equals<?>(eval<?>(t1), eval<?>(t2))
``````

The problem here is what types to put in the '`?`' in this snippet? The `Expr` type defines this as being any `a` but we want to constrain it to the index used by `t1` and `t2` but I don't think ATS allows us to declare that. This mailing list thread discusses the issue with some workarounds.

Tags: ats

2018-09-24

## Concurrent and Distributed Programming in Web Prolog

SWI Prolog is an open source Prolog implementation with good documentation and many interesting libraries. One library that was published recently is Web Prolog. While branded as a 'Web Logic Programming Language' the implementation in the github repository runs under SWI Prolog and adds Erlang style distributed concurrency. There is a PDF Book in the repository that goes into detail about the system. In this post I explore some of features.

## Install SWI Prolog

The first step is to install and run the SWI Prolog development version. There is documentation on this but the following are the basic steps to download and build from the SWI Prolog github repository:

``````\$ git clone https://github.com/SWI-Prolog/swipl-devel
\$ cd swipl-devel
\$ ./prepare
... answer yes to the various prompts ...
\$ cp build.tmpl build
\$ vim build
...change PREFIX to where you want to install...
\$ make install
\$ export PATH=<install location>/bin:\$PATH
``````

There may be errors about building optional libraries. They can be ignored or view dependancies to see how to install.

The newly installed SWI Prolog build can be run with:

``````\$ swipl
Welcome to SWI-Prolog (threaded, 64 bits, version 7.7.19-47-g4c3d70a09)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.

For built-in help, use ?- help(Topic). or ?- apropos(Word).

?-
``````

I recommend reading Basic Concepts in The Power of Prolog to get a grasp of Prolog syntax if you're not familiar with it.

## Starting a Web Prolog node

Installing the Web Prolog libraries involves cloning the github repository and loading the prolog libraries held within. The system includes a web based tutorial and IDE where each connected web page is a web prolog node that can send and receive messages to other web prolog nodes. I'll beiefly mention this later but won't cover it in detail in this post - for now some simple examples will just use the SWI Prolog REPL. We'll need two nodes running, which can be done by running on different machines. On each machine run:

``````\$ git clone https://github.com/Web-Prolog/swi-web-prolog/
\$ cd swi-web-prolog
\$ swipl
...
?- [web_prolog].
...
``````

The `[web_prolog].` command at the prolog prompt loads the web prolog libraries. Once loaded we need to instantiate a node, which starts the socket server and machinery that handles messaging:

``````?- node.
% Started server at http://localhost:3060/
true.
``````

This starts a node on the default port `3060`. If you're running multiple nodes on the same machine you can change the port by passing an argument to `node`:

``````?- node(localhost:3061).
% Started server at http://localhost:3061/
true.
``````

In the rest of this post I refer to node `A` and node `B` for the two nodes started above.

## Basic example

Each process has an id that can be used to reference it. It is obtained via `self`:

``````?- self(Self).
``````

We can send a message using '`!`'. Like Erlang, it takes a process to send to and the data to send:

``````?- self(Self),
Self ! 'Hello World'.
``````

This posts 'Hello World' to the current process mailbox. Messages can be received using `receive`. This takes a series of patterns to match against and code to run if an item in the process mailbox matches that pattern. Like Erlang it is a selective receive - it will look through messages in the mailbox that match the pattern, not just the topmost message:

``````?- receive({ M -> writeln(M) }).
Hello World
M = 'Hello World'.
``````

Notice the `receive` rules are enclosed in '`{ ... }`' and are comprised of a pattern to match against and the code to run separated by `->`. Variables in the pattern are assigned the value of what they match. In this case `M` is set to `Hello World` as that's the message we sent earlier. More complex patterns are possible:

``````?- \$Self ! foo(1, 2, bar('hi', [ 'a', 'b', 'c' ])).

?- receive({ foo(A, B, bar(C, [D | E])) -> true }).
A = 1,
B = 2,
C = hi,
D = a,
E = [b, c].
``````

Here I make use of a feature of the SWI Prolog REPL - prefixing a variable with `\$` will use the value that was assigned to that variable in a previous REPL command. This also shows matching on the head and tail of a list with the `[Head | Tail]` syntax.

To send to another process we just need to use the process identifier. In this case I can send a message from node A to node B:

``````# In Node A
?- self(Self).

# This call blocks
?- receive({ M -> format('I got: ~s~n', [M])}).

# In Node B
true.

# Node A unblocks
I got: Hi World
M = 'Hi World'.
``````

## Spawning processes

Use `spawn` to spawn a new process the runs concurrencly on a node. It will have its own process Id:

``````?- spawn((self(Self),
format('My process id is ~w~n', [Self]))).
true.
My process id is 21421552@http://localhost:3060
``````

The code to run in the process is passed as the first argument to `spawn` and must be between brackets. An optional second argument will provide the process id to the caller:

``````?- spawn((self(Self),
format('My process id is ~w~n', [Self])),
Pid).
My process id is 33869438@http://localhost:3060
Pid = '33869438'.
``````

A third argument can be provided to pass options to control spawning. These include the ability to link processes, monitor them or register a name to identify it in a registry.

Using the REPL we can define new rules and run them in processes, either by consulting a file, or entering them via `[user]`. The following code will define the code for a 'ping' server process:

``````server :-
ping(Pid) -> Pid ! pong, server
}).
``````

This can be added to a file and loaded in the REPL with `consult`, or it can be entered directly using `[user]` (note the use of Ctrl+D to exit the user input):

``````# on Node A
?- [user].
|: server :-
|:     ping(Pid) -> Pid ! pong, server
|:   }).
|: ^D
% user://1 compiled 0.01 sec, 1 clauses
true.
``````

`server` blocks receiving messages looking for a `ping(Pid)` message. It sends a `pong` message back to the process id it extracted from the message and calls itself recursively using a tail call. Run on the node with:

``````# on node A
?- spawn((server), Pid).
Pid = '18268992'.
``````

Send a message from another node by referencing the Pid along with the nodes address:

``````# on node B
?- self(Self), 18268992@'http://localhost:3060' ! ping(Self).

?- flush.
% Got pong
true.
``````

The `flush` call will remove all messages from the process' queue and display them. In this case, the `pong` reply from the other nodes server.

## Spawning on remote nodes

Web Prolog provides the ability to spawn processes to run on remote nodes. This is done by passing a `node` option to `spawn`. To demonstrate, enter the following to create an `add` rule in node A:

``````# on node A
?- [user].
|: add(X,Y,Z) :- Z is X + Y.
|: ^D
% user://1 compiled 0.01 sec, 1 clauses
true.
``````

This creates an 'add' predicate that works as follows:

``````# on node A
Z = 3.
``````

We can call this predicate from node B by spawning a remote process that executes code in node A. In node B, it looks like this:

``````# on node B
?- self(Self),
spawn((
Self ! Z
), Pid, [
node('http://localhost:3060'),
monitor(true)
]),
receive({ M -> format('Result is ~w~n', [M])})
Result is 30
Pid = '24931627'@'http://localhost:3060',
M = 30.
``````

Passing `node` as an option to `spawn/3` results in a process being spawned on that referenced node and the code runs there. In the code we call the `add` predicate that only exists on node A and return it by sending it to node B's identifier. The `monitor` option provides status information about the remote process - including getting a message when the process exits. Calling `flush` on node B shows that the process on node A exits:

``````# on Node B
?- flush.
% Got down('24931627'@'http://localhost:3060',exit)
true.
``````

Note the indirection in the `spawn` call where `add` isn't called directly, instead `call` is used to call it. For some reason Web Prolog requires `add` to exist on node B even though it is not called. The following gives an error on node B for example:

``````# on node B
?- self(Self),
spawn((
Self ! Z
), Pid, [
node('http://localhost:3060'),
monitor(true)
]),
receive({ M -> format('Result is ~w~n', [M])})
ERROR: Undefined procedure: add/3 (DWIM could not correct goal)
``````

I'm unsure if this is an error in web prolog or something I'm doing.

## RPC calls

The previous example can be simplified using RPC functions provided by web prolog. A synchronous RPC call from node B can be done that is aproximately equivalent to the above:

``````?- rpc('http://localhost:3060', add(1,2,Z)).
Z = 3.
``````

Or ansynchronously using promises:

``````?- promise('http://localhost:3060', add(1,2,Z), Ref).
Ref = '119435902516515459454946972679206500389'.

Ref = '119435902516515459454946972679206500389'.
``````

The following example, based on one I did for Wasp Lisp, is a process that maintains a count that can be incremented, decremented or retrieved:

``````server(Count) :-
inc -> Count2 is Count + 1, server(Count2);
dec -> Count2 is Count - 1, server(Count2);
get(Pid) -> Pid ! value(Count), server(Count)
}).
``````

Once defined, it can be spawned in node A with:

``````?- spawn((server(0)), Pid).
Pid = '33403644'.
``````

And called from Node B with code like:

``````?- self(Self), 33403644@'http://localhost:3060' ! get(Self).

?- flush.
% Got value(0)
true.

?- 33403644@'http://localhost:3060' ! inc.
true.

?- self(Self), 33403644@'http://localhost:3060' ! get(Self).

?- flush.
% Got value(1)
true.
``````

We can modify the server so it accepts an `upgrade` message that contains the new code that the server will execute:

``````server(Count) :-
inc -> Count2 is Count + 1, server(Count2);
dec -> Count2 is Count - 1, server(Count2);
get(Pid) -> Pid ! value(Count), server(Count);
}).
``````

Run the server and send it some messages. Then define a `new_server` on node A's REPL:

``````new_server(Count) :-
inc -> Count2 is Count + 1, new_server(Count2);
dec -> Count2 is Count - 1, new_server(Count2);
mul(X) -> Count2 is Count * X, new_server(Count2);
get(Pid) -> Pid ! value(Count), new_server(Count);
upgrade(Pred) -> writeln(Pred), writeln(Count), call(Pred, Count)
}).
``````

And send an `upgrade` message to the existing server:

``````# on node A
``````

Now the server understands the `mul` message without having to be shut down and restarted.

## Web IDE and tutorial

There's a web based IDE and tutorial based on SWISH, the online SWI-Prolog system. To start this on a local node, run the following script from the web prolog source:

``````\$ cd web-client
\$ swipl run.pl
``````

Now visit `http://localhost:3060/apps/swish/index.html` in a web browser. This displays a tutorial on the left and a REPL on the right. Each loaded web page has its own namespace and local environment. You can send messages to and from individual web pages and the processes running in the local node. It uses websockets to send data to and from instances.

## Conclusion

There's a lot to the Web Prolog system and the implementation for SWI Prolog has some rough edges but it's usable to experiment with and get a feel for the system. I hope to write a more about some of the features of it, and SWI Prolog, as I get more familiar with it.

Tags: prolog

2018-01-10

## Capturing program invariants in ATS

I've been reading the book Building High Integrity Applications with Spark to learn more about the SPARK/Ada language for formal verification. It's a great book that goes through lots of examples of how to do proofs in Spark. I wrote a post on Spark/Ada earlier this year and tried some of the examples in the GNAT edition.

While working through the book I oftened compared how I'd implement similar examples in ATS. I'll go through one small example in this post and show how the dependent types of ATS allow capturing the invariants of a function and check them at type checking time to prove the absence of a particular class of runtime errors.

The example I'm looking at from the book is from the `Loop Invariants` section, which aims to prove things about iterative loops in Ada. The procedure `Copy_Into` copies characters from a `String` into a `Buffer` object that has a maximum capacity. If the source string is too short then the buffer is padded with spaces. The Spark/Ada code from the book is:

``````procedure Copy_Into(Buffer: out Buffer_Type; Source: in String) is
Characters_To_Copy : Buffer_Count_type := Maximum_Buffer_Size;
begin
Buffer := (others => ' ');
if Source'Length < Characters_To_Copy then
Characters_To_Copy := Source`Length;
end if;
for Index in Buffer_Count_Type range 1 .. Characters_To_Copy loop
pragma Loop_Invariant (Characters_To_Copy <= Source'Length and
Characters_To_Copy = Characters_To_Copy'Loop_Entry);
Buffer(Index) := Source(Source'First + (Index - 1));
end loop;
end Copy_Into;
``````

This is quite readable for anyone familiar with imperative languages. The number of characters to copy is initially set to the maximum buffer size, then changed to the length of the string if it is less than that. The buffer is set to contain the initial characters ' '. The buffer is 1-indexed and during the loop the characters from the string are copied to it.

The `Loop_Invariant` pragma is a Spark statement that tells the checker that certain invariants should be true. The invariant used here is that the number of characters to copy is always less than or equal to the length of the string and does not change within the loop. Given this loop invariant Spark can assert that the indexing into the `Source` string cannot exceed the bounds of the string. Spark is able to reason itself that `Buffer` does not get indexed out of bounds as the loop ranges up to `Characters_To_Copy` which is capped at `Maximum_Buffer_Size`. I'm not sure why this doesn't need a loop invariant but the book notes that the Spark tools improve over time at what invariants they can compute automatically. As an example, the second invariant check above for `Characters_To_Copy'Loop_Entry` isn't needed for newer versions of Spark but is kept to enable checking on older toolchains.

For ATS I modeled the `Buffer` type as an array of characters:

``````stadef s_max_buffer_size: int = 128
typedef Buffer0 = @[char?][s_max_buffer_size]
typedef Buffer1 = @[char][s_max_buffer_size]
val max_buffer_size: size_t s_max_buffer_size = i2sz(128)
``````

In ATS, `@[char][128]` represents an array of contiguous 'char' types in memory, where those char's are initialized to value. The `@[char?][128]` represents an array of unitialized char's. It would be a type error to use an element of that array before initializing it. The '0' and '1' suffixes to the 'Buffer' name is an ATS idiom for uninitialized and initialized data respectively.

The `stadef` keyword creates a constant in the 'statics' system of ATS. That is a constant in the language of types, vs a constant in the language of values where runtime level programming is done. For this post I have prefixed variables in the 'statics' system with `s_` to make it clearer where they can be used.

The `max_buffer_size` creates a constant in the 'dynamics' part of ATS, in the language of values, of type "`size_t max_buffer_size`". `size_t` is the type for indexes into arrays and `size_t 128` is a dependent type representing a type where only a single value matches the type - `128` in this instance. The `i2sz` function converts an integer to a `size_t`. So in this case we've created a `max_buffer_size` constant that can only ever be the same value as the `s_max_buffer_size` type level constant.

The ATS equivalent of `copy_into` looks like:

``````fun copy_into {n:nat}
(buffer: &Buffer0 >> Buffer1, source: string n): void = let
val () = array_initize_elt(buffer, max_buffer_size, ' ')
val len = string1_length(source)
val tocopy: size_t s_tocopy = min(max_buffer_size, len)
fun loop {m:nat | m < s_tocopy } .<s_tocopy - m>.
(buffer: &Buffer1, m: size_t m): void = let
val () = buffer[m] := source[m]
in
if m < tocopy - 1 then loop(buffer, m+1)
end
in
if len > 0 then
loop(buffer, i2sz(0))
end
``````

Starting with the function declaration:

``````fun copy_into {n:nat}
(buffer: &Buffer0 >> Buffer1, source: string n): void
``````

The items between the curly brackets, `{...}`, are universal variables used for setting constraints on the function arguments. Between the round brackets, `(...)`, are the function arguments. The return type is `void`. The function takes two arguments, `buffer` and `source`.

The `buffer` is a reference type, represented by the `&` in the type. This is similar to a reference argument in C++ or a pointer argument in C. It allows modification of the argument by the function. The `Buffer0 >> Buffer1` means on entry to the function the type is `Buffer0`, the uninitialized `char` array described earlier, and on exit it will be a `Buffer1`, an initialized char array. if the body of the function fails to initialize the elements of the array it will be a type error.

The `source` argument is a `string n`, where `n` is a type index for the length of the string. By using a dependently typed string here we can use it later in the function body to set some invariants.

In the body of the function the call to `array_initize_elt` sets each element of the array to the space character. This also has the effect of changing the type stored in the array from `char?` to `char`, and the array is now a valid `Buffer1`. This matches the type change specified in the function declaration for the `buffer` argument.

The next three lines compute the number of characters to copy:

``````val len = string1_length(source)
val tocopy: size_t s_tocopy = min(max_buffer_size, len)
``````

The length of the string is obtained. The `string1_` prefix to function names is an idiom to show that they operate on dependently typed strings. `string1_length` will return the string length. It returns a dependently typed `size_t x`, where `x` is the length of the string. This means that calling `string_length` not only returns the length of the string but it sets a constraint such that the type index of the result is equal to the type index of the string passed in. Here is the declaration of `string1_length` from the prelude (with some additional annotations removed):

``````fun string1_length {n:int} (x: string n): size_t n
``````

The reuse of the `n` in the argument and result type is what tells the compiler to set the constraint when it is called. This is important for the next two lines. These perform the same operation, once in the statics and once in the dynamics. That is, once at the type level and once at the runtime level. The `stadef` sets a type level constant called `s_tocopy` to be constrained to be the same as the minimum of the max buffer size and the type level string length, `n`. The `val` line sets the runtime variable to the same calculation but using the runtime length. These two lengths must much as a result of the call to `string1_length` described earlier. The type of `tocopy` ensures this by saying that it is the singleton type that can only be fulfilled by the value represented by `s_tocopy`.

ATS has looping constructs but it's idiomatic to use tail recursion instead of loops. It is also easier to create types for tail recursive functions than to type iterative loops in ATS. For this reason I've converted the loop to a tail recursive function. When ATS compiles to C it converts tail recursive functions to a C loop so there is no danger of stack overflow. The looping function without dependent type annotations would be:

``````fun loop (buffer: &Buffer1, m: size_t): void = let
val () = buffer[m] := source[m]
in
if m < tocopy - 1 then loop(buffer, m+1)
end
``````

It takes a `buffer` and an index, `m`, as arguments and assigns to `buffer` at that index the equivalent entry from the `source` string. If it hasn't yet copied the number of characters needed to copy it tail recursively calls itself, increasing the index. The initial call is:

``````loop(buffer, i2sz(0))
``````

`i2sz` is needed to convert the integer number zero from an `int` type to a `size_t` type.

The declaration for the `loop` function contains the dependent type definitions that give invariants similar to the Spark example:

``````fun loop {m:nat | m < s_tocopy } .<s_tocopy - m>.
(buffer: &Buffer1, m: size_t m): void = let
``````

Starting with the arguments, the `buffer` is a `Buffer` meaning it must be an initialized array. The index, `m` is a dependently typed `size_t m` where `m` at the type level is constrained to be less than the number of characters to copy. This ensures that the indexing into the buffer and source string is never out of bounds due to the previous statements that set `s_tocopy` to be the lesser of the maximum buffer size and the string size. `s_tocopy` is needed here instead of `tocopy` due to universal arguments being written in the language of the typesystem, not the runtime system.

The body of the loop does the copying of the indexed character from the source string to the buffer and recursively calls itself with an incremented index if it hasn't copied the required number of characters.

``````  val () = buffer[m] := source[m]
in
if m < tocopy - 1 then loop(buffer, m+1)
``````

Due to the constraints set in the function declaration the buffer and source string indexes are checked at compile time to ensure they aren't out of bounds, and the recursive call to loop will fail typechecking if an out of bounds index is passed as an argument.

One issue with recursive functions is that they could loop forever if the termination condition check is incorrect, or the variables being used in that check don't explicitly increase or decrease. In this case the index must increase to eventually reach the value for the termination check.

ATS resolves this by allowing termination metrics to be added to the function. This is the part of the function declaration that is bracketed by `.< ... >.` markers. The expression inside these markers is expected to be in the 'statics' constraint language and evaluate to a tuple of natural numbers that the compiler needs to prove are lexicographically decreasing in value. The `loop` index counts up so this needs to be converted to a decreasing value:

``````.<s_tocopy - m>.
``````

This is done by taking the static constant of the number of characters to copy and subtracting the index value. The compiler proves that each recursive call in the body of the function results in something strictly less than the previous call. With the termination metric added it statically proves that the recursive function terminates.

The program can be run and tested with:

``````#include "share/atspre_define.hats"

typedef Buffer0 = @[char?][s_max_buffer_size]
typedef Buffer1 = @[char][s_max_buffer_size]
val max_buffer_size: size_t s_max_buffer_size = i2sz(128)

fun copy_into {n:nat}
(buffer: &Buffer0 >> Buffer1, source: string n): void = let
val () = array_initize_elt(buffer, max_buffer_size, ' ')
val len = string1_length(source)
val tocopy: size_t s_tocopy = min(max_buffer_size, len)
fun loop {m:nat | m < s_tocopy } .<s_tocopy - m>.
(buffer: &Buffer1, m: size_t m): void = let
val () = buffer[m] := source[m]
in
if m < tocopy - 1  then loop(buffer, m+1)
end
in
if len > 0 then
loop(buffer, i2sz(0))
end

implement main0() = let
var  b = @[char?][max_buffer_size]()
val () = copy_into(b, "hello world")
fun print_buffer {n:nat | n < s_max_buffer_size}
.<s_max_buffer_size - n>.
(buffer: &Buffer1, n: size_t n):void = let
val () = if n = 0 then print!("[")
val () = print!(buffer[n])
val () = if n = max_buffer_size - 1 then print!("]")
in
if n < 127 then print_buffer(buffer, n + 1)
end
val () = print_buffer(b, i2sz(0))
in
()
end
``````

This example creates the array as a stack allocated object in the line:

``````var  b = @[char?][max_buffer_size]()
``````

Being stack allocated the memory will be deallocated on exit of the scope it was defined in. It's also possible to create a heap allocated buffer and pass that to `copy_into`:

``````implement main0() = let
val (pf_b , pf_gc| b) = array_ptr_alloc<char>(max_buffer_size)
val () = copy_into(!b, "hello world")
fun print_buffer {n:nat | n < 128} .<128 - n>. (buffer: &(@[char][128]), n: int n):void = let
val () = if n = 0 then print!("[")
val () = print!(buffer[n])
val () = if n = 127 then print!("]")
in
if n < 127 then print_buffer(buffer, n + 1)
end
val () = print_buffer(!b, 0)
val () = array_ptr_free(pf_b, pf_gc | b)
in
()
end
``````

This explicitly allocates the memory for the array using `array_ptr_alloc` and dereferences it in the `copy_into` call using the '`!b`' syntax. Note that this array is later free'd using `array_ptr_free`. Not calling that to free the memory would be a compile error.

Although this is a simple function it demonstrates a number of safety features:

• The destination buffer cannot indexed beyond its bounds.
• The source string cannot be indexed beyond its bounds.
• There can be no buffer overflow on writing to the destination buffer.
• There can be no uninitialized data in the buffer on function exit.
• The internal recursive call must terminate.

The book Building High Integrity Applications with Spark is a great book for learning Spark and also for learning about the sorts of proofs used in production applications using Spark/Ada. Apart from learning a bit about Spark it's also a good exercise to think about how similar safety checks can be implemented in your language of choice.

Tags: ats

2018-01-03

## Writing basic proofs in ATS

This post covers using the latest version of ATS, ATS 2, and goes through proving some basic algorithms. I've written a couple of posts before on using proofs in ATS 1:

Writing proofs in ATS is complicated by the fact that the dependent types and proof component of the language does not use the full ATS language. It uses a restricted subset of the language. When implementing a proof for something like the factorial function you can't write factorial in the type system in the same way as you write it in the runtime system. Factorial in the latter is written using a function but in the former it needs to be encoded as relations in dataprop or dataview definitions.

Recent additions to ATS 2 have simplified the task of writing some proofs by enabling the constraint checking of ATS to be done by an external SMT solver rather than the built in solver. External solvers like Z3 have more features than the builtin solver and can solve constraints that the ATS solver cannot. For example, non-linear constraints are not solveable by ATS directly but are by using Z3 as the solver.

In this post I'll start by describing how to write proofs using quantified constraints. This is the easiest proof writing method in ATS but requires Z3 as the solver. Next I'll continue with Z3 as the solver but describe how to write proofs using Z3 without quantified constraints. Finally I'll go through writing the proofs by encoding the algorithm as relations in dataprop. This approach progressively goes from an easy, less intrusive to the code method, to more the more difficult system requiring threading proofs throughout the code.

## Installing Z3

Z3 is the external solver used in the examples in this post. To install from the Z3 github on Linux:

``````\$ git clone https://github.com/Z3Prover/z3
\$ cd z3
\$ mkdir build
\$ cd build
\$ cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release ../
\$ make && sudo make install
``````

## Installing ATS

I used ATS2-0.3.8 for the examples in this post. There are various scripts for installing but to do install manually from the git repository on an Ubuntu based system:

``````\$ sudo apt-get install build-essential libgmp-dev libgc-dev libjson-c-dev
\$ git clone git://git.code.sf.net/p/ats2-lang/code ATS2
\$ git clone https://github.com/githwxi/ATS-Postiats-contrib.git ATS2-contrib
\$ export PATSHOME=`pwd`/ATS2
\$ export PATSCONTRIB=`pwd`/ATS2-contrib
\$ export PATH=\$PATSHOME/bin:\$PATH
\$ (cd ATS2 && ./configure && make all)
\$ (cd ATS2/contrib/ATS-extsolve && make DATS_C)
\$ (cd ATS2/contrib/ATS-extsolve-z3 && make all && mv -f patsolve_z3 \$PATSHOME/bin)
\$ (cd ATS2/contrib/ATS-extsolve-smt2 && make all && mv -f patsolve_smt2 \$PATSHOME/bin)
``````

Optionally you can install the various ATS backends for generating code in other languages:

``````\$ (cd ATS2/contrib/CATS-parsemit && make all)
\$ (cd ATS2/contrib/CATS-atscc2js && make all && mv -f bin/atscc2js \$PATSHOME/bin)
\$ (cd ATS2/contrib/CATS-atscc2php && make all && mv -f bin/atscc2php \$PATSHOME/bin)
\$ (cd ATS2/contrib/CATS-atscc2py3 && make all && mv -f bin/atscc2py3 \$PATSHOME/bin)
``````

Add `PATSHOME`, `PATSCONTRIB` and the change to `PATH` to `.bash-rc`, `.profile`, or some other system file to have these environment variables populated when starting a new shell.

## Dependent Types

A function to add numbers in ATS can be proven correct using dependent types by specifying the expected result using dependently typed integers:

``````fun add_int {m,n:int} (a: int m, b: int n): int (m + n) = a + b
``````

This won't compile if the implementation does anything but result in the addition of the two integers. The constraint language used in dependent types is a restricted subset of the ATS language. I wrote a bit about this in my post on dependent types in ATS.

The following is an implementation of the factorial function without proofs:

``````#include "share/atspre_define.hats"

fun fact (n: int): int =
if n > 0 then n * fact(n- 1) else 1

implement main0() = let
val r = fact(5)
in
println!("5! = ", r)
end
``````

Compile and run with something like:

``````\$ patscc -o f0 f0.dats
\$ ./f0
5! = 120
``````

To prove that the implementation of factorial is correct we need to specify what factorial is in the constraint language of ATS. Ideally we'd like to write something like the following:

``````fun fact {n: nat} (n: int n): int (fact(n)) = ...
``````

This would check that the body of the function implements something that matches the result of `fact`. Unfortunately the restricted constraint language of ATS doesn't allow implementing or using arbitary functions in the type definition.

## Using Z3 as an external solver

By default ATS solves constraints using its built in constraint solver. It has a mechanism for using an external solver, in this case Z3. To type check the previous factorial example using Z3 use the following commands:

``````\$ patsopt -tc --constraint-export -d f0.dats |patsolve_z3 -i
Hello from [patsolve_z3]!
typechecking is finished successfully!
``````

Note the change to use `patsopt` instead of `patscc`. The `-tc` flag does the type checking phase only. `--constraint-export` results in the constraints to be exported to `stdout` which is piped to `patsolve_z3`. That command invokes Z3 and checks the constraint results.

Since the resulting program may contain code that ATS can't typecheck itself, to actually build the final executable we invoke `patscc` with a command line option to disable type checking. It's important that the checking has succeeded through the external solver before doing this!

``````\$ patscc --constraint-ignore -o f0 f0.dats
``````

## Z3 and quantified constraints

Using Z3 with quantified constraints is a new feature of ATS and quite experimental. Hongwei notes some issues due to different versions of Z3 that can cause problems. It is however an interesting approach to proofs with ATS so I include the process of using it here and hope it becomes more stable as it progresses.

ATS provides the `stacst` keyword to introduce a constant into the 'statics' part of the type system. The 'statics' is the restricted constraint language used when specifying types. There are some examples of `stacst` usage in the prelude file basics_pre.sats.

Using `stacst` we can introduce a function in the statics system for factorial:

``````stacst fact: int -> int
``````

Now the following code is valid:

``````fun fact {n:nat} (n: int n): int (fact(n)) = ...
``````

ATS doesn't know about `fact` in the statics system, it only knows it's a function that takes an `int` and returns an `int`. With Z3 as the external solver we can extend ATS' constraint knowledge by adding assertions to the Z3 solver engine using `\$static_assert`:

``````praxi fact_base(): [fact(0) == 1] void
praxi fact_ind {n:int | n >= 1} (): [fact(n) == n * fact(n-1)] void
``````

The keyword `praxi` is used for defining axioms, whereas `prfun` is used for proof functions that need an implementation. This is currently not checked by the compiler but may be at some future time. From a documentation perspective using `praxi` shows no plan to actually prove the axiom.

The two axioms here will add to the proof store the facts about the factorial function. The first, `fact_base`, asserts that the factorial of zero is one. The second asserts that for all `n`, where `n` is greater than or equal to `1`, that the factorial of `n` is equivalent to `n * fact (n - 1)`.

To add these facts to Z3's knowledge, use `\$solver_assert`:

``````fun fact {n:int | n >= 0} (n: int n): int (fact(n)) = let
prval () = \$solver_assert(fact_base)
prval () = \$solver_assert(fact_ind)
in
if n = 0 then 1 else n * fact(n - 1)
end
``````

This typechecks successfully. Changing the implementation to be incorrect results in a failed typecheck. Unfortunately, as is often the case with experimental code, sometimes an incorrect implementation will hang Z3 causing it to consume large amounts of memory as noted earlier.

The implementation of `fact` here closely mirrors the specification. The following is a tail recursive implementation of `fact` that is also proved correct to the specification:

``````#include "share/atspre_define.hats"

stacst fact: int -> int

extern praxi fact_base(): [fact(0) == 1] unit_p
extern praxi fact_ind{n:pos}(): [fact(n)==n*fact(n-1)] unit_p

fun fact {n:nat} (n: int n): int (fact(n)) = let
prval () = \$solver_assert(fact_base)
prval () = \$solver_assert(fact_ind)

fun loop {n,r:nat} (n: int n, r: int r): int (fact(n) * r) =
if n > 0 then loop (n - 1, n * r) else r
in
loop (n, 1)
end

implement main0() = let
val r = fact(5)
in
println!("5! = ", r)
end
``````

This was tested and built with:

``````\$ patsopt -tc --constraint-export -d f4.dats |patsolve_z3 -i
Hello from [patsolve_z3]!
typechecking is finished successfully!
\$ patscc --constraint-ignore -o f4 f4.dats
./f4
5! = 120
``````

Another approach to using the external solver is not to add constraints to the Z3 store via `\$solver_assert` but instead call the axioms explicitly as proof functions threaded in the body of the function. This is `fact` implemented in such a way:

``````stacst fact: int -> int

extern praxi fact_base(): [fact(0) == 1] void
extern praxi fact_ind {n:int | n >= 1} (): [fact(n) == n * fact(n-1)] void

fun fact {n:nat} (n: int n): int (fact(n)) =
if n > 0 then let
prval () = fact_ind {n} ()
in
n * fact(n - 1)
end
else let
prval () = fact_base()
in
1
end
``````

The code is more verbose due to needing to embed the `prval` statements in a `let` block but it doesn't suffer the Z3 incompatibility that the quantified constraint example did. An incorrect implementation will give an error from Z3.

The equivalent tail recursive version is:

``````fun fact {n:nat} (n: int n): int (fact(n)) = let
fun loop {n,r:nat} (n: int n, r: int r): int (fact(n) * r) =
if n > 0 then let
prval () = fact_ind {n} ()
in
loop (n - 1, n * r)
end
else let
prval () = fact_base()
in
r + 1
end
in
loop (n, 1)
end
``````

## Dataprops and Datatypes

It's still possible to write a verified version of factorial without using an external solver but the syntactic overhead is higher. The specification of `fact` needs to be implemented as a relation using `dataprop`. A `dataprop` introduces a type for the proof system in much the same way as declaring a datatype in the runtime system of ATS. Proof objects constructed from this type exist only at proof checking time and are erased at runtime. They can be taken as proof arguments in functions or included in return values. Proof functions can also be written to use them. In the words of Hongwei Xi:

A prop is like a type; a value classified by a prop is often called a proof, which is dynamic but erased by the compiler. So while proofs are dynamic, there is no proof construction at run-time.

For a comparison of the syntax of `dataprop` and `datatype`, here is a type for "list of integers" implement as both:

``````dataprop prop_list =
| prop_nil
| prop_cons of (int, prop_list)

datatype list =
| list_nil
| list_cons of (int, list)
``````

To specifiy `fact` as a relation it is useful to see how it is implemented in a logic based progamming language like Prolog:

``````fact(0, 1).
fact(N, R) :-
N > 0,
N1 is N - 1,
fact(N1, R1),
R is R1 * N.
``````

The equivalent as a `dataprop` is:

``````dataprop FACT (int, int) =
| FACTzero (0, 1)
| {n,r1:int | n > 0} FACTsucc (n, r1 * n) of (FACT (n-1, r1))
``````

`FACT(n,r)` encodes the relation that `fact(n) = r` where fact is defined as:

• fact(0) = 1, and
• fact(n) where n > 0 = n * fact(n - 1)

The `dataprop` creates a `FACT(int, int)` prop with two constructors:

• `FACTzero()` which encodes the relation that `fact(0) = 1`, and
• `FACTsucc(FACT(n-1, r1))` which encodes the relation that `fact(n) = n * fact(n-1)`

The declaration of the `fact` function uses this prop as a proof return value to enforce that the result must match this relation:

``````fun fact {n:nat} (n: int n): [r:int] (FACT (n, r) | int r) = ...
``````

The return value there is a tuple, where the first element is a proof value (the left of the pipe symbol) and the second element is the factorial result. The existential variable `[r:int]` is used to associate the returned value with the result of the `FACT` relation and the universal variable, `{n:nat}` is used to provide the first argument of the fact prop. Through unification the compiler checks that the relationships between the variables match.

The implementation of the function is:

``````fun fact {n:nat} (n: int n): [r:int] (FACT (n, r) | int r) =
if n > 0 then let
val (pf1 | r1) = fact (n - 1)
val r = n * r1
in
(FACTsucc (pf1) | r)
end else begin
(FACTzero () | 1)
end
``````

Note that the result of the recursive call to `fact` deconstructs the proof result into `pf1` and the value into `r1` and that proof is used in the result of the `FACTsucc` constructor. Proofs are constructed like datatypes. For example:

• `FACTsucc(FACTzero())` is `fact(1)` (the successor, or next factorial, from `fact(0)`.
• `FACTsucc(FACTsucc(FACTzero()))` is `fact(2)`, etc.

In this way it can be seen that a call to `fact(1)` will hit the first branch of the `if` condition which recursively calls `fact(0)`. That hits the second branch of the conditional to return the prop `FACTzero()`. On return back to the caller this is returned as `FACTsucc(FACTzero())`, giving our prop return type as `FACT(1, 1)`. Plugging these values into the `n` and `r` of the return type for the `int r` value means that value the function returns must be `1` for the function to pass type checking.

Although these proofs have syntactic overhead, the runtime overhead is nil. Proofs are erased after typechecking and only the factorial computation code remains. The full compilable program is:

``````#include "share/atspre_define.hats"

dataprop FACT (int, int) =
| FACTzero (0, 1)
| {n,r1:int | n > 0} FACTsucc (n, r1 * n) of (FACT (n-1, r1))

fun fact {n:nat} (n: int n): [r:int] (FACT (n, r) | int r) =
if n > 0 then let
val (pf1 | r1) = fact (n - 1)
val r = n * r1
in
(FACTsucc (pf1) | r)
end else begin
(FACTzero () | 1)
end

implement main0() = let
val (pf | r) = fact(5)
in
println!("5! = ", r)
end
``````

To compile and run:

``````\$ patscc -o f8 f8.dats
\$ ./f8
5! = 120
``````

A tail recursive implementation of `fact` using dataprop that type checks is:

``````fun fact {n:nat} (n: int n): [r:int] (FACT (n, r) | int r) = let
fun loop {r:int}{i:nat | i <= n}
(pf: FACT(i, r) | n: int n, r: int r, i: int i):
[r1:int] (FACT(n, r1) | int (r1)) =
if n - i > 0 then
loop (FACTsucc(pf) | n, (i + 1) * r, i + 1)
else (pf | r)
in
loop (FACTzero() | n, 1, 0)
end
``````

Because our dataprop is defined recursively based on successors to a previous factorial, and we can't get a predecessor, the tail recursive loop is structured to count up. This means we have to pass in the previous factorial prop as a proof argument, in `pf`, and pass an index, `i`, to be the current factorial being computed.

## Fibonacci

Fibonacci is similar to factorial in the way the proofs are constructed. The quantified constraints versions is trivial:

``````stacst fib: int -> int

extern praxi fib0(): [fib(0) == 0] unit_p
extern praxi fib1(): [fib(1) == 1] unit_p
extern praxi fib2 {n:nat|n >= 2} (): [fib(n) == fib(n-1) + fib(n-2)] unit_p

fun fib {n:int | n >= 0} (n: int n): int (fib(n)) = let
prval () = \$solver_assert(fib0)
prval () = \$solver_assert(fib1)
prval () = \$solver_assert(fib2)
in
if n = 0 then 0
else if n = 1 then 1
else fib(n-1) + fib(n -2)
end
``````

The threaded version using the external solver isn't much more verbose:

``````stacst fib: int -> int

extern praxi fib0(): [fib(0) == 0] void
extern praxi fib1(): [fib(1) == 1] void
extern praxi fib2 {n:nat|n >= 2} (): [fib(n) == fib(n-1) + fib(n-2)] void

fun fib {n:int | n >= 0} (n: int n): int (fib(n)) =
if n = 0 then let
prval () = fib0()
in
0
end
else if n = 1 then let
prval () = fib1()
in
1
end
else let
prval () = fib2 {n} ()
in
fib(n-1) + fib(n -2)
end
``````

The dataprop version is quite readable too and has the advantage of not needing an external solver:

``````dataprop FIB (int, int) =
| FIB0 (0, 0)
| FIB1 (1, 1)
| {n:nat}{r0,r1:int} FIB2(n, r1+r0) of (FIB(n-1, r0), FIB(n-2, r1))

fun fib {n:nat} (n: int n): [r:int] (FIB (n, r) | int r) =
if n = 0 then (FIB0() | 0)
else if n = 1 then (FIB1() | 1)
else let
val (pf0 | r0) = fib(n-1)
val (pf1 | r1) = fib(n-2)
in
(FIB2(pf0, pf1) | r0 + r1)
end
``````

The dataprop maps nicely to the Prolog implementation of fibonacci:

``````fib(0,0).
fib(1,1).
fib(N,R) :- N1 is N-1, N2 is N-2, fib(N1,R1),fib(N2,R2),R is R1+R2.
``````

I leave proving different implementations of `fib` to the reader.

## Conclusion

There are more algorithms that could be demonstrated but this post is getting long. Some pointers to other interesting examples and articles on ATS and proofs:

In general proof code adds a syntactic overhead to the code. With more work on using external solvers it looks like things will become easier with automated solving. ATS allows the programmer to write normal code and add proofs as needed so the barrier to entry to applying proofs is quite low, given suitable documentation and examples.

Tags: ats  prolog

2018-01-02

## Casting in ATS

The ATS programming language has a powerful type and proof system to enable safe construction of programs. Sometimes there is a need to cast a value from one type to another. ATS provides a feature that enables the programming to write their own rules specifying what can be cast. This becomes important when converting types from non-dependent typed values to dependantly typed values and dealing with conversion of proofs. This post goes through some common examples.

A casting function is introduced using the keyword `castfn`. It only has a declaration - there is no implementation or body of the casting function. From a value perspective there is no change to the actual value being cast, only to the type. It is effectively the identity function except for the type change. This means the cast is only allowed for types that are the same size, typically a machine word. There is no runtime overhead for using a casting function.

## Standard Types

The following example ATS program attempts to add an integer to an unsigned integer value:

``````#include "share/atspre_define.hats"

implement main0() = let
val x: uint = 5u
val y: int = x + 5
in
println!("y=", y)
end
``````

This results in a type error:

``````\$ patscc -o c2 c2.dats
c2.dats: 133(line=6, offs=16) -- 138(line=6, offs=21):
error(3): the symbol [+] cannot be resolved as no match is found.
c2.dats: 124(line=6, offs=7) -- 130(line=6, offs=13):
error(3): the pattern cannot be given the ascribed type.
c2.dats: 124(line=6, offs=7) -- 130(line=6, offs=13):
error(3): mismatch of static terms (tyleq):
The actual term is: S2Eerrexp()
The needed term is: S2Eapp(S2Ecst(g0int_t0ype); S2Eextkind(atstype_int))
``````

The error is saying that the operator `+` has no matching implementation for the types provided. It is expecting an `int` but got something else.

The addition operator requires the two arguments to be of the same type. A casting function can be added to cast the unsigned integer to an integer:

``````#include "share/atspre_define.hats"

extern castfn int_of_uint(x: uint): int

implement main0() = let
val x: uint = 5u
val y: int = int_of_uint(x) + 5
in
println!("y=", y)
end
``````

This compiles and runs:

``````\$ patscc -o c3 c3.dats
\$ ./c3
y=10
``````

A `castfn` looks very much like a `fun` declaration. It requires the name of the casting function, the argument it expects and a return type. There is no implementation as mentioned earlier. The compiler will convert the type of the argument to that of the return type. Another example follows showing conversion to a floating point value:

``````#include "share/atspre_define.hats"

extern castfn double_of_uint(x: uint): double

implement main0() = let
val x: uint = 5u
val y: double = double_of_uint(x) + 5.0
in
println!("y=", y)
end
``````

The `castfn` can have any name but most of the ATS prelude follows a format of including the 'from' and 'to' types in the name.

## Dependent Types

Conversion between dependently typed values and non-dependently typed values are a common use of casting functions. For example:

``````val x: [n:int] size_t n = string1_length("hello")
val y: int = x + 1 // Error here, mismatch between dependent and non-dependent int
``````

The type of `x` is dependently typed in that it has a type index of sort `int` for the length of the string. It is also of type `size_t` but the snippet expects an `int`. A `castfn` allows casing away the dependent type index and converting the type at the same time:

`````` #include "share/atspre_define.hats"

extern castfn int0_of_size1 {n:int} (x: size_t n): int

implement main0() = let
val x: [n:int] size_t n = string1_length("hello")
val y: int = int0_of_size1(x) + 1
in
println!("y=", y)
end
``````

Note the use of the '0' and '1' suffixes used for the type names in the `castfn`. This is an idiom in ATS naming where a '0' means not dependently typed and '1' means it has a dependent type.

It's possible to go in the other direction, casting from a non-dependent typed value to a dependently type value. This is an example of code that won't compile and needs a cast:

``````val x: int = 5
val y: [n:int] size_t n = x
``````

This fails due to `x` and `y` being different types. Adding a cast function to convert the `int` to a `size_t` and including a type index will allow it to compile:

``````#include "share/atspre_define.hats"

extern castfn size1_of_int0 (x: int): [n:int] size_t n

implement main0() = let
val x: int = 5
val y: [n:int] size_t n = size1_of_int0(x)
in
println!("y=", y)
end
``````

Here the `castfn` adds an existential variable to the return type (the '`[n:int]`' part) to attach a type index of "unspecified integer value' to the type.

Another common use of casting functions is to convert different string types. A `string` is a non dependently typed string and a `string1` is a dependently typed string with the length of the string as a type index. For example, arguments on a command line are `string`, but if a `string` function is called on it then there is a type error:

``````implement main0(argc, argv) =
if argc > 1 then let
val s: string = argv[1]
val n = string1_length(s)
in
()
end
else
()
``````

Using a `castfn` can convert the `string` to a `string1`:

``````#include "share/atspre_define.hats"

extern castfn string1_of_string (s: string): [n:int] string n

implement main0(argc, argv) =
if argc > 1 then let
val s: string = argv[1]
val n = string1_length(string1_of_string(s))
in
()
end
else
()
``````

## Casting linear types

Types can be cast to and from linear types. When programming with strings in ATS you often deal with `string` which does not need to be manually free'd and `strptr` which is a linear type and needs to be manually free'd. Sometimes you want to treat a `strptr` as a `string` to call a `string` based function. An example is:

``````#include "share/atspre_define.hats"

extern fun strdup(s: string): [l:addr] strptr l = "ext#"
extern castfn strptr_to_string {l:addr} (s: !strptr l): string

implement main0() = let
val s: [l:addr] strptr l = strdup("hello world")
val n = string0_length(strptr_to_string(s))
val _ = strptr_free(s)
in
()
end
``````

This example needs to be compiled passing '`-DATS_MEMALLOC_LIBC`' to tell ATS to use the standard C malloc/free calls.

``````\$ patscc -DATS_MEMALLOC_LIBC -o c8 c8.dats
``````

For the purposes of this example I use a FFI function to call the C `strdup` function to copy a static string to a linear string that needs to be free'd. Note that the `castfn` does not consume this linear type (evidenced by the '`!`' prefixing the type name in the argument). This is dangerous because we can store the return value of the cast and use it after the free:

``````val oops = strptr_to_string(s)
val _ = strptr_free(s)
val _ = println!(oops)
``````

The danger can be restricted by using locally scoped cast functions:

``````val n = string0_length(strptr_to_string(s)) where {
extern castfn strptr_to_string {l:addr} (s: !strptr l): string
}
``````

Now `strptr_to_string` can only be used within that local block of code. Cast functions can be dangerous so need some care. There are various prelude casting functions for performing these types of conversions already.

## Prelude unsafe casts

The prelude has an 'unsafe' module with a casting polymorphic function which can be used instead of defining casting functions for trivial things. There are also functions for converting `strptr` to `string`, etc. The equivalent of the previous example would be:

``````#include "share/atspre_define.hats"

extern fun strdup(s: string): [l:agz] strptr l = "ext#"

implement main0() = let
val s: [l:agz] strptr l = strdup("hello world")
val n = string0_length(\$UN.strptr2string(s))
val _ = strptr_free(s)
in
()
end
``````

Basic type casting can be done using the `cast` function:

``````#include "share/atspre_define.hats"

implement main0() = let
val x: uint = 5u
val y: int = \$UN.cast{int}(x) + 5
in
println!("y=", y)
end
``````

The swiss army knife of unsafe casting functions are `castvwtp0` and `castvwtp1`. They can be used for casting linear types. The string example previously can use this:

``````val s: [l:agz] strptr l = strdup("hello world")
val n = string0_length(\$UN.castvwtp1{string}(s))
val _ = strptr_free(s)
``````

`castvwtp1` will not consume the input argument whereas `castvwtp0` will. Looking through the `prelude/SATS/unsafe.sats` file will give an idea of the range of available functions.

The examples in this post are somewhat contrived in that there are standard prelude functions for doing most everything already. The need to cast `string` to `string1` and `strptr` are reduced due to specific functions being available for those types. However it is useful to learn to use the `castfn` functionality and it is handy for one-off local casts, or defining protocols using proofs to ensure safe casting to and from types.

Tags: ats

This site is accessable over tor as hidden service mh7mkfvezts5j6yu.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-44/

Tags

Archives