Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2018-01-10

Capturing program invariants in ATS

I've been reading the book Building High Integrity Applications with Spark to learn more about the SPARK/Ada language for formal verification. It's a great book that goes through lots of examples of how to do proofs in Spark. I wrote a post on Spark/Ada earlier this year and tried some of the examples in the GNAT edition.

While working through the book I oftened compared how I'd implement similar examples in ATS. I'll go through one small example in this post and show how the dependent types of ATS allow capturing the invariants of a function and check them at type checking time to prove the absence of a particular class of runtime errors.

The example I'm looking at from the book is from the Loop Invariants section, which aims to prove things about iterative loops in Ada. The procedure Copy_Into copies characters from a String into a Buffer object that has a maximum capacity. If the source string is too short then the buffer is padded with spaces. The Spark/Ada code from the book is:

procedure Copy_Into(Buffer: out Buffer_Type; Source: in String) is
  Characters_To_Copy : Buffer_Count_type := Maximum_Buffer_Size;
begin
  Buffer := (others => ' ');
  if Source'Length < Characters_To_Copy then
    Characters_To_Copy := Source`Length;
  end if;
  for Index in Buffer_Count_Type range 1 .. Characters_To_Copy loop
    pragma Loop_Invariant (Characters_To_Copy <= Source'Length and
                           Characters_To_Copy = Characters_To_Copy'Loop_Entry);
    Buffer(Index) := Source(Source'First + (Index - 1));
  end loop;
end Copy_Into;

This is quite readable for anyone familiar with imperative languages. The number of characters to copy is initially set to the maximum buffer size, then changed to the length of the string if it is less than that. The buffer is set to contain the initial characters ' '. The buffer is 1-indexed and during the loop the characters from the string are copied to it.

The Loop_Invariant pragma is a Spark statement that tells the checker that certain invariants should be true. The invariant used here is that the number of characters to copy is always less than or equal to the length of the string and does not change within the loop. Given this loop invariant Spark can assert that the indexing into the Source string cannot exceed the bounds of the string. Spark is able to reason itself that Buffer does not get indexed out of bounds as the loop ranges up to Characters_To_Copy which is capped at Maximum_Buffer_Size. I'm not sure why this doesn't need a loop invariant but the book notes that the Spark tools improve over time at what invariants they can compute automatically. As an example, the second invariant check above for Characters_To_Copy'Loop_Entry isn't needed for newer versions of Spark but is kept to enable checking on older toolchains.

For ATS I modeled the Buffer type as an array of characters:

stadef s_max_buffer_size: int = 128
typedef Buffer0 = @[char?][s_max_buffer_size]
typedef Buffer1 = @[char][s_max_buffer_size]
val max_buffer_size: size_t s_max_buffer_size = i2sz(128)

In ATS, @[char][128] represents an array of contiguous 'char' types in memory, where those char's are initialized to value. The @[char?][128] represents an array of unitialized char's. It would be a type error to use an element of that array before initializing it. The '0' and '1' suffixes to the 'Buffer' name is an ATS idiom for uninitialized and initialized data respectively.

The stadef keyword creates a constant in the 'statics' system of ATS. That is a constant in the language of types, vs a constant in the language of values where runtime level programming is done. For this post I have prefixed variables in the 'statics' system with s_ to make it clearer where they can be used.

The max_buffer_size creates a constant in the 'dynamics' part of ATS, in the language of values, of type "size_t max_buffer_size". size_t is the type for indexes into arrays and size_t 128 is a dependent type representing a type where only a single value matches the type - 128 in this instance. The i2sz function converts an integer to a size_t. So in this case we've created a max_buffer_size constant that can only ever be the same value as the s_max_buffer_size type level constant.

The ATS equivalent of copy_into looks like:

fun copy_into {n:nat}
       (buffer: &Buffer0 >> Buffer1, source: string n): void = let
  val () = array_initize_elt(buffer, max_buffer_size, ' ')
  val len = string1_length(source)
  stadef s_tocopy = min(s_max_buffer_size, n)
  val tocopy: size_t s_tocopy = min(max_buffer_size, len)
  fun loop {m:nat | m < s_tocopy } .<s_tocopy - m>.
        (buffer: &Buffer1, m: size_t m): void = let
    val () = buffer[m] := source[m]
  in
    if m < tocopy - 1 then loop(buffer, m+1)
  end
in
  if len > 0 then
    loop(buffer, i2sz(0))
end 

Starting with the function declaration:

fun copy_into {n:nat}
       (buffer: &Buffer0 >> Buffer1, source: string n): void

The items between the curly brackets, {...}, are universal variables used for setting constraints on the function arguments. Between the round brackets, (...), are the function arguments. The return type is void. The function takes two arguments, buffer and source.

The buffer is a reference type, represented by the & in the type. This is similar to a reference argument in C++ or a pointer argument in C. It allows modification of the argument by the function. The Buffer0 >> Buffer1 means on entry to the function the type is Buffer0, the uninitialized char array described earlier, and on exit it will be a Buffer1, an initialized char array. if the body of the function fails to initialize the elements of the array it will be a type error.

The source argument is a string n, where n is a type index for the length of the string. By using a dependently typed string here we can use it later in the function body to set some invariants.

In the body of the function the call to array_initize_elt sets each element of the array to the space character. This also has the effect of changing the type stored in the array from char? to char, and the array is now a valid Buffer1. This matches the type change specified in the function declaration for the buffer argument.

The next three lines compute the number of characters to copy:

val len = string1_length(source)
stadef s_tocopy = min(s_max_buffer_size, n)
val tocopy: size_t s_tocopy = min(max_buffer_size, len)

The length of the string is obtained. The string1_ prefix to function names is an idiom to show that they operate on dependently typed strings. string1_length will return the string length. It returns a dependently typed size_t x, where x is the length of the string. This means that calling string_length not only returns the length of the string but it sets a constraint such that the type index of the result is equal to the type index of the string passed in. Here is the declaration of string1_length from the prelude (with some additional annotations removed):

fun string1_length {n:int} (x: string n): size_t n

The reuse of the n in the argument and result type is what tells the compiler to set the constraint when it is called. This is important for the next two lines. These perform the same operation, once in the statics and once in the dynamics. That is, once at the type level and once at the runtime level. The stadef sets a type level constant called s_tocopy to be constrained to be the same as the minimum of the max buffer size and the type level string length, n. The val line sets the runtime variable to the same calculation but using the runtime length. These two lengths must much as a result of the call to string1_length described earlier. The type of tocopy ensures this by saying that it is the singleton type that can only be fulfilled by the value represented by s_tocopy.

ATS has looping constructs but it's idiomatic to use tail recursion instead of loops. It is also easier to create types for tail recursive functions than to type iterative loops in ATS. For this reason I've converted the loop to a tail recursive function. When ATS compiles to C it converts tail recursive functions to a C loop so there is no danger of stack overflow. The looping function without dependent type annotations would be:

fun loop (buffer: &Buffer1, m: size_t): void = let
  val () = buffer[m] := source[m]
in
  if m < tocopy - 1 then loop(buffer, m+1)
end

It takes a buffer and an index, m, as arguments and assigns to buffer at that index the equivalent entry from the source string. If it hasn't yet copied the number of characters needed to copy it tail recursively calls itself, increasing the index. The initial call is:

loop(buffer, i2sz(0))

i2sz is needed to convert the integer number zero from an int type to a size_t type.

The declaration for the loop function contains the dependent type definitions that give invariants similar to the Spark example:

fun loop {m:nat | m < s_tocopy } .<s_tocopy - m>.
      (buffer: &Buffer1, m: size_t m): void = let

Starting with the arguments, the buffer is a Buffer meaning it must be an initialized array. The index, m is a dependently typed size_t m where m at the type level is constrained to be less than the number of characters to copy. This ensures that the indexing into the buffer and source string is never out of bounds due to the previous statements that set s_tocopy to be the lesser of the maximum buffer size and the string size. s_tocopy is needed here instead of tocopy due to universal arguments being written in the language of the typesystem, not the runtime system.

The body of the loop does the copying of the indexed character from the source string to the buffer and recursively calls itself with an incremented index if it hasn't copied the required number of characters.

  val () = buffer[m] := source[m]
in
  if m < tocopy - 1 then loop(buffer, m+1)

Due to the constraints set in the function declaration the buffer and source string indexes are checked at compile time to ensure they aren't out of bounds, and the recursive call to loop will fail typechecking if an out of bounds index is passed as an argument.

One issue with recursive functions is that they could loop forever if the termination condition check is incorrect, or the variables being used in that check don't explicitly increase or decrease. In this case the index must increase to eventually reach the value for the termination check.

ATS resolves this by allowing termination metrics to be added to the function. This is the part of the function declaration that is bracketed by .< ... >. markers. The expression inside these markers is expected to be in the 'statics' constraint language and evaluate to a tuple of natural numbers that the compiler needs to prove are lexicographically decreasing in value. The loop index counts up so this needs to be converted to a decreasing value:

.<s_tocopy - m>.

This is done by taking the static constant of the number of characters to copy and subtracting the index value. The compiler proves that each recursive call in the body of the function results in something strictly less than the previous call. With the termination metric added it statically proves that the recursive function terminates.

The program can be run and tested with:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

stadef s_max_buffer_size: int = 128
typedef Buffer0 = @[char?][s_max_buffer_size]
typedef Buffer1 = @[char][s_max_buffer_size]
val max_buffer_size: size_t s_max_buffer_size = i2sz(128)

fun copy_into {n:nat}
    (buffer: &Buffer0 >> Buffer1, source: string n): void = let
  val () = array_initize_elt(buffer, max_buffer_size, ' ')
  val len = string1_length(source)
  stadef s_tocopy = min(s_max_buffer_size, n)
  val tocopy: size_t s_tocopy = min(max_buffer_size, len)
  fun loop {m:nat | m < s_tocopy } .<s_tocopy - m>.
        (buffer: &Buffer1, m: size_t m): void = let
    val () = buffer[m] := source[m]
  in
    if m < tocopy - 1  then loop(buffer, m+1)
  end
in
  if len > 0 then
    loop(buffer, i2sz(0))
end 

implement main0() = let
  var  b = @[char?][max_buffer_size]()
  val () = copy_into(b, "hello world")
  fun print_buffer {n:nat | n < s_max_buffer_size}
                   .<s_max_buffer_size - n>.
       (buffer: &Buffer1, n: size_t n):void = let
    val () = if n = 0 then print!("[")
    val () = print!(buffer[n])
    val () = if n = max_buffer_size - 1 then print!("]")
  in
    if n < 127 then print_buffer(buffer, n + 1)
  end
  val () = print_buffer(b, i2sz(0))
in
  ()
end

This example creates the array as a stack allocated object in the line:

var  b = @[char?][max_buffer_size]()

Being stack allocated the memory will be deallocated on exit of the scope it was defined in. It's also possible to create a heap allocated buffer and pass that to copy_into:

implement main0() = let
  val (pf_b , pf_gc| b) = array_ptr_alloc<char>(max_buffer_size)
  val () = copy_into(!b, "hello world")
  fun print_buffer {n:nat | n < 128} .<128 - n>. (buffer: &(@[char][128]), n: int n):void = let
    val () = if n = 0 then print!("[")
    val () = print!(buffer[n])
    val () = if n = 127 then print!("]")
  in
    if n < 127 then print_buffer(buffer, n + 1)
  end
  val () = print_buffer(!b, 0)
  val () = array_ptr_free(pf_b, pf_gc | b)
in
  ()
end

This explicitly allocates the memory for the array using array_ptr_alloc and dereferences it in the copy_into call using the '!b' syntax. Note that this array is later free'd using array_ptr_free. Not calling that to free the memory would be a compile error.

Although this is a simple function it demonstrates a number of safety features:

  • The destination buffer cannot indexed beyond its bounds.
  • The source string cannot be indexed beyond its bounds.
  • There can be no buffer overflow on writing to the destination buffer.
  • There can be no uninitialized data in the buffer on function exit.
  • The internal recursive call must terminate.

The book Building High Integrity Applications with Spark is a great book for learning Spark and also for learning about the sorts of proofs used in production applications using Spark/Ada. Apart from learning a bit about Spark it's also a good exercise to think about how similar safety checks can be implemented in your language of choice.

Tags: ats 

2018-01-03

Writing basic proofs in ATS

This post covers using the latest version of ATS, ATS 2, and goes through proving some basic algorithms. I've written a couple of posts before on using proofs in ATS 1:

Writing proofs in ATS is complicated by the fact that the dependent types and proof component of the language does not use the full ATS language. It uses a restricted subset of the language. When implementing a proof for something like the factorial function you can't write factorial in the type system in the same way as you write it in the runtime system. Factorial in the latter is written using a function but in the former it needs to be encoded as relations in dataprop or dataview definitions.

Recent additions to ATS 2 have simplified the task of writing some proofs by enabling the constraint checking of ATS to be done by an external SMT solver rather than the built in solver. External solvers like Z3 have more features than the builtin solver and can solve constraints that the ATS solver cannot. For example, non-linear constraints are not solveable by ATS directly but are by using Z3 as the solver.

In this post I'll start by describing how to write proofs using quantified constraints. This is the easiest proof writing method in ATS but requires Z3 as the solver. Next I'll continue with Z3 as the solver but describe how to write proofs using Z3 without quantified constraints. Finally I'll go through writing the proofs by encoding the algorithm as relations in dataprop. This approach progressively goes from an easy, less intrusive to the code method, to more the more difficult system requiring threading proofs throughout the code.

Installing Z3

Z3 is the external solver used in the examples in this post. To install from the Z3 github on Linux:

$ git clone https://github.com/Z3Prover/z3
$ cd z3
$ mkdir build
$ cd build
$ cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release ../
$ make && sudo make install

Installing ATS

I used ATS2-0.3.8 for the examples in this post. There are various scripts for installing but to do install manually from the git repository on an Ubuntu based system:

$ sudo apt-get install build-essential libgmp-dev libgc-dev libjson-c-dev
$ git clone git://git.code.sf.net/p/ats2-lang/code ATS2
$ git clone https://github.com/githwxi/ATS-Postiats-contrib.git ATS2-contrib
$ export PATSHOME=`pwd`/ATS2
$ export PATSCONTRIB=`pwd`/ATS2-contrib
$ export PATH=$PATSHOME/bin:$PATH
$ (cd ATS2 && ./configure && make all)
$ (cd ATS2/contrib/ATS-extsolve && make DATS_C)
$ (cd ATS2/contrib/ATS-extsolve-z3 && make all && mv -f patsolve_z3 $PATSHOME/bin)
$ (cd ATS2/contrib/ATS-extsolve-smt2 && make all && mv -f patsolve_smt2 $PATSHOME/bin)

Optionally you can install the various ATS backends for generating code in other languages:

$ (cd ATS2/contrib/CATS-parsemit && make all)
$ (cd ATS2/contrib/CATS-atscc2js && make all && mv -f bin/atscc2js $PATSHOME/bin)
$ (cd ATS2/contrib/CATS-atscc2php && make all && mv -f bin/atscc2php $PATSHOME/bin)
$ (cd ATS2/contrib/CATS-atscc2py3 && make all && mv -f bin/atscc2py3 $PATSHOME/bin)

Add PATSHOME, PATSCONTRIB and the change to PATH to .bash-rc, .profile, or some other system file to have these environment variables populated when starting a new shell.

Dependent Types

A function to add numbers in ATS can be proven correct using dependent types by specifying the expected result using dependently typed integers:

fun add_int {m,n:int} (a: int m, b: int n): int (m + n) = a + b

This won't compile if the implementation does anything but result in the addition of the two integers. The constraint language used in dependent types is a restricted subset of the ATS language. I wrote a bit about this in my post on dependent types in ATS.

The following is an implementation of the factorial function without proofs:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

fun fact (n: int): int =
  if n > 0 then n * fact(n- 1) else 1

implement main0() = let
  val r = fact(5)
in
  println!("5! = ", r)
end

Compile and run with something like:

$ patscc -o f0 f0.dats
$ ./f0
5! = 120

To prove that the implementation of factorial is correct we need to specify what factorial is in the constraint language of ATS. Ideally we'd like to write something like the following:

fun fact {n: nat} (n: int n): int (fact(n)) = ...

This would check that the body of the function implements something that matches the result of fact. Unfortunately the restricted constraint language of ATS doesn't allow implementing or using arbitary functions in the type definition.

Using Z3 as an external solver

By default ATS solves constraints using its built in constraint solver. It has a mechanism for using an external solver, in this case Z3. To type check the previous factorial example using Z3 use the following commands:

$ patsopt -tc --constraint-export -d f0.dats |patsolve_z3 -i
Hello from [patsolve_z3]!
typechecking is finished successfully!

Note the change to use patsopt instead of patscc. The -tc flag does the type checking phase only. --constraint-export results in the constraints to be exported to stdout which is piped to patsolve_z3. That command invokes Z3 and checks the constraint results.

Since the resulting program may contain code that ATS can't typecheck itself, to actually build the final executable we invoke patscc with a command line option to disable type checking. It's important that the checking has succeeded through the external solver before doing this!

$ patscc --constraint-ignore -o f0 f0.dats

Z3 and quantified constraints

Using Z3 with quantified constraints is a new feature of ATS and quite experimental. Hongwei notes some issues due to different versions of Z3 that can cause problems. It is however an interesting approach to proofs with ATS so I include the process of using it here and hope it becomes more stable as it progresses.

ATS provides the stacst keyword to introduce a constant into the 'statics' part of the type system. The 'statics' is the restricted constraint language used when specifying types. There are some examples of stacst usage in the prelude file basics_pre.sats.

Using stacst we can introduce a function in the statics system for factorial:

stacst fact: int -> int

Now the following code is valid:

fun fact {n:nat} (n: int n): int (fact(n)) = ...

ATS doesn't know about fact in the statics system, it only knows it's a function that takes an int and returns an int. With Z3 as the external solver we can extend ATS' constraint knowledge by adding assertions to the Z3 solver engine using $static_assert:

praxi fact_base(): [fact(0) == 1] void
praxi fact_ind {n:int | n >= 1} (): [fact(n) == n * fact(n-1)] void

The keyword praxi is used for defining axioms, whereas prfun is used for proof functions that need an implementation. This is currently not checked by the compiler but may be at some future time. From a documentation perspective using praxi shows no plan to actually prove the axiom.

The two axioms here will add to the proof store the facts about the factorial function. The first, fact_base, asserts that the factorial of zero is one. The second asserts that for all n, where n is greater than or equal to 1, that the factorial of n is equivalent to n * fact (n - 1).

To add these facts to Z3's knowledge, use $solver_assert:

fun fact {n:int | n >= 0} (n: int n): int (fact(n)) = let
  prval () = $solver_assert(fact_base)
  prval () = $solver_assert(fact_ind)
in
  if n = 0 then 1 else n * fact(n - 1)
end

This typechecks successfully. Changing the implementation to be incorrect results in a failed typecheck. Unfortunately, as is often the case with experimental code, sometimes an incorrect implementation will hang Z3 causing it to consume large amounts of memory as noted earlier.

The implementation of fact here closely mirrors the specification. The following is a tail recursive implementation of fact that is also proved correct to the specification:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

stacst fact: int -> int

extern praxi fact_base(): [fact(0) == 1] unit_p
extern praxi fact_ind{n:pos}(): [fact(n)==n*fact(n-1)] unit_p

fun fact {n:nat} (n: int n): int (fact(n)) = let
  prval () = $solver_assert(fact_base)
  prval () = $solver_assert(fact_ind)

  fun loop {n,r:nat} (n: int n, r: int r): int (fact(n) * r) =
    if n > 0 then loop (n - 1, n * r) else r
in
  loop (n, 1)
end

implement main0() = let
  val r = fact(5)  
in
  println!("5! = ", r)
end

This was tested and built with:

$ patsopt -tc --constraint-export -d f4.dats |patsolve_z3 -i
Hello from [patsolve_z3]!
typechecking is finished successfully!
$ patscc --constraint-ignore -o f4 f4.dats
./f4
5! = 120

Z3 with threaded proofs

Another approach to using the external solver is not to add constraints to the Z3 store via $solver_assert but instead call the axioms explicitly as proof functions threaded in the body of the function. This is fact implemented in such a way:

stacst fact: int -> int

extern praxi fact_base(): [fact(0) == 1] void
extern praxi fact_ind {n:int | n >= 1} (): [fact(n) == n * fact(n-1)] void

fun fact {n:nat} (n: int n): int (fact(n)) =
  if n > 0 then let
      prval () = fact_ind {n} ()
    in
      n * fact(n - 1)
    end
  else let
      prval () = fact_base()
     in
       1
     end

The code is more verbose due to needing to embed the prval statements in a let block but it doesn't suffer the Z3 incompatibility that the quantified constraint example did. An incorrect implementation will give an error from Z3.

The equivalent tail recursive version is:

fun fact {n:nat} (n: int n): int (fact(n)) = let
  fun loop {n,r:nat} (n: int n, r: int r): int (fact(n) * r) =
    if n > 0 then let
        prval () = fact_ind {n} ()
      in
        loop (n - 1, n * r)
      end
    else let
        prval () = fact_base()
      in
        r + 1
      end
in
  loop (n, 1)
end

Dataprops and Datatypes

It's still possible to write a verified version of factorial without using an external solver but the syntactic overhead is higher. The specification of fact needs to be implemented as a relation using dataprop. A dataprop introduces a type for the proof system in much the same way as declaring a datatype in the runtime system of ATS. Proof objects constructed from this type exist only at proof checking time and are erased at runtime. They can be taken as proof arguments in functions or included in return values. Proof functions can also be written to use them. In the words of Hongwei Xi:

A prop is like a type; a value classified by a prop is often called a proof, which is dynamic but erased by the compiler. So while proofs are dynamic, there is no proof construction at run-time.

For a comparison of the syntax of dataprop and datatype, here is a type for "list of integers" implement as both:

dataprop prop_list =
 | prop_nil
 | prop_cons of (int, prop_list)

datatype list =
 | list_nil
 | list_cons of (int, list)

To specifiy fact as a relation it is useful to see how it is implemented in a logic based progamming language like Prolog:

fact(0, 1).
fact(N, R) :-
    N > 0,
    N1 is N - 1,
    fact(N1, R1),
    R is R1 * N.

The equivalent as a dataprop is:

dataprop FACT (int, int) =
  | FACTzero (0, 1)
  | {n,r1:int | n > 0} FACTsucc (n, r1 * n) of (FACT (n-1, r1))

FACT(n,r) encodes the relation that fact(n) = r where fact is defined as:

  • fact(0) = 1, and
  • fact(n) where n > 0 = n * fact(n - 1)

The dataprop creates a FACT(int, int) prop with two constructors:

  • FACTzero() which encodes the relation that fact(0) = 1, and
  • FACTsucc(FACT(n-1, r1)) which encodes the relation that fact(n) = n * fact(n-1)

The declaration of the fact function uses this prop as a proof return value to enforce that the result must match this relation:

fun fact {n:nat} (n: int n): [r:int] (FACT (n, r) | int r) = ...

The return value there is a tuple, where the first element is a proof value (the left of the pipe symbol) and the second element is the factorial result. The existential variable [r:int] is used to associate the returned value with the result of the FACT relation and the universal variable, {n:nat} is used to provide the first argument of the fact prop. Through unification the compiler checks that the relationships between the variables match.

The implementation of the function is:

fun fact {n:nat} (n: int n): [r:int] (FACT (n, r) | int r) =
  if n > 0 then let
    val (pf1 | r1) = fact (n - 1)
    val r = n * r1
  in
    (FACTsucc (pf1) | r)
  end else begin
    (FACTzero () | 1)
end

Note that the result of the recursive call to fact deconstructs the proof result into pf1 and the value into r1 and that proof is used in the result of the FACTsucc constructor. Proofs are constructed like datatypes. For example:

  • FACTsucc(FACTzero()) is fact(1) (the successor, or next factorial, from fact(0).
  • FACTsucc(FACTsucc(FACTzero())) is fact(2), etc.

In this way it can be seen that a call to fact(1) will hit the first branch of the if condition which recursively calls fact(0). That hits the second branch of the conditional to return the prop FACTzero(). On return back to the caller this is returned as FACTsucc(FACTzero()), giving our prop return type as FACT(1, 1). Plugging these values into the n and r of the return type for the int r value means that value the function returns must be 1 for the function to pass type checking.

Although these proofs have syntactic overhead, the runtime overhead is nil. Proofs are erased after typechecking and only the factorial computation code remains. The full compilable program is:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

dataprop FACT (int, int) =
  | FACTzero (0, 1)
  | {n,r1:int | n > 0} FACTsucc (n, r1 * n) of (FACT (n-1, r1))

fun fact {n:nat} (n: int n): [r:int] (FACT (n, r) | int r) =
  if n > 0 then let
    val (pf1 | r1) = fact (n - 1)
    val r = n * r1
  in
    (FACTsucc (pf1) | r)
  end else begin
    (FACTzero () | 1)
end

implement main0() = let
  val (pf | r) = fact(5)
in
  println!("5! = ", r)
end

To compile and run:

$ patscc -o f8 f8.dats
$ ./f8
5! = 120

A tail recursive implementation of fact using dataprop that type checks is:

fun fact {n:nat} (n: int n): [r:int] (FACT (n, r) | int r) = let
  fun loop {r:int}{i:nat | i <= n}
           (pf: FACT(i, r) | n: int n, r: int r, i: int i):
           [r1:int] (FACT(n, r1) | int (r1)) =
    if n - i > 0 then
        loop (FACTsucc(pf) | n, (i + 1) * r, i + 1)
    else (pf | r)
in
  loop (FACTzero() | n, 1, 0)
end

Because our dataprop is defined recursively based on successors to a previous factorial, and we can't get a predecessor, the tail recursive loop is structured to count up. This means we have to pass in the previous factorial prop as a proof argument, in pf, and pass an index, i, to be the current factorial being computed.

Fibonacci

Fibonacci is similar to factorial in the way the proofs are constructed. The quantified constraints versions is trivial:

stacst fib: int -> int

extern praxi fib0(): [fib(0) == 0] unit_p
extern praxi fib1(): [fib(1) == 1] unit_p
extern praxi fib2 {n:nat|n >= 2} (): [fib(n) == fib(n-1) + fib(n-2)] unit_p

fun fib {n:int | n >= 0} (n: int n): int (fib(n)) = let
  prval () = $solver_assert(fib0)
  prval () = $solver_assert(fib1)
  prval () = $solver_assert(fib2)
in
  if n = 0 then 0
  else if n = 1 then 1
  else fib(n-1) + fib(n -2)
end

The threaded version using the external solver isn't much more verbose:

stacst fib: int -> int

extern praxi fib0(): [fib(0) == 0] void
extern praxi fib1(): [fib(1) == 1] void
extern praxi fib2 {n:nat|n >= 2} (): [fib(n) == fib(n-1) + fib(n-2)] void

fun fib {n:int | n >= 0} (n: int n): int (fib(n)) =
  if n = 0 then let
      prval () = fib0()
    in
      0
    end
  else if n = 1 then let
      prval () = fib1()
    in
      1
    end
  else let
      prval () = fib2 {n} ()
    in
      fib(n-1) + fib(n -2)
    end

The dataprop version is quite readable too and has the advantage of not needing an external solver:

dataprop FIB (int, int) =
  | FIB0 (0, 0)
  | FIB1 (1, 1)
  | {n:nat}{r0,r1:int} FIB2(n, r1+r0) of (FIB(n-1, r0), FIB(n-2, r1))

fun fib {n:nat} (n: int n): [r:int] (FIB (n, r) | int r) =
  if n = 0 then (FIB0() | 0)
  else if n = 1 then (FIB1() | 1)
  else let
      val (pf0 | r0) = fib(n-1)
      val (pf1 | r1) = fib(n-2)
    in
      (FIB2(pf0, pf1) | r0 + r1)
    end

The dataprop maps nicely to the Prolog implementation of fibonacci:

fib(0,0).
fib(1,1).
fib(N,R) :- N1 is N-1, N2 is N-2, fib(N1,R1),fib(N2,R2),R is R1+R2.

I leave proving different implementations of fib to the reader.

Conclusion

There are more algorithms that could be demonstrated but this post is getting long. Some pointers to other interesting examples and articles on ATS and proofs:

In general proof code adds a syntactic overhead to the code. With more work on using external solvers it looks like things will become easier with automated solving. ATS allows the programmer to write normal code and add proofs as needed so the barrier to entry to applying proofs is quite low, given suitable documentation and examples.

Tags: ats  prolog 

2018-01-02

Casting in ATS

The ATS programming language has a powerful type and proof system to enable safe construction of programs. Sometimes there is a need to cast a value from one type to another. ATS provides a feature that enables the programming to write their own rules specifying what can be cast. This becomes important when converting types from non-dependent typed values to dependantly typed values and dealing with conversion of proofs. This post goes through some common examples.

A casting function is introduced using the keyword castfn. It only has a declaration - there is no implementation or body of the casting function. From a value perspective there is no change to the actual value being cast, only to the type. It is effectively the identity function except for the type change. This means the cast is only allowed for types that are the same size, typically a machine word. There is no runtime overhead for using a casting function.

Standard Types

The following example ATS program attempts to add an integer to an unsigned integer value:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

implement main0() = let
  val x: uint = 5u
  val y: int = x + 5
in
  println!("y=", y)
end 

This results in a type error:

$ patscc -o c2 c2.dats
c2.dats: 133(line=6, offs=16) -- 138(line=6, offs=21):
         error(3): the symbol [+] cannot be resolved as no match is found.
c2.dats: 124(line=6, offs=7) -- 130(line=6, offs=13):
         error(3): the pattern cannot be given the ascribed type.
c2.dats: 124(line=6, offs=7) -- 130(line=6, offs=13):
         error(3): mismatch of static terms (tyleq):
The actual term is: S2Eerrexp()
The needed term is: S2Eapp(S2Ecst(g0int_t0ype); S2Eextkind(atstype_int))

The error is saying that the operator + has no matching implementation for the types provided. It is expecting an int but got something else.

The addition operator requires the two arguments to be of the same type. A casting function can be added to cast the unsigned integer to an integer:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

extern castfn int_of_uint(x: uint): int

implement main0() = let
  val x: uint = 5u
  val y: int = int_of_uint(x) + 5
in
  println!("y=", y)
end

This compiles and runs:

$ patscc -o c3 c3.dats
$ ./c3
y=10

A castfn looks very much like a fun declaration. It requires the name of the casting function, the argument it expects and a return type. There is no implementation as mentioned earlier. The compiler will convert the type of the argument to that of the return type. Another example follows showing conversion to a floating point value:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

extern castfn double_of_uint(x: uint): double

implement main0() = let
  val x: uint = 5u
  val y: double = double_of_uint(x) + 5.0
in
  println!("y=", y)
end

The castfn can have any name but most of the ATS prelude follows a format of including the 'from' and 'to' types in the name.

Dependent Types

Conversion between dependently typed values and non-dependently typed values are a common use of casting functions. For example:

val x: [n:int] size_t n = string1_length("hello")
val y: int = x + 1 // Error here, mismatch between dependent and non-dependent int

The type of x is dependently typed in that it has a type index of sort int for the length of the string. It is also of type size_t but the snippet expects an int. A castfn allows casing away the dependent type index and converting the type at the same time:

 #include "share/atspre_define.hats"
 #include "share/atspre_staload.hats"

 extern castfn int0_of_size1 {n:int} (x: size_t n): int

 implement main0() = let
   val x: [n:int] size_t n = string1_length("hello")
   val y: int = int0_of_size1(x) + 1
 in
   println!("y=", y)
 end 

Note the use of the '0' and '1' suffixes used for the type names in the castfn. This is an idiom in ATS naming where a '0' means not dependently typed and '1' means it has a dependent type.

It's possible to go in the other direction, casting from a non-dependent typed value to a dependently type value. This is an example of code that won't compile and needs a cast:

val x: int = 5
val y: [n:int] size_t n = x

This fails due to x and y being different types. Adding a cast function to convert the int to a size_t and including a type index will allow it to compile:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

extern castfn size1_of_int0 (x: int): [n:int] size_t n

implement main0() = let
  val x: int = 5
  val y: [n:int] size_t n = size1_of_int0(x)
in
  println!("y=", y)
end

Here the castfn adds an existential variable to the return type (the '[n:int]' part) to attach a type index of "unspecified integer value' to the type.

Another common use of casting functions is to convert different string types. A string is a non dependently typed string and a string1 is a dependently typed string with the length of the string as a type index. For example, arguments on a command line are string, but if a string function is called on it then there is a type error:

implement main0(argc, argv) =
  if argc > 1 then let
      val s: string = argv[1]
      val n = string1_length(s)
    in
      ()
    end
  else
    ()

Using a castfn can convert the string to a string1:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

extern castfn string1_of_string (s: string): [n:int] string n

implement main0(argc, argv) =
  if argc > 1 then let
      val s: string = argv[1]
      val n = string1_length(string1_of_string(s)) 
    in
      ()
    end
  else
    ()

Casting linear types

Types can be cast to and from linear types. When programming with strings in ATS you often deal with string which does not need to be manually free'd and strptr which is a linear type and needs to be manually free'd. Sometimes you want to treat a strptr as a string to call a string based function. An example is:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

extern fun strdup(s: string): [l:addr] strptr l = "ext#"
extern castfn strptr_to_string {l:addr} (s: !strptr l): string

implement main0() = let
  val s: [l:addr] strptr l = strdup("hello world")
  val n = string0_length(strptr_to_string(s))
  val _ = strptr_free(s)
in
  ()
end

This example needs to be compiled passing '-DATS_MEMALLOC_LIBC' to tell ATS to use the standard C malloc/free calls.

$ patscc -DATS_MEMALLOC_LIBC -o c8 c8.dats

For the purposes of this example I use a FFI function to call the C strdup function to copy a static string to a linear string that needs to be free'd. Note that the castfn does not consume this linear type (evidenced by the '!' prefixing the type name in the argument). This is dangerous because we can store the return value of the cast and use it after the free:

val oops = strptr_to_string(s)
val _ = strptr_free(s)
val _ = println!(oops)

The danger can be restricted by using locally scoped cast functions:

val n = string0_length(strptr_to_string(s)) where {
          extern castfn strptr_to_string {l:addr} (s: !strptr l): string
        }

Now strptr_to_string can only be used within that local block of code. Cast functions can be dangerous so need some care. There are various prelude casting functions for performing these types of conversions already.

Prelude unsafe casts

The prelude has an 'unsafe' module with a casting polymorphic function which can be used instead of defining casting functions for trivial things. There are also functions for converting strptr to string, etc. The equivalent of the previous example would be:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

staload UN = $UNSAFE

extern fun strdup(s: string): [l:agz] strptr l = "ext#"

implement main0() = let
  val s: [l:agz] strptr l = strdup("hello world")
  val n = string0_length($UN.strptr2string(s))
  val _ = strptr_free(s)
in
  ()
end

Basic type casting can be done using the cast function:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

staload UN = $UNSAFE

implement main0() = let
  val x: uint = 5u
  val y: int = $UN.cast{int}(x) + 5
in
  println!("y=", y)
end 

The swiss army knife of unsafe casting functions are castvwtp0 and castvwtp1. They can be used for casting linear types. The string example previously can use this:

val s: [l:agz] strptr l = strdup("hello world")
val n = string0_length($UN.castvwtp1{string}(s))
val _ = strptr_free(s)

castvwtp1 will not consume the input argument whereas castvwtp0 will. Looking through the prelude/SATS/unsafe.sats file will give an idea of the range of available functions.

The examples in this post are somewhat contrived in that there are standard prelude functions for doing most everything already. The need to cast string to string1 and strptr are reduced due to specific functions being available for those types. However it is useful to learn to use the castfn functionality and it is handy for one-off local casts, or defining protocols using proofs to ensure safe casting to and from types.

Tags: ats 

2017-12-02

Cross Compiling ATS Programs

A few years ago I wrote a mailing list post on cross compiling ATS to Android. With ATS2 being released the ability to cross compile has been made easier. In this post I'll show how to produce static linux binaries using musl libc and how to build Android and Windows ATS programs by cross compiling on linux.

For these examples I'm using ATS2-0.3.8. I'll use the simplest of 'Hello World' programs in ATS to cross compile. The following is the contents of hello.dats:

implement main0() = print!("Hello World\n")

To compile:

$ cat hello.dats
implement main0() = print!("Hello World\n")

$ patscc -o hello hello.dats
$ ./hello
Hello World

Static musl libc binaries

musl libc is a lightweight standard library for C programs. I use this instead of glibc because statically linked binaries using glibc have issues using networking functionality. There are no problems using networking routines with musl libc, it's smaller and lightweight and works well statically linked into executables. To build musl libc I used these steps:

$ git clone git://git.musl-libc.org/musl
$ cd musl
$ ./configure
$ make
$ sudo make install
$ export PATH=$PATH:/usr/local/musl/bin/

The ATS compiler, patscc, compiles ATS code to C. It defaults to using gcc but takes an atsccomp command line argument to define an alternative C compiler to use for compiling the generated C code. Unknown command line arguments are passed directly to that C compiler. Given musl libc installed in /usr/local/musl as above, a static binary can be built with musl-gcc like so:

$ patscc -o hello -atsccomp "/usr/local/musl/bin/musl-gcc" -static \
         -I $PATSHOME/ccomp/runtime -I $PATSHOME \
         hello.dats

I pass the -static flag to produce a statically linked binary and two -I include paths to find the ATS runtime. These appear to be required if -atsccomp is used. In this case I use the environment variable PATSHOME to find the installation directory of ATS. Hopefully that was set at ATS installation time. If this command succeeded then we have a static binary:

$ ./hello
Hello World
$ ldd hello
not a dynamic executable
$ strip hello && ls -l hello
-rwxr-xr-x 1 myuser myuser 18224 Dec  2 23:21 hello
$ file hello
hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped

Cross compiling Windows binaries

To build Windows compatible binaries on Linux I used mingw-w64. On Ubuntu this is available in the gcc-mingw-w64 package. With that installed a Windows hello binary can be built with:

 $ patscc -o hello.exe -atsccomp "i686-w64-mingw-gcc" \
         -I $PATSHOME/ccomp/runtime -I $PATSHOME \
         hello.dats
 $ file hello.exe
 hello.exe: PE32 executable (console) Intel 80386, for MS Windows

Copying to a Windows machine:

C:\Users\Myuser> .\hello
Hello World

Cross compiling Android binaries

To build on Android I generated a standalone toolchain using the Android NDK. This produces standard command line compilers that generate Android compatible binaries. Install the Android NDK - I used android-ndk-r16 - and run:

$ANDROID_NDK/build/tools/make-standalone-toolchain.sh \
  --arch=arm --install-dir=$STK_ROOT

Replace ANDROID_NDK with the path to the Android NDK and STK_ROOT with the destination where you want the standalone toolchain installed. Use arm64 instead of arm to build an ARM 64-bit binary.

Now the "Hello World" program can be built using the C compiler from the standalone toolchain. To build on 32-bit using an arm standalone toolchain, use arm-linux-androideabi-clang as the C compiler:

 $ patscc -o hello -atsccomp $STK_ROOT/bin/arm-linux-androideabi-clang \
         -I $PATSHOME/ccomp/runtime -I $PATSHOME \
         hello.dats
 $ file hello
 hello: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV),
        dynamically linked, interpreter /system/bin/linker, not stripped

To build on 64-bit ARM using an arm64 standalone toolchain, use aarch64-linux-android-clang as the compiler:

 $ patscc -o hello -atsccomp $STK_ROOT/bin/aarch64-linux-android-clang \
         -I $PATSHOME/ccomp/runtime -I $PATSHOME \
         hello.dats
 $ file hello
 hello64: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV),
          dynamically linked, interpreter /system/bin/linker64, not stripped

Notice the use of clang instead of gcc. I believe gcc support for building native Android executables with the NDK is deprecated - I got link errors when I tried.

Copy the hello executable to an Android phone. It needs to be somewhere writeable on the phone. On a recent non-rooted phone you should be able to use /data/local/tmp. The following adb commands work for me when the device is connected:

$ adb push hello /data/local/tmp
$ adb shell
$ cd /data/local/tmp
$ chmod 0755 hello
$ ./hello
Hello World

Producing an actual Android application would require using the ATS FFI to interface with the Android runtime - given the low level nature of ATS, if it can be done in C, it should be able to be done in ATS.

Tags: ats 

2017-10-12

ZeroMe - Decentralized Microblogging on ZeroNet

I wrote about ZeroNet a few years ago when it was released and it mostly was about decentralized websites. In the time between then and now it has gained a lot of features and regular use. It still does distributed websites well but it adds features suchs as support for sharing large files, merging sites and distributed pseudo-anonymous identity. This post is about an application hosted on ZeroNet called ZeroMe. It's a microblogging platform (i.e. A twitter like system) that takes advantage of ZeroNet's identity system and merging of sites.

Starting ZeroNet

To start with ZeroMe you first need to install ZeroNet. Given all the right dependancies are installed you can clone the github repository and run the Python script to get online:

$ git clone https://github.com/HelloZeroNet/ZeroNet
$ cd ZeroNet
$ ./zeronet.py

This will start the ZeroNet node and run a local webserver on port 43110 to access it. This port is available on the local machine only. It also opens a port on 15441 to communicate with other nodes. This port is open to the internet at large and will attempt to punch a hole through firewalls using UPNP.

If you have the Tor daemon running then ZeroNet will connect to it and bridge between clearnet nodes and nodes running on Tor. You can force the node to run only on Tor by providing a command line argument:

$ ./zeronet.py --tor always

In this configuration ZeroNet will not expose the port 15441 over the internet. It will spawn Tor onion services and expose the port to those for communication with other ZeroNet nodes running on onion services. Access to the local node is still done via port 43110 but you should use a browser configured to use Tor if anonymity is a concern since sites can run arbitary JavaScript. The main ZeroNet 'Hello' page warns you if you are not using Tor Browser in this configuration.

Gaining an identity

ZeroMe requires an identity to be created. Identities are created through identity ZeroNet sites. The main one, operated by the creator of ZeroNet, is ZeroId. When you first use a site that requires an identity you will be prompted to create one from any identity providers you have access to. For this example I'll use ZeroId, accessible at zeroid.bit - http://localhost:43110/zeroid.bit.

Click Get auth cert and choose a username. There are two ways to make the request for the identity. The first is a request over the internet via HTTP. If you're not running the browser over Tor this will expose your IP address to the identity service. The other is using BitMessage, which is an anonymous messaging system. ZeroId will request that you send a confirmation message to a BitMessage address.

This may seem very centralized - it requires the identity service to be running, if it goes down you can't create a new identity. There are other identity services on ZeroNet that have different requirements for creating an identity. And there's an API to create your own. Identities created using these work on any service using the ZeroNet identity API in the same was as ZeroId identities.

The ZeroId site will show your identity name once you've signed up:

From ZeroId you can search for existing users and choose to Mute or Unmute. By muting a particular identity you won't see any posts from that identity. This is useful for dealing with spam or users posting on topics you don't want to see. This will prevent you seeing their content on any ZeroNet site.

You can manage muted users from the Hello page left sidebar menu:

Joining a ZeroMe Hub

Access ZeroMe by activating the ZeroMe icon on the main user interface or going directly to Me.ZeroNetwork.bit - http://localhost:43110/Me.ZeroNetwork.bit/

A prompt on the top right asks you to accept a merger site. Merge sites are a way to split large sites into a number of smaller ones. You should approve this request.

Click on "Select user to post new content" to create a ZeroMe user. It will prompt for an identity to choose and you can select the one already created in ZeroId above.

Once an identity is chosen you'll be asked to select a 'hub'. ZeroMe works by having the user interface operate from the main ZeroMe site with data on users and posts stored in 'Hub' sites. You only see posts from users that belong to hubs that you have requested. There is a hub operated by the ZeroMe creator and it will be shown as Sun hub on this page along with any other Hub sites you may already have requested. Click Download on a hub to download existing posts and content for that hub and Join to join one.

Downloading a hub can take some time. On the Hello ZeroNet page you can see the progress of any sites currently being downloaded.

There's a useful site called 0hub.bit that lists known hubs and lets you download them all. It's at http://localhost:43110/0hub.bit/ and clicking the Click me to unlock all hub post in ZeroNet will start downloading them. This takes some time but is worth it since you can then see all posts by everyone on ZeroMe. You can delete hubs if you decide you don't want to see content from a particular hub.

Microblogging with ZeroMe

Now that you've got an identity, signed into ZeroNet and joined a hub you can starting reading content and posting. The Everyone tab in the user interface will show posts from all users, even those you haven't followed. This is useful at the beginning to find users and content.

Posts are in Markdown format and can include images. There is no limit to the size of a post and posts can have comments. You can visit profile pages of users and choose to optionally seed their images, mute users, and edit parts of your own profile page to have an avatar and description:

An example of a post with comments (selected randomly, identities blacked out):

Random stuff

The ZeroNet node running on your machine enforces size limits for sites. This prevents sites that you have accessed from consuming all your disk space. When a site starts reaching the size limit it appears in the main Hello page left sidebar in a special section:

When the site goes past the limit it will no longer download updates. Visiting the site will give a popup asking to increase the limit. Approving that popup will resume downloads for the site until it hits the next limit.

There's a sidebar on the right of pages that can be accessed by dragging the top right 0 icon to the left:

This sidebar gives statistics on the number of nodes seeding the site, the size limit, and various other things.

When you visit a ZeroNet site you starting seeding that sites content and receive automatic updates when the content changes. This can be disabled on the main Hello page on a per site basis. Sites can have "optional files" which you don't automatically seed. They are downloaded on demand and are often used for user generated content or larger files. You can choose to seed a sites optional files in the right hand sidebar for that site. There are also "Big Files" which are treated specially. These are large files like videos and are also optionally seeded. The Files tab of the Hello page lists optional and big files that you are seeding:

Sites can be set to allow cloning by end users. This creates a copy of the site with no content. An example of a site that does this is ZeroBlog which you can clone to create your own blog. This extends to ZeroMet itself. You can clone the ZeroMe user interface site and modify that to get a customized look but it still uses the data from the existing hubs.

There's a bunch of other interesting sites on ZeroNet. The ZeroTalk forums, various blogs, ZeroMail for encrypted email like service, etc. Be aware that the psuedo-anonymous use of identities can make for content you might not agree with and much spam. Use of 'Mute' is useful here.

Also be aware that it is 'psuedo-anonymous' not 'anonymous'. You create an identity and that identity is not tied to you in the real world but people can track what you do to that identity. Content is added to sites and distributed to other nodes. If you like a post or add content to some site then anyone who decides to dig into the data of that site can see that your identity liked or posted that content. It is possible to have multiple identities if you want to keep aspects of your ZeroNet usage separate but that's a topic for another post.

Overall ZeroMe is a nice microblogging system. It's user friendly, has a nice design and has tools for muting and "Non Real Name" identities. It, along with ZeroNet, is actively developed and supported by the ZeroNet developer.

Tags: zeronet 


This site is accessable over tor as hidden service mh7mkfvezts5j6yu.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-44/


Tags

Archives
Links