Capturing program invariants in ATS

2018-01-10

Capturing program invariants in ATS

I've been reading the book Building High Integrity Applications with Spark to learn more about the SPARK/Ada language for formal verification. It's a great book that goes through lots of examples of how to do proofs in Spark. I wrote a post on Spark/Ada earlier this year and tried some of the examples in the GNAT edition.

While working through the book I oftened compared how I'd implement similar examples in ATS. I'll go through one small example in this post and show how the dependent types of ATS allow capturing the invariants of a function and check them at type checking time to prove the absence of a particular class of runtime errors.

The example I'm looking at from the book is from the Loop Invariants section, which aims to prove things about iterative loops in Ada. The procedure Copy_Into copies characters from a String into a Buffer object that has a maximum capacity. If the source string is too short then the buffer is padded with spaces. The Spark/Ada code from the book is:

procedure Copy_Into(Buffer: out Buffer_Type; Source: in String) is
  Characters_To_Copy : Buffer_Count_type := Maximum_Buffer_Size;
begin
  Buffer := (others => ' ');
  if Source'Length < Characters_To_Copy then
    Characters_To_Copy := Source`Length;
  end if;
  for Index in Buffer_Count_Type range 1 .. Characters_To_Copy loop
    pragma Loop_Invariant (Characters_To_Copy <= Source'Length and
                           Characters_To_Copy = Characters_To_Copy'Loop_Entry);
    Buffer(Index) := Source(Source'First + (Index - 1));
  end loop;
end Copy_Into;

This is quite readable for anyone familiar with imperative languages. The number of characters to copy is initially set to the maximum buffer size, then changed to the length of the string if it is less than that. The buffer is set to contain the initial characters ' '. The buffer is 1-indexed and during the loop the characters from the string are copied to it.

The Loop_Invariant pragma is a Spark statement that tells the checker that certain invariants should be true. The invariant used here is that the number of characters to copy is always less than or equal to the length of the string and does not change within the loop. Given this loop invariant Spark can assert that the indexing into the Source string cannot exceed the bounds of the string. Spark is able to reason itself that Buffer does not get indexed out of bounds as the loop ranges up to Characters_To_Copy which is capped at Maximum_Buffer_Size. I'm not sure why this doesn't need a loop invariant but the book notes that the Spark tools improve over time at what invariants they can compute automatically. As an example, the second invariant check above for Characters_To_Copy'Loop_Entry isn't needed for newer versions of Spark but is kept to enable checking on older toolchains.

For ATS I modeled the Buffer type as an array of characters:

stadef s_max_buffer_size: int = 128
typedef Buffer0 = @[char?][s_max_buffer_size]
typedef Buffer1 = @[char][s_max_buffer_size]
val max_buffer_size: size_t s_max_buffer_size = i2sz(128)

In ATS, @[char][128] represents an array of contiguous 'char' types in memory, where those char's are initialized to value. The @[char?][128] represents an array of unitialized char's. It would be a type error to use an element of that array before initializing it. The '0' and '1' suffixes to the 'Buffer' name is an ATS idiom for uninitialized and initialized data respectively.

The stadef keyword creates a constant in the 'statics' system of ATS. That is a constant in the language of types, vs a constant in the language of values where runtime level programming is done. For this post I have prefixed variables in the 'statics' system with s_ to make it clearer where they can be used.

The max_buffer_size creates a constant in the 'dynamics' part of ATS, in the language of values, of type "size_t max_buffer_size". size_t is the type for indexes into arrays and size_t 128 is a dependent type representing a type where only a single value matches the type - 128 in this instance. The i2sz function converts an integer to a size_t. So in this case we've created a max_buffer_size constant that can only ever be the same value as the s_max_buffer_size type level constant.

The ATS equivalent of copy_into looks like:

fun copy_into {n:nat}
       (buffer: &Buffer0 >> Buffer1, source: string n): void = let
  val () = array_initize_elt(buffer, max_buffer_size, ' ')
  val len = string1_length(source)
  stadef s_tocopy = min(s_max_buffer_size, n)
  val tocopy: size_t s_tocopy = min(max_buffer_size, len)
  fun loop {m:nat | m < s_tocopy } .<s_tocopy - m>.
        (buffer: &Buffer1, m: size_t m): void = let
    val () = buffer[m] := source[m]
  in
    if m < tocopy - 1 then loop(buffer, m+1)
  end
in
  if len > 0 then
    loop(buffer, i2sz(0))
end

Starting with the function declaration:

fun copy_into {n:nat}
       (buffer: &Buffer0 >> Buffer1, source: string n): void

The items between the curly brackets, {...}, are universal variables used for setting constraints on the function arguments. Between the round brackets, (...), are the function arguments. The return type is void. The function takes two arguments, buffer and source.

The buffer is a reference type, represented by the & in the type. This is similar to a reference argument in C++ or a pointer argument in C. It allows modification of the argument by the function. The Buffer0 >> Buffer1 means on entry to the function the type is Buffer0, the uninitialized char array described earlier, and on exit it will be a Buffer1, an initialized char array. if the body of the function fails to initialize the elements of the array it will be a type error.

The source argument is a string n, where n is a type index for the length of the string. By using a dependently typed string here we can use it later in the function body to set some invariants.

In the body of the function the call to array_initize_elt sets each element of the array to the space character. This also has the effect of changing the type stored in the array from char? to char, and the array is now a valid Buffer1. This matches the type change specified in the function declaration for the buffer argument.

The next three lines compute the number of characters to copy:

val len = string1_length(source)
stadef s_tocopy = min(s_max_buffer_size, n)
val tocopy: size_t s_tocopy = min(max_buffer_size, len)

The length of the string is obtained. The string1_ prefix to function names is an idiom to show that they operate on dependently typed strings. string1_length will return the string length. It returns a dependently typed size_t x, where x is the length of the string. This means that calling string_length not only returns the length of the string but it sets a constraint such that the type index of the result is equal to the type index of the string passed in. Here is the declaration of string1_length from the prelude (with some additional annotations removed):

fun string1_length {n:int} (x: string n): size_t n

The reuse of the n in the argument and result type is what tells the compiler to set the constraint when it is called. This is important for the next two lines. These perform the same operation, once in the statics and once in the dynamics. That is, once at the type level and once at the runtime level. The stadef sets a type level constant called s_tocopy to be constrained to be the same as the minimum of the max buffer size and the type level string length, n. The val line sets the runtime variable to the same calculation but using the runtime length. These two lengths must much as a result of the call to string1_length described earlier. The type of tocopy ensures this by saying that it is the singleton type that can only be fulfilled by the value represented by s_tocopy.

ATS has looping constructs but it's idiomatic to use tail recursion instead of loops. It is also easier to create types for tail recursive functions than to type iterative loops in ATS. For this reason I've converted the loop to a tail recursive function. When ATS compiles to C it converts tail recursive functions to a C loop so there is no danger of stack overflow. The looping function without dependent type annotations would be:

fun loop (buffer: &Buffer1, m: size_t): void = let
  val () = buffer[m] := source[m]
in
  if m < tocopy - 1 then loop(buffer, m+1)
end

It takes a buffer and an index, m, as arguments and assigns to buffer at that index the equivalent entry from the source string. If it hasn't yet copied the number of characters needed to copy it tail recursively calls itself, increasing the index. The initial call is:

loop(buffer, i2sz(0))

i2sz is needed to convert the integer number zero from an int type to a size_t type.

The declaration for the loop function contains the dependent type definitions that give invariants similar to the Spark example:

fun loop {m:nat | m < s_tocopy } .<s_tocopy - m>.
      (buffer: &Buffer1, m: size_t m): void = let

Starting with the arguments, the buffer is a Buffer meaning it must be an initialized array. The index, m is a dependently typed size_t m where m at the type level is constrained to be less than the number of characters to copy. This ensures that the indexing into the buffer and source string is never out of bounds due to the previous statements that set s_tocopy to be the lesser of the maximum buffer size and the string size. s_tocopy is needed here instead of tocopy due to universal arguments being written in the language of the typesystem, not the runtime system.

The body of the loop does the copying of the indexed character from the source string to the buffer and recursively calls itself with an incremented index if it hasn't copied the required number of characters.

  val () = buffer[m] := source[m]
in
  if m < tocopy - 1 then loop(buffer, m+1)

Due to the constraints set in the function declaration the buffer and source string indexes are checked at compile time to ensure they aren't out of bounds, and the recursive call to loop will fail typechecking if an out of bounds index is passed as an argument.

One issue with recursive functions is that they could loop forever if the termination condition check is incorrect, or the variables being used in that check don't explicitly increase or decrease. In this case the index must increase to eventually reach the value for the termination check.

ATS resolves this by allowing termination metrics to be added to the function. This is the part of the function declaration that is bracketed by .< ... >. markers. The expression inside these markers is expected to be in the 'statics' constraint language and evaluate to a tuple of natural numbers that the compiler needs to prove are lexicographically decreasing in value. The loop index counts up so this needs to be converted to a decreasing value:

.<s_tocopy - m>.

This is done by taking the static constant of the number of characters to copy and subtracting the index value. The compiler proves that each recursive call in the body of the function results in something strictly less than the previous call. With the termination metric added it statically proves that the recursive function terminates.

The program can be run and tested with:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

stadef s_max_buffer_size: int = 128
typedef Buffer0 = @[char?][s_max_buffer_size]
typedef Buffer1 = @[char][s_max_buffer_size]
val max_buffer_size: size_t s_max_buffer_size = i2sz(128)

fun copy_into {n:nat}
    (buffer: &Buffer0 >> Buffer1, source: string n): void = let
  val () = array_initize_elt(buffer, max_buffer_size, ' ')
  val len = string1_length(source)
  stadef s_tocopy = min(s_max_buffer_size, n)
  val tocopy: size_t s_tocopy = min(max_buffer_size, len)
  fun loop {m:nat | m < s_tocopy } .<s_tocopy - m>.
        (buffer: &Buffer1, m: size_t m): void = let
    val () = buffer[m] := source[m]
  in
    if m < tocopy - 1  then loop(buffer, m+1)
  end
in
  if len > 0 then
    loop(buffer, i2sz(0))
end 

implement main0() = let
  var  b = @[char?][max_buffer_size]()
  val () = copy_into(b, "hello world")
  fun print_buffer {n:nat | n < s_max_buffer_size}
                   .<s_max_buffer_size - n>.
       (buffer: &Buffer1, n: size_t n):void = let
    val () = if n = 0 then print!("[")
    val () = print!(buffer[n])
    val () = if n = max_buffer_size - 1 then print!("]")
  in
    if n < 127 then print_buffer(buffer, n + 1)
  end
  val () = print_buffer(b, i2sz(0))
in
  ()
end

This example creates the array as a stack allocated object in the line:

var  b = @[char?][max_buffer_size]()

Being stack allocated the memory will be deallocated on exit of the scope it was defined in. It's also possible to create a heap allocated buffer and pass that to copy_into:

implement main0() = let
  val (pf_b , pf_gc| b) = array_ptr_alloc<char>(max_buffer_size)
  val () = copy_into(!b, "hello world")
  fun print_buffer {n:nat | n < 128} .<128 - n>. (buffer: &(@[char][128]), n: int n):void = let
    val () = if n = 0 then print!("[")
    val () = print!(buffer[n])
    val () = if n = 127 then print!("]")
  in
    if n < 127 then print_buffer(buffer, n + 1)
  end
  val () = print_buffer(!b, 0)
  val () = array_ptr_free(pf_b, pf_gc | b)
in
  ()
end

This explicitly allocates the memory for the array using array_ptr_alloc and dereferences it in the copy_into call using the '!b' syntax. Note that this array is later free'd using array_ptr_free. Not calling that to free the memory would be a compile error.

Although this is a simple function it demonstrates a number of safety features:

The destination buffer cannot indexed beyond its bounds.
The source string cannot be indexed beyond its bounds.
There can be no buffer overflow on writing to the destination buffer.
There can be no uninitialized data in the buffer on function exit.
The internal recursive call must terminate.

The book Building High Integrity Applications with Spark is a great book for learning Spark and also for learning about the sorts of proofs used in production applications using Spark/Ada. Apart from learning a bit about Spark it's also a good exercise to think about how similar safety checks can be implemented in your language of choice.

Bluish Coder

Capturing program invariants in ATS

Tags