Generalized Algebraic Data Types in ATS

2018-10-16

Generalized Algebraic Data Types in ATS

The ATS programming language supports defining Generalized Algebraic Data Types (GADTS). They allow defining datatypes where the constructors for the datatype are explicitly defined by the programmer. This has a number of uses and I'll go through some examples in this post. GADTs are sometimes referred to as Guarded Recursive Datatypes.

Some useful resources for reading up on GADTs that I used to write this post are:

The examples here were tested with ATS2-0.3.11.

Arithmetic Expressions

This is probably the most common demonstration of GADT usage and it's useful to see how to do it in ATS. The example is taken from the Haskell/GADT Wikibook.

First we'll create datatype to represent a simple expression language, and write an evaluation function for it, without using GADTs:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

datatype Expr  =
  | I of int
  | Add of (Expr, Expr)
  | Mul of (Expr, Expr)

fun eval (x:Expr): int =
  case+ x of
  | I i => i
  | Add (t1, t2) => eval(t1) + eval(t2)
  | Mul (t1, t2) => eval(t1) * eval(t2)


implement main0() = let
  val term = Mul(I(2), I(4))
  val res = eval(term)
in
  println!("res=", res)
end

Expr is a datatype with three constructors. I represents an integer, Add adds two expressions together and Mul multiples two expressions. The eval function pattern matches on these and evaluates them. The example can be compiled with the following if placed in a file arith1.dats:

$ patscc -DATS_MEMALLOC_LIBC -o arith1 arith1.dats
$ ./arith1
res=8

Now we extend the expression language with another type, booleans, and add an Expr constructor to compare for equality:

datatype Expr  =
  | I of int
  | B of bool
  | Add of (Expr, Expr)
  | Mul of (Expr, Expr)
  | Eq of (Expr, Expr)

(* Does not typecheck *)
fun eval (x:Expr): int =
  case+ x of
  | I i => i
  | B b => b
  | Add (t1, t2) => eval(t1) + eval(t2)
  | Mul (t1, t2) => eval(t1) * eval(t2)
  | Eq (t1, t2) => eval(t1) = eval(t2)

This code fails to typecheck - the eval function is defined to return an int but the B b pattern match returns a boolean. We can work around this by making the result value of eval return a datatype that can represent either an int or a bool but then we have to add code throughout eval to detect the invalid addition or multiplication of a boolean and raise a runtime error. Ideally we'd like to have make it impossible to construct invalid expressions such that they error out at compile time.

We can imagine the type constructors for Expr as if they were functions with a type signature. The type signature of the constructors would be:

fun I(n:int): Expr
fun B(b:bool): Expr
fun Add(t1:Expr, t2:Expr): Expr
fun Mul(t1:Expr, t1:Expr): Expr
fun Eq(t1:Expr, t2:Expr): Expr

The problem here is that Add and Mul both take Expr as arguments but we want to restrict them to integer expressions only - we need to differentiate between integer and boolean expressions. We'd like to add a type index to the Expr datatype that indicates if it is a boolean or an integer expression and change the constructors so they have types like:

fun I(n:int): Expr int
fun B(b:bool): Expr bool
fun Add(t1:Expr int, t2:Expr int): Expr int
fun Mul(t1:Expr int, t1:Expr int): Expr int
fun Eq(t1:Expr int, t2:Expr int): Expr bool

Using these constructors would make it impossible to create an expression that would give a runtime error in an eval function. Creating an Expr with a type index looks like:

datatype Expr(a:t@ype)  =
  | I(a) of int
  | B(a) of bool
  | Add(a) of (Expr int, Expr int)
  | Mul(a) of (Expr int, Expr int)
  | Eq(a) of (Expr int, Expr int)

This adds a type index of sort t@ype. A 'sort' in ATS is the type of type indexes. The t@ype is for types that have a flat memory structure of an unknown size. The '@' sigil looks odd inside the name, but '@' is used elsewhere in ATS when defining flat records and tuples which is a useful mnemonic for remembering what the '@' within the sort name means.

This doesn't help our case though as the type signatures for the constructors would be:

fun I(n:int): Expr a
fun B(b:bool): Expr a
fun Add(t1:Expr int, t2:Expr int): Expr a
fun Mul(t1:Expr int, t1:Expr int): Expr a
fun Eq(t1:Expr int, t2:Expr int): Expr a

There is still the problem that an Expr a is not an Expr int for Add, Mul and Eq, and requiring runtime errors when pattern matching. This is where GADTs come in. GADTs enable defining a datatype that provides constructors where the generic a of the type index can be constrained to a specific type, different for each constructor. The Expr datatype becomes:

datatype Expr(t@ype)  =
  | I(int) of int
  | B(bool) of bool
  | Add(int) of (Expr int, Expr int)
  | Mul(int) of (Expr int, Expr int)
  | Eq(bool) of (Expr int, Expr int)

The constructor type signature now match those that we wanted earlier and it becomes possible to check at compile time for invalid expressions. The eval functions looks similar, but is parameterized over the type index:

fun{a:t@ype} eval(x:Expr a): a =
  case+ x of
  | I i => i
  | B b => b
  | Add (t1, t2) => eval(t1) + eval(t2)
  | Mul (t1, t2) => eval(t1) * eval(t2)
  | Eq (t1, t2) => eval(t1) = eval(t2)

The "{a:t@ype}" syntax following fun is ATS' way of defining a function template where a is the template argument. ATS will generate a new function specific for each a type used in the program.

An example main function to demonstrate the new GADT enabled Expr and eval:

implement main0() = let
  val term1 = Eq(I(5), Add(I(1), I(4)))
  val term2 = Mul(I(2), I(4))
  val res1 = eval(term1)
  val res2 = eval(term2)
in
  println!("res1=", res1, " and res2=", res2)
end

$ patscc -DATS_MEMALLOC_LIBC -o arith2 arith2.dats
$ ./arith2
res1=true and res2=8

You may have noticed that our Expr type can't represent equality between booleans, or between booleans and integers. It's defined to only allow equality between integers. It's possible to make equality polymorphic but I'll go through this later as it brings up some more complicated edge cases in types with ATS beyond just using GADTs.

Typesafe sprintf

This example of GADT usage in ATS tries to implement a sprintf function that allows providing a format specification to sprintf which then processes it and accepts arguments of types as defined by the format specification.

The format specification is defined as a GADT:

datatype Format (a:type) =
  | I(int -<cloref1> a) of (Format a)
  | S_(a) of (string, Format a)
  | S0(string) of string

Each constructor represents part of a format string. The type index is the type that sprintf will return given a format string for that type. sprintf is declared as:

fun sprintf {a:type} (fmt: Format a): a

Here are some example invocations and the types that should result:

// Returns a string
sprintf (S0 "hello")

// Returns a string
sprintf (S_ ("hello", S0 "world"))

// Returns a function that takes an int and returns a string
sprintf (I (S0 "<- is a number"))

The last example shows how sprintf emulates variable arguments using closures. By returning a function any immediately following tokens are treated as arguments to that function. For example:

sprintf (I (S0 "<- is a number")) 42
(int -> string) 42

With some syntactic sugar using currying we can make this look nicer. Curried functions is a feature that isn't often used in ATS as it requires the garbage collector (to clean up partially curried functions) and performance is diminished due to the allocation of closures but it's useful for this example.

With the Format datatype defined, here's the implementation of sprintf:

fun sprintf {a:type} (fmt: Format a): a = let
  fun aux {a:type}(pre: string, fmt': Format a): a =
    case fmt' of
    | I fmt => (lam (i) => aux(append(pre, int2string(i)), fmt))
    | S_ (s, fmt) => aux(append(pre, s), fmt)
    | S0 s => append(pre, s)
in
  aux("", fmt)
end

It uses an inner function, aux, that does the work of deconstructing the format argument and returning a closure when needed, and appending strings as the result string is computed. For this example I'm cheating with memory management to keep the focus on GADT, and off dealing with the intricacies of linear strings. The example should be run with the garbage collector to free the allocated strings.

Here's an example of executing what we have so far:

implement main0() = let
  val fmt = S_ ("X: ", (I (S_ (" Y: ", (I (S0 ""))))))
  val s = sprintf fmt 5 10
in
  println!(s)
end

$ ./format
X: 5 Y: 10

The format string is pretty ugly though. Some helper functions make it a bit more friendly:

fun Str {a:type} (s: string) (fmt: Format a) = S_ (s, fmt)
fun Int {a:type} (fmt: Format a) = I (fmt)
fun End (s:string) = S0 (s)

infixr 0 >:
macdef >: (f, x) = ,(f) ,(x)

Now the format definition is:

val fmt = Str "X: " >: Int >: Str " Y: " >: Int >: End ""

The string handling functions that I threw together to append strings and convert an integer to a string without dealing with linear strings follow.

fun append(s0: string, s1: string): string = let
  extern castfn from(s:string):<> Strptr1
  extern castfn to(s:Strptr0):<> string
  extern prfun drop(s:Strptr1):<> void
  val s0' = from(s0)
  val s1' = from(s1)
  val r = strptr_append(s0', s1')
  prval () = drop(s0')
  prval () = drop(s1')
in
  to(r)
end

fun int2string(x: int): string =
  tostring(g0int2string x) where {
    extern castfn tostring(Strptr0):<> string
  }

Issues

These example of GADT usage were pulled from the papers and articles mentioned at the beginning of the post. They were written for Haskell or Dependent ML (a predecessor to ATS). There may be better ways of approaching the problem in ATS and some constraints on the way ATS generates C code limits how GADTs can be used. One example of the limitation was mentioned in the first example of the expression evaluator.

If the Expr datatype is changed so that Eq can work on any type, not just int, then a compile error results:

datatype Expr(a:t@ype)  =
  | I(a) of int
  | B(a) of bool
  | Add(a) of (Expr int, Expr int)
  | Mul(a) of (Expr int, Expr int)
  | Eq(a) of (Expr a, Expr a)

The error occurs due to the Eq branch in the case statement in eval:

fun{a:t@ype} eval(x:Expr a): a =
  case+ x of
  | I i => i
  | B b => b
  | Add (t1, t2) => eval(t1) + eval(t2)
  | Mul (t1, t2) => eval(t1) * eval(t2)
  | Eq (t1, t2) => eval(t1) = eval(t2)

The compiler is unable to find a match for the overloaded = call for the general case of any a (vs the specific case of int as it was before). This is something I've written about before and the workaround in that post gets us past that error:

extern fun{a:t@ype} equals(t1:a, t2:a): bool
implement equals<int>(t1,t2) = g0int_eq(t1,t2)
implement equals<bool>(t1,t2) = eq_bool0_bool0(t1,t2)

fun{a:t@ype} eval(x:Expr a): a =
  case+ x of
  | I i => i
  | B b => b
  | Add (t1, t2) => eval(t1) + eval(t2)
  | Mul (t1, t2) => eval(t1) * eval(t2)
  | Eq (t1, t2) => equals(eval(t1), eval (t2))

Now the program typechecks but fails to compile the generated C code. This can usually be resolved by explicitly stating the template parameters in template function calls:

fun{a:t@ype} eval(x:Expr a): a =
  case+ x of
  | I i => i
  | B b => b
  | Add (t1, t2) => eval<int>(t1) + eval<int>(t2)
  | Mul (t1, t2) => eval<int>(t1) * eval<int>(t2)
  | Eq (t1, t2) => equals<?>(eval<?>(t1), eval<?>(t2))

The problem here is what types to put in the '?' in this snippet? The Expr type defines this as being any a but we want to constrain it to the index used by t1 and t2 but I don't think ATS allows us to declare that. This mailing list thread discusses the issue with some workarounds.

Bluish Coder