2018-10-16

## Generalized Algebraic Data Types in ATS

The ATS programming language supports defining Generalized Algebraic Data Types (GADTS). They allow defining datatypes where the constructors for the datatype are explicitly defined by the programmer. This has a number of uses and I'll go through some examples in this post. GADTs are sometimes referred to as Guarded Recursive Datatypes.

Some useful resources for reading up on GADTs that I used to write this post are:

- Wikipedia page on GADT
- Haskell/GADT Wikibook
- GADTs for Dummies
- Guarded Recursive Datatype Constructors

The examples here were tested with ATS2-0.3.11.

## Arithmetic Expressions

This is probably the most common demonstration of GADT usage and it's useful to see how to do it in ATS. The example is taken from the Haskell/GADT Wikibook.

First we'll create datatype to represent a simple expression language, and write an evaluation function for it, without using GADTs:

```
#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"
datatype Expr =
| I of int
| Add of (Expr, Expr)
| Mul of (Expr, Expr)
fun eval (x:Expr): int =
case+ x of
| I i => i
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)
implement main0() = let
val term = Mul(I(2), I(4))
val res = eval(term)
in
println!("res=", res)
end
```

`Expr`

is a datatype with three constructors. `I`

represents an
integer, `Add`

adds two expressions together and `Mul`

multiples two
expressions. The `eval`

function pattern matches on these and
evaluates them. The example can be compiled with the following if
placed in a file `arith1.dats`

:

```
$ patscc -DATS_MEMALLOC_LIBC -o arith1 arith1.dats
$ ./arith1
res=8
```

Now we extend the expression language with another type, booleans, and
add an `Expr`

constructor to compare for equality:

```
datatype Expr =
| I of int
| B of bool
| Add of (Expr, Expr)
| Mul of (Expr, Expr)
| Eq of (Expr, Expr)
(* Does not typecheck *)
fun eval (x:Expr): int =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)
| Eq (t1, t2) => eval(t1) = eval(t2)
```

This code fails to typecheck - the `eval`

function is defined to
return an `int`

but the `B b`

pattern match returns a boolean. We can
work around this by making the result value of `eval`

return a
datatype that can represent either an `int`

or a `bool`

but then we
have to add code throughout `eval`

to detect the invalid addition or
multiplication of a boolean and raise a runtime error. Ideally we'd
like to have make it impossible to construct invalid expressions such
that they error out at compile time.

We can imagine the type constructors for `Expr`

as if they were
functions with a type signature. The type signature of the
constructors would be:

```
fun I(n:int): Expr
fun B(b:bool): Expr
fun Add(t1:Expr, t2:Expr): Expr
fun Mul(t1:Expr, t1:Expr): Expr
fun Eq(t1:Expr, t2:Expr): Expr
```

The problem here is that `Add`

and `Mul`

both take `Expr`

as arguments
but we want to restrict them to integer expressions only - we need to
differentiate between integer and boolean expressions. We'd like to
add a type index to the `Expr`

datatype that indicates if it is a
boolean or an integer expression and change the constructors so they
have types like:

```
fun I(n:int): Expr int
fun B(b:bool): Expr bool
fun Add(t1:Expr int, t2:Expr int): Expr int
fun Mul(t1:Expr int, t1:Expr int): Expr int
fun Eq(t1:Expr int, t2:Expr int): Expr bool
```

Using these constructors would make it impossible to create an
expression that would give a runtime error in an `eval`

function. Creating an `Expr`

with a type index looks like:

```
datatype Expr(a:t@ype) =
| I(a) of int
| B(a) of bool
| Add(a) of (Expr int, Expr int)
| Mul(a) of (Expr int, Expr int)
| Eq(a) of (Expr int, Expr int)
```

This adds a type index of sort `t@ype`

. A 'sort' in ATS is the type of
type indexes. The `t@ype`

is for types that have a flat memory
structure of an unknown size. The '@' sigil looks odd inside the name,
but '@' is used elsewhere in ATS when defining flat records and tuples
which is a useful mnemonic for remembering what the '@' within the
sort name means.

This doesn't help our case though as the type signatures for the constructors would be:

```
fun I(n:int): Expr a
fun B(b:bool): Expr a
fun Add(t1:Expr int, t2:Expr int): Expr a
fun Mul(t1:Expr int, t1:Expr int): Expr a
fun Eq(t1:Expr int, t2:Expr int): Expr a
```

There is still the problem that an `Expr a`

is not an `Expr int`

for
`Add`

, `Mul`

and `Eq`

, and requiring runtime errors when pattern
matching. This is where GADTs come in. GADTs enable defining a
datatype that provides constructors where the generic `a`

of the type
index can be constrained to a specific type, different for each
constructor. The `Expr`

datatype becomes:

```
datatype Expr(t@ype) =
| I(int) of int
| B(bool) of bool
| Add(int) of (Expr int, Expr int)
| Mul(int) of (Expr int, Expr int)
| Eq(bool) of (Expr int, Expr int)
```

The constructor type signature now match those that we wanted earlier
and it becomes possible to check at compile time for invalid
expressions. The `eval`

functions looks similar, but is parameterized
over the type index:

```
fun{a:t@ype} eval(x:Expr a): a =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)
| Eq (t1, t2) => eval(t1) = eval(t2)
```

The "`{a:t@ype}`

" syntax following `fun`

is ATS' way of defining a
function template where `a`

is the template argument. ATS will
generate a new function specific for each `a`

type used in the
program.

An example `main`

function to demonstrate the new GADT enabled `Expr`

and `eval`

:

```
implement main0() = let
val term1 = Eq(I(5), Add(I(1), I(4)))
val term2 = Mul(I(2), I(4))
val res1 = eval(term1)
val res2 = eval(term2)
in
println!("res1=", res1, " and res2=", res2)
end
$ patscc -DATS_MEMALLOC_LIBC -o arith2 arith2.dats
$ ./arith2
res1=true and res2=8
```

You may have noticed that our `Expr`

type can't represent equality
between booleans, or between booleans and integers. It's defined to
only allow equality between integers. It's possible to make equality
polymorphic but I'll go through this later as it brings up some more
complicated edge cases in types with ATS beyond just using GADTs.

## Typesafe sprintf

This example of GADT usage in ATS tries to implement a `sprintf`

function that allows providing a format specification to `sprintf`

which then processes it and accepts arguments of types as defined by
the format specification.

The format specification is defined as a GADT:

```
datatype Format (a:type) =
| I(int -<cloref1> a) of (Format a)
| S_(a) of (string, Format a)
| S0(string) of string
```

Each constructor represents part of a format string. The type index is the type that `sprintf`

will return given a format string for that type. `sprintf`

is declared as:

```
fun sprintf {a:type} (fmt: Format a): a
```

Here are some example invocations and the types that should result:

```
// Returns a string
sprintf (S0 "hello")
// Returns a string
sprintf (S_ ("hello", S0 "world"))
// Returns a function that takes an int and returns a string
sprintf (I (S0 "<- is a number"))
```

The last example shows how `sprintf`

emulates variable arguments using closures. By returning a function any immediately following tokens are treated as arguments to that function. For example:

```
sprintf (I (S0 "<- is a number")) 42
(int -> string) 42
```

With some syntactic sugar using currying we can make this look nicer. Curried functions is a feature that isn't often used in ATS as it requires the garbage collector (to clean up partially curried functions) and performance is diminished due to the allocation of closures but it's useful for this example.

With the `Format`

datatype defined, here's the implementation of `sprintf`

:

```
fun sprintf {a:type} (fmt: Format a): a = let
fun aux {a:type}(pre: string, fmt': Format a): a =
case fmt' of
| I fmt => (lam (i) => aux(append(pre, int2string(i)), fmt))
| S_ (s, fmt) => aux(append(pre, s), fmt)
| S0 s => append(pre, s)
in
aux("", fmt)
end
```

It uses an inner function, `aux`

, that does the work of deconstructing the format argument and returning a closure when needed, and appending strings as the result string is computed. For this example I'm cheating with memory management to keep the focus on GADT, and off dealing with the intricacies of linear strings. The example should be run with the garbage collector to free the allocated strings.

Here's an example of executing what we have so far:

```
implement main0() = let
val fmt = S_ ("X: ", (I (S_ (" Y: ", (I (S0 ""))))))
val s = sprintf fmt 5 10
in
println!(s)
end
$ ./format
X: 5 Y: 10
```

The format string is pretty ugly though. Some helper functions make it a bit more friendly:

```
fun Str {a:type} (s: string) (fmt: Format a) = S_ (s, fmt)
fun Int {a:type} (fmt: Format a) = I (fmt)
fun End (s:string) = S0 (s)
infixr 0 >:
macdef >: (f, x) = ,(f) ,(x)
```

Now the format definition is:

```
val fmt = Str "X: " >: Int >: Str " Y: " >: Int >: End ""
```

The string handling functions that I threw together to append strings and convert an integer to a string without dealing with linear strings follow.

```
fun append(s0: string, s1: string): string = let
extern castfn from(s:string):<> Strptr1
extern castfn to(s:Strptr0):<> string
extern prfun drop(s:Strptr1):<> void
val s0' = from(s0)
val s1' = from(s1)
val r = strptr_append(s0', s1')
prval () = drop(s0')
prval () = drop(s1')
in
to(r)
end
fun int2string(x: int): string =
tostring(g0int2string x) where {
extern castfn tostring(Strptr0):<> string
}
```

## Issues

These example of GADT usage were pulled from the papers and articles mentioned at the beginning of the post. They were written for Haskell or Dependent ML (a predecessor to ATS). There may be better ways of approaching the problem in ATS and some constraints on the way ATS generates C code limits how GADTs can be used. One example of the limitation was mentioned in the first example of the expression evaluator.

If the `Expr`

datatype is changed so that `Eq`

can work on any type, not just `int`

, then a compile error results:

```
datatype Expr(a:t@ype) =
| I(a) of int
| B(a) of bool
| Add(a) of (Expr int, Expr int)
| Mul(a) of (Expr int, Expr int)
| Eq(a) of (Expr a, Expr a)
```

The error occurs due to the `Eq`

branch in the case statement in `eval`

:

```
fun{a:t@ype} eval(x:Expr a): a =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)
| Eq (t1, t2) => eval(t1) = eval(t2)
```

The compiler is unable to find a match for the overloaded `=`

call for the general case of any `a`

(vs the specific case of `int`

as it was before). This is something I've written about before and the workaround in that post gets us past that error:

```
extern fun{a:t@ype} equals(t1:a, t2:a): bool
implement equals<int>(t1,t2) = g0int_eq(t1,t2)
implement equals<bool>(t1,t2) = eq_bool0_bool0(t1,t2)
fun{a:t@ype} eval(x:Expr a): a =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)
| Eq (t1, t2) => equals(eval(t1), eval (t2))
```

Now the program typechecks but fails to compile the generated C code. This can usually be resolved by explicitly stating the template parameters in template function calls:

```
fun{a:t@ype} eval(x:Expr a): a =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval<int>(t1) + eval<int>(t2)
| Mul (t1, t2) => eval<int>(t1) * eval<int>(t2)
| Eq (t1, t2) => equals<?>(eval<?>(t1), eval<?>(t2))
```

The problem here is what types to put in the '`?`

' in this snippet? The `Expr`

type defines this as being any `a`

but we want to constrain it to the index used by `t1`

and `t2`

but I don't think ATS allows us to declare that. This mailing list thread discusses the issue with some workarounds.