2018-10-16
Generalized Algebraic Data Types in ATS
The ATS programming language supports defining Generalized Algebraic Data Types (GADTS). They allow defining datatypes where the constructors for the datatype are explicitly defined by the programmer. This has a number of uses and I'll go through some examples in this post. GADTs are sometimes referred to as Guarded Recursive Datatypes.
Some useful resources for reading up on GADTs that I used to write this post are:
- Wikipedia page on GADT
- Haskell/GADT Wikibook
- GADTs for Dummies
- Guarded Recursive Datatype Constructors
The examples here were tested with ATS2-0.3.11.
Arithmetic Expressions
This is probably the most common demonstration of GADT usage and it's useful to see how to do it in ATS. The example is taken from the Haskell/GADT Wikibook.
First we'll create datatype to represent a simple expression language, and write an evaluation function for it, without using GADTs:
#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"
datatype Expr =
| I of int
| Add of (Expr, Expr)
| Mul of (Expr, Expr)
fun eval (x:Expr): int =
case+ x of
| I i => i
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)
implement main0() = let
val term = Mul(I(2), I(4))
val res = eval(term)
in
println!("res=", res)
end
Expr
is a datatype with three constructors. I
represents an
integer, Add
adds two expressions together and Mul
multiples two
expressions. The eval
function pattern matches on these and
evaluates them. The example can be compiled with the following if
placed in a file arith1.dats
:
$ patscc -DATS_MEMALLOC_LIBC -o arith1 arith1.dats
$ ./arith1
res=8
Now we extend the expression language with another type, booleans, and
add an Expr
constructor to compare for equality:
datatype Expr =
| I of int
| B of bool
| Add of (Expr, Expr)
| Mul of (Expr, Expr)
| Eq of (Expr, Expr)
(* Does not typecheck *)
fun eval (x:Expr): int =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)
| Eq (t1, t2) => eval(t1) = eval(t2)
This code fails to typecheck - the eval
function is defined to
return an int
but the B b
pattern match returns a boolean. We can
work around this by making the result value of eval
return a
datatype that can represent either an int
or a bool
but then we
have to add code throughout eval
to detect the invalid addition or
multiplication of a boolean and raise a runtime error. Ideally we'd
like to have make it impossible to construct invalid expressions such
that they error out at compile time.
We can imagine the type constructors for Expr
as if they were
functions with a type signature. The type signature of the
constructors would be:
fun I(n:int): Expr
fun B(b:bool): Expr
fun Add(t1:Expr, t2:Expr): Expr
fun Mul(t1:Expr, t1:Expr): Expr
fun Eq(t1:Expr, t2:Expr): Expr
The problem here is that Add
and Mul
both take Expr
as arguments
but we want to restrict them to integer expressions only - we need to
differentiate between integer and boolean expressions. We'd like to
add a type index to the Expr
datatype that indicates if it is a
boolean or an integer expression and change the constructors so they
have types like:
fun I(n:int): Expr int
fun B(b:bool): Expr bool
fun Add(t1:Expr int, t2:Expr int): Expr int
fun Mul(t1:Expr int, t1:Expr int): Expr int
fun Eq(t1:Expr int, t2:Expr int): Expr bool
Using these constructors would make it impossible to create an
expression that would give a runtime error in an eval
function. Creating an Expr
with a type index looks like:
datatype Expr(a:t@ype) =
| I(a) of int
| B(a) of bool
| Add(a) of (Expr int, Expr int)
| Mul(a) of (Expr int, Expr int)
| Eq(a) of (Expr int, Expr int)
This adds a type index of sort t@ype
. A 'sort' in ATS is the type of
type indexes. The t@ype
is for types that have a flat memory
structure of an unknown size. The '@' sigil looks odd inside the name,
but '@' is used elsewhere in ATS when defining flat records and tuples
which is a useful mnemonic for remembering what the '@' within the
sort name means.
This doesn't help our case though as the type signatures for the constructors would be:
fun I(n:int): Expr a
fun B(b:bool): Expr a
fun Add(t1:Expr int, t2:Expr int): Expr a
fun Mul(t1:Expr int, t1:Expr int): Expr a
fun Eq(t1:Expr int, t2:Expr int): Expr a
There is still the problem that an Expr a
is not an Expr int
for
Add
, Mul
and Eq
, and requiring runtime errors when pattern
matching. This is where GADTs come in. GADTs enable defining a
datatype that provides constructors where the generic a
of the type
index can be constrained to a specific type, different for each
constructor. The Expr
datatype becomes:
datatype Expr(t@ype) =
| I(int) of int
| B(bool) of bool
| Add(int) of (Expr int, Expr int)
| Mul(int) of (Expr int, Expr int)
| Eq(bool) of (Expr int, Expr int)
The constructor type signature now match those that we wanted earlier
and it becomes possible to check at compile time for invalid
expressions. The eval
functions looks similar, but is parameterized
over the type index:
fun{a:t@ype} eval(x:Expr a): a =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)
| Eq (t1, t2) => eval(t1) = eval(t2)
The "{a:t@ype}
" syntax following fun
is ATS' way of defining a
function template where a
is the template argument. ATS will
generate a new function specific for each a
type used in the
program.
An example main
function to demonstrate the new GADT enabled Expr
and eval
:
implement main0() = let
val term1 = Eq(I(5), Add(I(1), I(4)))
val term2 = Mul(I(2), I(4))
val res1 = eval(term1)
val res2 = eval(term2)
in
println!("res1=", res1, " and res2=", res2)
end
$ patscc -DATS_MEMALLOC_LIBC -o arith2 arith2.dats
$ ./arith2
res1=true and res2=8
You may have noticed that our Expr
type can't represent equality
between booleans, or between booleans and integers. It's defined to
only allow equality between integers. It's possible to make equality
polymorphic but I'll go through this later as it brings up some more
complicated edge cases in types with ATS beyond just using GADTs.
Typesafe sprintf
This example of GADT usage in ATS tries to implement a sprintf
function that allows providing a format specification to sprintf
which then processes it and accepts arguments of types as defined by
the format specification.
The format specification is defined as a GADT:
datatype Format (a:type) =
| I(int -<cloref1> a) of (Format a)
| S_(a) of (string, Format a)
| S0(string) of string
Each constructor represents part of a format string. The type index is the type that sprintf
will return given a format string for that type. sprintf
is declared as:
fun sprintf {a:type} (fmt: Format a): a
Here are some example invocations and the types that should result:
// Returns a string
sprintf (S0 "hello")
// Returns a string
sprintf (S_ ("hello", S0 "world"))
// Returns a function that takes an int and returns a string
sprintf (I (S0 "<- is a number"))
The last example shows how sprintf
emulates variable arguments using closures. By returning a function any immediately following tokens are treated as arguments to that function. For example:
sprintf (I (S0 "<- is a number")) 42
(int -> string) 42
With some syntactic sugar using currying we can make this look nicer. Curried functions is a feature that isn't often used in ATS as it requires the garbage collector (to clean up partially curried functions) and performance is diminished due to the allocation of closures but it's useful for this example.
With the Format
datatype defined, here's the implementation of sprintf
:
fun sprintf {a:type} (fmt: Format a): a = let
fun aux {a:type}(pre: string, fmt': Format a): a =
case fmt' of
| I fmt => (lam (i) => aux(append(pre, int2string(i)), fmt))
| S_ (s, fmt) => aux(append(pre, s), fmt)
| S0 s => append(pre, s)
in
aux("", fmt)
end
It uses an inner function, aux
, that does the work of deconstructing the format argument and returning a closure when needed, and appending strings as the result string is computed. For this example I'm cheating with memory management to keep the focus on GADT, and off dealing with the intricacies of linear strings. The example should be run with the garbage collector to free the allocated strings.
Here's an example of executing what we have so far:
implement main0() = let
val fmt = S_ ("X: ", (I (S_ (" Y: ", (I (S0 ""))))))
val s = sprintf fmt 5 10
in
println!(s)
end
$ ./format
X: 5 Y: 10
The format string is pretty ugly though. Some helper functions make it a bit more friendly:
fun Str {a:type} (s: string) (fmt: Format a) = S_ (s, fmt)
fun Int {a:type} (fmt: Format a) = I (fmt)
fun End (s:string) = S0 (s)
infixr 0 >:
macdef >: (f, x) = ,(f) ,(x)
Now the format definition is:
val fmt = Str "X: " >: Int >: Str " Y: " >: Int >: End ""
The string handling functions that I threw together to append strings and convert an integer to a string without dealing with linear strings follow.
fun append(s0: string, s1: string): string = let
extern castfn from(s:string):<> Strptr1
extern castfn to(s:Strptr0):<> string
extern prfun drop(s:Strptr1):<> void
val s0' = from(s0)
val s1' = from(s1)
val r = strptr_append(s0', s1')
prval () = drop(s0')
prval () = drop(s1')
in
to(r)
end
fun int2string(x: int): string =
tostring(g0int2string x) where {
extern castfn tostring(Strptr0):<> string
}
Issues
These example of GADT usage were pulled from the papers and articles mentioned at the beginning of the post. They were written for Haskell or Dependent ML (a predecessor to ATS). There may be better ways of approaching the problem in ATS and some constraints on the way ATS generates C code limits how GADTs can be used. One example of the limitation was mentioned in the first example of the expression evaluator.
If the Expr
datatype is changed so that Eq
can work on any type, not just int
, then a compile error results:
datatype Expr(a:t@ype) =
| I(a) of int
| B(a) of bool
| Add(a) of (Expr int, Expr int)
| Mul(a) of (Expr int, Expr int)
| Eq(a) of (Expr a, Expr a)
The error occurs due to the Eq
branch in the case statement in eval
:
fun{a:t@ype} eval(x:Expr a): a =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)
| Eq (t1, t2) => eval(t1) = eval(t2)
The compiler is unable to find a match for the overloaded =
call for the general case of any a
(vs the specific case of int
as it was before). This is something I've written about before and the workaround in that post gets us past that error:
extern fun{a:t@ype} equals(t1:a, t2:a): bool
implement equals<int>(t1,t2) = g0int_eq(t1,t2)
implement equals<bool>(t1,t2) = eq_bool0_bool0(t1,t2)
fun{a:t@ype} eval(x:Expr a): a =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval(t1) + eval(t2)
| Mul (t1, t2) => eval(t1) * eval(t2)
| Eq (t1, t2) => equals(eval(t1), eval (t2))
Now the program typechecks but fails to compile the generated C code. This can usually be resolved by explicitly stating the template parameters in template function calls:
fun{a:t@ype} eval(x:Expr a): a =
case+ x of
| I i => i
| B b => b
| Add (t1, t2) => eval<int>(t1) + eval<int>(t2)
| Mul (t1, t2) => eval<int>(t1) * eval<int>(t2)
| Eq (t1, t2) => equals<?>(eval<?>(t1), eval<?>(t2))
The problem here is what types to put in the '?
' in this snippet? The Expr
type defines this as being any a
but we want to constrain it to the index used by t1
and t2
but I don't think ATS allows us to declare that. This mailing list thread discusses the issue with some workarounds.