Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2010-09-01

Dependent Types in ATS

Dependent types are types that depend on the values of expressions. ATS uses dependent types and in this post I hope to go through some basic usage that I've learnt as I worked my way through the documentation, examples and various papers.

While learning about dependent types in ATS I used the following resources:

Most of the examples that follow are based on examples in those resources.

Sorts and Types

Some dependently typed languages allow types to depend on values expressed in the language itself. ATS provides a restricted form of dependent types. The expressions that types can depend on are in a restricted constraint language rather than the full language of ATS. This constraint language is itself typed and to prevent confusion they call the types in that language 'sorts'. References to 'sort' in this post refer to types in that constraint language. References to 'type' refers to types in the core ATS language itself.

Here is an example of the difference. The following is an example of a function that takes two numbers and returns the result of them added together. This is in a hypothetical dependently typed language where types can depend on language values itself:

fun add(a: int, b: int): int (a+b) = a + b

Here the result type is the exact integer type of the two numbers added together. A mistake in the body of the code that resulted in anything but the sum of the two numbers would be a type error. In ATS this function would look like:

fun add {m,n:int} (a: int m, b: int n): int (m+n) = a + b

Notice here the introduction of {m,n:int}. This is the 'constraint language' used for values that the types in ATS can depend on. Here we declare two values, m and n, of sort int. The two arguments to add are a and b and they are of type int m and int n respectively. They are the type of the exact integer represented by the m and n. The result type is an integer which is the sum of these two values. Note that the dependent type in ATS (the m, n and m+n) are variables and computations expressed in the constraint language, not variables in ATS (the a, b, and a+b).

Having a restricted constraint language for type values simplifies the type checking process. All computations in this language are pure and have no effects. Sorts and functions in the language must terminate. This avoids infinite loops and exceptions during typechecking.

For more information on the reasoning behind restricted dependent types see Dependent Types in Practical Programming and other papers at the ATS website.

In ATS documentation the restricted constrainted language is called the 'statics' of ATS and is documented in chapter 5 of the ATS user guide.

Simple Lists

Before I get into datatypes that use dependent types, I'll do a quick overview of non-dependent types for those not familiar with ATS syntax. A basic 'list' type that can contain integers can be defined in ATS as:

datatype list =
  | nil
  | cons of (int, list)

With this type defined a list of integers can be created using the following syntax:

val a = cons(1, cons(2, cons(3, cons(4, nil))))

Functions that operate over lists can use pattern matching to deconstruct the list:

fun list_length(xs: list): int = 
  case+ xs of
  | nil () => 0
  | cons (_, xs) => 1 + list_length(xs)

fun list_append(xs: list, ys:list): list =
  case+ xs of
  | nil () => ys
  | cons (x, xs) => cons(x, list_append(xs, ys))

A complete example program is in dt1.dats (html). This can be built with the command:

atscc -o dt1 -D_ATS_GCATS dt1.dats

Note the -D_ATS_GCATS. This tells the ATS compiler to link against the garbage collector. This is needed as types defined with datatype are allocated on the heap and require the garbage collector to be released.

Polymorphic Lists

Instead of a list of integers we might want lists of different types. A list that is polymorphic can be defined using:

datatype list (a:t@ype) =
  | nil (a)
  | cons (a) of (a, list a)

Here the list can hold elements that are of any size. This is what the t@ype refers to. The functions that operate on these polymorphic lists are similar to the non-polymorphic list versions. The difference is they are 'template' functions and are parameterized by the template type:

fun{a:t@ype} list_length (xs: list a): int = 
  case+ xs of
  | nil () => 0
  | cons (_, xs) => 1 + list_length(xs)

fun{a:t@ype} list_append(xs: list a, ys: list a): list a =
  case+ xs of
  | nil () => ys
  | cons (x, xs) => cons(x, list_append(xs, ys))

fun{a,b:t@ype} list_zip(xs: list a, ys: list b): list ('(a,b)) =
  case+ (xs, ys) of
  | (nil (), nil ()) => nil ()
  | (cons (x, xs), cons (y, ys)) => cons('(x,y), list_zip(xs, ys))
  | (_, _) => nil ()

The {a:t@ype} immediately after the fun keyword identifies the function as a template function. These are very similar to C++ style templates. See the ATS Parametric Polymorphism and Templates tutorial for more details.

This example adds a definition for list_zip now that lists of things other than integers can be created. In this example we return a list of tuples. Each tuple contains the elements from the original source lists.

The complete example program is in dt2.dats (html). The example program has the following code:

val a = cons(1, cons(2, cons(3, cons(4, nil))))
val b = cons(5, cons(6, cons(7, cons(8, nil))))
val lena = list_length(a)
val lenb = list_length(b)
val c = list_append(a, b)
val d = list_zip(a, c) // <== different lengths!
val lend = list_length(d)

Note that the length of list a is 4, whereas the length of list d is 8. Calling list_zip with these two different lengthed lists results in a list of length 4 being returned.

We can encode the length of a list as part of the type to get a compile error if an attempt is made to zip two lists with different lengths. The length that is part of the type would be a dependent type (as the length is the value of an expression - the integer length of the list).

Dependently Typed Lists

The following datatype definition defines a polymorphic list of length n, where n is an integer:

datatype list (a:t@ype, int) =
  | nil (a, 0)
  | {n:nat} cons (a, n+1) of (a, list (a, n))

It is very similar to the previous polymorphic list definition except for the additional int. The type constructor for nil has this set to 0. A nil list is a list of length 0. The cons type constructor shows that the cons of a list of length n will be a list of length n+1. The {n:nat} constrains the type of n to be natural numbers (non-negative integers).

The implementation of the functions shown previous for this new list type are function templates as they were in the previous example. They also have the additional parameter for the length value:

fun{a:t@ype} list_length {n:nat} (xs: list (a, n)): int n = 
  case+ xs of
  | nil () => 0
  | cons (_, xs) => 1 + list_length(xs)

fun{a:t@ype} list_append {n1,n2:nat} (xs: list (a, n1), ys: list (a, n2)): list (a, n1+n2) =
  case+ xs of
  | nil () => ys
  | cons (x, xs) => cons(x, list_append(xs, ys))

fun{a,b:t@ype} list_zip {n:nat} (xs: list (a, n), ys: list (b, n)): list ('(a,b), n) =
  case+ (xs, ys) of
  | (nil (), nil ()) => nil ()
  | (cons (x, xs), cons (y, ys)) => cons('(x,y), list_zip(xs, ys))

Notice the type of the return value of list_length is the type for the specific integer value of n - which is the length of the list. This means that any error in the implementation that would result in a different value being returned is a compile time error. For example, this won't typecheck:

fun{a:t@ype} list_length {n:nat} (xs: list (a, n)): int n = 
  case+ xs of
  | nil () => 1
  | cons (_, xs) => 1 + list_length(xs)

Similarly the type of the return value of list_append is defined as being a list of length n1+n2. That is, the sum of the length of the two input lists. The following will fail with a compile error:

fun{a:t@ype} list_append {n1,n2:nat} (xs: list (a, n1), ys: list (a, n2)): list (a, n1+n2) =
  case+ xs of
  | nil () =>  nil
  | cons (x, xs) => cons(x, list_append(xs, ys))

It is now a compile error to pass lists of different lengths to list_zip. This is because both input arguments are defined to be of the same length n. This list_zip usage from the previous polymorphic list example is now a compile error.

The complete example program is in dt3.dats (html).

Filter

It is not always possible to know the exact length of a result list which could make encoding the type problematic. A filter function that takes a list and returns a result list containing only those elements that return true when passed to a predicate function for example. In this case the typechecker would need to be able to call the predicate function to be able to determine the length of the result list. The following does not type check:

fun{a:t@ype} list_filter {m:nat} (
  xs: list (a, m),
  f: a -<> bool
  ): list (a, m) =
  case+ xs of
  | nil () => nil ()
  | cons (x, xs) => if f(x) then cons(x, list_filter(xs, f)) else list_filter(xs, f)

In this erroneous example the result is defined as a list of length m. But it won't be of length m if the result of the predicate function means elements are skipped. A definition that typechecks uses an existential type definition (the [n:nat]) to define the result length as being a different value:

fun{a:t@ype} list_filter {m:nat} (
  xs: list (a, m),
  f: a -<> bool
  ): [n:nat] list (a, n) =
  case+ xs of
  | nil () => nil ()
  | cons (x, xs) => if f(x) then cons(x, list_filter(xs, f)) else list_filter(xs, f)

This unfortunately means that the typechecker won't detect some examples of erroneous code. We'd like this to fail to compile if it results in a result list which is larger than the original list which should be impossible:

fun{a:t@ype} list_filter {m:nat} (
  xs: list (a, m),
  f: a -<> bool
  ): [n:nat] list (a, n) =
  case+ xs of
  | nil () => nil ()
  | cons (x, xs) => if f(x) then cons(x, cons(x, list_filter(xs, f))) else list_filter(xs, f)

The solution to this is to limit the existential type in the result to be all natural numbers less than or equal to the length of the input list:

fun{a:t@ype} list_filter {m:nat} (
  xs: list (a, m),
  f: a -<> bool
  ): [n:nat | n <= m] list (a, n) =
  case+ xs of
  | nil () => nil ()
  | cons (x, xs) => if f(x) then cons(x, list_filter(xs, f)) else list_filter(xs, f)

Note the [n:nat | n <= m] which defines the limit. This will now fail to compile the erroneous code but sucessfully compile the correct code. It won't catch all possible errors but is better at catching some errors that the previous version.

The complete example program is in dt4.dats (html).

Drop

The implementation of a list_drop function, which removes the first n items from a list, also has a similar problem to that of list_filter. This definition won't typecheck:

fun{a:t@ype} list_drop {m:nat} (
  xs: list (a, m),
  count: int
  ): list (a, m) =
  case+ xs of
  | nil () => nil ()
  | cons (x, xs2) => if count > 0 then list_drop(xs2, count - 1) else xs

Like list_filter this is due to the wrong size being used in the result list. We could use the same solution as list_filter which is to set the size using an existential type definition but in this case we actually know the result size. It is based on the count that is passed as an argument. The result list size should be the same as the input list, less the count. Here's the new definition:

fun{a:t@ype} list_drop {m,n:nat | n <= m} (
  xs: list (a, m),
  count: int n
  ): list (a, m - n) =
  case+ xs of
  | nil () => nil ()
  | cons (x, xs2) => if count > 0 then list_drop(xs2, count - 1) else xs

This version will typecheck correctly. It will give a compile time error if the implementation incorrectly produces a list that is not exactly the expected size. It will also be a compile time error if the given count of items to drop is greater than the size of the list.

This is done by making count a singleton integer. Its type is an integer of a specific value, called n. When n is declared we state that it must be less than the size of the list, m, as seen by the definition {m,n:nat | n <= m}. The result list is required to be of size m-n.

The complete example program is in dt5.dats (html).

Lists that depend on their values

The previous list examples were datatypes that were dependant on the size of the list. It's also possible to depend on the value stored in the list itself. The following is the definition for a list that can only hold even numbers:

datatype list =
  | nil of ()
  | {x:int | (x mod 2) == 0 } cons of (int x, list)

The cons constructor takes an int and a list. It is dependant on the value of the int with the constraint that the value of the int must be divisible by 2. It is now a compile error to put an odd number into the list. This won't typecheck due to the attempt to pass the odd number 3 to cons:

val a = cons(2, cons(3, cons(10, cons(6, nil))))

The complete example program is in dt6.dats (html).

Another example of a list that depends on its values might be a list where all elements must be less than 100 and in order. It should be a type error to construct an unordered list.

datatype olist (int) =
  | nil (100) of ()
  | {x,y:int | x <= y} cons (x) of (int x, olist (y))

An olist is dependent on an int. This int is an integer value that all elements consed to the list must be less than. The nil constructor uses 100 for the value to show that all items must be less than 100. Like the previous example, the cons constructor depends on the first argument. It uses this in the constraint to ensure it is less than the dependant value of the tail list (the y).

Out of order construction like the following will be a type error:

val a = cons(1, cons(12, cons(10, cons(12, nil))))

Whereas this is fine:

val a = cons(1, cons(2, cons(10, cons(12, nil))))

The complete example program is in dt7.dats (html).

Red-Black Trees

The paper Dependently Typed Datastructures has an example of using dependently typed datatypes to implement red-black trees. The paper uses the language DML. I've translated this to ATS in the example that follows.

A red-black tree is a balanced binary tree which satisfies the following conditions:

  • All nodes are marked with a colour, red or black.
  • All leaves are marked black and all other nodes are either red or black.
  • For every node there are the same number of black nodes on every path connecting the node to a leaf. This is called the black height of the node.
  • The two sons of every red node must be black.

These constraints can be defined in the red-black tree datatype ensuring that the tree is always balanced and correct as per the conditions above. It becomes impossible to implement functions that produce a tree that is not a correct red-black tree since it won't typecheck. This can be defined as:

sortdef color = {c:nat | 0 <= c; c <= 1}
#define red 1
#define black 0

datatype rbtree(int, int, int) =
  | E(black, 0, 0)
  | {cl,cr:color} {bh:nat}
     B(black, bh+1, 0)
       of (rbtree(cl, bh, 0), int, rbtree(cr, bh, 0))
  | {cl,cr:color} {bh:nat}
     R(red, bh, cl+cr)
       of (rbtree(cl, bh, 0), int, rbtree(cr, bh, 0))

The type rbtree is indexed by (int, int, int). These are the color of the node, the black height of the tree and the number of color violations respectively. The later is a count of the number of times a red node is followed by another red node. From the conditions given earlier it can be seen than a correct rbtree should always have a color violation number of zero.

The constructor, E, is a leaf node. This node is black, has a height of zero and no color violations. It is a valid rbtree.

The constructor B is a black node. It takes 3 arguments. The left child node, the integer value stored as the key, and the right child node. The type for a node constructed by B is black, has a height one greater than the child nodes and no color violations. Note that the black height of the two child nodes must be equal.

The constructor R is a red node. It takes 3 arguments, the same as the B constructor. As this is a red node it doesn't increase the black height so uses the same value of the child nodes. The color violations value is calculated by adding the color values of the two child nodes. If either of those are red then the color violations will be non-zero.

The type for a function that inserts a key into the tree can now be defined as:

fun insert {c:color} {bh:nat} ( 
  key: int, 
  rbt: rbtree(c ,bh, 0)
  ): [c:color] [bh:nat] rbtree(c, bh, 0)

This means our implementation of insert must return a correct rbtree. It cannot have a color violations value that is not zero so it must conform to the conditions we outlined earlier. If it doesn't, it won't compile.

When inserting a node into the tree we can end up with a tree where the red-black tree conditions are violated. A function restore is defined below that pattern matches the combinations of invalid nodes and performs the required transformations to return an rbtree with no violations:

fun restore {cl,cr:color} {bh:nat} {vl,vr:nat | vl+vr <= 1} (
  left: rbtree(cl, bh, vl),
  key: int,
  right: rbtree(cr, bh, vr)
  ): [c:color] rbtree(c, bh + 1, 0) =
  case+ (left, key, right) of
    | (R(R(a,x,b),y,c),z,d) => R(B(a,x,b),y,B(c,z,d))
    | (R(a,x,R(b,y,c)),z,d) => R(B(a,x,b),y,B(c,z,d))
    | (a,x,R(R(b,y,c),z,d)) => R(B(a,x,b),y,B(c,z,d))
    | (a,x,R(b,y,R(c,z,d))) => R(B(a,x,b),y,B(c,z,d))
    | (a,x,b) =>> B(a,x,b)

The type of the restore function states that it takes a left and right node, one of which may have a color violation, and returns a correctly formed red-black tree node. It's time consuming and error prone to look at the code and determine that it covers all the required cases to return a correctly formed tree. However the type checker will do this for us thanks to the constraints that have defined on the function and the rbtree type. It won't compile if any of the pattern match lines are removed for example.

The use of =>> in the last pattern match line is explained in the tutorial on pattern matching. The ATS typechecker will typecheck each pattern line independently of the others. This can cause a typecheck failure in the last match since it doesn't take into account the previous patterns and can't determine that the color violation value of a in the result will be zero. By using =>> we tell ATS to typecheck the clause under the assumption that the other clauses have not been taken. Since they all handle the non-zero color violation case this line will then typecheck.

The insert function itself is implemented as follows:

fun insert {c:color} {bh:nat} (
  x: int,
  t: rbtree(c ,bh, 0)
  ): [c:color] [bh2:nat] rbtree(c, bh2, 0) = let
  fun ins {c:color} {bh:nat} (
    t2: rbtree(c, bh, 0)
  ):<cloptr1> [c2:color] [v:nat | v <= c] rbtree(c2, bh, v) =
    case+ t2 of
    | E () => R(E (), x, E ())
    | B(a, y, b) => if x < y then restore(ins(a), y, b)
                    else if y < x then restore (a, y, ins(b))
                    else B(a, y, b)
    | R(a, y, b) => if x < y then R(ins(a), y, b)
                    else if y < x then R(a, y, ins(b))
                    else R(a, y, b)
  val t = ins(t)
in
  case+ t of
  | R(a, y, b) => B(a, y, b)
  | _ =>> t
end

The complete example program is in dt8.dats (html). For another example of red-black trees in ATS see the funrbtree example from the ATS website.

Linear constraints

The constraints generated by the dependent types must be 'linear' constraints. An example of a linear constraint is the definition of 'list_append' earlier:

fun{a:t@ype} list_append {n1,n2:nat} (
  xs: list (a, n1), ys: list (a, n2)
  ): list (a, n1+n2)

The result type containing n1+n2 will typecheck fine. However an example that won't typecheck is the following definition of a list_concat:

fun{a:t@ype} list_concat {n1,n2:nat} (
  xss: list (list (a, n1), n2),
  ): list (a, n1*n2)

The n1*n2 results in a non-linear constraint being generated and the ATS typechecker cannot resolve this. The solution for this is to use theoreom-proving. See chapter 6 in the ATS user guide for details and an example using concat.

Closing notes

Although I create my own list types in these examples, ATS has lists, vectors and many other data structures in the standard library.

There's a lot more that can be done with ATS and dependent types. For more examples see the papers mentioned at the beginning at throughout this post. The paper Why Dependent Types Matter is also useful reading for more on the topic.

Tags: ats 

2010-08-24

Experimental Playback Statistics for HTML Video and Audio

Now that HTML video is getting more usage there have been requests for statistics during playback so performance can be measured from JavaScript. This has come up a few times on the WHATWG mailing list.

I raised bug 580531 to add some additional data to media and video DOM elements to provide this information. The patch in that bug adds two attributes to the DOM HTMLVideoElement object, both prefixed by 'moz' to prevent name clashes as they are experimental:

interface nsIDOMHTMLVideoElement : nsIDOMHTMLMediaElement
{
   ...
   // A count of the number of frames that have been decoded and ready
   // to be displayed. This can be used for computing the decoding framerate
   readonly attribute unsigned long mozDecodedFrames;

   // A count of the number of frames that have been dropped for performance
   // reasons during playback.
   readonly attribute unsigned long mozDroppedFrames;
};

mozDecodedFrames is an incrementing count each time a frame is decoded and ready to be displayed. mozDroppedFrames is incremented each to a frame is dropped to keep playback going at the correct frame rate. These can be used to compute the current framerate that the video is being decoded at, and the framerate that it should be decoding at by combining the two counts. This will give client JavaScript code an indication if the hardware is able to play the video.

Another requested feature is information on the download rate. Currently the Firefox implementation of HTML video and audio uses a non-standard extension to 'progress' events. We provide information attached to the event that gives the current byte position in the media file as it downloads, and the total bytes available. This was actually required by the WHATWG spec at one point but it changed a while back and I don't think any other browser implements it.

This has been used in experimental pages to compute things like download rate and we use it in our own controls implementation to display a progress bar as we didn't implement the 'buffered' attribute. Support for 'buffered' has now landed so we want to remove the non-standard 'progress' information to be spec compliant.

To address the needs of those using the progress data for bandwidth calculation I've added two attributes to HTMLMediaElement:

interface nsIDOMHTMLMediaElement : nsIDOMHTMLElement
{
   ...
   // The approximate rate at which the media resource is being downloaded in
   // bytes per second. If the rate is not available (possibly due
   // to not having downloaded enough data to compute a consistent value)
   // this will be NaN.
   readonly attribute float mozDownloadRate;

   // The approximate rate at which the media resource is being decoded in
   // bytes per second. If the rate is not available this will be
   // NaN.
   readonly attribute float mozDecodeRate;
};

mozDownloadRate is an estimate, in bytes per second, of the current download rate of the media. If not enough information has been downloaded for a reliable estimate this will be NaN. mozDecodeRate provides the rate at which decoding is occurring. In the patch this is the length of the video divided by the duration and remains constant.

Whether this gets landed is yet to be determined but I think the information is useful and I'd be interested in any feedback on the data I've decided to include. There is some feedback in the bug from the patch review already and there's plenty of time between now and when the Firefox 4 rush is over to look over ideas of what could be included, removed or changed. I posted about the proposed additions in the WHATWG mailing list and there are some responses in that thread.

Tags: mozilla  firefox  video 

2010-08-11

Concurrency in ATS

ATS uses native operating system threads for concurrency. It provides a wrapper around pthreads and a higher level API for creating worker threads and using them for parallel computation called 'parworkshop.

'parworkshop' is described in a mailing list announcement about it as:

...a package named 'parworkshop' that allows the programmer to build a 'workshop', hire 'workers', submit 'works', etc. Hopefully, you get the idea :) Right now, each 'worker' is built on top of a pthread. In the long run, I expect that more sophisticated workshops can be built to better take advantage of the underlying hardware/architecture.

In this post I've done two examples using 'parworkshop' to learn how to use it. The first is simple with no actual computation done. The second is based on a programming example posted on reddit recently.

Simple example

To use 'parworkshop' include two files:

staload "libats/SATS/parworkshop.sats"
staload "libats/DATS/parworkshop.dats"

The SATS file contains the declarations for the functions and datatypes. The DATS file is needed for implementations of some of the template functions.

The basic steps to using 'parworkshop' involve:

  1. Create a workshop using 'workshop_make'.
  2. Add workers to the workshop using 'workshop_add_worker'. Each worker is a native thread.
  3. Insert one or more units of work to be processed by a worker using 'workshop_insert_work'.
  4. Wait for all units of work to be completed using 'workshop_wait_blocked_all'.
  5. Tell the workers to quit. This can be done by using 'workshop_insert_work' with some sentinel to tell the worker to quit.
  6. Wait for all workers to quit using 'workshop_wait_quit_all'.
  7. Free the worker using 'workshop_free' or 'workshop_free_vt_exn'.

Each worker created in step 2 above is a native thread. This thread blocks on a queue after being created. 'workshop_insert_work' (step 3 above) adds items onto the queue. One of the worker threads will then wake up, call a function that gets the unit of work passed as an argument and then resume blocking on the queue. The function that gets called by the worker thread is one of the parameters in 'workshop_make' created in step 1.

The definition of 'workshop_make is:

fun{a:viewt@ype} workshop_make {n:pos} (
  qsz: size_t n
, fwork: {l:agz} (!WORKSHOPptr (a, l), &a >> a?) -<fun1> int
) : WORKSHOPptr a

From this you can see that 'workshop_make' is a template function that takes two arguments. The template function is parameterized over the type of the work item (the 'a' in the definition). The first of the two arguments is a number for the initial size of the queue that is used for adding work items. The second is a pointer to the function that gets called in a worker thread when a work item is added to the queue. That function must have the definition:

{l:agz} (!WORKSHOPptr (a, l), &a >> a?) -<fun1> int

This means the worker thread function receives as an argument the created workshop (ie. the result from 'workshop_make') and a reference to the work item. It returns an 'int' value. The return value of this worker function tells the worker thread whether it should quit or continue processing queue items. The return values are:

  • return status > 0 : the worker is to continue
  • return status = 0 : the worker is to quit
  • return status = ~1 : the worker is to pause

For example, a worker function that takes an 'int' as the unit of work and prints the value of it could be:

fun fwork {l:addr} (
  ws: !WORKSHOPptr(work,l),
  x: &int >> int?
  ): int = let
  val () = printf("x = %d\n", @(x))
in 
 if x < 0 then 0 else 1
end

In this example if the value of the 'int' is less than zero then the worker thread will quit. To create a workshop and some workers using this function:

val ws = workshop_make<int>(1, fwork)
val _ = workshop_add_worker(ws)
val _ = workshop_add_worker(ws)

Notice that the type of the work unit (the 'int') is passed as a template parameter to 'workshop_make', very similar to C++ syntax for explicit use of template functions. To add work items:

val () = workshop_insert_work(ws, 1)
val () = workshop_insert_work(ws, 2)
val () = workshop_insert_work(ws, 3)

This will queue three items which will cause the threads to wake up and start processing them. We can then use 'workshop_wait_blocked_all' to wait for them to finish and post work items telling the worker threads to quit:

val () = workshop_wait_blocked_all(ws)
val nworker = workshop_get_nworker(ws)
var i: Nat = 0
val () = while(i < nworker) let
           val () = workshop_insert_work(ws, ~1)
         in i := i + 1 end  

This uses 'workshop_get_nworker' to get a count of the number of worker threads that exist. Then loops up to this count inserting a work item less than zero to cause those worker threads to exit. The final steps are to wait until they quit and free the workshop:

val () = workshop_wait_quit_all(ws)
val () = workshop_free(ws)

The complete program is:

staload "libats/SATS/parworkshop.sats"
staload "libats/DATS/parworkshop.dats"

typedef work = int

fun fwork {l:addr} (
  ws: !WORKSHOPptr(work,l),
  x: &work >> work?
  ): int = let
  val () = printf("x = %d\n", @(x))
in 
  if x < 0 then 0 else 1
end

implement main() = let
  val ws = workshop_make<work>(1, fwork)
  val status = workshop_add_worker(ws)
  val status = workshop_add_worker(ws)

  val () = workshop_insert_work(ws, 1)
  val () = workshop_insert_work(ws, 2)
  val () = workshop_insert_work(ws, 3)

  val () = workshop_wait_blocked_all(ws)
  val nworker = workshop_get_nworker(ws)
  var i: Nat = 0
  val () = while(i < nworker) let
             val () = workshop_insert_work(ws, ~1)
           in i := i + 1 end  

  val () = workshop_wait_quit_all(ws)
  val () = workshop_free(ws)
in
  ()
end

This can be compiled with (assuming it's in a file called 'eg1.dats'):

atscc -o eg1 eg1.dats -D_ATS_MULTITHREAD -lpthread

Running it gives:

$ ./eg1
x = 1
x = 2
x = 3
x = -1
x = -1

Note that 'parworkshop' makes use of the type system to ensure that resources are used correctly. It is a compile time error for example not to call 'workshop_free'.

More complex example

While the previous example is simple it does show the basic steps that need to be done in most 'parworkshop' usage. But in most parallel programming tasks you want the worker thread to process some data and return a result somehow. This result is then used for further processing. There are a few examples in the ATS distribution that show how to do this. I'll cover one approach below.

A thread on programming.reddit.com recently resulted in a number of implementations of a simple programming task in various programming languages. Some of these were implemented in concurrent languages and I tried my hand at a version in ATS using 'parworkshop'.

The task involves approximating the value of the mathematical constant 'e' by taking the following approach (description from here):

Select a random number between 0 and 1. Now select another and add it to the first. Keep doing this, piling on random numbers. How many random numbers, on average, do you need to make the total greater than 1? The answer is e.

The code to do this is fairly simple in ATS and you can see it in the completed result later. I ended up with a function called 'n_attempts' that takes the number of iterations to try and returns the count of the total number of attempts. This can then be divided by the iterations to get the average and therefore the estimation of 'e':

fun n_attempts (n:int): int
...
var e = n_attempts(10000000) / double_of_int(10000000)

To make use of multiple cores I decided on creatng a worker thread for each core. Each of these worker threads would run a portion of the total number of desired iterations. The unit of work is a data type 'command' that can be one of two things:

  1. 'Compute' which tells the worker thread to perform a number of iterations. The 'Compute' contains a pointer to an integer that is used to hold the result produced by the worker thread.
  2. 'Quit' which tells the worker thread to exit.

This data type is defined as:

dataviewtype command = 
  | {l:agz} Compute of (int @ l | ptr l, int)
  | Quit 

viewtypedef work = command

Notice the use of 'dataviewtype' instead of 'datatype'. The latter would require linking against the garbage collector to manage reclamation of the allocated data types. The former results in a 'linear data type' which needs to be explicitly free'd and therefore doesn't need the garbage collector. My goal was to make this example not need the garbage collector.

The worker thread function looks like:

fun fwork {l:addr}
  (ws: !WORKSHOPptr(work,l), x: &work >> work?): int = 
  case+ x of
  | ~Compute (pf_p | p, iterations)  => let 
       val () = !p := n_attempts(iterations)
       extern prfun __unref {l:addr} (pf: int @ l):void
       prval () = __unref(pf_p)
     in 1 end
  | ~Quit () => 0

It basically switches on the value of the work unit using a 'case' statement. In the case of 'Quit' it returns '0' to exit the worker thread. In the case of 'Compute' it calls n_attempts with the number of iterations and stores the result in the integer pointer. It then returns '1' for the worker thread to wait for more work units. The use of the '~' before the type in the 'case' statement tells ATS to release the memory for the linear type ('Compute' and 'Quit') so we don't need to do it manually.

When adding the work items to the workshop I create an array of 'int' to hold the results. I use a linear array and manually manage the memory:

#define NCPU 2
...
val (pf_gc_arr, pf_arr | arr) = array_ptr_alloc<int>(NCPU)
val () = array_ptr_initialize_elt<int>(!arr, NCPU, 0)
...
val () = array_ptr_free {int} (pf_gc_arr, pf_arr | arr)

When inserting the work items I need to create a 'Compute' and pass it a pointer to the 'int' where the result is stored. This pointer points to a single element of this array. The code to do this is:

fun loop {l,l2:agz} {n:nat} .< n >. (
      pf: !array_v(int, n, l2)
    | ws: !WORKSHOPptr(work, l), 
      p: ptr l2, 
      n: int n, 
      iterations: int)
    : void =
  ...
  prval (pf1, pf2) = array_v_uncons{int}(pf)
  extern prfun __ref {l:addr} (pf: ! int @ l): int @ l
  prval xf = __ref(pf1)
  val () = workshop_insert_work(ws, Compute (xf | p, iterations))
  val () = loop(pf2 | ws, p + sizeof<int>, n - 1, iterations)
  prval () = pf := array_v_cons{int}(pf1, pf2)
  ...

This code does some proof manipulations to take the proof that provides access to the entire array, and returns a proof for accessing a particular element. This ensures at compile time that the worker thread function can only ever access the memory for that single element. The magic that does this is 'array_v_uncons':

prval (pf1, pf2) = array_v_uncons{int}(pf)

'pf' is the proof obtained for the entire array (the 'pf_arr' returned from 'array_ptr_alloc'). 'array_v_uncons' consumes this and returns two proofs. The first ('pf1' here) is an 'int @ l' for access to the first item in the array. The second ('pf2' here) is for the rest of the array.

We can now pass 'pf1' to 'Compute' along with the pointer to that integer. This is repeated until there are no more array items to process. Notice the recursive call to 'loop' increments the pointer by the size of an 'int'. ATS is allowing us to efficiently work directly with C memory, but safely by requiring proofs that we can access the memory and that it contains valid data.

The following line occurs after the 'loop' recursive call:

prval () = pf := array_v_cons{int}(pf1, pf2)

This is needed because 'array_v_uncons' consumed our proof held in 'pf'. But the 'loop' definition states that 'pf' must not be consumed (that's the '!' in "pf: !array_v(int, n, l2)"). This last line sets the proof of 'pf' back to the original by consing together the 'pf1' and 'pf2' we created when consuming 'pf'.

The other bit of trickery here is:

extern prfun __ref {l:addr} (pf: ! int @ l): int @ l
prval xf = __ref(pf1)

The 'Compute' constructor consumes the proof it is passed. If we pass 'pf1' it would be consumed and can't be used in the 'array_v_cons' call later. This code creates a new proof 'xf' that is the same as the existing 'pf1' proof and passes it to the 'Compute' constructor. We later consume this proof manually inside the 'fwork' function. You probably noticed this code in the 'fwork' example earlier:

extern prfun __unref {l:addr} (pf: int @ l):void
prval () = __unref(pf_p)

This thread on the ATS mailing list discusses this and a better way of handling the issue using a '__borrow' proof function which I haven't yet implemented.

The entire program is provided below. Hopefully it's easy to follow given the previous simple example and the above explanation of the more complicated bits:

staload "libc/SATS/random.sats"
staload "libc/SATS/unistd.sats"
staload "libats/SATS/parworkshop.sats"
staload "libats/DATS/parworkshop.dats"
staload "prelude/DATS/array.dats"

#define ITER 100000000
#define NCPU 2

fn random_double (buf: &drand48_data): double = let
  var r: double
  val _ = drand48_r(buf, r)
in
  r
end

fn attempts (buf: &drand48_data): int = let 
  fun loop (buf: &drand48_data, sum: double, count: int): int = 
    if sum <= 1.0 then loop(buf, sum + random_double(buf), count + 1) else count
in
  loop(buf, 0.0, 0)
end

fun n_attempts (n:int): int = let
  var buf: drand48_data
  val _ = srand48_r(0L, buf)
  fun loop (n:int, count: int, buf: &drand48_data):int =
    if n = 0 then count else loop(n-1, count + attempts(buf), buf)
in
  loop(n, 0, buf)
end

dataviewtype command = 
  | {l:agz} Compute of (int @ l | ptr l, int)
  | Quit 

viewtypedef work = command

fun fwork {l:addr}
  (ws: !WORKSHOPptr(work,l), x: &work >> work?): int = 
  case+ x of
  | ~Compute (pf_p | p, iterations)  => let 
       val () = !p := n_attempts(iterations)
       extern prfun __unref {l:addr} (pf: int @ l):void
       prval () = __unref(pf_p)
     in 1 end
  | ~Quit () => 0

fun insert_all {l,l2:agz}
               {n:nat | n > 0}
    (pf_arr: !array_v(int, n, l2) | ws: !WORKSHOPptr(work, l),
     arr: ptr l2, n: int n, iterations: int):void = let
  fun loop {l,l2:agz} {n:nat} .< n >. (
      pf: !array_v(int, n, l2)
    | ws: !WORKSHOPptr(work, l), 
      p: ptr l2, 
      n: int n, 
      iterations: int)
    : void =
    if n = 0 then () else let
      prval (pf1, pf2) = array_v_uncons{int}(pf)
      extern prfun __ref {l:addr} (pf: ! int @ l): int @ l
      prval xf = __ref(pf1)
      val () = workshop_insert_work(ws, Compute (xf | p, iterations))
      val () = loop(pf2 | ws, p + sizeof<int>, n - 1, iterations)
      prval () = pf := array_v_cons{int}(pf1, pf2)
    in
      // nothing
    end
in
  loop(pf_arr | ws, arr, n, iterations / NCPU)
end

implement main() = let 
 val ws = workshop_make<work>(NCPU, fwork)

  var ncpu: int
  val () = for(ncpu := 0; ncpu < NCPU; ncpu := ncpu + 1) let 
            val _ = workshop_add_worker(ws) in () end

  val nworker = workshop_get_nworker(ws)

  val (pf_gc_arr, pf_arr | arr) = array_ptr_alloc<int>(NCPU)
  val () = array_ptr_initialize_elt<int>(!arr, NCPU, 0)
  prval pf_arr  = pf_arr

  val () = insert_all(pf_arr | ws, arr, NCPU, ITER)

  val () = workshop_wait_blocked_all(ws)

  var j: Nat = 0;
  val () = while(j < nworker) let
             val () = workshop_insert_work(ws, Quit ())
           in 
             j := j + 1
           end  

  val () = workshop_wait_quit_all(ws)
  val () = workshop_free_vt_exn(ws)

  var k: Nat = 0;
  var total: int = 0;
  val () = for(k := 0; k < NCPU; k := k + 1) total := total + arr[k]
  val avg = total / double_of_int(ITER)
  val () = printf("total: %d\n", @(total))
  val () = print(avg)
in
  array_ptr_free {int} (pf_gc_arr, pf_arr | arr)
end

Hongwei Xi has cleaned this up a bit by replacing 'ref' and 'unref' with '__borrow' and using a stack allocated linear array for the results. This is included in the ATS distribution as randcomp_mt.dats. Looking at that code will hopefully help clarify how these things work when compared to this code I wrote.

Compiling a single threaded version of the code (I posted it in the reddit thread here runs in about 30 seconds on my dual core machine. The 'parworkshop' version runs in about 16 seconds. The single threaded version uses 'drand48' whereas the 'parworkshop' version uses 'dthread48_r'. The latter is needed otherwise the threads are serialized over a lock in 'drand48' for the random number state and the runtime is even slower than the single threaded version. Compiling with '-O3' passed to 'atscc' speeds things up quite a bit too.

My first attempt at using 'parworkshop' for this task is in this gist paste. Instead of a linear data type to hold the unit of work I used a linear closure and called it in the 'fwork' function to get the result. I took this approach originally based on the existing MULTICORE examples in ATS. In the end I opted for the linear datatype approach as I felt it was clearer.

Tags: ats 

2010-08-02

lin and llam with closures

While using the ATS wrapper for pthreads I noticed the use of a couple things related to closures that I hadn't seen before and therefore didn't get mentioned in my post on ATS closures. The signature for the function to spawn a closure in a thread is:

fun pthread_create_detached_cloptr
  (f: () -<lin,cloptr1> void): void
// end of [pthread_create_detached_cloptr]

Notice the use of 'lin' alongside 'cloptr1'. When calling this function you can't use 'lam' to create the closure due to the following compile error:

(* Using *)
val () = pthread_create_detached_cloptr(lam () => print_string("hello thread 1\n"));

(* Gives *)
test.dats: ...: error(3) nonlinear function is given a linear type .

The correct thing to do is to use 'llam':

val () = pthread_create_detached_cloptr(llam () => print_string("hello thread 1\n"));

The 'Tips for using ATS' paper available from the ATSfloat site mentions that 'llam' is equivalent to the following 'lam' usage:

val () = pthread_create_detached_cloptr(lam () =<lin,cloptr1> print_string("hello thread 1\n"));

There wasn't much usage of lin,cloptr to look at, and little mention of it anywhere, so I asked on the mailing list about it. Hongwei's reply explains the difference between using 'lin' and not using 'lin':

In ATS, it is allowed to create a closure containing some resources in its environment. In the above example, 'lin' is used to indicate that [f] is a closure whose enviroment may contain resources.

In the reply Hongwei shows that a version of pthread_create_detached_cloptr without the 'lin' is more restrictive than with it as that would not allow containing resources in the environment of the closure.

A (somewhat contrived) example of usage below defines a function 'test' that takes as an argument a closure tagged as lin,cloptr1. It calls the closure and then frees it. In the 'main' function I use the cURL library to call 'curl_global_init'. This returns a 'pf_gerr' resource that is used to track if the correct cURL cleanup routine is called. The following example compiles and runs fine:

staload "contrib/cURL/SATS/curl.sats"

fun test(f: () -<lin,cloptr1> void) = (f(); cloptr_free(f))

implement main() = let
  val (pf_gerr | gerr) = curl_global_init (CURL_GLOBAL_ALL)
  val () = assert_errmsg (gerr = CURLE_OK, #LOCATION)

  val () = test(llam () => curl_global_cleanup (pf_gerr | (*none*)))
in
  ()
end;

Compile with something like:

atscc -o example example.dats -lcurl

Notice that in the closure passed to 'test' the 'pf_gerr' resource is used. The means that resource is captured by the closure. A version without using 'lin' and 'llam' is:

staload "contrib/cURL/SATS/curl.sats"

fun test(f: () -<cloptr1> void) = (f(); cloptr_free(f))

implement main() = let
  val (pf_gerr | gerr) = curl_global_init (CURL_GLOBAL_ALL)
  val () = assert_errmsg (gerr = CURLE_OK, #LOCATION)

  val () = test(lam () => curl_global_cleanup (pf_gerr | (*none*)))
in
  ()
end;

This produces the compile error showing that without the 'lin' tag a closure cannot capture resources:

example.dats: ...: the linear dynamic variable [pf_gerr] is expected to be local but it is not.

A version that works without using 'lin':

staload "contrib/cURL/SATS/curl.sats"

fun test {v:view}
  (pf: v | f: (v | (*none*)) -<cloptr1> void) = (f(pf | (*none*)); cloptr_free(f))

implement main() = let
  val (pf_gerr | gerr) = curl_global_init (CURL_GLOBAL_ALL)
  val () = assert_errmsg (gerr = CURLE_OK, #LOCATION)

  val () = test(pf_gerr | lam (pf | (*none*)) => curl_global_cleanup (pf | (*none*)))
in
  ()
end; 

Here we have to explicitly pass the resource around in a number of places. The use of 'lin' and 'llam' makes the code shorter and easier to understand. In a later reply Hongwei goes on to say:

While linear closures containing resources may not be common, linear proof functions are common and very useful for manipulating and tracking resources.

Tags: ats 

2010-06-20

Closures in ATS

My last post on ATS was about C style functions. This post covers what I've learnt about closures in ATS.

As I mentioned in my last post, closures are functions combined with an environment mapping names to values. These are like closures in dynamic languages that enable you to capture variables in the enclosing scope. To create a closure you must explicitly mark it as being a closure. A closure requires memory to be allocated to store the environment. This means you will either need to allocate the closure on the stack (so it is freed automatically when the scope exits), free the memory for the closure manually, or link against the garbage collector.

Persistent Closures

A persistent closure is allocated on the heap and is freed by the garbage collector when it is no longer used. To make this type of closure you use the tag 'cloref':

fun make_adder(n1:int): int -<cloref> int =
  lam (n2) =<cloref> n1 + n2;

implement main() = let
  val add5 = make_adder(5)
  val a = add5(10)
  val b = add5(20)
  val () = printf("a: %d\nb: %d\n", @(a,b))
in
  ()
end

As this example needs to use the garbage collector it must be built with the garbage collector included (using the -D_ATS_GCATS switch):

atscc -o eg1 eg1.dats -D_ATS_GCATS

In this example the 'make_adder' function takes an integer, n1, and returns a closure. This closure takes an integer, 'n2' and returns the addition of 'n1' and 'n2'. The 'main' function creates an adder with the initial function of '5' and uses the returned closure a couple of times.

Linear Closures

A linear closure is not freed by the garbage collector. It is the responsiblity of the programmer to free the memory manually. It is tracked by the type system and it is a compile time error to not call the function to free the linear closure. Linear closures have the tag 'cloptr'. The same example as before but using linear closures is:

fun make_adder(n1:int): int -<cloptr> int =
  lam (n2) =<cloptr> n1 + n2;

implement main() = let
  val add5 = make_adder(5)
  val a = add5(10)
  val b = add5(20)
  val () = printf("a: %d\nb: %d\n", @(a,b))
  val () = cloptr_free(add5)
in
  ()
end

Note the change from 'cloref' to 'cloptr' in the function tags, and the addition of a 'cloptr_free' call at the end of 'main'. If you leave out the 'cloptr_free' call then a compile time error results:

$ atscc -o closure2 closure2.dats
...
error(3): the linear dynamic variable [add5] needs to be consumed but it is preserved...
...

Because we are manually managing the memory the usage of linear closures does not require the garbage collector.

Stack Allocated Closures

It is possible to allocate linear closures on the stack rather than the heap. When this is done there is no need to free the memory manually as it is automatically deallocated when the stack scope exits:

implement main() = let
  var !add5 = @lam(n2:int) => 5 + n2
  val a = !add5(10)
  val b = !add5(20)
  val () = printf("a: %d\nb: %d\n", @(a,b))
in
  ()
end

A closure is allocated on the stack using '@lam' instead of 'lam'. This is assigned to a variable 'add5'. The variable is introduced using 'var' instead of 'val' which the examples have been using so far. 'val' is for 'value declarations' and 'var' is for 'variable declarations'. The latter are basically variables allocated on the stack that can be changed during the course of evaluation. See the val and var tutorial for details.

Variables allocated on the stack have an implicit proof variable that is used to allow access to its address. The use of '!' in the example appears to be a shorthand provided by ATS that can be used to avoid explicitly managing the proof's in some circumstances. I don't entirely understand why this is needed or how it works. If anyone has any helpful tips please let me know. The pointer tutorial seems to have some coverage of this.

For this example I don't create the closure in a 'make_adder' function. If the closure was allocated on the stack and returned from 'make_adder' it would be an error as we'd be returning a stack allocated object from a stack that just got destroyed. In ATS it is guaranteed by the type system that a closure allocated on the stack frame of a function cannot be used once the call to the function returns. A compile time warning occurs if this happens:

error(3): a flat closure is needed.

Standard Prelude Higher Order Functions

ATS has implementations of the usual higher order functions over lists and arrays. Things like 'map', 'filter', etc are available. The following example show's using higher order functions to sum a list:

staload "prelude/DATS/list.dats"

implement main() = () where {
  val a = '[1,2,3]
  val x = list_fold_left_cloref<int,int> (lam (a,b) =<cloref> a*b, 1, a)
  val () = printf("result: %d\n", @(x));
}

The 'list_fold_left_cloref' is defined in 'prelude/DATS/lists.dats'. It's a function template (similar to C++ templates) and needs the template types to be passed to it to be instantiated. In this case the types are both int's. This version of the code requires the garbage collector due to the use of 'cloref'.

Variants of list_fold_left exist for use with linear closures, functions, etc. For example, with a linear closure:

staload "prelude/DATS/list.dats"

implement main() = () where {
  val a = '[1,2,3]
  val clo = lam (pf: !unit_v | a:int,b:int):int =<cloptr> a*b;
  prval pfu = unit_v ()
  val x = list_fold_left_cloptr<int,int> (pfu | clo, 1, a)
  prval unit_v () = pfu
  val () = cloptr_free(clo)
  val () = printf("result: %d\n", @(x));

}

You'll notice the use of 'prval' in this version. The definition of 'list_fold_left_cloptr' is:

fun{init,a:t@ype} list_fold_left_cloptr {v:view} {f:eff}
  (pf: !v | f: !(!v | init, a) -<cloptr,f> init, ini: init, xs: List a):<f> init

This shows that both 'list_fold_left_cloptr' and the argument 'f' need a proof argument (anything to the left of the pipe '|' character, '!v' in this case). The example doesn't make use of the proof argument so we create one (unit_v ()) and release it later (the second 'prval' usage). Like I mentioned with proofs before, I don't quit understand why this takes a proof argument or why it passes it to the closure. Enlightenment in the comments would be welcome!

Note that even though this example manually manages memory for the closure it still needs the garbage collector for the allocation of the list stored 'a'. Even with the garbage collector included you can still manage memory manually for efficiency reasons if needed.

If using proof variables, manually managing closure memory, and related functionality proves to much of a burden then ATS provides alternative functions. The 'smlbas' routines provide much of the ML basis standard library implemented to use 'cloref1' function types. For example:

staload "libats/smlbas/SATS/list.sats"
staload "libats/smlbas/DATS/list.dats"
dynload "libats/smlbas/DATS/list.dats"

implement main() = () where {
  val a = list0_of_list1('[1,2,3])
  val x = foldl<int,int>(lam(a, b) => a + b, 1, a)
  val () = printf("result: %d\n", @(x));
}

This needs to be linked against the 'ats_smlbas' library to build:

atscc -o closure8 closure8.dats -D_ATS_GCATS -lats_smlbas

I had to pass the types to the 'foldl' template to get things to compile. Type inference is unfortunately not as simple as in some other functional languages. An alternative approach is to specify that you want to use the integer add function:

  val x = foldl(lam(a, b) => add_int_int(a, b), 1, a)

Unfortunately it wasn't enough to declare the types for the closure arguments:

  val x = foldl(lam(a:int, b:int) => a + b, 1, a)

This gives the error:

  error(3): the symbol [+] cannot be resolved: there are too many matches

Conclusion

This concludes my explorations of using closures and functions in ATS. I find using higher order functions somewhat difficult due to dealing with proofs and having to declare types everywhere.

This would probably be a bit easier if there was reference documentation on the ATS standard library with usage examples. Unfortunately I'm not aware of any apart from the library and example source code.

At least it's possible to use the 'smlbas' library and the simplified 'list0' and 'array0' types to avoid needing to understand the low level details of proofs and types while learning.

Tags: ats 


This site is accessable over tor as hidden service 6vp5u25g4izec5c37wv52skvecikld6kysvsivnl6sdg6q7wy25lixad.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-61/


Tags

Archives
Links