Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2017-03-28

Introduction to the Freenet API

Freenet is an anonymous, secure, distributed datastore. I've written about using Freenet before, including using it as the backend for static websites. In this post I'll demonstrate how to use the Freenet API to push data into the Freenet network and retrieve data from it.

Unfortunately the freenet protocol documentation is in a state of flux as it moves from a previous wiki to a github based wiki. This means some of the protocol information may be incomplete. The old wiki data is stored in Freenet under the following keys:

  • CHK@San0hXZSyCEvvb7enNUIWrkiv8MDChn8peLJllnWt4s,MCNn4eUbl5NW9quOm4JTU~0rsWu6QlIdek9VtpFpXe4,AAMC--8/freenetproject-oldwiki.tar
  • CHK@4sff2MgvexbsfgSuqNOwqVDkP~GaZPsZ1rJVKUg87g8,unU3TZ93pYGYPCoH7LC53dlc5~Rmar8SKF9fsZnQX-8,AAMC--8/freenetproject-wiki.tar

The API for external programs to communicate with a running Freenet daemon is FCPv2. It's a text based protocol accessed using a TCP socket connection on port 9481. The FCP protocol can be enabled or disabled from the Freenet configuration settings but it is enabled by default so the examples here should work in a default installation.

The basic protocol of FCP uses a unit called a 'message'. Messages are sent over the socket starting with a line for the start of the message, followed by a series of 'Key=Value' lines, and ending the message with 'EndMessage'. Some messages containing binary data and these end differently and I'll discuss this after some basic examples.

For the examples that follow I'll be using bash. I debated picking from my usual toolkit of obscure languages but decided to use something that doesn't require installation on Linux and Mac OS X and may also run on Windows 10. The examples should be readable enough for non-bash users to pick up and translate to their favourite language, especially given the simplicity of the protocol. I've found the ability to throw together a quick bash script to do inserts and retrievals to be useful.

Hello

The FCPv2 documentation lists the messages that can be sent from the client to the Freenet node, and what can be expected to be received from the node to the client. On first connecting to the node the client must send a ClientHello message. This looks like:

ClientHello
Name=My Client Name
ExpectedVersion=2.0
EndMessage

The Name field uniquely identifies the client to the node. Disconnecting and reconnecting with the same Name retains access to a persistent queue of data being inserted and retrieved. An error is returned if an attempt to connect is made when a client with the same Name is already connected.

The node returns with a NodeHello Message. This looks like:

NodeHello
Build=1477
ConnectionIdentifier=...
...
EndMessage

The various fields are described in the NodeHello documentation. This interaction can be tested using netcat or telnet:

$ nc localhost 9481
ClientHello
Name=My Client Name
ExpectedVersion=2.0
EndMessage

NodeHello
CompressionCodecs=4 - GZIP(0), BZIP2(1), LZMA(2), LZMA_NEW(3)
Revision=build01477
Testnet=false
...
ExtRevision=v29
EndMessage

You can connect to a socket from bash using 'exec' and file redirection to a pseudo-path describing the tcp socket. See HACKTUX notes from the trenches for details. The above netcat interaction looks like this from bash:

#! /bin/bash
    function wait_for {
      local line
      local str=$1
      while read -r line
      do
        >&2 echo "$line"
        if [ "$line" == "$str" ]; then
          break
        fi
      done
    }
       
    exec 3<>/dev/tcp/127.0.0.1/9481
       
    cat >&3 <<HERE
    ClientHello
    Name=My Client Name
    ExpectedVersion=2.0
    EndMessage
    HERE
       
    wait_for "NodeHello" <&3
    wait_for "EndMessage" <&3
       
    exec 3<&-
    exec 3>&-

The exec line opens a socket on port 9481, the FCP port, and assigns it to file descriptor '3'. Then we use cat to write the ClientHello message to that file descriptor. wait_for reads lines from the socket, displaying them on standard error (file descriptor '2'), until it reaches a specifc line passed as an argument. Here we wait for the NodeHello line and then the EndMesage line to cover the NodeHello response from the server. The remaining two exec lines close the socket.

The full bash script is available in hello.sh.

Retrieving data inline

The FCP message ClientGet is used to retrieve data stored at a specific key. The data can be returned inline within a message or written to a file accessable by the node. An example message for retrieving a known key is:

ClientGet
URI=CHK@otFYYKhLKFzkAKhEHWPzVAbzK9F3BRxLwuoLwkzefqA,AKn6KQE7c~8G5dLa4TuyfG16XIUwycWuFurNJYjbXu0,AAMC--8/example.txt
Identifier=1234
Verbosity=0
ReturnType=direct
EndMessage

This retrieves the contents of a particular CHK key where I stored example.txt. The Verbosity is set to not return any progress messages, just send messages when the entire contents are retrieved. A ReturnType of direct means return the data within the AllData message which is received when the retrieval is complete. The result messages are:

DataFound
Identifier=1234
CompletionTime=1490614072644
StartupTime=1490614072634
DataLength=21
Global=false
Metadata.ContentType=text/plain
EndMessage

AllData
Identifier=1234
CompletionTime=1490614072644
StartupTime=1490614072634
DataLength=21
Global=false
Metadata.ContentType=text/plain
Data
Hello Freenet World!

The first message received is DataFound giving information about the completed request. The following message, AllData, returns the actual data. Note that it does not include an EndMessage. Instead it has a Data terminator followed by the data as a sequence of bytes of length DataLength.

To process AllData from bash I use a function to extract the DataLength when it finds it:

    function get_data_length {
      local line
      while read -r line
      do
       if [[ "$line" =~ ^DataLength=.* ]]; then
         echo "${line##DataLength=}"
         break
       fi
      done
    }

This is called from the script after the ClientHello and NodeHello exchange:

    cat >&3 <<HERE
    ClientGet
    URI=CHK@otFYYKhLKFzkAKhEHWPzVAbzK9F3BRxLwuoLwkzefqA,AKn6KQE7c~8G5dLa4TuyfG16XIUwycWuFurNJYjbXu0,AAMC--8/example.txt
    Identifier=1234
    Verbosity=0
    ReturnType=direct
    EndMessage
    HERE
        
    wait_for "AllData" <&3
    len=$(get_data_length <&3)
    wait_for "Data" <&3
    dd status=none bs="$len" count=1 <&3 >&2

The dd command reads the specified number of bytes from the socket and outputs it to standard output. This is the contents of the key we requested:

$ ./getinline
Hello Freenet World!

The full bash script is available in getinline.sh.

The main downside of using inline data requests is that large files can exhaust the memory of the node.

Request Direct Disk Access

A variant of ClientGet requests the node to write the result to a file on disk instead of sending it as part of the AllData message. This is useful for large files that don't fit in memory. The data is written to the filesystem that the node has access to so it's most useful when the FCP client and the freenet node are on the same system.

Being able to tell the server to write directly to the filesystem is a security issue so Freenet requires a negotiation to happen first to confirm that the client has access to the directory that you are requesting the server to write to. This negotiation requirement, known as TestDDA can be disabled in the configuration settings of the node but it's not recommended.

First the client must send a TestDDARequest message listing the directory it wants access to and whether read or write access is being requested.

TestDDARequest
Directory=/tmp/
WantWriteDirectory=true
WantReadDirectory=true
EndMessage

The server replies with a TestDDAReply:

TestDDAReply
Directory=/tmp/
ReadFilename=/tmp/testr.tmp
WriteFilename=/tmp/testw.tmp
ContentToWrite=RANDOM
EndMessage

The script should now write the data contained in the ContentToWrite key into the file referenced by the WriteFilename key. It should read the data from the file referenced in the ReadFilename key and send that data in a TestDDAResponse:

TestDDAResponse
Directory=/tmp/
ReadContent=...content from TestDDAReply...
EndMessage

The server then responds with a TestDDAComplete:

TestDDAComplete
Directory=/tmp/
ReadDirectoryAllowed=true
WriteDirectoryAllowed=true
EndMessage

Once that dance is complete then put and get requests can be done to that specific directory. The bash code for doing this is:

    cat >&3 <<HERE
    TestDDARequest
    Directory=/tmp/
    WantWriteDirectory=true
    WantReadDirectory=true
    EndMessage
    HERE
    
    wait_for "TestDDAReply" <&3
    content=$(process_dda_reply <&3)
    
    cat >&3 <<HERE
    TestDDAResponse
    Directory=/tmp/
    ReadContent=$content
    EndMessage
    HERE
    
    wait_for "TestDDAComplete" <&3
    process_dda_complete <&3

It uses a helper function process_dda_reply to handle the TestDDAReply message from the server:

    function process_dda_reply {
      local readfile=""
      local writefile=""
      local content=""
    
      while read -r line
      do
       if [[ "$line" =~ ^ReadFilename=.* ]]; then
         readfile="${line##ReadFilename=}"
       fi
       if [[ "$line" =~ ^WriteFilename=.* ]]; then
         writefile="${line##WriteFilename=}"
       fi
       if [[ "$line" =~ ^ContentToWrite=.* ]]; then
         content="${line##ContentToWrite=}"
       fi
       if [[ "$line" == "EndMessage" ]]; then
         echo -n "$content" >"$writefile"
         cat "$readfile"
         break
       fi
      done
    }

This function reads the fields of the TestDDAReply and writes the required content to the write file and returns the data contained in the read file. This returned data is sent in the TestDDAResponse. The process_dda_complete function checks the TestDDAComplete fields to ensure that access was granted. The full bash script is available in testdda.sh.

Retrieving data to disk

The ReturnType field of the ClientGet message can be set to disk to write directly to a disk file once the TestDDA process is complete. The message looks like this:

    cat >&3 <<HERE
    ClientGet
    URI=CHK@HH-OJMEBuwYC048-Ljph0fh11oOprLFbtB7QDi~4MWw,B~~NJn~XrJIYEOMPLw69Lc5Bv6BcGWoqJbEXrfX~VCo,AAMC--8/pitcairn_justice.jpg
    Identifier=1234
    Verbosity=1
    ReturnType=disk
    Filename=/tmp/pitcairn_justice.png
    EndMessage
    HERE

In this case we're retreving a file I've inserted previously. The Verbosity key is set to 1 to enable SimpleProgress messages to be received. These messages contain fields showing the total number of blocks that can be fetched for that file, the required number of blocks that we need to get, how many we've successfully retrieved so far, and a few other fields. The following bash function processes this and prints the result:

    function handle_progress {
      local total=0
      local succeeded=0
      local required=0
      local final=""
    
      while read -r line
      do
       if [[ "$line" =~ ^Total=.* ]]; then
         total="${line##Total=}"
       fi
       if [[ "$line" =~ ^Required=.* ]]; then
         required="${line##Required=}"
       fi
       if [[ "$line" == "FinalizedTotal=true" ]]; then
         final="final"
       fi
       if [[ "$line" =~ ^Succeeded=.* ]]; then
         succeeded="${line##Succeeded=}"
       fi
       if [[ "$line" == "EndMessage" ]]; then
         echo "Progress: retrieved $succeeded out of $required required and $total total ($final)"
         break
       fi
      done
    }

The FinalizedTotal field indicates if the Total field is accurate and will not change. Otherwise that field can increase as more knowledge about the file is gained. The Required field is the number of blocks that need to be received to reconstruct the file. It is less than Total due to redundancy in the way freenet stores data to account for nodes going away and data being lost.

The handle_progress function is called from within wait_with_progress, which waits for a particular message (usually the one indicating the end of the transfer), displays progress, and ignores everything else.

    function wait_with_progress {
      while read -r line
      do
        if [ "$line" == "SimpleProgress" ]; then
          handle_progress
        fi
        if [ "$line" == "$1" ]; then
          break
        fi
      done
    }

These are called as follows:

    cat >&3 <<HERE
    ClientGet
    URI=CHK@HH-OJMEBuwYC048-Ljph0fh11oOprLFbtB7QDi~4MWw,B~~NJn~XrJIYEOMPLw69Lc5Bv6BcGWoqJbEXrfX~VCo,AAMC--8/pitcairn_justice.jpg
    Identifier=1234
    Verbosity=1
    ReturnType=disk
    Filename=/tmp/pitcairn_justice.png
    EndMessage
    HERE
    
    wait_with_progress "DataFound" <&3
    wait_for "EndMessage" <&3

The DataFound message is sent by the server when the file has been successfully retrieved. It can be found at the location specified in the Filename field of the ClientGet.

The full bash script is available in getdisk.sh.

$ bash getdisk.sh
Progress: retrieved 0 out of 1 required and 1 total ()
Progress: retrieved 1 out of 1 required and 1 total ()
Progress: retrieved 1 out of 5 required and 10 total ()
Progress: retrieved 1 out of 5 required and 10 total (final)
Progress: retrieved 5 out of 5 required and 10 total (final)

Inserting Data Inline

When storing data using FCP you can provide the data directly in the message or reference a file on disk that the node will read and store. They are both done using the ClientPut message. Sending this message looks like:

    file="$1"
    size=$(stat -c%s "$file")
    mime=$(file --mime-type "$file" |awk '{print $2}')
    
    cat >&3 <<HERE
    ClientPut
    URI=CHK@
    Metadata.ContentType=$mime
    Identifier=1234
    Verbosity=1
    GetCHKOnly=false
    TargetFilename=$(basename "$file")
    DataLength=$size
    UploadFrom=direct
    Data
    HERE
    
    dd status=none if="$file" bs="$size" count=1 |pv -L 500k >&3
    
    wait_with_progress "PutSuccessful" <&3
    uri=$(get_uri <&3)
    wait_for "EndMessage" <&3

ClientPut requires the mime type of the file and this is obtained using file. The size of the file is retrieved with stat. These are placed in the ClientPut message directly. The binary data for the file needs to be sent after a Data terminator similar to how we retrieved the data when doing an inline get. dd is again used for this but it's piped to pv to limit the data transfer rate otherwise the network gets swamped due to buffer bloat.

The URI for inserting is generated as a CHK key. This is a key based on a hash of the file content. Inserting the same file will result in the same key. get_uri reads the PutSuccessful message to find the full key of the insert. This is displayed to the user later in the script.

The full bash script is available in putinline.sh.

$ bash putinline.sh /tmp/example.txt 
Progress: inserted 0 out of 1 ()
Progress: inserted 0 out of 2 ()
Progress: inserted 0 out of 2 (final)

Inserting Data from Disk

Inserting data directly from a disk file works very similar to requesting from a disk file. It requires the TestDDA process followed by a ClientPut using a UploadFrom set to disk:

    cat >&3 <<HERE
    ClientPut
    URI=CHK@
    Metadata.ContentType=$mime
    Identifier=1234
    Verbosity=1
    GetCHKOnly=false
    TargetFilename=$(basename "$file")
    Filename=$file
    UploadFrom=disk
    EndMessage
    HERE
    
    wait_with_progress "PutSuccessful" <&3
    uri=$(get_uri <&3)
    wait_for "EndMessage" <&3

The full bash script is available in putdisk.sh.

Other Messages

There are other interesting messages that are useful. Using ClientPut if you set the field GetCHKOnly to true then the file isn't inserted but the CHK key is generated. Since CHK is based on the file contents it will be the same key if the file is inserted using the same mime type, filename and other parameters. This allows generating a key, sending it to a third party and they can start a file retrieval before the file completes inserting. There are security issues with this in that if an attacker knows the key while it is being inserted they may be able to narrow down the location of the inserting node.

Another useful message is GenerateSSK:

GenerateSSK
Identifier=1234
EndMessage

This results in a SSKKeypair reply containing a randomly generated SSK insert and request key:

SSKKeypair
InsertURI=SSK@AK.../
RequestURI=SSK@Bn.../
Identifier=1234
EndMessage

These can be used to insert data with ClientPut by setting the URI to the InsertURI, and retrieved by a third party using the RequestURI as the URI in ClientGet.

ClientPutDiskDir inserts an entire directory. This is the basis of inserting 'Freesites' - Freenet websites. I wrote a mirror.sh utility that mirrors an online website or page and inserts it into Freenet. This is useful for linking to news articles in Freenet without having to leave the Freenet environment. It uses a putdir.sh script that inserts a directory.

Conclusion

The Freenet API has a lot of functionality beyond what I've shown here. I used bash for the examples because it has few dependancies but more robust scripts would be easier in a full featured programming language. I'm not a bash expert and welcome corrections and additions. I've put the code in an fcp-examples github repository.

There are some client libraries with pyFreenet being an example. I recommend the following articles for a deeper dive into Freenet programming:

Tags: freenet 

2017-03-20

Relational Programming in Mozart/Oz

A video I watched recently on logic programming, A Vision for Relational Programming in miniKanren, by William Byrd gives some interesting examples of relational programming using miniKanren.

miniKanren is an embedded constraint logic programming language designed for writing programs as relations, rather than as functions or procedures. Unlike a function, a miniKanren relation makes no distinction between its inputs and outputs, leading to a variety of fascinating behaviors. For example, an interpreter written as a relation can also perform code synthesis, symbolic execution, and many other tasks.

The video demonstrates how a relational interpreter can be used not just to evaluate expressions but to also generate valid expressions given a result. Other examples of relational programming would be the Prolog examples in my Shen Prolog post where relational functions like member can be used to enumerate all members of a list, as well as the traditional finding a member in a list.

I like to use the Mozart Programming System for exploring logic programming and this post goes through converting some examples in Mozart/Oz. Mozart is an implementation of the Oz programming language. The book "Concepts, Techniques, and Models of Computer Programming" is a good textbook for learning the different programming models that Mozart enables, including the relational model. For the examples here I'm using Mozart 1.3.x. It's an old language and implementation but I like to experiment with languages that are a bit different to mainstream languages.

The Prolog examples following can be tested in the online SWI Prolog implementation, Swish.

Basic example

The following function in Prolog has multiple values that be used as an argument:

foo(one).
foo(two).
foo(three).

Passing either one, two or three succeeds:

foo(one).
true

foo(two).
true

foo(four).
false

Using backtracking it's possible to enumerate all valid arguments:

foo(X).
X = one
X = two
X = three

The function findall can be used to return all values in a list:

findall(X, foo(X), Y).
Y = [one, two, three]

Backtracking in Mozart/Oz is not the default. Mozart provides the choice statement to enable backtracking. A choice statement contains a sequence of clauses separated by [] where a 'choice point' is created for each group of clauses. Each clause is tried in turned - if a particular clause fails then execution backtracks to a previous choice point and is resumed until one succeeds or all fails. The equivalent implementation of foo in Mozart is:

fun {Foo}
   choice
      one
   []
      two
   []
      three
   end
end

Prolog uses an automatic depth first search to find solutions. Mozart doesn't do automatic search - programs involving choice points must be run in a search engine that can implement any form of search required. A default search engine is provided that does depth first search. Here we create a search object and enumerate all the valid results manually:

Y = {New Search.object script(Foo)}
{Browse {Y next($)}
[one]

{Browse {Y next($)}
[two]

{Browse {Y next($)}
[three]

{Browse {Y next($)}
nil

The Mozart syntax can look a little strange at first but it's not too difficult to understand once it's learnt. New instantiates an object. Search.object is the class of the object being created. script(Foo) is a record that is being passed to the constructor of Search.object. The Foo is the function we created previously. The remaining statements call the next method on the object. next takes as an argument a variable that receives the result of the call. The use of $ tells Mozart to pass a temporary variable to receive the result, and return that result as the result of the call. This result is passed to Browse which displays the result in the GUI browser. Results are returned in single element lists and when there are no more results, nil is returned.

There is a library function to return all possible solutions in a list, Search.all:

{Browse {Search.all Foo 1 _ $}}
[one two three]

The ability to interactively drive the search enables writing programs that control the search process. The Explorer is a Mozart tool that uses this ability to show an interactive graph of the search space of a program. It can be run to show a graph of all solutions in Foo with:

{Explorer.all Foo}

Membership testing

In Prolog a function to test membership of a list can be written as:

mem(X, [X|_]).
mem(X, [_|Y]) :- mem(X, Y).

This states that X is a member of the list if X is the head of the list (the first element), or if it is a member of the tail of the list.

mem(2, [1,2,3]).
true.

Thanks to Prolog's backtracking you can also use this function to enumerate all values in the list:

mem(X, [1,2,3]).
1
2
3

In Swish you need to click 'Next' to get each result. You can return all results as a list with findall:

findall(X, mem(X, [1,2,3]), Y).
Y = [1, 2, 3].

This is what the non-backtrackable member function looks like in Mozart:

fun {Mem X Ys}
   case Ys
   of nil then false
   [] Y|Yr then X==Y orelse {Mem X Yr}
   end
end

{Browse {Mem 1 [1 2 3]}}
true

A backtrackable version of this code uses choice instead of case to create choice points in each clause:

proc {Mem X Ys}
   choice
      Ys = X|_
   []
      Yr
   in
      Ys = _|Yr
      {Mem X Yr}
   end
end

Here proc is used instead of fun. A procedure doesn't return a value. It is expected to bind values to arguments to return a result. In this case either X or Ys will be bound depending on what is passed as an argument. Given the call Mem 1 [1 2 3] then the first clause of choice succeeds - the head of the list is equal to X. With the call Mem X [1 2 3] then X will be successively bound to each element of the list depending on what order the search engine used to evaluate it uses. Mem 1 Y will enumerate all possible lists containing 1:

{Browse {Search.all proc {$ L} {Mem 1 [1 2 3]} end 1 _ $}}
[_]

{Browse {Search.all proc {$ L} {Mem L [1 2 3]} end 1 _ $}}
[1 2 3]

Y={New Search.object script(proc {$ L} {Mem 1 L} end)}
{Browse {Y next($)}}
[1|_]

{Browse {Y next($)}}
[_|1|_]

{Browse {Y next($)}}
[_|_|1|_]

The difference here compared to our search over Foo is that an anonymous procedure is passed to the search engine. Foo was aleady a single argument procedure so it didn't require a wrapper. The anonymous procedure takes a single argument which is expected to be bound to the result of a single iteration. In the first example no result is bound, just the fact that the Mem call succeeds is enough. In the second, L is bound to the first argument to Mem resulting in a list of all valid first arguments. In the third, L is bound to the second argument to Mem resulting in a list of all valid lists that contain the element 1. This is infinite so we only iterate the first few solutions.

Syntax checker

The miniKanren video referenced earlier, A Vision for Relational Programming in miniKanren, has an example of a simple language syntax checker implemented in the relational style. The equivalent Mozart implementation is:

proc {LCSyn Term}
   choice
      {IsAtom Term true}
   []
      X T
   in
      Term = lambda(X T)
      {IsAtom X true}
      {LCSyn T}
   []
      E1 E2
   in
      Term = [E1 E2]
      {LCSyn E1}
      {LCSyn E2}
   end
end

A Term is either an atom, a lambda record containing an argument and body, or application of two expressions (here represented as a list). A Term can be tested to see if it is valid with:

{Browse {Search.one.depth
         proc {$ L}
            {LCSyn lambda(foo bar)}
         end
         1 _ $}}

A result of [_] indicates that it succeded (nil would be a failure). Thanks to the magic of relational programming it's possible to enumerate all valid terms:

Y={New Search.object script(LCSyn)}
{Browse {Y next($)}}
[lambda(_ _)]

{Browse {Y next($)}}
[[_ _]]

{Browse {Y next($)}}
[[lambda(_ _) _]]

{Browse {Y next($)}}
[[[_ _] _]]

Reversible Factorial

As a final example, the following program implements a factorial function that can compute the factorial of a number, or the number given its factorial:

proc {Fact ?N ?F}
   proc {Fact1 ?N ?F}
      choice
         N = 0
         F = 1
      []
         N1 F1
      in
         N1::0#FD.sup
         F1::0#FD.sup
         N >: 0
         N1 =: N - 1
         F =: N * F1
         {Fact1 N1 F1}
      end
   end
in
   N::0#FD.sup
   F::0#FD.sup      
   {Fact1 N F}
   {FD.distribute naive [N F]}
end

% Factorial of 5
{Browse {Search.all proc {$ L} {Fact 5 L} end 1 _ $}}
[120]

% What factorial gives the answer 24
{Browse {Search.all proc {$ L} {Fact L 24} end 1 _ $}}
[4]

This uses a few more Mozart features but is essentially a search through the choice points in a choice statement. In this case it also uses Finate Domain Constraints. This allows telling Mozart what the range of a particular integer value can be, provides constraints upon that value and the search process will attempt to find a solution that results in the integer containing a single value. The syntax X::0#5 constrains the variable X to be from zero to five. FD.sup is a constant for an implementation defined maximum upper bound. Operators ending in : impose constraints on values that can be backtracked during search. There's a limitation to this approach in that the finite domain constraint has an implementation defined maximum upper value and the search will fail if it hits this value which unfortunately limits the practicality of a reverse factorial.

Mozart/Oz summary

Mozart/Oz has a lot of other interesting functionality. Distributed objects, GUI programming, Concurrency, amongst others. It's a fun tool to experiment with and Concepts, Techniques, and Models of Computer Programming is an excellent read.

Unfortunately development of Mozart has slowed from its heyday. A reimplementation of Mozart, called Mozart 2 is being worked on that uses LLVM for the code generator and replaces the internal constraint programming system with the Gecode toolkit. Development seems to have stalled recently and it lacks many of the features that already exist in older Mozart versions. Hopefully it'll pick up steam again.

For this reason I continue to use Mozart 1.3.x. There is a 1.4.x version but it has some issues that make me avoid using it. The distributed layer was replaced with an implementation written in C++ which has some bugs that I've been unable to work around in projects where I used it. The distribution panel is broken due to the hooks it requires not being implemented by the new C++ layer. At some point Mozart 2 may be complete enough for me to move to that version. I make the occasional fixes to 1.3.x and 1.4.x to keep it building and running on Linux.

Even though Mozart seems to be in a holding pattern it's a great for exploring ideas and the list of papers is a good resource for learning about distributed programming, constraints and concurrency. Some interesting projects implemented in Mozart include:

Tags: mozartoz 

2017-02-22

Borrowing Internal Pointers in ATS

Recently Robert O'Callahan wrote a post on what Rust can do that other languages can't. It involves getting a pointer to an internal field of a stack allocated object and having the type system ensure that the lifetime of that pointer doesn't outlive the owning object.

As soon as I saw this I thought "Aha, ATS can do that!" but I suspected it would be quite a bit more verbose. It turns out it's possible but tedious to do and while ATS 1 can do it in safe code, ATS 2 requires some help. This isn't a "Robert's post is wrong" post or a tear down of Rust, it's a thought experiment in how to do things in ATS that are easier in Rust. Plus ATS is in no way a "remotely mainstream language".

The following is a representation of the scenario that works in ATS 1. First we declare the type of an object that contains two fields:

viewtypedef X = @{
  a= int,
  b= int
}

A function that safely returns a pointer to one of the fields looks like this:

fun get_b_ref {l:addr} (pf: X @ l | x: ptr l):
              [l2:addr] (int @ l2, int @ l2 -<lin> X @ l | ptr l2) =
let
  val p = &(x->b)
  prval pf = view@ (x->b)
in
  #[.. | (pf, llam pf => let
          prval () = view@ (x->b) := pf
        in
          view@ (!x)
        end
    | p)]
end

This can look imposing if you're not familiar with ATS. What does this all mean? Once you're familiar with ATS you can read this as get_b_ref takes a pointer to an X and a proof that you can use it. It consumes this proof meaning we can't access the X object after calling. It returns a pointer to an int - one of the internal fields of X - and provides a proof to use it as an int. It also returns a proof function that consumes that proof of an int and returns the original proof that there is an X at l - allowing you to use the original x as an X again and preventing using the returned pointer to an int due to the proof allowing you to use that getting consumed.

In more detail, it defines a function, get_b_ref, that takes a pointer to an X object as an argument: (pf: X @ l | x: ptr l). The items to the left of the | are proof arguments and only exist at type checking time. This says that pf contains a proof that an object of type X exists at address l. The l is defined in the {l:addr} section of the function declaration meaning l is a memory address. The items to the right of the | are normal values. In this case, x is a pointer with address l. Because we have a pointer at address l and a proof that an object of type X exists at l we can treat the pointer safely as being a pointer to an X. The function consumes the proof when called. That means that the proof cannot be used again so we can no longer treat x as a pointer to an X. It's a pointer to something we can't do anything with after calling.

The result of the function comes after the : symbol. The (int @ l2, int @ l2 -<lin> X @ l | ptr l2) is the meat of it. Again, proofs are to the left of the | and normal values to the right. In this case the right is ptr l2 which means a pointer of some address, l2, is returned. The intent of the function is to return an internal pointer to one of the int fields so we'd expect to have it return a proof that l2 points to an int. The proof returned is a tuple: (int @ l2, int @ l2 -<lin> X @ l).

The first field of that tuple is int @ l2 which is the proof we expected. So a caller of this function will have a pointer to an int and the proof to use it as such. The second field of the tuple is int @ l2 -<lin> X @ l. This is the syntax for declaring an anonymous function type. The items to the left of the -<...> are the arguments to the function and the items to the right are the return value. The second field of the tuple is therefore a function that takes a proof that an int exists at address l2 and returns a proof that an X exists at l. The lin means this function is linear - it can only be used once. Because this tuple appears in the proof section of the result, this anonymous function is a proof function - it is called at type checking time.

The implementation of the function gets a pointer to the b field of the object and a proof that we can use it:

val p = &(x->b)
prval pf = view@ (x->b)

The prval line is proof code that runs at type checking time. view@ is a ATS keyword that gets the proof for an object if we can access it. Once we get a proof using view@ we need to return it later else the proof checking fails. This is a form of 'borrowing' - we are borrowing the proof to access the internal object and we have to give it back later otherwise we can't access that field again and the compiler complains if we don't give it back.

The main part of the return value is:

(pf, llam pf => let
              prval () = view@ (x->b) := pf
            in
              view@ (!x)
            end
        | p)]

It is a tuple, with two proof arguments and a value. The first proof argument is the proof of int @ someaddress to access the internal field we got earlier using view@. The second is a linear lambda function. That function is what returns the proof back to the view@ for the internal field in this line:

prval () = view@ (x->b) := pf

This is the returning of the proof we borrowed earlier. The linear lambda function returns the view@ of the original X object. The value being returned p is the pointer to the internal field.

The [.. | ] syntax is a way of telling the compiler that the return value is of a particular existential type and that it should compute it from the context of the expression such that it matches the [l2:addr] existential result type we declared in the function declaration. Ideally the compiler would infer this without our help but unfortunately it does not. It will tell us if we get it wrong though.

Usage of the function looks like:

implement main() = let
  var x: X = @{ a=1, b=2 }
  val (pf, pff | p) = get_b_ref(view@ x | &x)
  val _ = !p := 3
  prval () = view@ x := pff(pf)
in
  println!("x.b = ", x.b)
end

A variable x is declared on the stack. We get a pointer to the b field using the function above and set that field via the pointer. We're allowed to do this because we have a proof that p is a pointer to an int via the pf proof variable. Calling get_b_ref makes x no longer usable as get_b_ref consumed the proof to access it. The proof function, pff is used to return the proof to access X so we can then use x again. After that we can't use the internal pointer to b anymore. The compiler prevents the escape of the proof to access the internal pointer from the lifetime of the stack variable x.

Unfortunately this code doesn't work in ATS 2. The equivalent code, which fails to compile, is:

fun get_b_ref {l:addr} .<>. (pf: X @ l | x: ptr l):<>
                            [l2:addr] (int @ l2, int @ l2 -<lin> X @ l | ptr l2) =
let
  val p = addr@(x->b)
  prval pf = view@ (x->b)
in
  (pf, llam (pf) => let
          prval () = view@ (x->b) := pf
        in
          view@ (!x)
        end
    | p)
end

ATS 2 does not allow the use of view@ (!x) inside the lambda if the x comes from outside the lambda as it does in this case. Maybe this will change in future iterations of ATS 2 but currently it does not. One workaround is to resort to embedded C code:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

viewtypedef X = @{
  a= int,
  b= int
}
extern viewtypedef "X" = X

%{
int* get_b_ref(X* x) {
  return &x->atslab__b;
}
%}

extern fun get_b_ref {l:addr} (pf: X @ l | x: ptr l):<>
         [l2:addr] (int @ l2, int @ l2 -<lin> X @ l | ptr l2) = "mac#get_b_ref"

implement main0() = let
  var x: X = @{ a=1, b=2 }
  val (pf, pff | p) = get_b_ref(view@ x | addr@ x)
  val _ = !p := 3
  prval () = view@ x := pff(pf)
in
  println!("x.b = ", x.b)
end

Here the X object is given a name "X" that can be accessed from C using the line:

extern viewtypedef "X" = X

The get_b_ref function is implemented in C and given an extern definition that matches the previous one. Although the C implementation is effectively 'unsafe' code, any calls of get_b_ref from ATS is safe as the type defines exactly what it does. This is an example of implementing a typesafe kernel around potentially unsafe code. In this case there is no runtime overhead as the typesafe kernel is proof code.

Which approach to use depends on the project. Often the overhead of implementing safe pointer code in ATS in terms of satisfying the proofs of the implementation is too much at the time. Embedded C with a safe external definition is a faster approach to get things done. Later the C code could be replaced with proof-correct ATS as desired.

Hopefully this gives some insight in what ATS can do, and gives some appreciation of the simplicity of the approach Rust has taken. ATS tends towards providing the low level tools to build the level of access you want but this has cognitive and syntactic overhead. Once you're famliar with the system it becomes understandable but it's probably not "new user friendly" like Rust is.

Tags: ats 

2017-01-19

Transitioning from ATS1 to ATS2

ATS is a programming language I've used a lot in the past. A new version was released a while ago, ATS2, that resulted in various library and language changes and I never got around to transitioning my code to it. Many of my existing posts on ATS are still relevant but some syntax has changed and many of my examples won't compile in the new system. I'm playing around with ATS 2 now and will note some of the major differences I've encountered in this post to get some of my existing ATS1 examples to build.

The first thing to note is that there's some extensive documentation for ATS2 compared to when I first started learning. This was very helpful in learning the differences.

Starting with a simple 'hello world' program:

implement main0() = print("Hello World\n")

Put in a 'hello.dats' file this can be compiled with:

$ patscc -o hello hello.dats
$ ./hello
Hello World

Notice that the main entry function is called main0. In ATS1 this was main. In ATS2 the use of main is to mirror the C main function in that it can take arguments and return a value:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

implement main(argc, argv) =
  if argc = 2 then
    (print("Hello "); print(argv[1]); print_newline(); 0)
  else
    (print("This program requires an argument.\n"); 1)

main takes the traditional argc and argv arguments that the C language main function takes. The function must return an integer value - in this example 0 for success and 1 for failure. Note the two #include statements. These are boilerplate used by ATS to enable the ATS compiler atsopt to gain access to certain external library packages and the definitions of various library functions.

$ patscc -o hello1 hello1.dats
$ ./hello1
This program requires an argument.
$ ./hello1 World
Hello World

In my Linear Datatypes in ATS article I showed an example of declaring a datatype and folding over it:

datatype list (a:t@ype) =
  | nil (a)
  | cons (a) of (a, list a)

fun {seed:t@ype}{a:t@ype} fold(f: (seed,a) -<> seed, s: seed, xs: list a): seed =
  case+ xs of
  | nil () => s
  | cons (x, xs) => fold(f, f(s, x), xs)

implement main() = let
  val a = cons(1, cons(2, cons(3, cons(4, nil))))
  val b = fold(add_int_int, 0, a)
  val c = fold(mul_int_int, 1, a)
  val () = printf("b: %d c: %d\n", @(b, c))
in
  ()
end

This won't compile in ATS2 due to:

  • add_int_int and mul_int_int don't exist in ATS2. They are replaced with g0int_add_int and g0int_mul_int.
  • printf doesn't exist in ATS2. I don't know of a replacement for this other than multiple print statements.
  • main needs to be main0.

The working ATS2 version becomes:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

datatype list (a:t@ype) =
  | nil (a)
  | cons (a) of (a, list a)

fun {seed:t@ype}{a:t@ype} fold(f: (seed,a) -<> seed, s: seed, xs: list a): seed =
  case+ xs of
  | nil () => s
  | cons (x, xs) => fold(f, f(s, x), xs)

implement main0() = let
  val a = cons(1, cons(2, cons(3, cons(4, nil))))
  val b = fold(g0int_add_int, 0, a)
  val c = fold(g0int_mul_int, 1, a)
  val _ = print("b: ")
  val _ = print_int (b)
  val _ = print (" c: ")
  val _ = print_int (c)
  val _ = print_newline()
in
  ()
end

Compiling this fails at the link stage however:

$ patscc -o test test.dats
/tmp/ccfgUdP6.o: In function `mainats_void_0':
t2_dats.c:(.text+0x138): undefined reference to `atsruntime_malloc_undef'
t2_dats.c:(.text+0x15c): undefined reference to `atsruntime_malloc_undef'
t2_dats.c:(.text+0x180): undefined reference to `atsruntime_malloc_undef'
t2_dats.c:(.text+0x1a4): undefined reference to `atsruntime_malloc_undef'
collect2: error: ld returned 1 exit status

The reason for this is the example uses dynamic memory allocation. ATS2 requires a command line argument to tell it to use the libc functions for allocating and freeing memory:

$ patscc -DATS_MEMALLOC_LIBC -o test test.dats
$ ./test
b: 10 c: 24

The same article shows a linear version of the datatype using dataviewtype:

dataviewtype list (a:t@ype) =
  | nil (a)
  | cons (a) of (a, list a)

fun {a:t@ype} free(xs: list a): void =
  case+ xs of
  | ~nil () => ()
  | ~cons (x, xs) => free(xs)

fun {seed:t@ype}{a:t@ype} fold(f: (seed,a) -<> seed, s: seed, xs: !list a): seed =
  case+ xs of
  | nil () => (fold@ xs; s)
  | cons (x, !xs1) => let val s = fold(f, f(s, x), !xs1) in fold@ xs; s end

implement main() = let
  val a = cons(1, cons(2, cons(3, cons(4, nil))))
  val b = fold(add_int_int, 0, a)
  val c = fold(mul_int_int, 1, a)
  val () = printf("b: %d c: %d\n", @(b, c))
in
  free(a)
end

Here the list type is linear. References to a list value can't be aliased and they must be explicitly consumed. In ATS2 this code now looks like:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

datavtype list (a:t@ype) =
  | nil (a)
  | cons (a) of (a, list a)

fun {a:t@ype} free(xs: list a): void =
  case+ xs of
  | ~nil () => ()
  | ~cons (x, xs) => free(xs)

fun {seed:t@ype}{a:t@ype} fold(f: (seed,a) -<> seed, s: seed, xs: !list a): seed =
  case+ xs of
  | nil () => (s)
  | cons (x, xs1) => let val s = fold(f, f(s, x), xs1) in s end


implement main0() = let
  val a = cons(1, cons(2, cons(3, cons(4, nil))))
  val b = fold(g0int_add_int, 0, a)
  val c = fold(g0int_mul_int, 1, a)
  val _ = print("b: ")
  val _ = print_int (b)
  val _ = print (" c: ")
  val _ = print_int (c)
  val _ = print_newline()
in
  free a
end

This is mostly the same but the implementation of fold is simplified. There is no longer a need to use the "!" suffix when declaring the xs1 variable in the pattern match and the fold@ is not needed. In the cases where it is needed then an @ syntax seems to be used in the pattern match for ATS2:

fun {seed:t@ype}{a:t@ype} fold(f: (seed,a) -<> seed, s: seed, xs: !list a): seed =
  case+ xs of
  | nil () => (s)
  | @cons (x, xs1) => let val s = fold(f, f(s, x), xs1) in fold@ xs; s end

There is some discussion about this in the ATS tutorial but I need to look into it in detail to work out where it is needed now.

Another difference in this example is using datavtype instead of dataviewtype. The ATS2 examples I've seen consistently use the shorter word but the longer one still seems to work.

ATS2 makes more extensive use of templates. One thing that I found a pain point in ATS1 was dealing with higher order functions. ATS has multiple function types. There are functions that have no external environment (eg. plain C functions), closures which contain a pointer to an environment, linear closures where the closure envionment must be explicitly free'd, etc. (see closures in ATS for some examples). Writing higher order functions like map and filter would often require implementing a version for each type of function with duplicated boilerplate code.

The ATS2 library uses function templates specialized within the implementation of higher order functions to deal with this boilerplate. Here is an example of the technique adapted from Programming in ATS:

#include "share/atspre_define.hats"
#include "share/atspre_staload.hats"

datatype list (a:t@ype) =
  | nil (a)
  | cons (a) of (a, list a)

extern fun{a:t@ype}{b:t@ype} map$impl(x: a): b

fun{a:t@ype}{b:t@ype} map(xs: list a): list b = let
  fun loop(xs: list a): list b =
    case+ xs of
    | nil () => nil ()
    | cons(x, xs) => cons(map$impl<a><b>(x), loop(xs))
in
  loop(xs)
end

This implements a 'map' template function that takes a list of type a and returns a list of type b. It iterates over the list calling another template function called map$impl on each element and conses up the result. The map$impl function takes an a and returns a b. It is left unimplemented so far. The following is an implementation of a map that applies a function over the list:

fun{a:t@ype}{b:t@ype} map_fun(xs: list a, f: a -<fun> b) = let
  implement map$impl<a><b> (x) = f(x)
in
  map<a><b>(xs)
end

Here the map$impl function is implemented internally to map_fun to call the passed in function. The version that uses a closure does the same:

fun{a:t@ype}{b:t@ype} map_clo(xs: list a, f: a -<cloref> b) = let
  implement map$impl<a><b> (x) = f(x)
in
  map<a><b>(xs)
end

In this way the code that does the looping and mapping is implemented once in the map function and shared amongst the implementations that take the different function types. The caller doesn't need to know about templates to use it:

implement main0() = let
  val n = 5
  val a: list int = cons(1, cons(2, cons(3, cons(4, nil))))
  val b = map_fun(a, lam(x) => x * 2)
  val c = map_clo(a, lam(x) => x * n)

in
  ()
end

A lot of the ATS library implementation seems to take this approach, with template specialisation being used to extend it.

There are other additions to ATS2 that I haven't looked at yet. The constraint solver can be replaced with an external solver like Z3. There are backends for other languages, including JavaScript. There's a Try ATS online service for trying out the language and support for a C++ FFI.

I probably won't rewrite my existing ATS1 posts to be ATS2 compatible but will write about any changes needed to get things to compile if they're non-obvious.

Tags: ats 

2017-01-18

Running X11 apps in an rkt container

rkt is a container runtime I've been using on a few projects recently. I was creating a container for Mozart which uses emacs as an IDE. This requires running an X11 application within the container and have it displayed on the host display.

To get this working I needed to mount my hosts X11 unix domain socket inside the container and provide an Xauthority file that gave the container the rights to connect to the host X server.

The following shell commands use acbuild to create a container that runs xclock as an example of the process:

acbuild begin docker://ubuntu:16.04
acbuild set-name bluishcoder.co.nz/xclock
acbuild run -- apt install --no-install-recommends --yes x11-apps
acbuild run -- rm -rf /var/lib/apt/lists/*
acbuild environment add DISPLAY unix$DISPLAY
acbuild environment add XAUTHORITY /root/.Xauthority
acbuild mount add x11socket /tmp/.X11-unix
acbuild mount add x11auth /root/.Xauthority
acbuild set-exec xclock
acbuild write --overwrite xclock.aci
acbuild end

It uses an Ubuntu Linux image from the Docker hub as a base and installs x11-apps. To reduce disk space it removes cached package files afterwards. A DISPLAY environment variable is set to point to use the same DISPLAY as the host. The XAUTHORITY enviroment variable is set to point to a file in the home directory of the root user in the container.

The mount subcommands expose the x11socket and x11auth endpoints to point to where the X11 unix domain socket and the Xauthority file are expected to be. These will be provided by the rkt invocation to mount host resources in those locations.

The final part of the script sets the executable to be xclock and writes the aci file.

On the host side we need to create an Xauthority file that provides the container access to our X11 server. This file needs to be set so that any hostname can connect to the X11 server as the hostname for the container can change between invocations. To do this the authentication family in the file needs to be set to FamilyWild. I got the steps to do this from this stack overflow post:

xauth nlist :0 | sed -e 's/^..../ffff/' | xauth -f myauthority nmerge -

This will retrieve the Xauthority information for display :0 and modify the first four bytes to be ffff. This sets the authority family to FamilyWild. A new file called myauthority is created with this data. This file will be mapped to the x11auth mount point in the container.

The container can be executed with rkt:

rkt run --insecure-options=image xclock.aci \
        --volume x11socket,kind=host,source=/tmp/.X11-unix \
        --volume x11auth,kind=host,source=./myauthority

The --volume command line arguments map the mount points we defined in the acbuild commands to locations on the host. The running xclock application should now appear on the host X11 display.

Tags: rkt 


This site is accessable over tor as hidden service mh7mkfvezts5j6yu.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-30/


Tags

Archives
Links