Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2016-08-18

Using Freenet over Tor

This post outlines a method of using Freenet over Tor based on posts I wrote on my Freenet hosted blog and subsequent discussions about it. If you read my Freenet hosted blog there's little new here, I'm just making it available on my non-freenet blog.

One issue I've had with Freenet is that it exposes your IP address to peers. Recent law enforcement efforts to monitor Freenet have shown that they have been able to obtain search warrants based on logging requests for blocks of known data and associating them with IP addresses. If law enforcement can do this, so can random bad people.

You can avoid exposing your IP address to random strangers on opennet by using darknet but even then you have to trust your friends aren't monitoring your requests. If it was possible to run Freenet over Tor hidden services then only the hidden service address would be exposed using this logging method. A problem is that Freenet uses UDP which Tor does not support.

A recent post on the Freenet development mailing list pointed out that onioncat provides a virtual network over Tor and tunnels UDP. Using the steps they provided, and some tweaks, it's possible to set up a darknet node that doesn't expose its IP address. It uses the onioncat generated IPv6 address for communicating with peers - and this address is backed by a Tor hidden service.

The steps below outline how to set this up. Note that this is quite experimental and requires care to not expose your IP address. There are some Freenet issues that make things difficult so you should be aware that you do this at your risk and understand it may still expose your identity if things go wrong.

I'm assuming a Debian/Ubuntu like system for the steps.

Install Tor

Install Tor:

$ sudo apt-get install tor

Edit the /etc/tor/torrc file to enable a Hidden Service with an entry like:

HiddenServiceDir /var/lib/tor/freenet/
HiddenServicePort 8060 127.0.0.1:8060

Restart Tor and find your hidden service hostname:

$ sudo systemctl restart tor
$ sudo cat /var/lib/tor/freenet/hostname

Install onioncat

Install onioncat:

$ sudo apt-get install onioncat

Edit /etc/default/onioncat and change the lines matching the following:

ENABLED=yes
DAEMON_OPTS="-d 0 hiddenservicename.onion -U"

Restart onioncat:

$ sudo systemctl stop onioncat
$ sudo systemctl start onioncat

Find your onioncat IP address with:

$ ocat -i hiddenservicename.onion

Install Freenet

Install Freenet in the usual way and go through the browser based setup wizard. Choose "Details settings: (custom)" for the security option. On the subsequent pages of the wizard:

  • Disable the UPnP plugin.
  • Choose "Only connect to your friends"
  • Choose "High" for "Protection against a stranger attacking you over the internet"
  • Click the "I trust at least one person already using Freenet" checkbox.
  • For "Protection of your downloads..." pick any option you want.
  • Pick a node name that your darknet friends will see.
  • Pick a datastore size that you want.
  • Choose the bandwidth limit.

The node will now be started but have no connections. There will be warnings about this.

Configure Freenet over Tor

The following settings need to be changed in "Configuration/Core Settings" - make sure you have clicked "Switch to advanced mode".

  • Change "IP address override" to your onioncat IP address retrieved in the previous section.
  • Apply the changes.

Shut down Freenet and edit the wrapper.conf file in the Freenet installation directory. Change the line that contains java.net.preferIPv4Stack=true to java.net.preferIPv4Stack=false. In my wrapper.conf this is:

wrapper.java.additional.3=-Djava.net.preferIPv4Stack=false

Edit freenet.ini file in the Freenet installation directory. Change or add the following (replace "onioncat IP address" with the IP address obtained installing onioncat):

node.opennet.bindTo=onioncat IP address
node.bindTo=onioncat IP address
node.load.subMaxPingTime=2500
node.load.maxPingTime=5k

Save the file and restart Freenet. There might be a warning about "Unknown external address". Ignore this as you've explictly set one. I provide a patch later in this post if you want to get rid of the warning.

Add a friend

Now is the time to add a Darknet friend who is also using Tor/Onioncat. Go to "Friends/Add a friend". Choose your trust and ability to see other friends settings and enter a description of the friend. Paste their noderef in the "Enter node reference directly" box.

Give your noderef to your friend and have them add it. Once both connections have been added you should see "Connected" in the Friends list for that connection. The IP address should show the onioncat IPv6 address, beginning with "fd".

Optional Freenet patch

When running a Tor based node Freenet thinks the onioncat IP address is a local address. Some places in the Freenet code base check for this and reject it as a valid global routable address. In the FProxy user interface a large warning appears on each page that it couldn't find the external IP address of the node. The other issue is that local addresses aren't counted for bandwidth statistic reporting. The bandwidth box on the statistics page is empty as a result.

I use a patch, onioncat.txt, that provides a workaround for these two issues. The patch is optional as the node works without it but it's a useful improvement if you plan to run a Tor based node long term. You should check the patch before applying it blindly and assure that it's not doing anything nefarious.

Hybrid nodes

If you run a Tor based darknet node then at least one hybrid node must be in the darknet to bridge to the non-tor nodes. These hybrid nodes will have a public clearnet IP address exposed. I outline how to set up a hybrid node later below. For those that trust me, if you send a darknet tor noderef to me at the freemail address on the bottom of this page, or via normal email, I'll connect and send you a noderef of a hybrid node setup in this manner.

Install Tor and Onioncat as described previously. Install Freenet in the usual way and go through the browser based setup wizard. Choose "Details settings: (custom)" for the security option. On the subsequent pages of the wizard:

  • Enable or Disable the UPnP plugin as necessary depending on what you need for your clearnet connection to work.
  • Choose "Connect to strangers"
  • Choose "Low" or "Normal" security as desired.
  • For "Protection of your downloads..." pick any option you want.
  • Pick a datastore size that you want.
  • Choose the bandwidth limit.

The node will start and connect to opennet.

Shut down Freenet and edit the wrapper.conf file in the Freenet installation directory. Change the line that contains java.net.preferIPv4Stack=true to java.net.preferIPv4Stack=false. In my wrapper.conf this is:

wrapper.java.additional.3=-Djava.net.preferIPv4Stack=false

Edit freenet.ini file in the Freenet installation directory. Change or add the following:

node.load.subMaxPingTime=2500
node.load.maxPingTime=5k

Save the file and restart Freenet. If you base64 decode the "physical.udp" section of the noderef for the node you should see that it now contains the onioncat IP address as well as the public clearnet IP address.

Adding friends to this node will give those friends access to the wider Freenet datastore when they reciprocate.

Don't forget to check your noderefs to ensure that the ARK and the public IP address contain data you are willing to reveal. Check both the darknet noderef and the opennet noderef. You can decode the base64 of the "physical.udp" line with the GNU base64 command:

$ echo "physical.udp base64 here" |base64 -d

Final steps and caveats

Try visiting a Freenet index site and see if it loads. If it does then the Freenet over Tor setup is working. It will be slower than normal Freenet usage due to Tor latency. If you connect to more darknet nodes it will get faster.

When adding a friends noderef you can check what IP addresses it will connect to by looking at the "physical.udp" line. This is a base64 encoded list of IP addresses. You might want to check this to ensure that there are no clearnet addresses in there. If there is a clearnet address then it could deanonymize your node when it tries to connect to that in preference to the onioncat address.

The "ark.pubURI" portion of the noderef is an SSK that points to updated IP address information. A node can subscribe to the USK version of this and learn about IP address changes. Your friends node could change their IP address to a clearnet address resulting in you connecting to that.

To avoid the above two issues it's worthwhile running Freenet in a VM or container that does not have clearnet network access and only has access to the onioncat network setup. Alternatively you can use iptables to only allow onioncat traffic for the Freenet process or user running it.

The IP addresses exposed in the noderef include all local link addresses and their scopes. This is Freenet bug 6879. This may leak information you don't want leaked. It pays to check the "physical.udp" and "ark.pubURI" to see what you are exposing. Remember that any IP addresses exposed over the ARK is discoverable by looking at previous editions of the USK.

The traffic footprint of Freenet may make it easier to track down your IP address from your Tor ID. The volume of data and the nature of the traffic may make certain types of Tor de-anonymization techniques more effective.

Ideally it would be possible to have an opennet of Tor nodes so the exchange of darknet noderefs wouldn't be needed. I haven't been able to get this working yet but I'll continue to investigate it.

I've been running a Tor darknet node for the past week to test how well it works. With three darknet connections it runs well enough for browsing freesites. Sone and the Web of Trust took quite a while to bootstrap due to the lower speed but once it was running it works well. FMS and Flip are also usable. I'd expect performance to be even better with more connections.

Tags: freenet 

2016-08-12

Bundling Inferno Applications

Inferno can run as a standalone OS or hosted in an existing OS. In the latter case an emu executable runs and executes as a virtual operating system. An application written in Limbo runs inside this virtual OS. When distributing such an application you can install Inferno on the target machine with the application compiled code inside the filesystem of Inferno. This will include a bunch of stuff that the application may not need since it includes all the utilities for the operating system. To distribute the minimal amount of dependencies to run the Limbo application you can create a cut down Inferno distribution that only includes the application dependencies and run that. Another option is to bundle the root filesystem into the executable leaving only that executable to be distributed.

Application specific Inferno

To create an application specific Inferno install we need to find out what the dependencies are for the application. Most of the following comes from powerman's post on the topic in Russian.

When running emu it runs a program /dis/emuinit.dis to initialize the system and start the shell or other program. An application specific Inferno distribution would require:

  • emu
  • /dis/emuinit.dis
  • /dis/lib/emuinit.dis
  • Your application program files inside the Inferno directory structure somewhere.

Powerman's post goes into detail on how to do this with the following "Hello World" Limbo application:

implement HelloWorld;
include "sys.m";
include "draw.m";

HelloWorld: module
{
    init: fn(nil: ref Draw->Context, nil: list of string);
};

init(nil: ref Draw->Context, nil: list of string)
{
    sys := load Sys Sys->PATH;
    sys->print("Hello World!\n");
}

With this in a helloworld.b file, compiling it produces helloworld.dis. This contains the compiled bytecode for the Dis virtual machine. disdep will list the dependencies that it requires:

; limbo helloworld.b
; ./helloworld
Hello World!
; disdep helloworld.dis
; disdep /dis/emuinit.dis
/dis/lib/arg.dis
/dis/sh.dis
/dis/lib/bufio.dis
/dis/lib/env.dis
/dis/lib/readdir.dis
/dis/lib/filepat.dis
/dis/lib/string.dis

This shows the helloworld has no dependencies outside of what's already built into emu and emuinit.dis contains a few. Creating a directory layout with just these files and emu should be enough to run helloworld. In the following example foo is the directory containing the full Inferno distribution where helloworld.b was compiled.

$ mkdir -p hw/dis/lib
$ cd hw
$ cp ../foo/Linux/386/bin/emu .
$ cp ../foo/dis/emuinit.dis dis/
$ cp ../foo/dis/lib/arg.dis dis/lib/
$ cp ../foo/dis/lib/bufio.dis dis/lib/
$ cp ../foo/dis/lib/env.dis dis/lib/
$ cp ../foo/dis/lib/readdir.dis dis/lib/
$ cp ../foo/dis/lib/filepat.dis dis/lib/
$ cp ../foo/dis/lib/string.dis dis/lib/
$ cp ../foo/dis/sh.dis dis/
$ cp ../foo/usr/inferno/helloworld.dis .
$ ./emu -r. helloworld
Hello World!

Some of these dependencies aren't needed, for example sh.dis as we don't run the shell. See Powerman's post for a more extensive example that has dependencies.

Bundling application into emu

Instead of distributing a directory of files as in the previous example it's possible to bundle a root filesystem inside the emu program. This allows distributing just a single executable that runs the Limbo application. The steps to do this involve finding the dependencies of the program as above and creating a kernel configuration file that lists them.

The file /emu/Linux/emu is the kernel configuration file for the Linux Inferno VM. It has a root section which defines the root filesystem:

root
        /dev    /
        /fd     /
        /prog   /
        /prof   /
        /net    /
        /net.alt        /
        /chan   /
        /nvfs   /
        /env    /
#       /dis
#       /n
#       /icons
#       /osinit.dis
#       /dis/emuinit.dis
#       /dis/lib/auth.dis
#       /dis/lib/ssl.dis
#       /n/local /

The actual filesystem as located by the -r command line argument to emu is overlayed on top of this. Copying this file and adding our dependencies will bundle them into a custom build executable. This is what the root section of our helloworld looks like:

root
        /dev    /
        /fd     /
        /prog   /
        /prof   /
        /net    /
        /net.alt        /
        /chan   /
        /nvfs   /
        /env    /
        /dis
        /dis/emuinit.dis /usr/chris/helloworld.dis

Here we make helloworld.dis appear in the root filesystem as /dis/emuinit.dis, which is the first program run as the system comes up. Note that this file must use tabs, not spaces, for the entries. An executable can be compiled with this bundled root filesystem with:

$ cd emu/Linux
$ ...create helloworld configuration file...
$ mk CONF=helloworld

This produces an executable o.helloworld which when run will execute the helloworld.dis embedded inside it:

$ ./o.helloworld
Hello World!

When stripped the executable is about 1.5MB. This executable links to the X11 libraries. For a headless system you can remove the dependancy by using the emu-g kernel configuration file as a base. This removes the drivers that use X11 and prevents linking against the X11 libraries. The resulting executable when stripped is now 700K.

The executable produced by this process has some dynamic dependencies - libc, etc - that most glibc applications have. It should be possible to use musl-libc to produce a static binary and I'll cover this in another post.

Tags: inferno 

2016-07-18

Borrowing in Pony

The 'TL;DR' of this post on how to borrow internal fields of iso objects in Pony is:

To borrow fields internal to an iso object, recover the object to a ref (or other valid capability) perform the operations using the field, then consume the object back to an iso.

Read on to find out why.

In this post I use the term borrowing to describe the process of taking a pointer or reference internal to some object, using it, then returning it. An example from C would be something like:

void* new_foo();
void* get_bar(foo* f);
void  delete_foo(foo* f);

...
void* f = new_foo();
void* b = get_bar(f);
...
delete_foo(f);

Here a new foo is created and a pointer to a bar object returned from it. This pointer is to data internal to foo. It's important not to use it after foo is deleted as it will be a dangling pointer. While holding the bar pointer you have an alias to something internal to foo. This makes it difficult to share foo with other threads or reason about data races. The foo object could change the bar data without the holder of the borrowed pointer to bar knowing making it a dangling pointer, or invalid data, at any time. I go through a real world case of this in my article on using C in the ATS programming language.

Pony has the concept of a reference to an object where only one pointer to that object exists. It can't be aliased and nothing else can read or write to that object but the current reference to it. This is the iso reference capability. Capabilities are 'deep' in pony, rather than 'shallow'. This means that the reference capability of an alias to an object affects the reference capabilities of fields of that object as seen by that alias. The description of this is in the viewpoint adaption section of the Pony tutorial.

The following is a Pony equivalent of the previous C example:

class Foo
  let bar: Bar ref
...
let f: Foo ref = Foo.create()
let b: Bar ref = f.bar

The reference capability of f determines the reference capability of bar as seen by f. In this case f is a ref (the default of class objects) which according to the viewpoint adaption table means that bar as seen by f is also a ref. Intuitively this makes sense - a ref signifies multiple read/write aliases can exist therefore getting a read/write alias to something internal to the object is no issue. A ref is not sendable so cannot be accessed from multiple threads.

If f is an iso then things change:

class Foo
  let bar: Bar ref
...
let f: Foo iso = recover iso Foo.create() end
let b: Bar tag = f.bar

Now bar as seen by f is a tag. A tag can be aliased but cannot be used to read/write to it. Only object identity and calling behaviours is allowed. Again this is intuitive. If we have a non-aliasable reference to an object (f being iso here) then we can't alias internally to the object either. Doing so would mean that the object could be changed on one thread and the internals modified on another giving a data race.

The viewpoint adaption table shows that given an iso f it's very difficult to get a bar that you can write to. The following read only access to bar is ok:

class Foo
  let bar: Bar val
...
let f: Foo iso = recover iso Foo.create() end
let b: Bar val = f.bar

Here bar is a val. This allows multiple aliases, sendable across threads, but only read access is provided. Nothing can write to it. According to viewpoint adaption, bar as seen by f is a val. It makes sense that given a non-aliasable reference to an object, anything within that object that is immutable is safe to borrow since it cannot be changed. What if bar is itself an iso?

class Foo
  let bar: Bar iso = recover iso Bar end
...
let f: Foo iso = recover iso Foo.create() end
let b: Bar iso = f.bar

This won't compile. Viewpoint adaption shows that bar as seen by f is an iso. The assignment to b doesn't typecheck because it's aliasing an iso and iso reference capabilities don't allow aliasing. The usual solution when a field isn't involved is to consume the original but it won't work here. The contents of an objects field can't be consumed because it would then be left in an undefined state. A Foo object that doesn't have a valid bar is not really a Foo. To get access to bar externally from Foo the destructive read syntax is required:

class Foo
  var bar: Bar iso = recover iso Bar end
...
let f: Foo iso = recover iso Foo.create() end
let b: Bar iso = f.bar = recover iso Bar end

This results in f.bar being set to a new instance of Bar so it's never in an undefined state. The old value of f.bar is then assigned to b. This is safe as there are no aliases to it anymore due to the first part of the assignment being done first.

What if the internal field is a ref and we really want to access it as a ref? This is possible using recover. As described in the tutorial, one of the uses for recover is:

"Extract" a mutable field from an iso and return it as an iso.

This looks like:

class Foo
  let bar: Bar ref
... 
let f: Foo iso = recover iso Foo end
let f' = recover iso
           let f'': Foo ref = consume f
           let b: Bar ref = f''.bar
           consume f''
         end

Inside the recover block f is consumed and returned as a ref. The f alias to the object no longer exists at this point and we have the same object but as a ref capability in f''. bar as seen by f'' is a ref according to viewpoint adaption and can now be used within the recover block as a ref. When the recover block ends the f'' alias is consumed and returned out of the block as an iso again in f'.

This works because inside the recover block only sendable values from the enclosing scope can be accessed (ie. val, iso, or tag). When exiting the block all aliases except for the object being returned are destroyed. There can be many aliases to bar within the block but none of them can leak out. Multiple aliases to f' can be created also and they are not going to leaked either. At the end of the block only one can be returned and by consuming it the compiler knows that there are no more aliases to it so it is safe to make it an iso.

To show how the ref aliases created within the recover block can't escape, here's an example of an erroneous attempt to assign the f' alias to an object in the outer scope:

class Baz
  var a: (Foo ref | None) = None
  var b: (Foo ref | None) = None

  fun ref set(x: Foo ref) =>
    a = x
    b = x

class Bar

class Foo
  let bar: Bar ref = Bar

var baz: Baz iso = recover iso Baz end
var f: Foo iso = recover iso Foo end
f = recover iso
      let f': Foo ref = consume f
      baz.set(f')
      let b: Bar ref = f'.bar
      consume f'
    end

If this were to compile then baz would contain two references to the f' object which is then consumed as an iso. f would contain what it thinks is non-aliasable reference but baz would actually hold two additional references to it. This fails to compile at this line:

main.pony:20:18: receiver type is not a subtype of target type
          baz.set(f')
                 ^
Info:
main.pony:20:11: receiver type: Baz iso!
              baz.set(f')
              ^
main.pony:5:3: target type: Baz ref
      fun ref set(x: Foo ref) =>
      ^
main.pony:20:18: this would be possible if the arguments and return value were all sendable
              baz.set(f')
                     ^

baz is an iso so is allowed to be accessed from within the recover block. But the set method on it expects a ref receiver. This doesn't work because the receiver of a method of an object is also an implicit argument to that method and therefore needs to be aliased. In this way it's not possible to store data created within the recover block in something passed into the recover block externally. No aliases can be leaked and the compiler can track things easily.

There is something called automatic receiver recovery that is alluded to in the error message ("this would be possible...") which states that if the arguments were sendable then it is possible for the compiler to work out that it's ok to call a ref method on an iso object. Our ref arguments are not sendable which is why this doesn't kick in.

A real world example of where all this comes up is using the Pony net/http package. A user on IRC posted the following code snippet:

use "net/http"
class MyRequestHandler is RequestHandler

  let env: Env

  new val create(env': Env) =>
    env = env'

  fun val apply(request: Payload iso): Any =>
    for (k, v) in request.headers().pairs() do
      env.out.print(k)
      env.out.print(v)
    end

    let r = Payload.response(200)
    r.add_chunk("Woot")
    (consume request).respond(consume r)

The code attempts to iterate over the HTTP request headers and print them out. It fails in the request.headers().pairs() call, complaining that tag is not a subtype of box in the result of headers() when calling pairs(). Looking at the Payload class definition shows:

class iso Payload
  let _headers: Map[String, String] = _headers.create()

  fun headers(): this->Map[String, String] =>
    _headers

In the example code request is an iso and the headers function is a box (the default for fun). The return value of headers uses an arrow type. It reads as "return a Map[String, String] with the reference capability of _headers as seen by this". In this example this is the request object which is iso. _headers is a ref according to the class definition. So it's returning a ref as seen by an iso which according to viewpoint adaption is a tag.

This makes sense as we're getting a reference to the internal field of an iso object. As explained previously this must be a tag to prevent data races. This means that pairs() can't be called on the result as tag doesn't allow function calls. pairs() is a box method which is why the error message refers to tag not being a subtype of box.

To borrow the headers correctly we can use the approach done earlier of using a recover block:

fun val apply(request: Payload iso): Any =>
  let request'' = recover iso
    let request': Payload ref = consume request
    for (k, v) in request'.headers().pairs() do
      env.out.print(k)
      env.out.print(v)
    end
    consume request'
  end
  let r = Payload.response(200)
  r.add_chunk("Woot")
  (consume request'').respond(consume r)

In short, to borrow fields internal to an iso object, recover the object to a ref (or other valid capability) perform the operations using the field, then consume the object back to an iso.

Tags: pony 

2016-07-14

Concurrency in Wasp Lisp

Wasp Lisp has a light weight co-operative threading model that's allows programming in an Actor style. It's possible to serialize Wasp values and send them to other processes and machines to be deserialized and run. MOSREF uses this to compile Lisp code on the console process and send the bytecode to drone processes to execute. This allows drones to operate without the Lisp compiler present.

Spawning threads

Threads are created using the spawn function. It takes the function to run as a thread as an argument:

(spawn (lambda () (print "Hello World\n")))

Communication between threads is done using queues. A queue is an unbounded channel that can have many senders but only one receiver. The function send adds data to the queue and wait receives data. If there is no data in the queue then wait blocks. Input/Output in Wasp Lisp is done using the same wait/send mechanism making it easy to pipeline data from console and file output to sockets.

Implementing Actors

A basic Actor can be implemented like the following:

(define (actor1)
  (define counter 0)
  (define chan (make-queue))

  (define (loop)
    (define msg (wait chan))
    (cond
      ((eq? msg 'inc)
        (set! counter (+ 1 counter)))
      ((eq? msg 'dec)
        (set! counter (- 1 counter)))
      ((and (list? msg) (eq? (car msg) 'get))
       (send counter (cadr msg))))
    (loop))

  (spawn loop)
  chan)

actor1 is a function that contains a counter holding an numeric value. It creates chan, a queue for holding messages, spawns a thread to run loop and returns the chan so messages can be queued for loop to process.

loop waits for a message on chan. This is a blocking call and the thread will go idle until a message is queued. It processes the message, incrementing or decrementing the counter as requested. An additional message, get, can be used to get the value of the counter. That message also includes a channel object to place the result in. loop recursively calls itself to continue.

A sample interaction is:

>> (define a1 (actor1))
>> (define result (make-queue))
>> (send (list 'get result) a1)
>> (wait result)
:: 0
>> (send 'inc a1)
>> (send (list 'get result) a1)
>> (wait result)
:: 1

This creates an actor and a queue to receive results. It asks for the current value of the actor, increments it, then asks again.

Updating an Actor

It's possible to update the code for an Actor without stopping the application. Running in a Lisp REPL means you can change functions on the fly but you can't change the internal implementation of a running loop from the REPL if that loop is internal to a function. A way around this is to provide the Actor with the means to receive a function as a message that performs the update. Here is an example of an updatable actor:

(define (actor3)
  (define counter 0)
  (define chan (make-queue))

  (define (loop chan)
    (define msg (wait chan))
    (cond
      ((eq? msg 'inc)
        (set! counter (+ 1 counter)))
      ((eq? msg 'dec)
        (set! counter (- 1 counter)))
      ((and (list? msg) (eq? (car msg) 'get))
       (send counter (cadr msg)))
      ((function? msg)
       (return ((msg counter) chan))))
    (loop chan))

  (spawn loop chan)
  chan)

This code contains an additional branch in the cond to check if the message is a function. If it is then that function is called passing the current value of the counter. It is expected to return a function which will be the new loop to call. This can contain any code and effectively updates the entire actor with new functionality. An example update function to change the messages to increment/decrement by two is:

(define (update oldstate)
  (define counter (* oldstate 2))
  (define (loop chan)
    (define msg (wait chan))
    (cond
      ((eq? msg 'inc)
        (set! counter (+ 2 counter)))
      ((eq? msg 'dec)
        (set! counter (- 2 counter)))
      ((and (list? msg) (eq? (car msg) 'get))
       (send counter (cadr msg)))
      ((function? msg)
       (return ((msg counter) chan))))
   (loop chan))
  loop)

An example interaction of the actor and upgrading it is:

>> (define a3 (actor3))
>> (define result (make-queue))
>> (send 'inc a3)
>> (send (list 'get result) a3)
>> (wait result)
:: 1
>> (send update a3)   ;; Updating the actor here
>> (send (list 'get result) a3)
>> (wait result)
:: 2                  ;; This shows the new counter value that 'update' changed
>> (send 'inc a3)
>> (send (list 'get result) a3)
>> (wait result)
:: 4                  ;; Amount is now incrementing by two
>> (send 'inc a3)
>> (send (list 'get result) a3)
>> (wait result)
:: 6

This is a variant of Joe Armstrong's Erlang Universal Server allowing a server to be updated to do anything.

Filters

An idiom when programming in an Actor or coroutine style is to write small processes that take an input, modify it in some way, and send it to another process to do something else. A program becomes a chain or pipeline of these individual processes. Wasp Lisp calls these small units of functionality filters. They are described in filter.ms as:

A process that waits for data from an input channel, and sends data to an output channel. Filters are constructed using a constructor function, then wired together using either the input-chain or output-chain functions."

This is an example of a line filter from the Wasp source code;

(define-filter (line-filter)
  (define buf (make-string 80)) 

  (define (parse)
    (forever
      (define next (string-read-line! buf))
      (if next (send next out)
               (return))))

  (define (line-loop)
    (forever
      (define next (wait-input in))
      (cond 
        ((string? next)
         (string-append! buf next)
         (parse))
        ((eq? next 'close)
         (return))
        (else
          (send-output next out)))))

  (line-loop)

  (send-output buf out)
  (send-output 'close out))

A line-filter receives strings of bytes on the input channel and outputs a complete line on the output channel when it has one. It does this by appending received bytes onto a string buffer and checking if that buffer contains a line. If it does it removes the line data from the buffer and sends it to the output channel. It then continues to wait for data on the input channel. An example of usage:

>> (import "lib/filter")
>> (import "lib/line-filter")
>> (define q (make-queue))
>> (define lines (input-chain q (line-filter)))
>> (spawn (lambda () (forever (print (wait lines)))))

>> (send "hello" q)
>> (send "world\n" q)
helloworld
>> (send "foo\nbar" q)
foo
>> (send "baz\n" q)
barbaz

This creates a queue, q for input data. It creates a chain containing only one filter, the line-filter. It returns the output channel which contains the filtered data. Data placed in q is retrieved by the line filter and when a line is received it is sent to the output channel. A thread is spawned to loop forever printing any lines from the output channel. Notice in the manual sending of data to the channel q that output is only printed by the spawned thread when a line is completed.

Wasp Lisp comes with some default filters for parsing s-expressions, encrypting and decrypting data and fuzzing data amongst other things. Scott Dunlop wrote about coroutines and filters on the Wasp blog.

Sending data to other OS processes

Some Wasp values can be serialized and deserialized. This provides a way to send values to other wasp instances running in different OS processes or machines. Lisp objects are serialized using freeze and unserialized using thaw.

The following server function starts a TCP server on port 10000. Clients connnected to it send Lisp objects to it and it prints it to the standard output on the server process.

(import "lib/tcp-server")

(define (server)
  (define server-output (current-output))

  (define (acceptor)
    (forever
      (define data (wait))
      (with-output server-output
        (print (format (thaw data)))
        (print "\n"))))

  (spawn-tcp-server 10000 acceptor))

The acceptor function is called with its current input and output bound to the TCP stream. For this reason we capture the value of current-output before it is bound so we can output to the server console rather than to the TCP stream. A sample test:

;; On server 
>> (server)

;; On client
>> (define s (tcp-connect "127.0.0.1" 10000))
>> (send (freeze "foo") s)

;; On Server
"foo"

;; On Client
>> (send (freeze 66) s)

;; On Server
66

;; On Client
>> (send (freeze '(one (two three))) s)

;; On Server
(one (two three))

Notice that all i/o is done using the 'send' and 'wait' channel operators. This means we can use a filter to do the freezing/thawing automatically and Wasp has a freeze-filter and thaw-filter that does this. The server becomes:

(import "lib/tcp-server")
(import "lib/package-filter")
(import "lib/filter")
(import "lib/format-filter")

(define (server2)             
  (define server-output (current-output))

  (define (acceptor)
    (define chan (input-chain (current-input)
                              (thaw-filter)
                              (format-filter)))
    (forever
      (define data (wait chan))
      (print* data "\n")))

  (spawn-tcp-server 10000 acceptor))

Usage from a client is:

>> (import "lib/filter")
>> (import "lib/package-filter")

>> (define s (tcp-connect "127.0.0.1" 10000))
>> (define chan (output-chain s (freeze-filter)))
>> (send "hello" chan)
>> (send '(one (two three)) chan)

Through the use of the thaw/freeze filter there is no need to manually call freeze and thaw.

Sending bytecode to other processes

Unfortunately it's not possible to freeze or thaw closures or functions. It is possible however to assemble Lisp to bytecode and send that. This enables sending new functions across OS processes and is how MOSREF is able to compile Lisp on the console and send it to the drone. This example will compile a function from source to bytecode and run it:

>> (define code '((print "Hello World\n")))
>> (define proc (assemble (optimize (compile code))))
>> (proc)
Hello World

The result of assemble can be frozen, sent somewhere and thawed:

>> (define x (freeze (assemble (optimize (compile '((print "Hello World\n")))))))
>> (define y (thaw x))
>> (y)
Hello World

Using this we can have an upgradable server process:

(define (server3)
  (define server-output (current-output))

  (define (acceptor)
    (define chan (input-chain (current-input)
                              (thaw-filter)))

    (define (loop chan)
      (define data (wait chan))
      (cond
        ((function? data)
          (return ((data) chan)))
        (else
          (print* "OLD: " (format data) "\n")
          (return (loop chan)))))
     (loop chan))

  (spawn-tcp-server 10000 acceptor))

This will display the data sent to the server prefixed by "OLD:" unless it is sent a function. In which case it calls that function as the new server loop. An upgraded server loop to prefix with "NEW: " is:

(define (new-server3)
  (assemble
    (optimize
      (compile
        '((define (loop chan)
            (define data (wait chan))
            (cond
              ((function? data)
                (return ((data) chan)))
              (else
                (print* "NEW: " (format data) "\n")
                (return (loop chan))))))))))

We can't send a function directly so this compiles the new loop from source and returns the compiled procedure. This can be frozen, sent to the server and it will execute it as the new loop. An example interaction:

;; On Server
>> (server3)

;; On Client
>> (define s (tcp-connect "127.0.0.1" 10000))
>> (define chan (output-chain s (freeze-filter)))
>> (send '(one (two three)) chan)

;; On Server
OLD: (one (two three))

;; On Client
>> (send (new-server3) chan)
>> (send '(one (two three)) chan)

;; On Server
>> NEW: (one (two three))

Why not send the source to the server process and have it eval it? The approach of sending the bytecode allows the server process to skip including the Lisp compiler. The Wasp VM includes an interpreter and deserializer - the compiler and other libraries are all in Lisp. A Wasp executable consists of the VM stub with bytecode appended to the end of it. On execution it looks for the bytecode, deserializes it and runs it. This provides a minimal program that can have functionality added by sending it bytecode as needed.

An aside on tail call optimization

It's important that a process loop is tail recursive otherwise each call through the loop will increase stack size and eventually exhaust memory. The following is not tail recursive in Wasp Lisp, even though it looks like it should be:

(define (test1 chan)
  (define msg (wait chan))
  (cond
    ((eq msg 'foo)
      (test1 chan))
    ((eq msg 'bar)
      (test1 chan))
    (else
      (test1 chan))))

This is because the recursive call to 'test1' compiles down to bytecode that looks like:

(newf)
(ldg eq)
(arg)
(ldg msg)
(arg)
(ldc bar)
(arg)
(call)
(jf false-47) ;; If the msg is not 'bar then jump to false-47
...
false-47
(newf)
(ldg test1)
(arg)
(ldg chan)
(arg)
(call)        ;; recursively call 'test1'
done-46
done-44
(retn)        ;; return from function 'test1'

The stack frame for test1 is not exited (the retn instruction) until after the recursive call is done. Compare this to the obvious tail recursive case:

(newf)
(ldg wait)
(arg)
(ldg chan)
(arg)
(call)
(stg msg)
(newf)
(ldg test2)
(arg)
(ldg chan)
(arg)
(tail)

Note that tail instruction. This does an immediate jump rather than a call so a retn is not necessary. The call stack does not grow. The difference between the two cases is due to the way the Wasp Lisp compiler generates the instructions and optimizes looking for tail calls. The instructions generated can be viewed using:

(define x '(define (test2 chan)
             (define msg (wait chan))
             (test2 chan)))
(define code (compile x))
(for-each (lambda (x) (print* (format x) "\n")) code)

Using compile shows the first pass which does not look for tail calls:

(newf)
(ldg test2)
(arg)
(ldg chan)
(arg)
(call)
(retn)

Notice the call followed by retn. This is the sequence that optimize looks for to generate the tail instruction:

(define x '(define (test2 chan)
             (define msg (wait chan))
             (test2 chan)))
(define code (optimize (compile x)))
(for-each (lambda (x) (print* (format x) "\n")) code)
...
(newf)
(ldg test2)
(arg)
(ldg chan)
(arg)
(tail)

Looking back at the instructions for test1 the call is followed by a jump or a label before retn so the optimizer misses it. This can be worked around by doing an explicit return statement:

(define (test3 chan)
  (define msg (wait chan))
  (cond
    ((eq msg 'foo)
      (return (test1 chan)))
    ((eq msg 'bar)
      (return (test1 chan)))
    (else
      (return (test1 chan)))))

The code in the cond branches generates to the following which is now a tail call:

(jf false-93)
(newf)
(ldg test1)
(arg)
(ldg chan)
(arg)
(tail)

Some things to note

The Wasp VM is single threaded and non-preemptive. Threads yield to the scheduler explicitly using yield or implicitly when doing i/o or waiting on a queue. The bytecode is cross platform. Serialized objects on one architecture can be deserialized on another. The Wasp VM history comes from Mosquito Lisp and MOSREF - a penetration testing platform. It's written in C with some GNU extensions (nested functions are used in the VM).

This post came about from exploring the difference in Actor programming in the Pony programming language and a dynamic language where the Actor model isn't explicit. The programming style is similar in that pipelines of calls to actors to transform data is a common idiom.

Wasp Lisp isn't actively developed anymore but the author, Scott Dunlop, still processes pull requests and monitors it. I like to use it for projects and tinker with it as it's an interesting little cross platform lisp. MOSREF is useful as a way to access and maintain servers of different architectures, aside from its use as a penetration testing tool.

Some other Wasp resources:

Tags: waspvm 

2016-06-05

Building Static Wasp Lisp Binaries on Linux

Wasp Lisp builds binaries that are linked dynamically to glibc. This ties the binary to specific versions of Linux. It's usually not possible to run on an OS with older glibc versions than what it was compiled against. I wanted to be able to run a single binary of Wasp Lisp and MOSREF drones on new Ubuntu versions and some machines with an older version of Ubuntu. To do this I needed to have the libc linked statically.

Changing Wasp Lisp to statically link glibc doesn't work though. Some networking routines in glibc require dynamic linking. If glibc is statically linked then networking doesn't work.

The solution I opted for is to use musl libc instead of glibc. This is a libc that was designed to be statically linked. To buid Wasp Lisp binaries with musl it required:

  • Building musl libc
  • Building libevent using musl libc headers
  • Building Wasp Lisp against musl and libevent

Building musl libc

Building musl libc requires using git to clone the repository and following the standard configure, make, make install invocations. The bin directory for the musl tools is added to the PATH:

$ git clone git://git.musl-libc.org/musl
$ cd musl
$ ./configure
$ make
$ sudo make install
$ export PATH=$PATH:/usr/local/musl/bin/

Building libevent

Building libevent with musl requires using the musl-gcc command which was installed by the previous step. This invokes GCC with the required options to use musl. The following steps performs the build:

$ wget https://github.com/libevent/libevent/releases/download/release-2.0.22-stable/libevent-2.0.22-stable.tar.gz
$ tar xvf libevent-2.0.22-stable.tar.gz
$ cd libevent-2.0.22-stable/
$ ./configure --prefix=/tmp/musl/usr CC=musl-gcc --enable-static --disable-shared
$ make
$ make install

Building Wasp Lisp

The Wasp VM source requires a change to the Makefile.cf to use static linking for all libraries. This changes:

EXEFLAGS += -Wl,-Bstatic $(STATICLIBS) -Wl,-Bdynamic $(DYNAMICLIBS)

to:

EXEFLAGS += -static $(STATICLIBS) $(DYNAMICLIBS)

I've made this change in the static branch of my github fork . This branch also includes some other changes from the official repository for real number support. Building with musl and libevent is done with:

$ git clone https://github.com/doublec/WaspVM --branch static
$ cd WaspVM
$ CC=musl-gcc CFLAGS="-I /tmp/musl/usr/include -L /tmp/musl/usr/lib" make repl

This runs directly into the Lisp REPL. The following confirms a static binary:

$ ldd wasp
not a dynamic executable

Building MOSREF

The stub generated is also static and can be used to build static drones:

$ cd mod
$ ../waspc -exe ../mosref bin/mosref
$ chmod +x ../mosref
$ ../mosref
console> set addr=xx.xx.xx.xx
console> set port=8000
console> drone mydrone foo linux-x86_64
Drone executable created.

The generated drone should run on a wider range of Linux versions than the non-static build at the cost of a larger size. I rename the waspvm-linux-x86_64 stub to be waspvm-musl-x86-64 so I can generate static drones or dynamic linked drones as needed from the MOSREF console by using linux-x86_64 or musl-x86_64 respectively.

Tags: waspvm 


This site is accessable over tor as hidden service mh7mkfvezts5j6yu.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-30/


Tags

Archives
Links