Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2017-10-12

ZeroMe - Decentralized Microblogging on ZeroNet

I wrote about ZeroNet a few years ago when it was released and it mostly was about decentralized websites. In the time between then and now it has gained a lot of features and regular use. It still does distributed websites well but it adds features suchs as support for sharing large files, merging sites and distributed pseudo-anonymous identity. This post is about an application hosted on ZeroNet called ZeroMe. It's a microblogging platform (i.e. A twitter like system) that takes advantage of ZeroNet's identity system and merging of sites.

Starting ZeroNet

To start with ZeroMe you first need to install ZeroNet. Given all the right dependancies are installed you can clone the github repository and run the Python script to get online:

$ git clone https://github.com/HelloZeroNet/ZeroNet
$ cd ZeroNet
$ ./zeronet.py

This will start the ZeroNet node and run a local webserver on port 43110 to access it. This port is available on the local machine only. It also opens a port on 15441 to communicate with other nodes. This port is open to the internet at large and will attempt to punch a hole through firewalls using UPNP.

If you have the Tor daemon running then ZeroNet will connect to it and bridge between clearnet nodes and nodes running on Tor. You can force the node to run only on Tor by providing a command line argument:

$ ./zeronet.py --tor always

In this configuration ZeroNet will not expose the port 15441 over the internet. It will spawn Tor onion services and expose the port to those for communication with other ZeroNet nodes running on onion services. Access to the local node is still done via port 43110 but you should use a browser configured to use Tor if anonymity is a concern since sites can run arbitary JavaScript. The main ZeroNet 'Hello' page warns you if you are not using Tor Browser in this configuration.

Gaining an identity

ZeroMe requires an identity to be created. Identities are created through identity ZeroNet sites. The main one, operated by the creator of ZeroNet, is ZeroId. When you first use a site that requires an identity you will be prompted to create one from any identity providers you have access to. For this example I'll use ZeroId, accessible at zeroid.bit - http://localhost:43110/zeroid.bit.

Click Get auth cert and choose a username. There are two ways to make the request for the identity. The first is a request over the internet via HTTP. If you're not running the browser over Tor this will expose your IP address to the identity service. The other is using BitMessage, which is an anonymous messaging system. ZeroId will request that you send a confirmation message to a BitMessage address.

This may seem very centralized - it requires the identity service to be running, if it goes down you can't create a new identity. There are other identity services on ZeroNet that have different requirements for creating an identity. And there's an API to create your own. Identities created using these work on any service using the ZeroNet identity API in the same was as ZeroId identities.

The ZeroId site will show your identity name once you've signed up:

From ZeroId you can search for existing users and choose to Mute or Unmute. By muting a particular identity you won't see any posts from that identity. This is useful for dealing with spam or users posting on topics you don't want to see. This will prevent you seeing their content on any ZeroNet site.

You can manage muted users from the Hello page left sidebar menu:

Joining a ZeroMe Hub

Access ZeroMe by activating the ZeroMe icon on the main user interface or going directly to Me.ZeroNetwork.bit - http://localhost:43110/Me.ZeroNetwork.bit/

A prompt on the top right asks you to accept a merger site. Merge sites are a way to split large sites into a number of smaller ones. You should approve this request.

Click on "Select user to post new content" to create a ZeroMe user. It will prompt for an identity to choose and you can select the one already created in ZeroId above.

Once an identity is chosen you'll be asked to select a 'hub'. ZeroMe works by having the user interface operate from the main ZeroMe site with data on users and posts stored in 'Hub' sites. You only see posts from users that belong to hubs that you have requested. There is a hub operated by the ZeroMe creator and it will be shown as Sun hub on this page along with any other Hub sites you may already have requested. Click Download on a hub to download existing posts and content for that hub and Join to join one.

Downloading a hub can take some time. On the Hello ZeroNet page you can see the progress of any sites currently being downloaded.

There's a useful site called 0hub.bit that lists known hubs and lets you download them all. It's at http://localhost:43110/0hub.bit/ and clicking the Click me to unlock all hub post in ZeroNet will start downloading them. This takes some time but is worth it since you can then see all posts by everyone on ZeroMe. You can delete hubs if you decide you don't want to see content from a particular hub.

Microblogging with ZeroMe

Now that you've got an identity, signed into ZeroNet and joined a hub you can starting reading content and posting. The Everyone tab in the user interface will show posts from all users, even those you haven't followed. This is useful at the beginning to find users and content.

Posts are in Markdown format and can include images. There is no limit to the size of a post and posts can have comments. You can visit profile pages of users and choose to optionally seed their images, mute users, and edit parts of your own profile page to have an avatar and description:

An example of a post with comments (selected randomly, identities blacked out):

Random stuff

The ZeroNet node running on your machine enforces size limits for sites. This prevents sites that you have accessed from consuming all your disk space. When a site starts reaching the size limit it appears in the main Hello page left sidebar in a special section:

When the site goes past the limit it will no longer download updates. Visiting the site will give a popup asking to increase the limit. Approving that popup will resume downloads for the site until it hits the next limit.

There's a sidebar on the right of pages that can be accessed by dragging the top right 0 icon to the left:

This sidebar gives statistics on the number of nodes seeding the site, the size limit, and various other things.

When you visit a ZeroNet site you starting seeding that sites content and receive automatic updates when the content changes. This can be disabled on the main Hello page on a per site basis. Sites can have "optional files" which you don't automatically seed. They are downloaded on demand and are often used for user generated content or larger files. You can choose to seed a sites optional files in the right hand sidebar for that site. There are also "Big Files" which are treated specially. These are large files like videos and are also optionally seeded. The Files tab of the Hello page lists optional and big files that you are seeding:

Sites can be set to allow cloning by end users. This creates a copy of the site with no content. An example of a site that does this is ZeroBlog which you can clone to create your own blog. This extends to ZeroMet itself. You can clone the ZeroMe user interface site and modify that to get a customized look but it still uses the data from the existing hubs.

There's a bunch of other interesting sites on ZeroNet. The ZeroTalk forums, various blogs, ZeroMail for encrypted email like service, etc. Be aware that the psuedo-anonymous use of identities can make for content you might not agree with and much spam. Use of 'Mute' is useful here.

Also be aware that it is 'psuedo-anonymous' not 'anonymous'. You create an identity and that identity is not tied to you in the real world but people can track what you do to that identity. Content is added to sites and distributed to other nodes. If you like a post or add content to some site then anyone who decides to dig into the data of that site can see that your identity liked or posted that content. It is possible to have multiple identities if you want to keep aspects of your ZeroNet usage separate but that's a topic for another post.

Overall ZeroMe is a nice microblogging system. It's user friendly, has a nice design and has tools for muting and "Non Real Name" identities. It, along with ZeroNet, is actively developed and supported by the ZeroNet developer.

Tags: zeronet 

2017-09-21

Using the J Foreign Function Interface

The J Programming Language is an array oriented or vector based programming language. It is in the same family of programming languages as APL and K.

I won't be going into deep details of J syntax or a tutorial of how to use it in this post - for that I recommend something like J for C Programmers which is also available as a paperback book - but I'll provide a quick overview of syntax used here.

What I want to cover in this post is how to use the FFI to call C functions in J, and how to use callbacks from C back to J. The latter isn't well documented and I always forget how to do it.

Introduction to J

J provides a concise syntax for performing operations. Many things that would be expressed as english words in other languages use ASCII symbols in J. The J Dictionary and Vocabulary provides details on what these are.

J has a REPL similar to Lisp environments. For a taste of J without installing it there is an online J IDE that can be used from a browser and details on that are here. Otherwise the native J system is available from JSoftware downloads. Source is available on github.

J can be used as a calculator to perform operations on numbers:

  10 + 2
12

There is no operator precedence, use brackets to define order of operations. Order of evaluation is right to left:

  3 * 4 + 10
42

  (3 * 4) + 10
22

Sequences of numbers separated by white space are arrays of that number. Operations can be performed on arrays. To multiply each element of an array 1 2 3 4 by 2:

  2 * 1 2 3 4
2 4 6 8

Arrays can be multi-dimensional. One way of creating such arrays is via the '$', or Shape, operation. To create a 2x3 array:

  2 3 $ 1 2 3
1 2 3
1 2 3

Functions are called verbs in J terminology. The * and + operators above are verbs. Functions that operate on verbs are called adverbs. An example is the adverb '/' which inserts a verb between elements of an array and evaluates it. An example to sum an array:

  +/ 1 2 3 4
10

The +/ forms an operation that will do the equivalent of inserting + between each element of the array like:

  1 + 2 + 3 + 4
10

Other verbs can have / applied to it:

  */ 1 2 3 4
24

Verbs are monadic or dyadic. The term monadic is not related to the use of monads in languages like Haskell. It means the verb takes a single argument on the right hand side of the verb. A dyadic verb takes two arguments, one on the left and right. Many verbs have both dyadic and monadic forms. Dyadic - is the subtraction operator. Monadic - negates its parameter:

  10 - 5
5

  -5
_5

The _ character signifies a negative number to differentiate from monadic -. _5 is negative 5 whereas -5 is the application of monadic - to the number 5.

A sequence of three verbs together is called a fork. '+/ % #' is a commonly demonstrated fork. It is three verbs together. The monadic # to count the number of items in an array. The +/ verb which sums an array and dyadic % which is division:

  # 1 2 3 4
4

  +/ 1 2 3 4
10

  10 % 4
2.5

  (+/ 1 2 3 4) % (# 1 2 3 4)
2.5

  (+/ % #) 1 2 3 4
2.5

A fork of '(A B C) X', where B is dyadic and A and C are monadic does (A X) B (C X). The combination of verbs, adverbs, nouns and the right to left evaluation of J gives it a "Programming with functions" feel and concise syntax.

J has variable assignment which can include assignment of verb sequences to named variables. =: does global variable assignment and =. does local variable assignment (local to the scope of verb or namespace):

  x   =: 1 2 3 4
  avg =: +/%#
  avg x
2.5

Vales in J can be boxed or unboxed. The examples below are unboxed. They can be boxed with the '<' monadic verb. In the REPL boxed values are drawn with a box around them. To unbox a boxed value, use the monadic verb '>'. The ';' varidic verb will form an array of boxed elements. For example:

  a =: <1
  a
┌─┐
│1│
└─┘
  >a
1
  1;2;3;4
┌─┬─┬─┬─┐
│1│2│3│4│
└─┴─┴─┴─┘

All elements in an unboxed array must have the same type. Elements in boxed arrays can be different types.

Hopefully that gives a taste of J syntax so the rest of this post is understandable if you don't know much of the language. I recommend playing around in the J REPL to get a feel.

Foreign Conjunctions

Calling C functions from J requires using what's called a Foreign Conjunction. These are special system operations defined by the J implementation and are of the form 'x !: y' where x and 'y' are numbers. They return verbs or adverbs themselves so the result of the conjuction is applied to more arguments. An example would be the Space category of foreign conjuctions that returns information about J memory usage:

  (7 !: 0) ''
10939904

Here 7 !: 0 returns a verb which takes a single argument (which is ignored) and returns the number of bytes currently in use by the J interpreter.

  (7!:2) '+/ 1 2 3 4'
1664

7!:2 returns a verb that when called with an argument returns the number of bytes required to execute that argument as a J sentence.

There's lots more described in the Foreign Conjunction documentation which allows reflecting an introspecting on just about everything in the J system.

Foreign Function Interface

The foreign function interface is the set of foreign conjunctions where the left side of the conjunction verb is the number 15. There's a short list of these in the Dynamic Link Library part of the dictionary, and further explanation in the Dlls section of the user manual. There are more verbs than are listed in these pages and the J source file x15.c gives an idea of the scope. Most of the additional ones are used in implementation of the J library but we have to delve into undocumented features to handle C callbacks.

The J source file stdlib.ijs contains the definitions for names that make the foreign conjunctions easier to use. The ones covered in this post are:

cd    =: 15!:0
memr  =: 15!:1
memw  =: 15!:2
mema  =: 15!:3
memf  =: 15!:4
cdf   =: 15!:5
cder  =: 15!:10
cderx =: 15!:11
cdcb  =: 15!:13

To call a foreign function we use the '15!:0' verb, defined as cd in the standard library. This is described in the documentation as 'Call DLL Function' and is documented in the Calling DLLs section of the user manual. It is dyadic and has the form:

'filename procedure [>] [+] [%] declaration' cd parameters

The descriptions below look complicated but will become clearer with a few examples.

The first argument is a string that contains the function name and arguments it takes. It's the declaration of the function basically. The second parameter is an array of the arguments that the foreign function takes.

The filename is the name of a DLL or shared library. It can also be one of the following special filenames:

  • 0 = If filename is 0 then the 'procedure' is a signed integer representing a memory address for where the function is located.
  • 1 = If filename is 1 then the 'procedure' is a non-negative integer that is the index of a procedure in a vtable. The first element of the parameter array should be an address of the address of the vtable, the 'object' address. The declaration of the first parameter should be '*' or 'x'. This is used for calling COM objects or Java Native Interface objects.

The procedure is the name of the function in the DLL or shared library to call. It can also be a number as described in the filename portion above.

The '>', '+' and '%' parts are optional and define attributes of the function:

  • '>' = The procedure returns a scalar result. Without '>' the result is the boxed scalar result appended to the possibly modified list of boxed arguments.
  • '+' = Chooses the calling convention of the fnuction. On Windows the standard calling convention is __stdcall and using '+' selects __cdecl. On Unix using '+' has no effect.
  • '%' = Does a floating point reset after the call. Some procedures leave floating point in an invalid state.

'declaration' is a string of blank delimited characters defining the types of the arguments passed to the function, and the type of the result. They can be:

  • c = J character (1 byte)
  • w = J literal2 (2 byte)
  • u = J literal4 (4 byte)
  • s = short integer (2 byte)
  • i = integer (4 byte)
  • l = long integer (8 byte)
  • x = J integer (4 or 8 byte depending on 32/64 bit architecture)
  • 'f' = short float (4 byte)
  • 'd' = J float (8 byte)
  • 'j' = J complex (16 byte - 2 d values) (pointer only, not as result or scalar)
  • '*' = pointer. This can be followed by any of the above to indicate the type pointed to. A parameter passed in the place where a pointer is expected can be a J array of the right type, or a scalar boxed scalar integer that is a memory address.
  • 'n' = no result (result is ignored and '0' is returned)

The first entry in the 'declaration' is the result type and the remaining ones describe each parameter provided in the second argument to 'cd'.

The 'parameters' argument to 'cd' should be a boxed array. If it is not boxed then it will be boxed for you.

Examples

The following example calls the C function puts from J on my Linux OS:

  '/lib/x86_64-linux-gnu/libc.so.6 puts x *c' cd <'hello'
┌─┬─────┐
│6│hello│
└─┴─────┘

The full path to the shared library is given with puts as the function name. The 'declaration' is set to return a J integer and marks the function as taking a pointer to characters as an argument. We pass the boxed array of characters with <'hello'.

The result is an boxed array where the first element is the value returned from the function. This is 6, the number of characters printed. The result of the result is the arguments passed to the function. The output of puts can be seen on the console.

If we just want the result of the function, without the appended list of parameters, then we can use the '>' attribute in the first argument to 'cd':

  '/lib/x86_64-linux-gnu/libc.so.6 puts > x *c' cd <'hello'
6

Loading shared libraries using cd uses system resources. To unload shared libraries currently in use by J use the '15!:5' foreign conjunction, or cdf. It is monadic and ignores its argument:

  cdf ''

Errors

If the shared library can't be found, or there is some other problem with the call, then a 'domain error' will result:

  'doesnotexist.so puts > x *c' cd <'hello'
|domain error

To find the reason for the domain error, use the '15!:10' Foreign Function verb to identify the error. If is a monadic verb that ignores its argument and prints out information about a domain error. In the standard library the name cder is assigned to '15!:10'. The meaning of what it returns is:

  • 0 0 = no error
  • 1 0 = file not found
  • 2 0 = procedure not found
  • 3 0 = too many DLLs loaded (max 20)
  • 4 0 = too many or too few parameters
  • 5 x = declaration x invalid
  • 6 x = parameter x type doesn't match declaration
  • 7 0 = system limit - linux64 max 8 float/double scalars

In the example above it indicates 'file not found':

  cder ''
1 0

For a detailed description of the error use '15!:11', or cderx. It is monadic, ignores its arguments, and returns a boxed array with the error number as the first element and a string description as the second:

  'doesnotexist.so puts > x *c' cd <'hello'
|domain error: cd
|   'doesnotexist.so puts > x *c'    cd<'hello'

   cderx ''
┌─┬──────────────────────────────────────────────────────────────────────────┐
│0│doesnotexist.so: cannot open shared object file: No such file or directory│
└─┴──────────────────────────────────────────────────────────────────────────┘

Memory

To allocate, free, read and write C memory in J requires use of the memory management foreign verbs. An example of an FFI call that returns a string is getenv:

  '/lib/x86_64-linux-gnu/libc.so.6 getenv *c *c' cd <'HOME'
┌───────────────┬────┐
│140735847611736│HOME│
└───────────────┴────┘

Note here that the returned value is a number. This represents a pointer to the string that getenv returns. To see the string result from this pointer we need to use the J verb that deals with reading raw memory.

  memr 140735847611736 0 _1 2
/home/myuser

'15!:1', or memr as it is defined in the standard library, reads raw memory. It takes an array as an argument with the elements of the array being:

  • The numeric pointer to the memory
  • The offset into that memory
  • The number of elements to read. A _1 means read to the first NUL byte which is useful for C strings and is used in this getenv example.
  • An optional type argument for the data to be read. The default is 2 which is used explicitly above. It can be 2, 4, 8 or 16 for char, integer, float or complex respectively. The count parameter is number of elements of the given type - it takes the size into account.

'15!:3', or mema, allocates memory. It takes a length argument for number of bytes to allocate and returns 0 on allocation failure:

  mema 1024
23813424

The numeric result return is a pointer address that can be passed to '*' parameters in the FFI calls.

'15!:4', or memf, frees memory allocated with mema. It takes the address as an argument and returns '0' on success or '1' on failure:

  memf 23813424
0

'15!:2', or memw, writes to memory given a pointer. It is the inverse of memr and is dyadic. The left hand argument is an array of data to write, the right hand argument similar to that defined by memr above:

  mema 32
22371984

  'hello' memw 22371984 0 6 2
  memr 22371984 0 _1 2
hello

  '/lib/x86_64-linux-gnu/libc.so.6 puts > x *c' cd <<22371984
6

  memf 22371984
0

Notice the double boxing of the pointer argument to the puts call. This is what enables J to recognise it as a pointer parameter rather than an integer argument. It is a boxed array containing a boxed integer.

The count argument passed to the memw call can be one greater than the length of the string in the left argument if the type of the memw call is 2 (the char type). In that case a NUL byte is appended which is what we use in the example above.

Here's another example using sprintf:

  mema 1024
24110208

  '/lib/x86_64-linux-gnu/libc.so.6 sprintf > x * *c *c x' cd (<24110208);'string is: %s %d\n';'foo';42
17

  memr 24110208 0 _1
string is: foo 42

  memf 24110208
0

Callbacks

The foreign function conjunction '15!:13', or cdcb, is used for defining a callback function that can be passed to a C function and have it call back to J. It is a monadic verb that takes a number as an argument for the number of arguments that the callback function uses. It returns a pointer to a compiled function that can be used as the callback.

When that function is called it will do argument conversion and call a J verb called cdcallback with the arguments passed as a boxed array. The following demonstrates how this works by defining some callback functions with a definition of cdcallback that prints its output to the REPL:

cdcallback =: monad define
  y (1!:2) 2
  0
)

f1 =: cdcb 1
f2 =: cdcb 2
f3 =: cdcb 3

  ('0 ',(": f1),' > n x') cd <100
100
0

   ('0 ',(": f2),' > n x x') cd 100;200
100 200
0

   ('0 ',(": f3),' > n x x x') cd 100;200;300
100 200 300
0

In this example cdcallback is implemented as a monadic function that passes its argument, 'y', to the foreign conjunction for writing to the screen, '1!:2'. It returns '0' as a result.

Three callback functions are created for one, two and three arguments respectively. These return C pointers to the equivalent of:

int f1(int);
int f2(int, int);
int f3(int, int, int);

Then we call the functions using cd. The declaration string is built dynamically by concatenating the function address into it. This uses the default format monadic verb, '":'. Note how the first parameter in the delcaration string is '0', which as described above means a function pointer is expected instead of a function name. That's what we're concatenating in:

  ('0 ',(": f1),' > n x')
0 140277582398804 > n x

Calling the callback function prints to the screen the argument array. Knowing this we can start to test with a function that requires a callback. For that example I'll use the qsort C function:

void qsort(void *base, size_t nmemb, size_t size,
           int (*compar)(const void *, const void *));

We can test with our existing callback that just displays the argument output:

  '/lib/x86_64-linux-gnu/libc.so.6 qsort n *l x x *' cd (1 2 3 4);4;8;<<f2
30492672 30492676
30492680 30492684
30492672 30492680
30492676 30492680
0

For qsort I've defined it as taking an array of integers, an integer for the number of members, an integer for the size of each member and a pointer. For arguments I pass an array of integers (the J array is automatically converted to a C array), '4' for the number of members, '8' for the size of each member (eight bytes each) and a boxed number for the pointer to the callback function that takes two arguments.

When run it prints out the two arguments each time the callback is called. Because qsort expects the callback to take pointers as arguments they appear as integers. Now we need to write a callback that dereferences those arguments and returns the result of a comparison.

We can use memr to read the integer value from memory and the '{' dyadic verb to retrieve elements from the array ('x { y' returns the xth element of y).

  cdcallback =: monad define
    lhs =. 0 { y
    rhs =. 1 { y
    i1 =. memr lhs,(0 1 4)
    i2 =. memr rhs,(0 1 4)
    diff =. i1 - i2
    (i1,i2,diff) (1!:2) 2
    if. diff < 0 do. diff =. _1
    elseif. diff > 0 do. diff =. 1
    end.
    diff
  )

Running this with our qsort call gives:

  '/lib/x86_64-linux-gnu/libc.so.6 qsort > n *l x x *' cd (3 7 1 4);4;8;<<f2
3 7 _4
1 4 _3
3 1 2
3 4 _1
7 4 3
0

This looks good! But how do we get the result back? qsort modifies in place, and when converting array arguments J will copy the array to a C array and then copy the results back. To look at the results we need to remove the '>' attribute in the declaration to view the modified parameters:

  '/lib/x86_64-linux-gnu/libc.so.6 qsort n *l x x *' cd (3 7 1 4);4;8;<<f2
3 7 _4
1 4 _3
3 1 2
3 4 _1
7 4 3
┌─┬───────┬─┬─┬─────────────────┐
│0│1 3 4 7│4│8│┌───────────────┐│
│ │       │ │ ││140277582398867││
│ │       │ │ │└───────────────┘│
└─┴───────┴─┴─┴─────────────────┘

The second element of the boxed array is the sorted array. Some points to note:

  • Notice the use of the if., else., end. control structure in the callback. I couldn't get it to work returning the result of the subtraction of i1 - i2. This resulted in the callback always returning zero. I think there may be some number conversion issue happening with diff not being an integer. Assigning a specific integer resolved this.
  • The callback could be written more concisely using other J features but I wanted to keep it readable given only what was covered in the introduction of this post.
  • Dealing with integer sizes, pointers as integers, and all the conversions going on makes things crash if you get the types wrong.
  • I defined cdcallback as a global variable by using =: as the assignment verb. Using =. would make a local variable. This enables creating a local callback, within the scope of another verb or namespace, just before passing it to the function that requires it.
  • If you need multiple callbacks that do different things active at the same time you need to write the one cdcallback and differentiate inside that somehow, maybe based on the number or type of arguments, to decide what action needs to be performed.

Conclusion

Using the FFI from J isn't difficult but requires care. Using callbacks is more complicated and not very well documented but it's possible. Searching the example J code for cdcb or 15!:13 will find example usage. There are other undocumented FFI routines but I haven't looked into them yet.

For more information on J, including tutorials, books, mailing lists, etc there is:

Tags: jlang 

2017-07-31

Reference Capabilities, Consume and Recover in Pony

I've written about reference capabilities in the Pony programming language spread across some of my other posts but haven't written about them directly. This post is my attempt to provide an intuitive understanding of reference capabilities and when to use consume and recover. Hopefully this reduces the confusion when faced with reference capability compilation errors and the need to memorize capability tables.

Reference capabilities are used to control aliasing. Controlling aliasing is important when sharing data to avoid data races in the presence of multiple threads. An example of aliasing is:

let a: Object = create_an_object()
let b: Object = a

In that snippet, 'b' is an alias to the object referenced by 'a'. If I pass 'b' to a parallel thread then the same object can be mutated and viewed by two different threads at the same time resulting in data races.

Reference capabilities allow annotating types to say how they can be aliased. Only objects with a particular set of reference capabilities can be shared across actors.

tag - opaque reference

A tag reference capability is an opaque reference to an object. You cannot read or write fields of the object referenced through a tag alias. You can store tag objects, compare object identity and share them with other actors. The sharing is safe since no reading or writing of state is allowed other than object identity.

let a: Object tag = create_a_tag()
let b: Object tag = a
if a is b then ...we're the same object... end

There is also the digestof operator that returns a unique unsigned 64 bit integer value of an object. This is safe to use on tag:

let a: Object tag = create_a_tag()
env.out.print("Id is: " + (digestof a).string())

The tag reference capability is most often used for actors and passing references to actors around. Actor behaviours (asynchronous method calls) can be made via tag references.

val - immutable and sharable

A val reference capability on a variable means that the object is immutable. It cannot be written via that variable. Only read-only fields and methods can be used. Aliases can be shared with other actors because it cannot be changed - there is no issue of data races when accessed from multiple threads. There can be any number of aliases to the object but they all must be val.

let a: Object val = create_an_object()
let b: Object val = a
let c: Object val = a
call_some_function(b)
send_to_an_actor(c)

All the above are valid uses of val. Multiple aliases can exist within the same actor or shared with other actors.

ref - readable, writable but not sharable

The ref reference capability means the object is readable and writable but only within a single actor. There can exist multiple aliases to it within an actor but it cannot be shared with other actors. If it were sharable with other actors then this would allow data races as multiple threads of control read or write to it in a non-deterministic manner.

let a: Object ref = create_a_ref_object()
let b: Object ref = a
call_some_function(b)

The above are valid uses of ref if they are occuring within a single actor. The following are invalid uses - they will result in a compile error:

let a: Object ref = create_a_ref_object()
send_to_an_actor(a)
let b: Object val = a

The send_to_an_actor call would result in an alias of a being accessible from another thread. This would cause data races so is disallowed and results in a compilation error. The assignment to an Object val is also a compilation error. The reasoning for this is access via b would assume that the object is immutable but it could be changed through the underlying alias a. If b were passed to another actor then changes made via a will cause data races.

iso - readable, writable uniquely

The iso reference capability means the object is readable and writable but only within a single actor - much like ref. But unlike ref it cannot be aliased. There can only be one variable holding a reference to an iso object at a time. It is said to be 'unique' because it can only be written or read via that single unique reference.

let a: Object iso = create_an_iso_object()
let b: Object iso = a
call_some_function(a)

The first line above creates an iso object. The other two lines are compilation errors. The assignment to b attempts to alias a. This would enable reading and writing via a and b which breaks the uniqueness rule.

The second line calls a function passing a. This is an implicit alias of a in that the parameter to call_some_function has aliased. It is readable and writable via a and the parameter in call_some_function.

When it comes to fields of objects things get a bit more complicated. Reference capabilities are 'deep'. This means that the capability of an enclosing object affects the capability of the fields as seen by an external user of the object. Here's an example that won't work:

class Foo
  var bar: Object ref = ...

let f: Foo iso = create_a_foo()
let b: Object ref = create_an_object()
f.bar = b

If this were to compile we would have a ref alias alive in b and another alias to the same object alive in the bar field of f. We could then pass our f iso object to another actor and that actor would have a data race when trying to use bar since the original actor also has an alias to it via b.

The uniqueness restriction would seem to make iso not very useful. What makes it useful is the ability to mark aliases as no longer used via the consume keyword.

consume - I don't want this alias

The consume keyword tells Pony that an alias should be destroyed. Not the object itself but the variable holding a reference to it. By removing an alias we can pass iso objects around.

let a: Object iso = create_an_iso_object()
let b: Object iso = consume a
call_some_function(consume b)

This snippet creates an iso object referenced in variable a. The consume in the second line tells Pony that a should no longer alias that object. It's floating around with nothing pointing to it now. Pony refers to this state as being 'ephemeral'. At this point the variable a doesn't exist and it is a compile error to use it further. The object has no aliases and can now be assigned to another variable, in this case b. This meets the requirements of iso because there is still only one reference to the object, via b.

The function call works in the same manner. The consume b makes the object ephemeral and can then be assigned to the parameter for the function and still meet the uniqueness requirements of iso.

iso objects can be sent to other actors. This is safe because there is only a single alias. Once it has been sent to another actor, the alias from the original actor cannot read or be written to because the alias it had was consumed:

let a: Object iso = create_an_iso_object()
send_to_an_actor(consume a)

Converting capabilities

Given an iso reference that is consumed, can that ephemeral object be assigned to other reference capabilities other than iso? The answer to that is yes.

Intuitively this makes sense. If you have no aliases to an object then when you alias that object you can make it whatever capability you want - it is like having created a new object, nothing else references it until you assign it. From that point on what you can do with it is restricted by the reference capability of the variable you assigned it to.

let a: Object iso = create_an_iso()
let b: Object val = consume a

let c: Object iso = create_an_iso()
let d: Object ref = consume c;

The above are examples of valid conversions. You can have an iso, make changes to the object, then consume the alias to assign to some other reference capability. Once you've done that you are restricted by that new alias:

let c: Object iso = create_an_iso()
let d: Object ref = consume c;
send_to_an_actor(d)

That snippet is an error as ref cannot be sent to another actor as explained earlier. This is also invalid:

let c: Object iso = create_an_iso()
let d: Object ref = consume c;
send_to_an_actor(c)

Here we are trying to use c after it is consumed. The c alias no longer exists so it is a compile error.

What if you want to go the other way and convert a val to an iso:

let a: Object val = create_a_val()
let b: Object iso = consume a

This is an error. Consuming the a alias does not allow assigning to another reference capability. Because val allows multiple aliases to exist the Pony compiler doesn't know if a is the only alias to the object. There could be others aliases elsewhere in the program. iso requires uniqueness and the compiler can't guarantee it because of this. The same reasoning is why the following is an error:

let a: Object val = create_a_val()
let b: Object ref = consume a

Intuitively we can reason why this fails. ref allows reading and writing within an actor. val requires immutability and can have multiple aliases. Even though we consume a there may be other aliases around, like the iso example before. Writing to the object via the b alias would break the guarantee of any other aliases to a.

Given this, how do you do this type of conversion? This is what the recover expression is used for.

recover - restricted alias conversion

A recover expression provides a block scope where the variables that can enter the scope are restricted based on their reference capability. The restriction is that only objects that you could send to an actor are allowed to enter the scope of the recover expression. That is iso, val and tag.

Within the recover expression you can create objects and return them from the expression as a different reference capability than what you created them as. This is safe because the compiler knows what entered the block, knows what was created within the block, and can track the aliases such that it knows it's safe to perform a particular conversion.

let a: Object val = recover val
                      let b: Object ref = create_a_ref()
                      ...do something...
                      b
                    end

In this snippet we create a ref object within the recover block. This can be returned as a val because the compiler knows that all aliases to that ref object exist within the recover block. When the block scope exits those aliases don't exist - there are no more aliases to the object and can be returned as the reference type of the recover block.

How does the compiler know that the b object wasn't stored elsewhere within the recover block? There are no global variables in Pony so it can't be stored globally. It could be passed to another object but the only objects accessable inside the block are the restricted ones mentioned before (iso, val and tag). Here's an attempt to store it that fails:

var x: Object ref = create_a_ref()
let a: Object val = recover val
                      let b: Object ref = create_a_ref()
                      x = b
                      b
                    end 

This snippet has a ref object created in the enclosing lexical scope of the recover expression. Inside the recover an attempt is made to assign the object b to that variable x. Intuitively this should not work - allowing it would mean that we have a readable and writeable alias to the object held in x, and an immutable alias in a allowing data races. The compiler prevents this by not allowing a ref object from the enclosing scope to enter a recover expression.

Can we go the other way and convert a val to a ref using recover? Unfortunately the answer here is no.

let a: Object ref = recover ref
                      let b: Object val = create_a_ref()
                      ...do something...
                      b
                    end

This results in an error. The reason is a val can be stored in another val variable in the enclosing scope because val objects are safely shareable. This would make it unsafe to return a writeable alias to the val if it is stored as an immutable alias elsewhere. This code snippet shows how it could be aliased in this way:

let x: Object val = create_a_val()
let a: Object val = recover val
                      let b: Object val = create_a_ref()
                      x = b
                      b
                    end

We are able to assign b to a variable in the enclosing scope as the x variable is a val which is one of the valid reference capabilities that can be accessed from within the recover block. If we were able to recover to a ref then we'd have a writeable and an immutable alias alive at the same time so that particular conversion path is an error.

A common use for recover is to create objects with a reference capability different to that defined by the constructor of the object:

class Foo
  new ref create() => ...

let a: Foo val = recover val Foo end
let b: Foo iso = recover iso Foo end

The reference capability of the recover expression can be left out and then it is inferred by the capability of the variable being assigned to:

let a: Foo val = recover Foo end
let b: Foo iso = recover Foo end

Two more reference capabilities to go. They are box and trn.

box - allows use of val or ref

The box reference capability provides the ability to write code that works for val or ref objects. A box alias only allows readonly operations on the object but can be used on either val or ref:

let a: Object ref = create_a_ref()
let b: Object val = create_a_val()
let c: Object box = a
let d: Object box = b

This is particularly useful when writing methods on a class that should work for a receiver type of val and ref.

class Bar
  var count: U32 = 0

  fun val display(out:OutStream) =>
    out.print(count.string())

actor Main
  new create(env:Env) =>
    let b: Bar val = recover Bar end
    b.display(env.out)

This example creates a val object and calls a method display that expects to be called by a val object (the "fun val" syntax). The this from within the display method is of reference capability val. This compiles and works. The following does not:

let b: Bar ref = recover Bar end
b.display(env.out)

Here the object is a ref but display expects it to be val. We can change display to be ref and it would work:

fun ref display(out:OutStream) =>
  out.print(count.string())

But now we can't call it with a val object as in our first example. This is where box comes in. It allows a ref or a val object to be assigned to it and it only allows read only access. This is safe for val as that is immutable and it is safe for ref as an immutable view to the ref:

fun box display(out:OutStream) =>
  out.print(count.string())

Methods are box by default so can be written as:

fun display(out:OutStream) =>
  out.print(count.string())

As an aside, the default of box is the cause for a common "new to Pony" error message where an attempt to mutate a field in an object fails with an "expected box got ref" error:

fun increment() => count = count + 1

This needs to be the following as the implicit box makes the this immutable within the method:

fun ref increment() => count = count + 1

trn - writeable uniquely, consumable to immutable

A trn reference capability is writeable but can be consumed to an immutable reference capability, val. This is useful for cases where you want to create an object, perform mutable operations on it and then make it immutable to send to an actor.

let a: Array[U32] trn = recover Array[U32] end
a.push(1)
a.push(2)
let b: Array[U32] val = consume a
send_to_actor(b)

box and ref methods can be called on trn objects:

class Bar
  var count: U32 = 0

  fun box display(out:OutStream) =>
    out.print(count.string())

  fun ref increment() => count = count + 1

actor Main
  new create(env:Env) =>
    let a: Bar trn = recover Bar end
    a.increment()
    a.display(env.out)

This provides an alternative to the "How do I convert a ref to a val?" question. Instead of starting with a ref inside a recover expression you can use trn and consume to a val later.

You can use iso in place of trn in these examples. Where trn is useful is passing it to box methods to perform readonly operations on it. This is difficult with iso as you have to consume the alias everytime you pass it around, and the methods you pass it to have to return it again if you want to perform further operations on it. With trn you can pass it directly.

actor Main
  let out: OutStream

  fun display(b: Bar box) =>
    b.display(out)

  new create(env:Env) =>
    out = env.out

    let a: Bar trn = recover Bar end
    display(a)
    let b : Bar val = consume a
    send_to_actor(b)

The equivalent with iso is more verbose and requires knowledge of ephemeral types (the hat, ^, symbol):

actor Main
  let out: OutStream

  fun display(b: Bar iso): Bar iso^ =>
    b.display(out)
    consume b

  new create(env:Env) =>
    out = env.out

    let a: Bar iso = recover Bar end
    let b: Bar iso = display(consume a)
    let c: Bar val = consume b
    send_to_actor(c)

Capability Subtyping

I've tried to use a lot of examples to help gain an intuitive understanding of the capability rules. The Pony Tutorial has a Capability Subtyping page that gives the specific rules. Although technical seeming the rules there encode our implicit understanding. This section is a bit more complex and isn't necessary for basic Pony programming if you have a reasonable grasp of it intuitively. It is however useful for working out tricky capability errors and usage.

The way to read those rules are that "<:" means "is a subtype of" or "can be substituted for". So "ref :< box" means that a ref object can be assigned to a box variable:

let a: Object ref = create_a_ref()
let b: Object box = a

The effects are transitive. So if "iso^ <: iso" and "iso <: trn" and "trn <: ref" then "iso^ <: ref":

let a: Object iso = create_an_iso()
let b: Object ref = consume a

Notice we start with iso^ which is an ephemeral reference capability. We get ephemeral types with consume. So consuming the iso gives an iso^ which can be assigned to a ref due to the transitive subtyping path above.

Why couldn't we assign the iso directly without the consume? This is explained previously using inutition but following the rules on the subtyping page we see that "iso! <: tag". The ! is for an aliased reference capability. When we do "something = a" we are aliasing the iso and the type of that a in that expression is iso!. This can only be assigned to a tag according to that rule:

let a: Object iso = create_an_iso()
let b: Object tag = a

Notice there is no "iso! <: iso" which tells us that an alias to an iso cannot be assigned to an iso which basically states the rule that iso can't be aliased.

In a previous section I used an ephemeral type in a method return type:

fun display(b: Bar iso): Bar iso^ =>
    b.display(out)
    consume b

This was needed because the result of display was assigned to an iso:

let b: Bar iso = display(consume a)

If we used Bar iso as the return type then the compiler expects us to be aliasing the object being returned. This alias is of type iso!. The error message states that iso! is not a subtype of iso which is correct as there is no "iso! :< iso" rule. Thankfully the error message tells us that "this would be possible if the subcap were more ephemeral" which is the clue that we need the return type to be ephemeral.

Viewpoint Adaption

I briefly mentioned in a previous section that reference capabilities are 'deep' and this is important when accessing fields of objects. It is also important when writing generic classes and methods and using collections.

Viewpoint adaption is described in the combining capabilities part of the tutorial. I also have a post, Borrowing in Pony which works through some examples.

The thing to remember is that a containing objects reference capability affects the reference capabilities of fields and methods when accessed via a user of the container.

let a: Array[Bar ref] iso = recover Array[Bar ref] end
a.push(Bar)
try
  let b: Bar ref = a(0)?
end

Here is an iso array of Bar ref objects. The third line to retrieve an element of the array fails to compile, stating that tag is not a subtype of ref. Where does the tag come from? Intuitively we can reason that we shouldn't be able to get a ref alias of an item in an iso array as that would give us two ref aliases of an item in an iso that could be shared across actors. This can give data races.

Viewpoint adaption encodes this in the type rules. We have receiver, a of type iso, attempting to call the apply method of the array. This method is declared as:

fun apply(i: USize): this->A ?

The this->A syntax is the viewpoint adaption. It states that the result of the call is the reference capability of A as seen by this. In our case, this is an iso and A is a ref. The viewpoint adaption for iso->ref is tag and that's where the tag in the error message comes from.

We could get an immutable alias to an item in the array if the array was trn:

let a: Array[Bar ref] trn = recover Array[Bar ref] end
a.push(Bar)
try
  let b: Bar box = a(0)?
end

The viewpoint adaption table shows that trn->ref gives box. To get a ref item we'd need the array to be ref:

let a: Array[Bar ref] ref = recover Array[Bar ref] end
a.push(Bar)
try
  let b: Bar ref = a(0)?
end

For more on viewpoint adaption I recommend my Borrowing in Pony and Bang, Hat and Arrow posts.

Miscellaneous things

In my examples I've used explicit reference capabilities in types to make it a bit clearer of what is happening. They can be left off in places to get reasonable defaults.

When declaring types the default capability is the capability defined in the class:

class ref Foo

let a: Foo = ...

class val Bar
let b: Bar = ...

The type of a is Foo ref and the type of Bar is Bar val due to the annotation in the class definition. By default classes are ref if they aren't annotated.

The type of an object returned from a constructor is based on the type defined in the constructor:

class Foo
  new ref create() => ...
  new val create_as_val() => ...
  new iso create_as_iso() => ...

let a: Foo val = Foo.create_as_val()

Here the Foo class is the default type ref, but there are constructors that explicitly return ref, val and iso objects. As shown previously you can use recover to change the capability returned by a constructor in some instances.

let a: Foo val = recover Foo.create() end // Ok - ref to val
let b: Foo ref = recover Foo.create_as_val() end // Not Ok - val to ref

As can be seen, converting val to ref using recover is problematic as shown in previous examples.

Conclusion

Reference capabilities are complex for people new to Pony. An intuitive grasp can be gained without needing to memorize tables of conversions. This intuitive understanding requires thinking about how objects are being used and whether such use might cause data races or prevent sharing across actors. Based on that understanding you pick the capabilities to match what you want to use. For those times that errors don't match your understanding use the viewpoint adaption and capability subtyping tables to work out where things are going wrong. Over time your intuitive understanding improves for these additional edge cases.

Tags: pony 

2017-07-15

Runtime typing and eval in Alice ML

I originally wtote this post three years ago but I wasn't happy with how it read so never finished it. It's been sitting around in draft making me feel guilty for too long so I've cleaned it up and published it.

I like the style of prototyping and programming that dynamic languages like Self promote. When building systems inside the language environment it feels like you are living in a soup of objects. Programming becomes creating and connecting objects to perform tasks while debugging and refactoring as you go. The animated image below shows a use of the Self environment to instantiate and run a VNC object I wrote for example. Other examples can be seen in screencasts in my Self language posts.

Recently I've been using more statically typed languages to explore the world of type safety and how it can improve correctness of programs. My series of ATS posts go through a lot of the features that this approach provides. Most of these languages promote an edit/compile/link/run style of development and I miss the live development and debugging in the dynamic environments.

Some of the statically typed functional programming languages provide ways of doing dynamic types. Alice ML, developed in mid-2000, was an extension of Standard ML which provided support for concurrent, distributed and constraint programming. It was an attempt to see what a statically typed functional version of the Mozart/Oz language would be like. Development stopped in 2007 with the release of version 1.4 of Alice ML but the system remains very useable. I had been following Alice ML since the early days of its development and the concurrency and distribution features of it were the inspiration for some of my explorations with using futures and promises in JavaScript and concurrency in Factor.

As part of the support for distributed programming it required the ability to serialize and deserialize values along with their types. This form of dynamic behaviour would seem to be useful for developing a live coding environment. In fact Alice ML includes a GUI editor and REPL written in Alice ML that makes use of the library to evaluate, compile and produce components and executables.

I've imported the source of Alice ML into a github repository with minor bitrot changes and a couple of bug fixes so that it builds on recent Linux and Mac OS X systems. The code there is from the original Alice ML source with many changes and fixes made by Gareth Smith in his bitbucket repository. The original Alice developers and Gareth have kindly allowed me to host this on github.

Packages

Alice ML does dynamic runtime typing through packages. A package encapsulates a module and its signature. The package is an opaque type and accessing the module stored within is only possible via an unpack operation which requires giving the explicit signature of the module stored. If the signature doesn't match the type of the module stored then a runtime exception occurs. Packages can be passed around as first class values, stored in files, sent to other processes, etc.

Packages are created using the pack expression. This follows the form:

pack structure_expression : signature_expression

Where structure_expression and signature_expression are expressions that evaluate to Standard ML structures and signatures. The following would create a package for the value 42 stored in a module (as typed in the Alice ML REPL):

> val p = pack struct val x:int = 42 end : sig val x:int end;
val p : package = package{|...|}

In the pack expression a structure is created with an x member of type int and the value of that is 42. This structure is the value that is stored in the package. The type of this is given by the signature expression and when later unpacked only this signature can be used to get at the value. For simple examples like this the struct and sig syntax is quite verbose but Alice ML allows shortening this to just the contents of the structure and signature. The following is the equivalent shortened code:

> val p = pack (val x:int = 42) : (val x:int);
val p : package = package{|...|}

Getting the value back from a package is done using unpack. The general form of the unpack expression is:

unpack expression : signature_expression

If the signature_expression does not match the signature of the value stored in the package then an exception is raised. The type of the unpack expression is signature_expression so if it successfully unpacks then use of the resulting value is type safe. Unpacking our example above looks like:

> structure S = unpack p : sig val x:int end;
structure S : sig val x : int end

Or using the shorter syntax:

> structure S = unpack p : (val x:int);
structure S : sig val x : int end

The resulting module S can be used as any other SML module to access the fields within:

> print (Int.toString S.x);
42

Eval

To create an environment that allows evaluating code and manipulating results requires an eval facility. Alice ML provides this through the Compiler module. This module provides, amongst other functions, the following variants of eval:

val eval :     string -> package
val evalWith : env * string -> env * package

The first function, eval takes a string of Alice ML code, evaluates it, and returns the result as a package. The second, evalWith, takes an additional parameter which is the environment which is used to evaluate the code within. It also returns the modified envrionment after evaluating the code. This allows keeping a persistent state of changes made by the evaluated code.

The result is returned as a package because the type of the evaluated code is unknown. It could be anything. If the caller of eval needs to manipulate or display the result in some manner it needs to unpack it with a known type that it expects it to contain and handle any exception that might occur if the type is incorrect at runtime. An example of doing this is:

> val x = Compiler.eval("1+2");
val x : package = package{|...|}
> structure X = unpack x : (val it:int);
structure X : sig val it : int end
> X.it;
val it : int = 3

In this case the result of our evaluation is an int so this is what's used in the signature for the unpack expression.

An example using evalWith to track the changes to the environment is:

> val x = Compiler.evalWith(Compiler.initialEnv,
                            "fun fac(n:int) = if n <= 1 then 1 else n * fac(n - 1)");
val x : Compiler.env * package = (_val, package{|...|})
> val y = Compiler.evalWith(#1 x, "fac(10)");
val y : Compiler.env * package = (_val, package{|...|})
> structure Y = unpack (#2 y) : (val it:int);
structure Y : sig val it : int end
> Y.it;
val it : int = 3628800

The function evalWith returns a tuple where the first element is the resulting environment after the evaluation and the second element is the package containing the result. For the second call to evalWith the environment resulting from the first call is passed to it so the function fac can be found.

Pretty Printing

One thing to note in the previous example is that the call to unpack required knowing the type of what we were unpacking. This is usually the case but when writing a REPL we need to print the result of evaluating what is entered at the top level - and this could be any type depending on what the user entered to evaluate.

There are some internal Alice ML modules that make it possible to do this. An example follows:

> import structure Reflect     from "x-alice:/lib/system/Reflect";
> import structure PPComponent from "x-alice:/lib/system/PPComponent";
> import structure PrettyPrint from "x-alice:/lib/utility/PrettyPrint";
> val a = Compiler.prepareWith (Compiler.initialEnv, "1+2");
val a : Compiler.env * (unit -> package) * t = (_val, _fn, _val)
> val b = (#2 a) ();
val b : package = package{|...|}
> val c = Reflect.reflectPackage b;
val c : Reflect.module * t = (_val, _lazy)
> val d = PPComponent.ppComp(#1 c, #2 c);
val d : doc = _val
> PrettyPrint.toString(d,40);
val it : string = "val it : int = 3"

The Compiler.prepareWith function does not evaluate the string passed to it but performs part of the step of evaluation. It returns a tuple containing the environment which will result from evaluation, a function that when called will perform the evaluation, and a value representing the type of the result of the evaluation.

In step (b) the evaluation function is called which returns the package containing the result. Reflect.reflectPackage returns a tuple describing the package. These are passed to PPComponent.ppComp to return a PrettyPrint document. The pretty printer is based on A prettier printer by Phil Wadler. PrettyPrint.toString converts this to a string which could then be displayed by a REPL.

Conclusion

As mentioned previously the Alice ML tools are written in Alice ML. The toplevel code uses the modules and functions outlined previously to implement the REPL and IDE. Unfortunately it's mostly undocumented but the source is available to show how it is implemented and used.

There's much more to Alice ML run time use of types, including pickling, components, sandboxing, and distribution .

An interesting exercise would to to write a web based client to provide a "Try Alice ML" in a similar manner to other languages online playgrounds to allow trying Alice ML code snippets without needing to install it. I'd also like to explore how close to a Self like environment could be done in an Alice ML system.

Tags: aliceml 

2017-05-16

Distributed Wikipedia Mirrors in Freenet

There was a recent post about uncensorable Wikipedia mirrors on IPFS. The IPFS project put a snapshot of the Turkish version of Wikipedia on IPFS. This is a great idea and something I've wanted to try on Freenet.

Freenet is an anonymous, secure, distributed datastore that I've written a few posts about. It wasn't too difficult to convert the IPFS process to something that worked on Freenet. For the Freenet keys linked in this post I'm using a proxy that retrieves data directly from Freenet. This uses the SCGIPublisher plugin on a local Freenet node. The list of whitelisted keys usable are at freenet.cd.pn. There is also a gateway available at d6.gnutella2.info. The keys can also be used directly from a Freenet node, which is likely to be more performant than going through my underpowered proxy. Keep in mind that the "distributed, can't be taken down" aspect of the sites on Freenet is only when accessed directly through Freenet. It's quite likely my clearnet proxy won't be able to handle large amounts of traffic.

I started with the Pitkern/Norfuk Wikipedia Snapshot as that was relatively small. Once I got the scripts for that working I converted the Māori Wikipedia Snapshot. The lastest test I did was the Simple English Wikipedia Snapshot. This was much bigger so I did the version without images first. Later I plan to try the version with images when I've resolved some issues with the current process.

The Freenet keys for these mirrors are:

  • USK@m79AuzYDr-PLZ9kVaRhrgza45joVCrQmU9Er7ikdeRI,1mtRcpsTNBiIHOtPRLiJKDb1Al4sJn4ulKcZC5qHrFQ,AQACAAE/simple-wikipedia/0/
  • USK@jYBa5KmwybC9mQ2QJEuuQhCx9VMr9bb3ul7w1TnyVwE,OMqNMLprCO6ostkdK6oIuL1CxaI3PFNpnHxDZClGCGU,AQACAAE/maori-wikipedia/5/
  • USK@HdWqD7afIfjYuqqE74kJDwhYa2eetoPL7cX4TRHtZwc,CeRayXsCZR6qYq5tDmG6r24LrEgaZT9L2iirqa9tIgc,AQACAAE/pitkern-wikipedia/2/

The keys are 'USK' keys. These keys can be updated and have an edition number at the end of them. This number will increase as newer versions of the mirrors are pushed out. The Freenet node will often find the latest edition it knows about, or the latest edition can be searched for using '-1' as the edition number.

The approach I took for the mirroring follows the approach IPFS took. I used the ZIM archives provided by Kiwix and a ZIM extractor written in Rust. The archive was extracted with:

$ extract_zim wikipedia_en_simple_all_nopic.zim

This places the content in an out directory. All HTML files are stored in a single directory, out/A. In the 'simple english' case that's over 170,000 files. This is too many files in a directory for Freenet to insert. I wrote a script in bash to split the directory so that files are stored in '000/filename.html' where '000' is the first three digits of a SHA256 hash of the base filename, computed with:

$ echo "filename.html"|sha256sum|awk '{ print $1 }'|cut -c "1,2,3"

The script then went through and adjusted the article and image links on each page to point to the new location. The script does some other things to remove HTML tags that the Freenet HTML filter doesn't like and to add a footer about the origin of the mirror.

Another issue I faced was that filenames with non-ascii characters would get handled differently by Freenet if the file was inserted as a single file vs being inserted as part of a directory. In the later case the file could not be retrieved later. I worked around this by translating filenames into ascii. A more robust solution would be needed here if I can't track down where the issue is occurring.

This script to do the conversion is in my freenet-wikipedia githib repository. To convert a ZIM archive the steps are:

$ wget http://download.kiwix.org/zim/wikipedia_pih_all.zim
$ extract_zim wikipedia_pih_all.zim
$ ./convert.sh
$ ./putdir.sh result my-mirror index.html

At completion of the insert this will output a list of keys. the uri key is the one that can be shared for others to retrieve the insert. The uskinsert key can be used to insert an updated version of the site:

$ ./putdir.sh result my-mirror index.html <uskinsert key>

The convert.sh script was a quick 'proof of concept' hack and could be improved in many ways. It is also very slow. It took about 24 hours to do the simple english conversion. I welcome patches and better ways of doing things.

The repository includes a bash script, putdir.sh, which will insert the site using the Freenet ClientPutDiskDir API message. This is a useful way to get a directory online quickly but is not an optimal way of inserting something the size of the mirror. The initial request for the site downloads a manifest containing a list of all the files in the site. This can be quite large. It's 12MB for the Simple English mirror with no images. For the Māori mirror it's almost 50MB due to the images. The layout of the files doesn't take into account likely retrieval patterns. So images and scripts that are included in a page are not downloaded as part of the initial page request, but may result in pulling in larger amounts of data depending on how that file was inserted. A good optimisation project would be to analyse the directory to be inserted and create an optimal Freenet insert for faster retrieval. pyFreenet has a utility, freesitemgr, that can do some of this and there are other insertion tools like jSite that may also do a better job.

My goal was to do a proof of concept to see if a Wikipedia mirror on Freenet was viable. This seems to be the case and the Simple English mirror is very usable. Discussion on the FMS forum when I announced the site has been positive. I hope to improve the process over time and welcome any suggestions or enhancements to do that.

What are the differences between this and the IPFS mirror? It's mostly down to how IPFS and Freenet work.

In Freenet content is distributed across all nodes in the network. The node that has inserted the data can turn their node off and the content remains in the network. No single node has all the content. There is redundancy built in so if nodes go offline the content can still be fully retrieved. Node space is limited so as data is inserted into Freenet, data that is not requested often is lost to make room. This means that content that is not popular disappears over time. I suspect this means that some of the wikipedia pages will become inaccessible. This can be fixed by periodically reinserting the content, healing the specific missing content, or using the KeepAlive plugin to keep content around. Freenet is encrypted and anonymous. You can browse Wikipedia pages without an attacker knowing that you are doing so. Your node doesn't share the Wikipedia data, except possibly small encrypted chunks of parts of it in your datastore, and it's difficult for the attacker to identify you as a sharer of that data. The tradeoff of this security is retrievals are slower.

In IPFS a node inserting the content cannot be turned off until that content is pinned by another node on the network and fully retrieved. Nodes that pin the content keep the entire content on their node. If all pinned nodes go offline then the content is lost. All nodes sharing the content advertise that fact. It's easy to obtain the IP address of all nodes that are sharing Wikipedia files. On the positive side IPFS is potentially quite a bit faster to retrieve data.

Both IPFS and Freenet have interesting use cases and tradeoffs. The intent of this experiment is not to present one or the other as a better choice, but to highlight what Freenet can do and make the content available within the Freenet network.

Tags: freenet 


This site is accessable over tor as hidden service mh7mkfvezts5j6yu.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-44/


Tags

Archives
Links