Bluish Coder

Programming Languages, Martials Arts and Computers. The Weblog of Chris Double.


2017-09-21

Using the J Foreign Function Interface

The J Programming Language is an array oriented or vector based programming language. It is in the same family of programming languages as APL and K.

I won't be going into deep details of J syntax or a tutorial of how to use it in this post - for that I recommend something like J for C Programmers which is also available as a paperback book - but I'll provide a quick overview of syntax used here.

What I want to cover in this post is how to use the FFI to call C functions in J, and how to use callbacks from C back to J. The latter isn't well documented and I always forget how to do it.

Introduction to J

J provides a concise syntax for performing operations. Many things that would be expressed as english words in other languages use ASCII symbols in J. The J Dictionary and Vocabulary provides details on what these are.

J has a REPL similar to Lisp environments. For a taste of J without installing it there is an online J IDE that can be used from a browser and details on that are here. Otherwise the native J system is available from JSoftware downloads. Source is available on github.

J can be used as a calculator to perform operations on numbers:

  10 + 2
12

There is no operator precedence, use brackets to define order of operations. Order of evaluation is right to left:

  3 * 4 + 10
42

  (3 * 4) + 10
22

Sequences of numbers separated by white space are arrays of that number. Operations can be performed on arrays. To multiply each element of an array 1 2 3 4 by 2:

  2 * 1 2 3 4
2 4 6 8

Arrays can be multi-dimensional. One way of creating such arrays is via the '$', or Shape, operation. To create a 2x3 array:

  2 3 $ 1 2 3
1 2 3
1 2 3

Functions are called verbs in J terminology. The * and + operators above are verbs. Functions that operate on verbs are called adverbs. An example is the adverb '/' which inserts a verb between elements of an array and evaluates it. An example to sum an array:

  +/ 1 2 3 4
10

The +/ forms an operation that will do the equivalent of inserting + between each element of the array like:

  1 + 2 + 3 + 4
10

Other verbs can have / applied to it:

  */ 1 2 3 4
24

Verbs are monadic or dyadic. The term monadic is not related to the use of monads in languages like Haskell. It means the verb takes a single argument on the right hand side of the verb. A dyadic verb takes two arguments, one on the left and right. Many verbs have both dyadic and monadic forms. Dyadic - is the subtraction operator. Monadic - negates its parameter:

  10 - 5
5

  -5
_5

The _ character signifies a negative number to differentiate from monadic -. _5 is negative 5 whereas -5 is the application of monadic - to the number 5.

A sequence of three verbs together is called a fork. '+/ % #' is a commonly demonstrated fork. It is three verbs together. The monadic # to count the number of items in an array. The +/ verb which sums an array and dyadic % which is division:

  # 1 2 3 4
4

  +/ 1 2 3 4
10

  10 % 4
2.5

  (+/ 1 2 3 4) % (# 1 2 3 4)
2.5

  (+/ % #) 1 2 3 4
2.5

A fork of '(A B C) X', where B is dyadic and A and C are monadic does (A X) B (C X). The combination of verbs, adverbs, nouns and the right to left evaluation of J gives it a "Programming with functions" feel and concise syntax.

J has variable assignment which can include assignment of verb sequences to named variables. =: does global variable assignment and =. does local variable assignment (local to the scope of verb or namespace):

  x   =: 1 2 3 4
  avg =: +/%#
  avg x
2.5

Vales in J can be boxed or unboxed. The examples below are unboxed. They can be boxed with the '<' monadic verb. In the REPL boxed values are drawn with a box around them. To unbox a boxed value, use the monadic verb '>'. The ';' varidic verb will form an array of boxed elements. For example:

  a =: <1
  a
┌─┐
│1│
└─┘
  >a
1
  1;2;3;4
┌─┬─┬─┬─┐
│1│2│3│4│
└─┴─┴─┴─┘

All elements in an unboxed array must have the same type. Elements in boxed arrays can be different types.

Hopefully that gives a taste of J syntax so the rest of this post is understandable if you don't know much of the language. I recommend playing around in the J REPL to get a feel.

Foreign Conjunctions

Calling C functions from J requires using what's called a Foreign Conjunction. These are special system operations defined by the J implementation and are of the form 'x !: y' where x and 'y' are numbers. They return verbs or adverbs themselves so the result of the conjuction is applied to more arguments. An example would be the Space category of foreign conjuctions that returns information about J memory usage:

  (7 !: 0) ''
10939904

Here 7 !: 0 returns a verb which takes a single argument (which is ignored) and returns the number of bytes currently in use by the J interpreter.

  (7!:2) '+/ 1 2 3 4'
1664

7!:2 returns a verb that when called with an argument returns the number of bytes required to execute that argument as a J sentence.

There's lots more described in the Foreign Conjunction documentation which allows reflecting an introspecting on just about everything in the J system.

Foreign Function Interface

The foreign function interface is the set of foreign conjunctions where the left side of the conjunction verb is the number 15. There's a short list of these in the Dynamic Link Library part of the dictionary, and further explanation in the Dlls section of the user manual. There are more verbs than are listed in these pages and the J source file x15.c gives an idea of the scope. Most of the additional ones are used in implementation of the J library but we have to delve into undocumented features to handle C callbacks.

The J source file stdlib.ijs contains the definitions for names that make the foreign conjunctions easier to use. The ones covered in this post are:

cd    =: 15!:0
memr  =: 15!:1
memw  =: 15!:2
mema  =: 15!:3
memf  =: 15!:4
cdf   =: 15!:5
cder  =: 15!:10
cderx =: 15!:11
cdcb  =: 15!:13

To call a foreign function we use the '15!:0' verb, defined as cd in the standard library. This is described in the documentation as 'Call DLL Function' and is documented in the Calling DLLs section of the user manual. It is dyadic and has the form:

'filename procedure [>] [+] [%] declaration' cd parameters

The descriptions below look complicated but will become clearer with a few examples.

The first argument is a string that contains the function name and arguments it takes. It's the declaration of the function basically. The second parameter is an array of the arguments that the foreign function takes.

The filename is the name of a DLL or shared library. It can also be one of the following special filenames:

  • 0 = If filename is 0 then the 'procedure' is a signed integer representing a memory address for where the function is located.
  • 1 = If filename is 1 then the 'procedure' is a non-negative integer that is the index of a procedure in a vtable. The first element of the parameter array should be an address of the address of the vtable, the 'object' address. The declaration of the first parameter should be '*' or 'x'. This is used for calling COM objects or Java Native Interface objects.

The procedure is the name of the function in the DLL or shared library to call. It can also be a number as described in the filename portion above.

The '>', '+' and '%' parts are optional and define attributes of the function:

  • '>' = The procedure returns a scalar result. Without '>' the result is the boxed scalar result appended to the possibly modified list of boxed arguments.
  • '+' = Chooses the calling convention of the fnuction. On Windows the standard calling convention is __stdcall and using '+' selects __cdecl. On Unix using '+' has no effect.
  • '%' = Does a floating point reset after the call. Some procedures leave floating point in an invalid state.

'declaration' is a string of blank delimited characters defining the types of the arguments passed to the function, and the type of the result. They can be:

  • c = J character (1 byte)
  • w = J literal2 (2 byte)
  • u = J literal4 (4 byte)
  • s = short integer (2 byte)
  • i = integer (4 byte)
  • l = long integer (8 byte)
  • x = J integer (4 or 8 byte depending on 32/64 bit architecture)
  • 'f' = short float (4 byte)
  • 'd' = J float (8 byte)
  • 'j' = J complex (16 byte - 2 d values) (pointer only, not as result or scalar)
  • '*' = pointer. This can be followed by any of the above to indicate the type pointed to. A parameter passed in the place where a pointer is expected can be a J array of the right type, or a scalar boxed scalar integer that is a memory address.
  • 'n' = no result (result is ignored and '0' is returned)

The first entry in the 'declaration' is the result type and the remaining ones describe each parameter provided in the second argument to 'cd'.

The 'parameters' argument to 'cd' should be a boxed array. If it is not boxed then it will be boxed for you.

Examples

The following example calls the C function puts from J on my Linux OS:

  '/lib/x86_64-linux-gnu/libc.so.6 puts x *c' cd <'hello'
┌─┬─────┐
│6│hello│
└─┴─────┘

The full path to the shared library is given with puts as the function name. The 'declaration' is set to return a J integer and marks the function as taking a pointer to characters as an argument. We pass the boxed array of characters with <'hello'.

The result is an boxed array where the first element is the value returned from the function. This is 6, the number of characters printed. The result of the result is the arguments passed to the function. The output of puts can be seen on the console.

If we just want the result of the function, without the appended list of parameters, then we can use the '>' attribute in the first argument to 'cd':

  '/lib/x86_64-linux-gnu/libc.so.6 puts > x *c' cd <'hello'
6

Loading shared libraries using cd uses system resources. To unload shared libraries currently in use by J use the '15!:5' foreign conjunction, or cdf. It is monadic and ignores its argument:

  cdf ''

Errors

If the shared library can't be found, or there is some other problem with the call, then a 'domain error' will result:

  'doesnotexist.so puts > x *c' cd <'hello'
|domain error

To find the reason for the domain error, use the '15!:10' Foreign Function verb to identify the error. If is a monadic verb that ignores its argument and prints out information about a domain error. In the standard library the name cder is assigned to '15!:10'. The meaning of what it returns is:

  • 0 0 = no error
  • 1 0 = file not found
  • 2 0 = procedure not found
  • 3 0 = too many DLLs loaded (max 20)
  • 4 0 = too many or too few parameters
  • 5 x = declaration x invalid
  • 6 x = parameter x type doesn't match declaration
  • 7 0 = system limit - linux64 max 8 float/double scalars

In the example above it indicates 'file not found':

  cder ''
1 0

For a detailed description of the error use '15!:11', or cderx. It is monadic, ignores its arguments, and returns a boxed array with the error number as the first element and a string description as the second:

  'doesnotexist.so puts > x *c' cd <'hello'
|domain error: cd
|   'doesnotexist.so puts > x *c'    cd<'hello'

   cderx ''
┌─┬──────────────────────────────────────────────────────────────────────────┐
│0│doesnotexist.so: cannot open shared object file: No such file or directory│
└─┴──────────────────────────────────────────────────────────────────────────┘

Memory

To allocate, free, read and write C memory in J requires use of the memory management foreign verbs. An example of an FFI call that returns a string is getenv:

  '/lib/x86_64-linux-gnu/libc.so.6 getenv *c *c' cd <'HOME'
┌───────────────┬────┐
│140735847611736│HOME│
└───────────────┴────┘

Note here that the returned value is a number. This represents a pointer to the string that getenv returns. To see the string result from this pointer we need to use the J verb that deals with reading raw memory.

  memr 140735847611736 0 _1 2
/home/myuser

'15!:1', or memr as it is defined in the standard library, reads raw memory. It takes an array as an argument with the elements of the array being:

  • The numeric pointer to the memory
  • The offset into that memory
  • The number of elements to read. A _1 means read to the first NUL byte which is useful for C strings and is used in this getenv example.
  • An optional type argument for the data to be read. The default is 2 which is used explicitly above. It can be 2, 4, 8 or 16 for char, integer, float or complex respectively. The count parameter is number of elements of the given type - it takes the size into account.

'15!:3', or mema, allocates memory. It takes a length argument for number of bytes to allocate and returns 0 on allocation failure:

  mema 1024
23813424

The numeric result return is a pointer address that can be passed to '*' parameters in the FFI calls.

'15!:4', or memf, frees memory allocated with mema. It takes the address as an argument and returns '0' on success or '1' on failure:

  memf 23813424
0

'15!:2', or memw, writes to memory given a pointer. It is the inverse of memr and is dyadic. The left hand argument is an array of data to write, the right hand argument similar to that defined by memr above:

  mema 32
22371984

  'hello' memw 22371984 0 6 2
  memr 22371984 0 _1 2
hello

  '/lib/x86_64-linux-gnu/libc.so.6 puts > x *c' cd <<22371984
6

  memf 22371984
0

Notice the double boxing of the pointer argument to the puts call. This is what enables J to recognise it as a pointer parameter rather than an integer argument. It is a boxed array containing a boxed integer.

The count argument passed to the memw call can be one greater than the length of the string in the left argument if the type of the memw call is 2 (the char type). In that case a NUL byte is appended which is what we use in the example above.

Here's another example using sprintf:

  mema 1024
24110208

  '/lib/x86_64-linux-gnu/libc.so.6 sprintf > x * *c *c x' cd (<24110208);'string is: %s %d\n';'foo';42
17

  memr 24110208 0 _1
string is: foo 42

  memf 24110208
0

Callbacks

The foreign function conjunction '15!:13', or cdcb, is used for defining a callback function that can be passed to a C function and have it call back to J. It is a monadic verb that takes a number as an argument for the number of arguments that the callback function uses. It returns a pointer to a compiled function that can be used as the callback.

When that function is called it will do argument conversion and call a J verb called cdcallback with the arguments passed as a boxed array. The following demonstrates how this works by defining some callback functions with a definition of cdcallback that prints its output to the REPL:

cdcallback =: monad define
  y (1!:2) 2
  0
)

f1 =: cdcb 1
f2 =: cdcb 2
f3 =: cdcb 3

  ('0 ',(": f1),' > n x') cd <100
100
0

   ('0 ',(": f2),' > n x x') cd 100;200
100 200
0

   ('0 ',(": f3),' > n x x x') cd 100;200;300
100 200 300
0

In this example cdcallback is implemented as a monadic function that passes its argument, 'y', to the foreign conjunction for writing to the screen, '1!:2'. It returns '0' as a result.

Three callback functions are created for one, two and three arguments respectively. These return C pointers to the equivalent of:

int f1(int);
int f2(int, int);
int f3(int, int, int);

Then we call the functions using cd. The declaration string is built dynamically by concatenating the function address into it. This uses the default format monadic verb, '":'. Note how the first parameter in the delcaration string is '0', which as described above means a function pointer is expected instead of a function name. That's what we're concatenating in:

  ('0 ',(": f1),' > n x')
0 140277582398804 > n x

Calling the callback function prints to the screen the argument array. Knowing this we can start to test with a function that requires a callback. For that example I'll use the qsort C function:

void qsort(void *base, size_t nmemb, size_t size,
           int (*compar)(const void *, const void *));

We can test with our existing callback that just displays the argument output:

  '/lib/x86_64-linux-gnu/libc.so.6 qsort n *l x x *' cd (1 2 3 4);4;8;<<f2
30492672 30492676
30492680 30492684
30492672 30492680
30492676 30492680
0

For qsort I've defined it as taking an array of integers, an integer for the number of members, an integer for the size of each member and a pointer. For arguments I pass an array of integers (the J array is automatically converted to a C array), '4' for the number of members, '8' for the size of each member (eight bytes each) and a boxed number for the pointer to the callback function that takes two arguments.

When run it prints out the two arguments each time the callback is called. Because qsort expects the callback to take pointers as arguments they appear as integers. Now we need to write a callback that dereferences those arguments and returns the result of a comparison.

We can use memr to read the integer value from memory and the '{' dyadic verb to retrieve elements from the array ('x { y' returns the xth element of y).

  cdcallback =: monad define
    lhs =. 0 { y
    rhs =. 1 { y
    i1 =. memr lhs,(0 1 4)
    i2 =. memr rhs,(0 1 4)
    diff =. i1 - i2
    (i1,i2,diff) (1!:2) 2
    if. diff < 0 do. diff =. _1
    elseif. diff > 0 do. diff =. 1
    end.
    diff
  )

Running this with our qsort call gives:

  '/lib/x86_64-linux-gnu/libc.so.6 qsort > n *l x x *' cd (3 7 1 4);4;8;<<f2
3 7 _4
1 4 _3
3 1 2
3 4 _1
7 4 3
0

This looks good! But how do we get the result back? qsort modifies in place, and when converting array arguments J will copy the array to a C array and then copy the results back. To look at the results we need to remove the '>' attribute in the declaration to view the modified parameters:

  '/lib/x86_64-linux-gnu/libc.so.6 qsort n *l x x *' cd (3 7 1 4);4;8;<<f2
3 7 _4
1 4 _3
3 1 2
3 4 _1
7 4 3
┌─┬───────┬─┬─┬─────────────────┐
│0│1 3 4 7│4│8│┌───────────────┐│
│ │       │ │ ││140277582398867││
│ │       │ │ │└───────────────┘│
└─┴───────┴─┴─┴─────────────────┘

The second element of the boxed array is the sorted array. Some points to note:

  • Notice the use of the if., else., end. control structure in the callback. I couldn't get it to work returning the result of the subtraction of i1 - i2. This resulted in the callback always returning zero. I think there may be some number conversion issue happening with diff not being an integer. Assigning a specific integer resolved this.
  • The callback could be written more concisely using other J features but I wanted to keep it readable given only what was covered in the introduction of this post.
  • Dealing with integer sizes, pointers as integers, and all the conversions going on makes things crash if you get the types wrong.
  • I defined cdcallback as a global variable by using =: as the assignment verb. Using =. would make a local variable. This enables creating a local callback, within the scope of another verb or namespace, just before passing it to the function that requires it.
  • If you need multiple callbacks that do different things active at the same time you need to write the one cdcallback and differentiate inside that somehow, maybe based on the number or type of arguments, to decide what action needs to be performed.

Conclusion

Using the FFI from J isn't difficult but requires care. Using callbacks is more complicated and not very well documented but it's possible. Searching the example J code for cdcb or 15!:13 will find example usage. There are other undocumented FFI routines but I haven't looked into them yet.

For more information on J, including tutorials, books, mailing lists, etc there is:

Tags


This site is accessable over tor as hidden service mh7mkfvezts5j6yu.onion, or Freenet using key:
USK@1ORdIvjL2H1bZblJcP8hu2LjjKtVB-rVzp8mLty~5N4,8hL85otZBbq0geDsSKkBK4sKESL2SrNVecFZz9NxGVQ,AQACAAE/bluishcoder/-44/


Tags

Archives
Links