Jackdmp for multi-processor machine


jackdmp is a C++ version of the Jack low-latency audio server for uniprocessor and multiprocessor machines. It is a new implementation of the jack server core features that aims in removing some limitations of the current design. The activation system has been changed for a data flow model and lock-free programming techniques for graph access have been used to have a more dynamic and robust system.

Belorussian translation here : http://webhostinggeeks.com/science/jackdmp-be


JACK2

Future JACK2 will be based on jackdmp codebase. Jack 1.9.8 is the "renaming" of jackdmp and the result of a lot of developments started after LAC 2008. It is now distributed as a source package including documentation:

General publications on the design are available here. A technical paper on new "profiling tools" can be found here.


Old packages: the following packages contain the source code for Linux, OSX and Windows and binary versions for OSX and Windows. A more technical documentation is also included:


Getting the sources

Jackdmp code in now available on Git at: https://github.com/jackaudio/jack2

and for those with write access: git clone git://github.com/jackaudio/jack2.git


JACK2 "pipelining" experimental version

Jackdmp current implementation allows explicit parallel clients in a graph to be processed on several available CPU at the same time. A typical case is:

A and B depend of the same input (the "in" driver in this case) and can be activated on 2 processors, C waits for A and B outputs.

We are experimenting with the "pipelining" idea to allow sequential graphs (like in ==> A ==> B ==> out) to be computed on multi-core machines in a more efficient way. The idea is to cut the D driver buffer size into N several sub parts and run the entire graph with N buffers of D/N size, for example: take a driver buffer size of 1024 frames, with N = 4, the graph is processed with buffer of 1024/4 = 256 frames. With the previous graph we have the activation sequence for an entire driver cycle:

when X(n) represent the index of the 256 frames buffer used in the processing. Processing can now be done on several cores at the same time.

The activation model has been generalized a bit to handle this case. There is a new -D parameter in jackdmp command line to specify the value of N. Two new "jack_set_buffer_divisor/jack_get_buffer_divisor" functions have been added in the API to allow dynamic change of the divisor (= N) parameter in a running graph. A corresponding "jack_bufdivisor" client is now compiled.

Testing : the pipelining branch is available here: svn co http://subversion.jackaudio.org/jack/jack2/branches/pipelining

WARNING, WARNING !!

When used with output ports, the "jack_port_get_buffer" function was assuming that the buffer address could be cached by the client. This is not the case anymore in the pipelining branch version. The "jack_port_get_buffer" must now be used each time in the process callback to retrieve the correct buffer address. The consequence of this change is that some very famous jack applications (like Ardour or Hydrogen) get broken! (April 2009 : Ardour 2.8 is now fixed).

The code has been tested on OSX with some heavy CoreAudio jackified applications and the jack DSP load drops as expected as soon as the divisor gets > 1. Typical tests were done with driver buffer size = 512 or 1024 and N = 4 or 8.


Jackdmp "direct" experimental version

When several jack clients are opened in a same process (either the server or a separated process), the cost of context switch between clients can be reduced by replacing the more costly fifo based client activation system by a more efficient direct call to the client audio Process callback. This is especially interesting when clients are connected in sequence. With the following graph of clients in a same process: in => [C1 => C2 => C3 => C4] => out, C1 will be resumed first by the jack server by the standard fifo based system but C2, C3, C4 Process callback will directly be called in C1 Real-Time thread. For any arbitrary graph with possible parallel sub-graph, the system activates any sequential path in the graph with this direct call method and resume parallel parts using the fifo based activation system.

Results: On a 4 cores Intel Mac Pro running jack at 64 frames with 10 in-process "metro like" clients connected in sequence, with the normal activation scheme jack CPU load is 12% and 4% only with the direct call activation scheme.

Testing : the direct branch is available here: svn co http://subversion.jackaudio.org/jack/jack2/branches/direct. A new -C parameter allows to activate "direct call" mode. By default "direct call" mode is off.