The Quick fix

Sadly from version 0.0.21 breaks userspace. Here os the quick fix:

# Version <0.0.21

class Pipeline(AbstractPipeline):
  ...

  attribute = ...

  def process(self, input_container):
    self.attribute
    ...

# Version 0.0.21 and beyond

from dicomnode.server.processor import AbstractProcessor

ATTRIBUTE = ...

class Pipeline(AbstractPipeline):
  class Processor(AbstractProcessor): # Very import to name it Processor!
    def process(self, input_container):
      ATTRIBUTE

Note that there’s some new edge-cases that might hit you, although I consider this unlikely

The in-depth explanation

The remain of this document will just explain what changed and why. It contains no additional “user” documentation, but if you want to know what happens under the hood, that is what the remaining of this document will devote itself to.

The root of the problem

A process has a “current working directory” which you can get with a: os.getcwd() call. This is the reference point for all non absolute files look up a program makes. In other words, if you do a nibabel.load("file.nii.gz") the operating system will look in the current working directory and look for a file names file.nii.gz in the current working directory and return from there.

Now Dicomnode assumes that you do not clean after yourself, meaning if you produce intermediate files, then dicomnode cleans them up, when you are done processing the patient.

The way it does this is by creating a directory, then changing its current working directory to the newly created directory. When the processing is complete dicomnode deletes the directory and all the intermediate files that have been produced.

As stated above a process only has a single current working directory that means if another process sends data to the dicomnode, the archiving will be incorrect because the newly send files would be stored inside of a temporary directory, which is bad.

So the way that dicomnode maintains this illusion is by spawning a new process for processing the data data. That way the program maintains this illusion.

A small lesson on os-primitives

So how does operating system create a new process? There’s two primitives for doing this:

spawn : Create a new runtime and run the program in it.
fork : The newly created process gets a read only view of the old process. If it changes anything, then the operating system will create a new copy of the data.

Now fork have been the “default” for spawning new processes because it’s much faster. See most programs have large amount of “read-only” data. Think about libraries like numpy, scipy, matplotlib, but also operating system libraries like the C standard library, rendering engines and so forth.

To put in measurable terms. A spawn call takes an additional ~1.5 seconds per process created compared to fork, but this number may be even worse if you need to initialize tensorflow or pytorch for instance.

The fork problem

Dicomnode have been using fork up to version 0.0.20, but this is introduces some problems.

First of all - It’s depreciated, meaning that python will in the future no longer support it.

But more importantly with fork dicomnode might deadlock. When forking, everything is copied including other threads! Now this is a pretty big problem, because these threads might be locked. They might be waiting on acquiring a lock to store a dataset in the dicomnode or post a log record. If the thread is copied in a locked state the operating system will not wake the thread. This means, that the child process will never finish, and because dicomnode waits on the process to finish, it will wait forever and therefore deadlock.

The fundamental differences between fork and spawn

Sadly fork and spawn are different in functionality. With the fork call you have access to the memory and the previous instruction so you can do this:

def target(arg):
  arg.call()

class Arg
  def __init__(self):
    def local_function():
      print("hello world")

    self.call = local_function



arg = Arg()

multiprocessing.get_context('fork').process(target=target, args=(arg,)) # OK
multiprocessing.get_context('spawn').process(target=target, args=(arg,)) # NOT OK

You cannot send an object that have locks to new processes. Dicomnode have a neat little function dicomnode.lib.utils.is_picklable you can use to check if you can send an object to another process.