# Configuring the pipeline

This document is an extension of the "Create a pipeline". It explains the many
build in configurations that the library offers.

## Permanent file storage

By default all the dataset is kept in memory. That means if the program is
stopped, all data is lost.
To make the pipeline save a copy to disk you need to overwrite an attribute:

```python
class MyPipeline(AbstractPipeline):
  ...
  data_directory: Union[str, Path] = Path('path/to/data/directory')
  ...
```

The pipeline will now save a copy of each dataset, and delete them when
it's done with the dataset. The file structure produced looks like this:

```text
  data_directory / {\$patient_identifier_tag} / {\$input_arg_name_1} / Image_{\$image.SeriesDescription}_{\$image.instance_number}.dcm
                                                                           ... / Image_{\$image.SeriesDescription}_{\$image.instance_number}.dcm
                                               / {\$input_arg_name_2} / Image_{\$image.SeriesDescription}_{\$instance_number}.dcm
                                                                           ... / Image_{\$image.SeriesDescription}_{\$image.instance_number}.dcm
                                           ... / ...
                 / {\$patient_identifier_tag} / {\$input_arg_name_1} / Image_{\$image.SeriesDescription}_{\$image.instance_number}.dcm
                                                                           ... / Image_{\$image.SeriesDescription}_{\$image.instance_number}.dcm
                                               / {\$input_arg_name_2} / Image_{\$image.SeriesDescription}_{\$image.instance_number}.dcm
                                                                           ... / Image_{\$image.SeriesDescription}_{\$image.instance_number}.dcm
                                           ... / ...
             ... / ...
```

The `patient_identifier_tag` is another `AbstractPipeline` attribute, which the
pipeline uses to separate images belonging to differing "batches".
The tag prevents that, if you send a PET from a patient and a CT from another
then both series are accepted, but processing doesn't start on these series.
The value of this tag is shared(equal) among all images used to generate the
`InputContainer` of the processing function.

The tag defaults to the tag PatientID. The `input_arg_name` is a key in the
`input` directory, and each file is an image stored in the input instance.

## Logging

A pipeline creates a logger by default using python standard library, you can modify the following properties to make the logger behave like you want:

```python
class MyPipeline(AbstractPipeline):
  number_of_backups: int = 8
  "Number of backups before the os starts deleting old logs"

  log_date_format = "%Y/%m/%d %H:%M:%S"
  "String format for timestamps in logs."

  log_output: Optional[Union[TextIO, Path, str]] = stdout
  """Destination of log output:
  * `None` - Disables The logger
  * `TextIO` - output to that stream, This is stdout / stderr
  * `Path | str` - creates a rotating log at the path
  """

  log_when = "w0"
  "At what points in time the log should roll over, defaults to monday midnight"

  log_level: int = logging.INFO
  "Level of Logger"

  log_format: str = "%(asctime)s %(name)s %(levelname)s %(message)s"
  "Format of log messages using the '%' style."

  pynetdicom_logger_level: int = logging.CRITICAL + 1
  """Sets the level pynetdicom logger, note that traceback from
  associations are logged to pynetdicom, which can be helpful for bugfixing"""

  ...
```

The logger is injected into most sub-libraries.

## Customizing outputs

Sometimes you want to create a report supplementing an image series or you want
to send data over some other form communication protocol. In that case you need
to start customizing the output


## Dynamics Inputs

Sometimes you want multiple series in the same input. For instance if you
wanted to take an average of multiple series. For this purpose you have:
`dicomnode.server.input.DynamicInput`, which differs from a normal input, that
it has a `separator_tag`. Each accepted dataset will be placed in a bucket
(leaf) based on the string conversion on the `separator_tag`.

All leafs are stored in the `leafs` attribute, which you should use when you
validate, so for example this code checks that all leafs are valid and there's
at least 3 leafs:
```python
  ...
  def validate(self):
    return all(validate_leaf(leaf) for leaf in self.leafs) and len(self.leafs) > 2
  ...
```
Each leaf is a `dicomnode.data_structures.storage.Storage` which is a very
simple class, that doesn't support much more than iteration of the dataset
stored in it.

When you grind, you always get a dictionary with key-value pairs, where the key
is the string key, while the value is the ground image from the inputs grinder.

## Historic Inputs

Sometimes you want historic images to compare against the current image.
Dicomnode has a build-in input for just that:
`dicomnode.server.input.HistoricAbstractInput`. However sadly this is not plug
and play solution as many things can go wrong. Hence this is why it has it own
section. So please read this section carefully.

In this section I'll refer the **pivot** series to be the dicom series that the
HistoricAbstractInput uses data from to generate and send it's C-FIND and C-MOVE
DIMSE messages. The retrieved data will be referred to as **historic** series.

In general it's assumed that the studies we retrieve will go into the historic
input and not in the normal abstract inputs, while the historic input will only
contain historic datasets.

For those with limited dicom terminology a `SCP` is just a program, that you can
send C-FIND's and C-MOVE's to and it responds like expected. These are often
specialized databases that work with dicom images.

### Goal

So inside of your pipeline you would create the following input:

```python
from datetime import date
from typing import Dict
from pydicom import Dataset

  ... # Rest of the Dicomnode node configuration:
  input = {
    "SERIES_TYPE_1" : SERIES_1_INPUT, # Abstract Input Subtype
    "SERIES_TYPE_2" : SERIES_2_INPUT, # Abstract Input Subtype
    "HISTORIC"      : HISTORIC_INPUT, # Historic Abstract Input Sub type
  }

  class Processor(AbstractProcessor):
    def process(self, input_container):
      historic: Dict[date, Dict[str, Dataset]] = input_container["HISTORIC"]

    # Where the date is the study date and the string is the series description.
```

### Assumptions and restrictions

This components requires a lot of moving parts to work together, which in turn
impose restrictions:

* All datasets have the 0x0008_0020 `StudyDate` Attribute and all historic
  Series have the `SeriesDescription` Attribute
* All non historic input datasets have the same `StudyDate`
* Patients do not have two historic studies on the same day
* You have configured another SCP to accept C-FIND, C-MOVE from the node. In
  normal lingo: The Dicomnode you create must be able to retrieve dicom datasets
* You can retrieve all the data you need over a single association, and a single
  C-FIND, but multiple C-MOVE's
* The Pipeline is configured such that the `PipelineTree` batch historic and
  current studies. (You didn't change `patient_identifier_tag` node attribute
  to StudyInstanceUID as example)

If these Assumption or restrictions are backbreaking - Send patches with the
whine.

### Modification to non-historic inputs

The "common" use case that the library attempts to handle is: Comparison between
old and new series of the same type. This poses a problem, where your normal
input would accept and add both the historic and current series, which is not
desirable because by default inputs assumes that there's only a single series in
them. You also cannot solve this by adding `enforce_single_series` because if
the historic series is added first, then the current series will be rejected.

To solve this issue you must add `enforce_single_study_date = True` to all non
historic inputs. When any non historic input accepts an image, the study_date
attribute is set for ALL abstract inputs.


### A birds eye view of `HistoricAbstractInput`

A historic input has three different states. Empty, Fetching, Filled

* **Empty** - The Input is initialized, but doesn't know which patient to query
  for.
* **Fetching** - The Input have an active association with a SCP
* **Filled** - The Input have closed it association with the SCP.

Lets get an overview of the entire process with a historic Input:

![Data transfer diagram](../_static/historic_input_timeline.svg)

An external source sends data to our service, and a `PatientContainer` is
created for our patient with an **Empty** `HistoricInput` and some other inputs

The Historic gets a dataset and, sets it status to **Fetching** spawns a thread,
that generates a query Dataset and sends a C-FIND to the SCP. We get some
answers and we pick the studies that we need, and send a C-MOVE for the studies
we need. The connection closes and we set the Input as filled.

### The nitty-gritty implementation details

#### Historic Attributes

A historic input have different requirements to a standard abstract input.
First of all a historic endpoint need an `address` attribute of type:
`dicomnode.dicom.dimse.Address`. This address must accept C-FIND and C-MOVE from
the node using the nodes AE title. The node must also have been created as a
move destination in the SCP.

Unlike normal inputs you should **NOT** overwrite the validate function. A
historic input validates to true when It has finished a single connection to the
`SCP`.

#### C-FIND, C-MOVE - Query and Retrieve

To retrieve a history, you need to use the two DICOM Message Service Element
**(DIMSE)** commands:

* C_FIND - Used to index for the historic studies.
* C_MOVE - Used to transfer the studies from your archive to the dicomnode for
  processing.

When the pivot dicom datasets are send from the source, the `add_image` is
called for all `AbstractInput`s are called as normal, but The historic inputs
`add_image` is a tad different. Instead of trying to add the image to itself,
it calls:
`check_query_dataset(self, current_study: Dataset, query_dataset: Optional[Dataset] = None) -> Optional[HistoricAction, Dataset]`
Where the `current_study` argument is the dataset send by the source.

This function purpose is to determine if the "current_study" should send a dimse
message or not. It should return one of three things:

* None - The current study should not send any DIMSE message
* Tuple[HistoricAction.FIND_QUERY, Dataset] - Dicomnode should send a C-FIND
  with the dataset as the querying dataset
* Tuple[HistoricAction.MOVE_QUERY, Dataset] - Dicomnode should send a C-MOVE
  with the dataset as the querying dataset

`check_query_dataset` is called once from pivot series with `query_dataset=None`
and for each C-FIND response with the `query_dataset` equal to the dataset used
for the query. You can (and should) chain C-FINDs together such that you only
request the series you need.
This is because most DICOM studies contain a lot of auxiliary data, that in best
case just takes extra time.