Dicom For Dummies
Digital Imaging and Communications in Medicine is a file format and a message protocol owned and partly developed by the National Electrical Manufacturers Association (NEMA). It’s primary file format for medical image equipment. An Analogy is dicom is to medical equipment, what a pdf is to a printer.
The Dicom standard what an should be capable of doing and can be found at:
https://www.dicomstandard.org/current
If you click a little bit around in it, you’ll find that it’s big, verbose and not very readable. This is because it not very restrictive, yet at the same time tries to standardize optional content. It also means there’s A LOT of details and finical details that’ll be missing for the guide. This guide is mostly focused around usage of Dicom files, and thus many technical details will be left out.
A warning: Images produced by medical equipment may be in the dicom format, but might not comply with the dicom standard. It’s your application responsibility to check, that the images you receive and produce comply with the dicom standard. There’s a number of tools in library to help with this job.
This is a python library, that builds on top of the python library pydicom https://pydicom.github.io/pydicom/stable/ and getting familiar with that library will definitely help you.
The Dicom file
A dicom file is a dictionary with integers keys and just about anything as values. To get started using pydicom you can create an empty dataset the following way:
from pydicom.dataset import Dataset
dataset = Dataset()
Value representation
A value representation (VR), informs on how to interpret the ones and zeroes forming the value. For instance the patient name tag have a VR of PN. Which means that the program should assume the value is formatted as string. However the dicom standard also imposes additional restrictions upon that string: Namely it should be formatted as: family name complex, given name complex, middle name, name prefix, name suffix. Where each component is separated by a ‘^’ character.
It also specifies that a each component is a maximum of 64 characters.
From experience this is the most often broken part of the dicom standard, where programs simply just store the name as a string, instead of formatting it.
All the Value representation can be found at https://dicom.nema.org/dicom/2013/output/chtml/part05/sect_6.2.html
Sequences
There’s a few very special VR, and one of them is the Sequence VR SQ. A Sequence is a list of zero or more dicom objects stores inside of the tag. This is often used to store associated values, which doesn’t fit within a single tag. It’s clearly specified in the standard that each dicom object of a sequence may have different tags, however for your own sanity make sure that each object of a sequence have the same tags.
UID
Most dicom objects have a few unique identifiers (UID) which determines something uniquely about the image, There’s a few UIDs that is important to know about:
SOPInstanceUID - This is the ID of the image, it should uniquely determine the image.
SeriesInstanceUID - Medical images are often related and belongs to series. For instance in a tomogram and then each image would belong to the same series.
StudyInstanceUID - This relates all pictures which relates to same study. if some post processing processing have been applied to some images, they would have the same StudyInstanceUID while having different SeriesInstanceUIDs.
SOPClassUID - This describes what type of image the file is. But more on this later.
You can determine the origin of a picture by looking at the prefix of the instance UID.
For instance the prefix 1.2.826.0.1.3680043.10.1083 indicates that the image have been generated by dicomnode library.
To generate an UID use the gen_uid method: In the following example it generate UIDs for a series:
from dicomnode.lib.dicom import gen_uid
SeriesUID = gen_uid()
for dataset in datasets:
dataset.SeriesInstanceUID = SeriesUID
dataset.SOPInstanceUID = gen_uid()
Value multiplicity
A tag also have a value multiplicity (VM). This indicates how many instances of the values should exists. For instance the tag 0x00280034 describes how what the aspect ratio of the underlying picture. So if the stored picture is HD, then it has a 16:9 aspect ratio and the values for the tag 0x00280034 should be set to [16,9]. A value multiplicity might be a range. For instance if a tag has value multiplicity of 1-n that means there can be any number of values associated with the tag.
Dicom Image
All dicom images should have a SOPClassUID, this value informs what type of image the file is. The list of valid classes can be found at https://dicom.nema.org/dicom/2013/output/chtml/part04/sect_B.5.html. For each type of image a number of modules is either Mandatory (M), Conditional (C) or optional (U). Each module consists of a set number of tags, which is again either required, required but no value is required or optional. Again a tag may be conditionally required or conditionally required with out value.
It is highly recommended to use the public tags if applicable and being verbose in filling public tags. In other words, if you can fill a tag within a module with some valid data, it’s recommended you do so. This is because other dicom application might have some functionality dependant on optional tags.