Standard, containerized Q01-protocol MRI BIDS-conversion pipeline (apptainer and Docker)

Find a file

Michael Hanke cd78902bb0 feat: podman run setup This matches what the docker adaptor is doing for execution. It gets the container image ID by loading the annex key into podman.		2026-03-18 07:50:26 +01:00
.datalad	feat: podman run setup	2026-03-18 07:50:26 +01:00
apptainer	[DATALAD RUNCMD] Ingest apptainer image from ReproNim for self-hosting	2025-08-29 16:01:38 +02:00
docker-archive	[DATALAD RUNCMD] chore: export image with/for podman too	2026-03-18 07:02:54 +01:00
oci	[DATALAD] Configure containerized environment 'docker'	2025-08-22 06:59:51 +02:00
repronim-containers@08524b0aff	[DATALAD] Added subdataset	2025-08-28 08:03:14 +02:00
.gitattributes	[DATALAD] new dataset	2025-08-22 06:58:55 +02:00
.gitmodules	[DATALAD] Added subdataset	2025-08-28 08:03:14 +02:00
README.md	feat: support use with podman too	2026-03-17 20:48:25 +01:00

README.md

Heudiconv container utility dataset

This is a DataLad dataset providing a Heudiconv container. It can be used to perform reproducible conversion of TRR379 DICOM data to a BIDS-compliant format. This homogenization is essential for implementing consortium-wide data aggregation and analysis

The currently provided heudiconv version is: 1.3.3

At present, only container configurations for apptainer/singularity and Docker are provided. However, additional configurations for other container engines, such as podman can be added upon request.

Example usage

The following example shows a complete conversion. It requires the DataLad software to be installed. For example, install with uv.

uv tool install datalad --with-executables-from datalad-next --with datalad-container

Assumptions regarding source data organization

It is assumed that the DICOM "sourcedata" is organized such that DICOMs of individual sessions are tracked in individual DataLad datasets (one per sessions) that are themselves subdatasets of a single DataLad (super)dataset tracking all session datasets. An example of such an organization is the TRR379 phantom MRI dataset.

Importantly, the names/paths of the session subdatasets use pseudonymized subject/session identifiers only to avoid leaking sensitive information in BIDS conversion provenance records.

Prepare the BIDS dataset

The key idea is to use this container dataset and the DICOM source dataset as "dependencies" of another (new) dataset that will receive the converted image data:

# create a new dataset to recieve data in BIDS format
datalad create bids
cd bids

# add a dataset with DICOMs as `sourcedata/`
datalad clone -d . https://hub.trr379.de/q01/phantom-mri-dicoms.git sourcedata

# add this container dataset as a dependency
datalad clone -d . https://hub.trr379.de/q02/heudiconv-container.git code/heudiconv

# create a dedicated subdataset for managing the heudiconv per-acquisition
# overrides (contain personal data). Alternative approaches are
# - cover in .gitignore
# - keep in annexed files and not share their content
datalad create -d . .heudiconv

Run conversion

In order to select the source data and target destination for the DICOM conversion, we define a shell variable. It is used in the example calls below. In order to convert data from different sessions, only this variable need to be set. The actual command (using these settings) is identical every time.

export SRC=AP001-001

This setting is followed by the container execution command for one of the following supported containers. These are identical, except for the container image selection.

Apptainer/Singularity

datalad containers-run \
  -m "Convert session ${SRC} data" \
  -n code/heudiconv/apptainer \
  -o "sub-${SRC%-*}/ses-${SRC#*-}" \
  -o ".heudiconv/sub-${SRC%-*}/ses-${SRC#*-}" \
  -i sourcedata/code/heuristic-q01.py \
  -i "sourcedata/sessions/${SRC}" \
  -- \
  --bids notop --overwrite --minmeta \
  -o . \
  -f '{inputs[0]}' \
  -s "${SRC%-*}" -ss "${SRC#*-}" \
  --files sourcedata/sessions/${SRC}

Functionality was tested with apptainer v1.4.0.

For use with singularity replace the with -n code/heudiconv/singularity.

Docker/Podman

datalad containers-run \
  -m "Convert session ${SRC} data" \
  -n code/heudiconv/docker \
  -o "sub-${SRC%-*}/ses-${SRC#*-}" \
  -o ".heudiconv/sub-${SRC%-*}/ses-${SRC#*-}" \
  -i sourcedata/code/heuristic-q01.py \
  -i "sourcedata/sessions/${SRC}" \
  -- \
  --bids notop --overwrite --minmeta \
  -o . \
  -f '{inputs[0]}' \
  -s "${SRC%-*}" -ss "${SRC#*-}" \
  --files sourcedata/sessions/${SRC}

Functionality was tested with docker v26.

For use with singularity replace the with -n code/heudiconv/podman.

Updating a BIDS dataset with new sessions

The new session become available via an update of the DICOM data superdataset that is registered under sourcedata/. This registration needs to be updated and pointed to the new version.

The following example assumes a clean work space, and pulls the current BIDS dataset from its hosting location.

# this is an example URL!
datalad -c annex.private=true clone -d . https://hub.trr379.de/q02/phantom-mri-bids
cd phantom-mri-bids

In order to update the DICOM sessions dataset, we need to obtain its repository locally. We do so without pulling down all data.

# this obtains the currently registered state
datalad get -n sourcedata

Now we instruct DataLad to fetch an updated state of sourcedata/. If there is one, the subdataset is updated and the new state is recorded in the top-level BIDS dataset.

datalad update -r --how-subds ff-only sourcedata

In order to run the BIDS conversion (see above) for newly available sessions, we need to get the subdataset with the HeuDiConv container, to make it known to datalad containers-run.

datalad get -n code/heudiconv

Now the dataset is ready for adding new BIDS-conversion outputs. Once done, the converted data files and dataset updates need to be pushed back to the hosting services:

datalad push

Afterwards the workspace can be cleaned:

cd ..
# example dataset name!
datalad drop --what all -r -d phantom-mri-bids

Update DICOM sessions dataset

Whenever a new scan is made, the dataset tracking all MRI sessions needs to be updated. Here is a sketch how this is done, again assuming a clean workspace.

# clone the dataset from its hosting location (example URL)
datalad -c annex.private=true clone https://hub.trr379.de/q01/phantom-mri-dicoms.git

Two basic types of changes need to be distinguished: the addition of a session, and the (rarer) modification of a session dataset. Adding a new session dataset is done via:

# example URL!
datalad -c annex.private=true -C phantom-mri-dicoms clone -d . \
  https://hub.trr379.de/q01/phantom-mri-dicom-aachen-2.git
  sessions/AP001-002

If a session dataset state need to be updated, we need to retrieve the dataset, and then fetch its updates:

datalad -C phantom-mri-dicoms get -n sessions/AP001-002
datalad -C phantom-mri-dicoms update -r --how-subds ff-only

In both cases, the updates need to be pushed to the hosting service, before the local workspace can be cleaned.

datalad -C phantom-mri-dicoms push
datalad drop --what all -r -d phantom-mri-dicoms