phantom-mri-dicoms/README.md

1.4 KiB

Phantom MRI study dataset

This dataset tracks all phantom MRI acquisitions done for the TRR379 for the purpose of validating the Q01 protocol at all sites.

Key aspects of this setup

Session labels are pseudonymized identifiers

This first layer of personal data protection reduces the chances of participant identifiers appearing as part of file/path names.

Each session or acquisition is placed into a directory/dataset
in sessions/ that is given a project-internal pseudonymous identifier as its directory name.

Multi-project ID mapping

The top-level id_map.tsv is a tab-separated table, which maps session source identifiers to any number of contexts. The source identifier corresponds to the directory name for a DICOM dataset in the sessions/ directory. This is the value in the first column of each table row. Every subsequent column define the ID mapping to a different context. The context label is defined in the header row.

A script to perform "re-identification" from a particular context is provided at code/reidentify.py. It can be used like this

python3 code/reidentify.py id_map.tsv q01 AP001

The script returns the source identifier linked to the q01 identifier AP001.

The file id_map.tsv is an annexed file. Once the last copy of this file is destroyed, identifier-based re-identification is no longer possible (a precondition for data anonymization).