phantom-mri-dicoms/README.md

38 lines
1.4 KiB
Markdown

# Phantom MRI study dataset
This dataset tracks all phantom MRI acquisitions done for the TRR379
for the purpose of validating the Q01 protocol at all sites.
## Key aspects of this setup
### Session labels are pseudonymized identifiers
This first layer of personal data protection reduces the chances
of participant identifiers appearing as part of file/path names.
Each session or acquisition is placed into a directory/dataset \
in `sessions/` that is given a project-internal pseudonymous identifier
as its directory name.
### Multi-project ID mapping
The top-level `id_map.tsv` is a tab-separated table, which maps
session source identifiers to any number of contexts. The source
identifier corresponds to the directory name for a DICOM dataset in the
`sessions/` directory. This is the value in the first column of each
table row. Every subsequent column define the ID mapping to a different
context. The context label is defined in the header row.
A script to perform "re-identification" from a particular context
is provided at `code/reidentify.py`. It can be used like this
```bash
python3 code/reidentify.py id_map.tsv q01 AP001
```
The script returns the source identifier linked to the `q01` identifier
`AP001`.
The file `id_map.tsv` is an annexed file. Once the last copy of this file is
destroyed, identifier-based re-identification is no longer possible
(a precondition for data anonymization).