phantom-mri-dicoms/README.md

# Phantom MRI study dataset

This dataset tracks all phantom MRI acquisitions done for the TRR379
for the purpose of validating the Q01 protocol at all sites.

## Key aspects of this setup

### Session labels are pseudonymized identifiers

This first layer of personal data protection reduces the chances
of participant identifiers appearing as part of file/path names.

Each session or acquisition is placed into a directory/dataset \
in `sessions/` that is given a project-internal pseudonymous identifier
as its directory name.

### Multi-project ID mapping

The top-level `id_map.tsv` is a tab-separated table, which maps
session source identifiers to any number of contexts. The source
identifier corresponds to the directory name for a DICOM dataset in the
`sessions/` directory. This is the value in the first column of each
table row. Every subsequent column define the ID mapping to a different
context. The context label is defined in the header row.

A script to perform "re-identification" from a particular context
is provided at `code/reidentify.py`. It can be used like this

```bash
python3 code/reidentify.py id_map.tsv q01 AP001
```

The script returns the source identifier linked to the `q01` identifier
`AP001`.

The file `id_map.tsv` is an annexed file. Once the last copy of this file is
destroyed, identifier-based re-identification is no longer possible
(a precondition for data anonymization).