WIP: Auto-generate contributor pages from the Pool #14

Closed
msz wants to merge 1 commit from msz/www.trr379.de:person-pool-preview into main
Contributor

Given that the TRR379 maintains a metadata collection in the TRR379 Knowledge Pooling Tool, and the collection includes Person records, it feels appropriate to use the public collection as a source for generating contributor pages (similar to what's now done for publications, see #13) -- the Pool records contain information very similar to what has been entered in the YAML header of the markdown documents. This approach should reduce duplication of data entry, by centralizing editing of profile information on the Pool.

This pull request applies the information extracted from the Pool using code in q02/pool-publication-page#2 (currently 165 LOC). The goal is to assess the degree of (in)compatibility of both systems (pool and website) and to decide what improvements should be needed.

In addressing the divergence of information, we need to ask ourselves whether the information not available could be simply added to the pool, or whether there are bigger structural challenges.

A few things which caught my attention:

1: sites may be incomplete

  • Sites in the website are "Aachen" (RWTH Aachen University), "Frankfurt" (Goethe University Frankfurt), etc.
  • In the pool, the sites can be deduced from part_of property.
    • That property can correspond to an unrelated organization (additional affiliation with a non-TRR site), hence filtering against a list of TRR sites is needed. This is done currently, against a list of organizations listed above.
    • However, many records in the Pool do not list those organizations, but rather related organizations (Univ. Clinic Aachen, Univ. Medical Center of JGU Mainz). They could also list child organizations with own RORs (Klinik und Poliklinik für Kinder- und Jugendpsychiatrie). We should probably (vastly) extend the list of organizations, or query ROR for more details (relationships are available in ror.org but not in the Pool). This is currently not done.
  • likely addressable in generation code alone

In the Hugo website, taxonomies are set up so that person and site are bidirectional (site page lists persons, person page lists sites). So far we only create / change person sites.

3. name formatting

  • Hugo template displays contributor's page heading as a combination of params.name-title (used for prefixes like Prof.) and title (used for name without prefixes). It makes sense to keep the title to just name (the title is used e.g. in the Contributors listing). However, this currently does not allow to display honorific suffixes (allowed in the pool).
    • this is likely an easy Hugo change
  • In the TRR concepts, the definition of a honorific name suffix includes 'For example, generation labels ("III"), or indicators of an academic degree, a profession, or a position ("MD", "BA").' In my opinion, the former should actually be glued to the name (John Doe III should not be listed as John Doe) while the latter should be treated like we currently do prefixes (John Doe, PhD can appear as John Doe in a listing).
    • this does not seem to affect the current data

4. titles not available

Related to 3, very few records in the Pool, if any, use honorific_name_prefix. For these reasons, most generated pages "lose" the Prof. Dr. prefix in the title. This needs to be handled tactfully.

This can be addressed by editing Pool records.

5. affiliations: level of detail

  • While not all part_of values are TRR sites (as discussed above), they are likely affiliations. However, the affiliations generated from the Pool are less detailed than the ones which were present in existing pages (e.g. neither ror.org nor the Pool has INM-7, so INM-7 affiliation became an FZJ affiliation).
  • This means we currently lose detailed affiliations.
  • This may be addressable by adding more detailed records to the Pool?

6. unlinked contributors

This PR changed 37 existing pages and added 41 new. However, not all people appear in the contributors page -- likely because they are not listed as contributors in other pages (see 2) -- although their pages exist and are linkable.

This is likely related to the Hugo setup, but may require including more pages in the generation procedure.

Given that the TRR379 maintains a metadata collection in the TRR379 Knowledge Pooling Tool, and the collection includes Person records, it feels appropriate to use the public collection as a source for generating contributor pages (similar to what's now done for publications, see #13) -- the Pool records contain information very similar to what has been entered in the YAML header of the markdown documents. This approach should reduce duplication of data entry, by centralizing editing of profile information on the Pool. This pull request applies the information extracted from the Pool using code in https://hub.trr379.de/q02/pool-publication-page/pulls/2 (currently 165 LOC). The goal is to assess the degree of (in)compatibility of both systems (pool and website) and to decide what improvements should be needed. In addressing the divergence of information, we need to ask ourselves whether the information not available could be simply added to the pool, or whether there are bigger structural challenges. A few things which caught my attention: ### 1: sites may be incomplete - Sites in the website are "Aachen" (RWTH Aachen University), "Frankfurt" (Goethe University Frankfurt), etc. - In the pool, the sites can be deduced from `part_of` property. - That property can correspond to an unrelated organization (additional affiliation with a non-TRR site), hence filtering against a list of TRR sites is needed. This is done currently, against a list of organizations listed above. - However, many records in the Pool do not list those organizations, but rather related organizations (Univ. Clinic Aachen, Univ. Medical Center of JGU Mainz). They could also list child organizations with own RORs (Klinik und Poliklinik für Kinder- und Jugendpsychiatrie). We should probably (vastly) extend the list of organizations, or query ROR for more details (relationships are available in ror.org but not in the Pool). This is currently *not* done. - likely addressable in generation code alone ### 2. site to person link In the Hugo website, taxonomies are set up so that person and site are bidirectional (site page lists persons, person page lists sites). So far we only create / change person sites. ### 3. name formatting - Hugo template displays contributor's page heading as a combination of `params.name-title` (used for prefixes like Prof.) and `title` (used for name without prefixes). It makes sense to keep the title to just name (the title is used e.g. in the Contributors listing). However, this currently does not allow to display honorific suffixes (allowed in the pool). - this is likely an easy Hugo change - In the TRR concepts, the definition of a [honorific name suffix](https://concepts.datalad.org/s/demo-research-assets/unreleased/honorific_name_suffix/) includes 'For example, generation labels ("III"), or indicators of an academic degree, a profession, or a position ("MD", "BA").' In my opinion, the former should actually be glued to the name (John Doe III should not be listed as John Doe) while the latter should be treated like we currently do prefixes (John Doe, PhD can appear as John Doe in a listing). - this does not seem to affect the current data ### 4. titles not available Related to 3, very few records in the Pool, if any, use `honorific_name_prefix`. For these reasons, most generated pages "lose" the Prof. Dr. prefix in the title. This needs to be handled tactfully. This can be addressed by editing Pool records. ### 5. affiliations: level of detail - While not all `part_of` values are TRR sites (as discussed above), they are likely affiliations. However, the affiliations generated from the Pool are less detailed than the ones which were present in existing pages (e.g. neither ror.org nor the Pool has INM-7, so INM-7 affiliation became an FZJ affiliation). - This means we currently lose detailed affiliations. - This may be addressable by adding more detailed records to the Pool? ### 6. unlinked contributors This PR changed 37 existing pages and added 41 new. However, not all people appear in the contributors page -- likely because they are not listed as contributors in other pages (see 2) -- although their pages exist and are linkable. This is likely related to the Hugo setup, but may require including more pages in the generation procedure.
This commit is to assess the extent of changes introduced, and changes
needed.
@ -1,20 +1,21 @@
---
title: Andreas G Chiocchetti
title: Andreas Chiocchetti
Owner

Asking for middle initial feature

Asking for middle initial feature
Author
Contributor

The feature is already present - names will use either formatted_name property or combine given, additional, family. The particular pool record does not contain any additional names.

Should be addressed by updating metadata records.

The feature is already present - names will use either `formatted_name` property or combine given, additional, family. The particular pool record does not contain any additional names. Should be addressed by updating metadata records.
@ -0,0 +1,13 @@
---
Owner

This file exists under a different name -- needs investigation.

This file exists under a different name -- needs investigation.
Author
Contributor

The same for contributors/oliver-tuescher -- the generated pages use the last components from the Pool PID (https://trr379.de/contributors/...) as the folder name. I'm surprised it only differed from the existing folder names in 2 cases.

In this PR I did not remove the old files (there are extra files like portraits), I just placed generated files onto existing ones.

Manual clean-up will be necessary.

The same for `contributors/oliver-tuescher` -- the generated pages use the last components from the Pool PID (`https://trr379.de/contributors/...`) as the folder name. I'm surprised it only differed from the existing folder names in 2 cases. In this PR I did not remove the old files (there are extra files like portraits), I just placed generated files onto existing ones. Manual clean-up will be necessary.
@ -17,4 +16,1 @@
orcid: 0000-0002-0992-634X
name-title: Prof. Dr. med.
affiliation: Department of Psychiatry, Psychososmatics and Psychotherapy, Goethe University Frankfurt
portrait: portrait.jpg
Author
Contributor

We currently lose all "portrait" params - from the Pool we have no way of knowing whether the person has a dedicated portrait, or whether the portrait is webp or json.

Seeing the current usage, most folders have both portrait.[jpg|webp] and (smaller / more tightly cropped) thumbnail.[jpg|webp]. The custom Hugo template has this:

  {{- $portrait := $images.GetMatch (.Params.portrait | default "*thumbnail*") }}

It would probably be OK to restrict the freedom in choosing the names a little bit, remove the portrait parameter, and do something like (untested, proposed blindly):

  {{- $portrait := $images.GetMatch ( "portrait*" | default "*thumbnail*") }}

Can be addressed by changing Hugo template?

We currently lose all "portrait" params - from the Pool we have no way of knowing whether the person has a dedicated portrait, or whether the portrait is webp or json. Seeing the current usage, most folders have both `portrait.[jpg|webp]` and (smaller / more tightly cropped) `thumbnail.[jpg|webp]`. The custom Hugo template has this: ``` {{- $portrait := $images.GetMatch (.Params.portrait | default "*thumbnail*") }} ``` It would probably be OK to restrict the freedom in choosing the names a little bit, remove the portrait parameter, and do something like (untested, proposed blindly): ``` {{- $portrait := $images.GetMatch ( "portrait*" | default "*thumbnail*") }} ``` Can be addressed by changing Hugo template?
Owner

I agree

I agree
Author
Contributor

Template change proposed in #16/files

Template change proposed in https://hub.trr379.de/q04/www.trr379.de/pulls/16/files
msz force-pushed person-pool-preview from 645cf09849 to 6a096d441a 2026-01-23 16:20:31 +00:00 Compare
Author
Contributor

Replaced by #17 after we updated the pool and the generation pipeline.

Replaced by #17 after we updated the pool and the generation pipeline.
msz closed this pull request 2026-02-06 18:20:17 +00:00

Pull request closed

Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
q04/www.trr379.de!14
No description provided.