Skip to content

Formalize joint index support in .tsv via {"+".join(entities)}s.tsv files #2273

@yarikoptic

Description

@yarikoptic

Your idea

Came up prominently in discussion of the BEP036 phenotypic data (attn @bids-standard/bep036):

which wants to introduce an aggregated demographics table, bringing together information across subjects/sessions and potentially even runs.

In immediate use case of BEP036 I suggest to formalize allowing for top level participant+sessions.tsv with joint index across participant_id, session_id pairs. Then information

  • the same across sessions will go to participants.tsv like it does now
  • the same across participants but different across sessions - into sessions.tsv at the top level to be "taken" per IP principle.
    • @effigies noted complication is that for .tsv we actually do not really define how to inherit .tsv records beyond "take .tsv closest in hierarchy"... I am yet to check.

But I think the issue is common and warrants a more generic solution! Related desires came up in

  • iEEG in BIDS and microephys (BEP032) to need to disambiguate combination of electrodes on probes. iEEG atm just encodes some location in addition to index within "name", thus making it unique

It is IMHO potentially a general desire to be able to consolidate relevant summary information in the "long form" at higher levels, thus requiring composite indexes. E.g. for an "experimental" composition of OpenNeuro studies I already produced similar studies_derivatives.tsv (to be renamed likely into study+derivatives.tsv).

Pros:

  • It would "pair" up with summarization inheritance principle that at higher levels, if desired, we can provide common metadata in .json or .tsv files. So ATM participants.tsv provide common metadata per participant and sub-*/sub-*_sessions.tsv are the ones to provide differing ones across sessions.
  • It would allow for new type of metadata aggregation/specification, not anyhow limited/specific to phenotype
    • would not break any existing tool assuming a single index/unique IDs in {plural_entity}.tsv} and thus be backward compatible
    • allow for efficient single-entity index table to still exist (unlike adding composite index into a {plural_entity}.tsv needing to duplicate the value for all values of the other entities)

Counter precedents (sneaked in):

  • microscopy uses top level samples.tsv which at the top level pairs the two: "The combination of sample_id and participant_id MUST be unique."
    • TODO: rename/auto-migrate into sample+participants.tsv while allowing for samples.tsv as well
  • any other???

Alternative solutions

Flexible joint indexes

By @effigies to do allow multiple columns to serve an index, and define that at the schema level. Details to be provided by @effigies in some other issue/PR? ;)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs implementationschemaIssues related to the YAML schema representation of the specification. Patch version release.schema-codeUpdates or changes to the code used to parse, filter, and render the schema.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions