Replies: 1 comment
-
Existing Provenance PatternsPattern 1: WorkflowExecution (Computational Provenance)Location: Purpose: Track bioinformatics workflow execution Key slots:
Example: MetagenomeAssembly tracking when/where/how reads were assembled Distinction: WorkflowExecution tracks data processing, ProvenanceMetadata tracks metadata record creation Pattern 2: CreditAssociation (Human Agent Provenance)Location: Purpose: Link human contributors to Studies using CRediT taxonomy Structure: CreditAssociation:
class_uri: prov:Association
slots:
- applies_to_person
- applied_roles
Study:
slots:
- has_credit_associations # multivalued, inlinedExample: Tracking PIs, data curators, and other contributors Distinction: CreditAssociation tracks human contributions, ProvenanceMetadata tracks system/software Pattern 3: GOLD Legacy (Deprecated Temporal Provenance)Location: Slots: Applied to: Biosample, DataGeneration Problems (see issues #8, #884):
Why this matters: Shows the pain of not using slot_usage. These slots are defined generically but only make sense in GOLD context, causing confusion. Complete Provenance Coverage by Class
What's Still Missing (Future Work)
Not blockers for this PR - just noting the gaps. NMDC's Move Away from PROV-OHistorical ContextThe schema originally used W3C PROV-O vocabulary ( This PR's Approach
Why not force OBO mappings?
Future ExtensibilityShort-term (Next 1-2 PRs)Extend to other classes: Study:
slots:
- provenance_metadata
DataGeneration:
slots:
- provenance_metadataMedium-termDeprecate GOLD legacy: add_date:
deprecated: true
deprecated_element_has_exact_replacement: created_atLong-termAdd modification tracking: Biosample:
slots:
- provenance_metadata # creation (single)
- modification_history # changes (multivalued)ReferencesIssues
Pull RequestsDocumentation |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
PR:
Issue:
Biosampleto track metadata origin #2710What's Good About the Current PR
✅
SourceSystemEnum(basic_classes.yaml)see_alsolinks✅
source_system_of_recordslot (basic_classes.yaml)Biosampleto track metadata origin #2710✅
provenance_metadataslot (basic_classes.yaml)has_credit_associations)✅ Biosample integration (core.yaml)
✅ Test data (Biosample-invalid-source-system.yaml)
Suggested Enhancement: Add slot_usage to ProvenanceMetadata
Why?
The PR reuses existing slots (
git_url,version) which currently have descriptions specific toWorkflowExecution. Withoutslot_usage, readers may be confused:git_urlpoint to workflow code or translator code?versiona workflow release or a translator release?This is the same challenge NMDC faced with
add_date- it was defined generically but only used for GOLD, causing confusion (see issue #8).Recommendation: Use slot_usage Pattern
The schema already uses this pattern in
WorkflowExecution(basic_classes.yaml:491-510):Proposed Addition to ProvenanceMetadata Class
Add to basic_classes.yaml (lines 512-519):
Why slot_usage Instead of New Slots?
❌ Don't Do This (slot proliferation):
✅ Do This (slot_usage refinement):
Benefits:
The "This is Distinct From" Comment Pattern
These comments prevent semantic confusion when slots are reused:
Why it matters:
Summary of Suggested Changes
basic_classes.yamlslot_usageblock + commentscore.yamlbasic_classes.yamlImpact: Establishes a clear, reusable pattern that demonstrates best practices for future schema development.
Questions for Discussion
Do the "this is distinct from" comments add clarity without being too verbose?
Should we extend ProvenanceMetadata to other classes now (Study, DataGeneration) or wait?
Is
nmdc:ProvenanceMetadatathe right class_uri, or should we search harder for OBO mappings?Related Issues/PRs
Beta Was this translation helpful? Give feedback.
All reactions