Skip to content

PMID GeneReviews entries (e.g., PMID:31536184) fall back to abstract_only instead of fetching NBK full text #41

@cmungall

Description

@cmungall

Summary

When caching PMID:31536184 (GeneReviews chapter for GA1), linkml-reference-validator stores content_type: abstract_only and does not fetch the full NCBI Bookshelf chapter text, even though full text is publicly available.

Version

  • linkml-reference-validator==0.1.4rc8

Repro

linkml-reference-validator cache reference PMID:31536184 --force --verbose

Output includes:

  • Content type: abstract_only

Cached file frontmatter:

reference_id: PMID:31536184
content_type: abstract_only

Why this is a problem

For GeneReviews PMIDs, users often cite statements that are present in the full NBK chapter but not in the PubMed abstract/summary text. Validation then fails with abstract-only limitations.

Expected behavior

One of:

  1. PMID source should detect NCBI Bookshelf/GeneReviews links and fetch chapter full text (e.g., printable view), or
  2. Provide a built-in NBK source and/or an automatic PubMed->Bookshelf handoff for PMIDs that resolve to Bookshelf chapters.

Additional evidence

Full chapter is available at:

(HTML content is large and includes full sections/tables/references, not just the summary.)

Notes

Current PMID implementation appears to do:

  • PubMed abstract fetch
  • optional PMC full text fetch via PMCID
  • fallback to abstract_only when no PMCID

This misses Bookshelf-hosted full text for GeneReviews PMIDs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions