Forward schema_path/target_class to linkml's delimited file loader by Sigfried · Pull Request #235 · linkml/linkml-map

Sigfried · 2026-05-14T15:41:12Z

Summary

Forward schema_path and target_class through TsvFileLoader, CsvFileLoader, DataLoader, and get_file_loader() so they reach linkml's underlying TsvLoader / CsvLoader. This enables schema-aware type coercion for delimited files, preventing string-ranged and enum-ranged columns from being silently coerced to int/float.

Without these parameters, a column like subject_id containing numeric-looking strings (e.g., "00123") gets loaded as 123 (int) by pandas' default inference, losing the leading zero and breaking downstream lookups.

Background

linkml/linkml#3289 ("Make delimited file loader schema-aware to preserve string/enum columns") added schema-awareness to linkml's _DelimitedFileLoader. It was released in linkml v1.11.0. This PR exposes those parameters through linkml-map's loader API so users of linkml-map can benefit from the same fix.

Changes

TsvFileLoader.__init__ / CsvFileLoader.__init__: accept optional schema_path and target_class
iter_instances(): forward both params to the underlying linkml loader
get_file_loader(): kwargs-only schema_path / target_class, passed only to TSV/CSV loader classes (other formats already do their own type handling)
DataLoader.__init__: accept and store both params
Tests: 11 new `TestSchemaAware*` tests in `tests/test_loaders/test_data_loader.py` covering string-range preservation, enum-range preservation, and the no-schema coercion fallback for each entry point (`TsvFileLoader`, `CsvFileLoader`, `get_file_loader`, `DataLoader`)

The change is additive — existing callers that don't pass the new params get current behavior. Annotations use PEP 604 `X | Y | None` syntax to match the rest of the file's style (no new `typing` imports needed).

Test plan

42 tests pass in `tests/test_loaders/test_data_loader.py` (31 existing + 11 new)
Full `pytest` run: 831 passed, 4 skipped, 1 pre-existing failure (`test_graphviz_compiler` — fails on `main` too due to missing local `dot` binary, unrelated)
CI passes

🤖 Generated with Claude Code

Pass through the new schema-aware loading params so that string-ranged and enum-ranged columns in TSV/CSV files are not coerced to int/float. Requires linkml >=1.11 (PR linkml/linkml#3289 added schema-awareness to the underlying _DelimitedFileLoader; released in v1.11.0). Also imports nothing new from `typing` — the new annotations use PEP 604 `X | Y | None` syntax to match the rest of the file's style. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

linkml/linkml#3289 was released in linkml v1.11.0; schema-automator/#188 was released in v0.5.5. Switch both from git URL pins to PyPI version specifiers. linkml-map fix is still unreleased (PR linkml/linkml-map#235 open) — its git pin stays in place until that ships. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop the kwargs-only schema_path/target_class from get_file_loader and let them flow through **kwargs to the loader class. The DataLoader call sites add them to the same conditional block that already gates skip_empty_rows, so YAML/JSON paths never see them. Passing them to a non-tabular loader now raises TypeError instead of being silently swallowed.

amc-corey-cox

I'd like to back out the new schema_path / target_class params on DataLoader and get_file_loader — we already have source_schemaview on Session / Transformer, and the identifier passed to data_loader[...] already names the target class. Inlines mark the spots.

amc-corey-cox · 2026-05-28T13:57:07Z

        self,
        source: str | Path,
        skip_empty_rows: bool = True,
+        schema_path: str | Path | None = None,


The leaf-level exposure is right and should stay. The reshape below is about not duplicating the same parameters higher up the stack — DataLoader and get_file_loader don't need to grow these too.

amc-corey-cox · 2026-05-28T13:57:07Z

        base_path: str | Path,
        default_format: FileFormat | None = None,
        skip_empty_rows: bool = True,
+        schema_path: str | Path | None = None,


We don't need new infrastructure here — the schema is already in scope at the orchestration layer. Session.source_schemaview and Transformer.source_schemaview carry it, and the spec declares source_schema.

Suggested: replace schema_path / target_class with an optional schemaview: SchemaView | None, and let transform_spec and the CLI pass the existing source_schemaview in. End users then get the fix without changing how they call linkml-map.

amc-corey-cox · 2026-05-28T13:57:07Z

        if file_format in (FileFormat.TSV, FileFormat.CSV):
            loader_kwargs["skip_empty_rows"] = self.skip_empty_rows
+            loader_kwargs["schema_path"] = self.schema_path
+            loader_kwargs["target_class"] = self.target_class


The identifier already names the target class — data_loader["Person"] means we want rows typed as Person. Derive target_class = identifier here instead of storing it on self.

Storing a single target_class on the loader doesn't compose with directory mode anyway: different files map to different classes.

amc-corey-cox · 2026-05-28T13:57:07Z

        if file_format in (FileFormat.TSV, FileFormat.CSV):
            loader_kwargs["skip_empty_rows"] = self.skip_empty_rows
+            loader_kwargs["schema_path"] = self.schema_path
+            loader_kwargs["target_class"] = self.target_class


Single-file mode: derive target_class = self.base_path.stem to match how identifiers map to files everywhere else in the loader.

Sigfried mentioned this pull request May 14, 2026

Enum derivations pipeline with auto-generated specs linkml/dm-bip#291

Draft

5 tasks

amc-corey-cox added this to the v0.5.3 milestone May 22, 2026

amc-corey-cox mentioned this pull request May 28, 2026

Consolidate parallel loader hierarchies in linkml-runtime and linkml.validator linkml/linkml#3579

Open

Merge branch 'main' into forward-schema-path-to-delimited-loader

d904fb7

amc-corey-cox self-requested a review May 28, 2026 18:10

amc-corey-cox requested changes May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward schema_path/target_class to linkml's delimited file loader#235

Forward schema_path/target_class to linkml's delimited file loader#235
Sigfried wants to merge 3 commits into
linkml:mainfrom
Sigfried:forward-schema-path-to-delimited-loader

Sigfried commented May 14, 2026

Uh oh!

amc-corey-cox left a comment •

edited

Loading

Uh oh!

amc-corey-cox May 28, 2026

Uh oh!

amc-corey-cox May 28, 2026

Uh oh!

amc-corey-cox May 28, 2026

Uh oh!

amc-corey-cox May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sigfried commented May 14, 2026

Summary

Background

Changes

Test plan

Uh oh!

amc-corey-cox left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amc-corey-cox May 28, 2026

Choose a reason for hiding this comment

Uh oh!

amc-corey-cox May 28, 2026

Choose a reason for hiding this comment

Uh oh!

amc-corey-cox May 28, 2026

Choose a reason for hiding this comment

Uh oh!

amc-corey-cox May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amc-corey-cox left a comment •

edited

Loading