Development of a metadata schema for experimental data, specifically electrochemical and electrocatalytic data.
Install pixi and get a copy of the metadata-schema:
git clone https://github.com/echemdb/metadata-schema.git
cd metadata-schemaThe mdstools package provides tools to flatten nested YAML metadata into tabular Excel/CSV formats with optional schema-based enrichment (descriptions and examples from JSON schemas).
Flatten a YAML file to enriched Excel and CSV:
mdstools flatten tests/example_metadata.yamlThis creates three files in generated/:
example_metadata.csv- Flat CSV with all metadataexample_metadata.xlsx- Single-sheet Excel fileexample_metadata_sheets.xlsx- Multi-sheet Excel (one sheet per top-level key)
All exported files include Description and Example columns populated from the JSON schemas, making it easier for users to understand and fill out the metadata templates.
mdstools flatten <yaml_file> [--schema-dir DIR] [--output-dir DIR] [--no-enrichment]--schema-dir- Directory with JSON schemas (default:schemas)--output-dir- Output directory (default:generated)--no-enrichment- Disable enrichment (no Description/Example columns)
mdstools unflatten generated/example_metadata.xlsx --schema-file schemas/minimum_echemdb.jsonNote: All CLI commands can also be run via pixi, e.g.,
pixi run flatten ...andpixi run unflatten ....
The mdstools package can also be used programmatically:
from mdstools.metadata.metadata import Metadata
from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata
# Load YAML metadata
metadata = Metadata.from_yaml('metadata.yaml')
# Flatten to tabular format
flattened = metadata.flatten()
# Add schema enrichment (descriptions and examples)
enriched = EnrichedFlattenedMetadata(flattened.rows, schema_dir='schemas')
# Get enriched DataFrame
df = enriched.to_pandas()
# Export to various formats
enriched.to_csv('output.csv')
enriched.to_excel('output.xlsx')
enriched.to_excel('output_multi.xlsx', separate_sheets=True) # One sheet per top-level key
enriched.to_markdown('output.md')You can also load a flat Excel/CSV file, reconstruct the nested dict, and
optionally write YAML. This workflow expects columns named Number, Key,
and Value and is intended for unflattening back to dict/YAML.
An enriched Excel can also be loaded.
from mdstools.metadata.flattened_metadata import FlattenedMetadata
flattened = FlattenedMetadata.from_excel("generated/example_metadata.xlsx")
metadata = flattened.unflatten()
data = metadata.data # Nested dict
metadata.to_yaml("generated/example_metadata.yaml")pixi run test # Run all tests
pixi run doctest # Run doctests only
pixi run test-comprehensive # Run integration tests onlyor all
pixi run -e dev test-allGenerate JSON schemas and Pydantic models from the LinkML definitions in linkml/:
pixi run generate-schemas # JSON Schema only
pixi run generate-models # Pydantic models only
pixi run generate-all # BothThe generated JSON schemas are written to schemas/.
After intentional changes to LinkML files, update the expected baseline files:
pixi run update-expected-schemasTo validate the example files against the JSON schemas:
pixi run validate # Run all validations
pixi run validate-objects # Validate individual object examples
pixi run validate-file-schemas # Validate file-level YAML examples
pixi run validate-package-schemas # Validate package JSON examples
pixi run check-naming # Enforce naming conventionsPackage schema validation requires the Frictionless Data Package standard
schemas. They are downloaded automatically on first run into
schemas/frictionless/ (gitignored) and cached for subsequent offline use.