Metadata Schema

Development of a metadata schema for experimental data, specifically electrochemical and electrocatalytic data.

Install

Install pixi and get a copy of the metadata-schema:

git clone https://github.com/echemdb/metadata-schema.git
cd metadata-schema

CLI

Flatten metadata to Excel/CSV

The mdstools package provides tools to flatten nested YAML metadata into tabular Excel/CSV formats with optional schema-based enrichment (descriptions and examples from JSON schemas).

Flatten a YAML file to enriched Excel and CSV:

mdstools flatten tests/example_metadata.yaml

This creates three files in generated/:

example_metadata.csv - Flat CSV with all metadata
example_metadata.xlsx - Single-sheet Excel file
example_metadata_sheets.xlsx - Multi-sheet Excel (one sheet per top-level key)

All exported files include Description and Example columns populated from the JSON schemas, making it easier for users to understand and fill out the metadata templates.

Options

mdstools flatten <yaml_file> [--schema-dir DIR] [--output-dir DIR] [--no-enrichment]

--schema-dir - Directory with JSON schemas (default: schemas)
--output-dir - Output directory (default: generated)
--no-enrichment - Disable enrichment (no Description/Example columns)

Unflatten Excel/CSV back to YAML

mdstools unflatten generated/example_metadata.xlsx --schema-file schemas/minimum_echemdb.json

Note: All CLI commands can also be run via pixi, e.g., pixi run flatten ... and pixi run unflatten ....

Python API

The mdstools package can also be used programmatically:

from mdstools.metadata.metadata import Metadata
from mdstools.metadata.enriched_metadata import EnrichedFlattenedMetadata

# Load YAML metadata
metadata = Metadata.from_yaml('metadata.yaml')

# Flatten to tabular format
flattened = metadata.flatten()

# Add schema enrichment (descriptions and examples)
enriched = EnrichedFlattenedMetadata(flattened.rows, schema_dir='schemas')

# Get enriched DataFrame
df = enriched.to_pandas()

# Export to various formats
enriched.to_csv('output.csv')
enriched.to_excel('output.xlsx')
enriched.to_excel('output_multi.xlsx', separate_sheets=True)  # One sheet per top-level key
enriched.to_markdown('output.md')

You can also load a flat Excel/CSV file, reconstruct the nested dict, and optionally write YAML. This workflow expects columns named Number, Key, and Value and is intended for unflattening back to dict/YAML. An enriched Excel can also be loaded.

from mdstools.metadata.flattened_metadata import FlattenedMetadata

flattened = FlattenedMetadata.from_excel("generated/example_metadata.xlsx")
metadata = flattened.unflatten()

data = metadata.data  # Nested dict
metadata.to_yaml("generated/example_metadata.yaml")

Developer

Run tests

pixi run test              # Run all tests
pixi run doctest           # Run doctests only
pixi run test-comprehensive # Run integration tests only

or all

pixi run -e dev test-all

Generate schemas from LinkML

Generate JSON schemas and Pydantic models from the LinkML definitions in linkml/:

pixi run generate-schemas        # JSON Schema only
pixi run generate-models          # Pydantic models only
pixi run generate-all             # Both

The generated JSON schemas are written to schemas/.

After intentional changes to LinkML files, update the expected baseline files:

pixi run update-expected-schemas

Validate schema files

To validate the example files against the JSON schemas:

pixi run validate              # Run all validations
pixi run validate-objects      # Validate individual object examples
pixi run validate-file-schemas # Validate file-level YAML examples
pixi run validate-package-schemas  # Validate package JSON examples
pixi run check-naming          # Enforce naming conventions

Package schema validation requires the Frictionless Data Package standard schemas. They are downloaded automatically on first run into schemas/frictionless/ (gitignored) and cached for subsequent offline use.

Name		Name	Last commit message	Last commit date
Latest commit History 330 Commits
.github/workflows		.github/workflows
doc/news		doc/news
examples		examples
linkml		linkml
mdstools		mdstools
schemas		schemas
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pylintrc		.pylintrc
ChangeLog		ChangeLog
LICENSE		LICENSE
claude.md		claude.md
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml
readme.md		readme.md
rever.xsh		rever.xsh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metadata Schema

Install

CLI

Flatten metadata to Excel/CSV

Options

Unflatten Excel/CSV back to YAML

Python API

Developer

Run tests

Generate schemas from LinkML

Validate schema files

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

echemdb/metadata-schema

Folders and files

Latest commit

History

Repository files navigation

Metadata Schema

Install

CLI

Flatten metadata to Excel/CSV

Options

Unflatten Excel/CSV back to YAML

Python API

Developer

Run tests

Generate schemas from LinkML

Validate schema files

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages