Skip to content

Commit b7d8b09

Browse files
committed
refactor: use CanonicalCodeMeta internal data model instead of CodeMeta
* build: need setuptools>=61.0 to build with only a pyproject.toml * chore: add codemeta.json and CITATION.cff meta examples
1 parent e62a661 commit b7d8b09

File tree

20 files changed

+202
-133
lines changed

20 files changed

+202
-133
lines changed

CITATION.cff

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
abstract: validate and convert between software metadata standards with pydantic
2+
authors:
3+
- affiliation: Arizona State University
4+
5+
family-names: Foster
6+
given-names: Scott
7+
cff-version: 1.2.0
8+
keywords:
9+
- software metadata
10+
- codemeta
11+
- research software
12+
- FAIR
13+
license: GPL-3.0
14+
message: If you use this software, please cite it using the metadata from this file.
15+
repository-code: https://github.com/sgfost/codemeticulous/
16+
title: codemeticulous
17+
type: software
18+
version: 0.1.0

README.md

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,11 @@
11
![](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fsgfost%2Fcodemeticulous%2Fmain%2Fpyproject.toml) ![](https://img.shields.io/github/license/sgfost/codemeticulous)
22

3-
> [!NOTE]
3+
> [!WARNING]
44
> `codemeticulous` is in an early state of development and things are subject to change. Refer to the [table](#feature-roadmap) below to see currently supported formats and conversions.
55
6-
`codemeticulous` is a python library and command line utility for validating and converting between different metadata standards for software. Validation is done by providing [pydantic](https://docs.pydantic.dev/latest/) models that mirror the standards' schema definitions.
6+
`codemeticulous` is a python library and command line utility for working with different metadata standards for software. Several [Pydantic](https://docs.pydantic.dev/latest/) models that mirror metadata schemas are provided which allows for simple validation, (de)serialization and type-safety for developers.
77

8-
Currently, CodeMeta is used as a central "hub" representation of software metadata as it is the most exhaustive, and provides [crosswalk definitions](https://codemeta.github.io/crosswalk/) between other formats. This is done in order to avoid the need for a bridge between every format, though custom conversion logic can be implemented where needed.
9-
10-
> [!NOTE]
11-
> This is subject to change, however. There is an argument to be made for whether an even more robust internal data model would be beneficial. Namely, that going through CodeMeta/schema.org means some conversions will be lossy.
8+
For converting between different standards, an extension of [CodeMeta](https://codemeta.github.io/), called `CanonicalCodeMeta`, is used as a canonical data model or central "hub" representation, along with conversion logic back and forth between it and supported standards. This design allows for conversion between any two formats without needing to implement each bridge. CodeMeta was chosen as it is the most exhaustive and provides [crosswalk definitions](https://codemeta.github.io/crosswalk/) between other formats. Still, some data loss can occur, so some extension is needed to fill schema gaps and resolve abiguity. Note that `CanonicalCodeMeta` is not a proposed standard, but an internal data model used by this library.
129

1310
## Feature Roadmap
1411

@@ -96,15 +93,15 @@ $ codemeticulous validate --format cff CITATION.cff
9693
### As a python library
9794

9895
```python
99-
from codemeticulous.codemeta.models import CodeMeta, Person
100-
from codemeticulous.cff.convert import codemeta_to_cff
96+
from codemeticulous.codemeta import CodeMeta, Person
97+
from codemeticulous import convert
10198

10299
codemeta = CodeMeta(
103100
name="My Project",
104101
author=Person(givenName="Dale", familyName="Earnhardt"),
105102
)
106103

107-
cff = codemeta_to_cff(codemeta)
104+
cff = convert("codemeta", "cff", codemeta)
108105

109106
print(codemeta.json(indent=True))
110107
# {

codemeta.json

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
{
2+
"@context": "https://w3id.org/codemeta/3.0",
3+
"type": "SoftwareSourceCode",
4+
"author": [
5+
{
6+
"id": "_:author_1",
7+
"type": "Person",
8+
"affiliation": {
9+
"type": "Organization",
10+
"name": "Arizona State University"
11+
},
12+
"email": "[email protected]",
13+
"familyName": "Foster",
14+
"givenName": "Scott"
15+
}
16+
],
17+
"codeRepository": "https://github.com/sgfost/codemeticulous/",
18+
"dateCreated": "2024-11-05",
19+
"description": "validate and convert between software metadata standards with pydantic",
20+
"keywords": [
21+
"software metadata",
22+
"codemeta",
23+
"research software",
24+
"FAIR"
25+
],
26+
"license": "https://spdx.org/licenses/GPL-3.0",
27+
"name": "codemeticulous",
28+
"programmingLanguage": "Python 3",
29+
"softwareRequirements": "Python 3.10",
30+
"version": "0.1.0",
31+
"continuousIntegration": "https://github.com/sgfost/codemeticulous/actions/",
32+
"issueTracker": "https://github.com/sgfost/codemeticulous/issues"
33+
}
34+

codemeticulous/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from .convert import convert, to_canonical, from_canonical
2+
3+
__all__ = ["convert", "to_canonical", "from_canonical"]

codemeticulous/cff/__init__.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,2 @@
1-
from .models import CitationFileFormat
2-
3-
__all__ = ["CitationFileFormat"]
1+
from .models import *
2+
from .convert import *

codemeticulous/cff/convert.py

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,8 @@
1010
from pydantic2_schemaorg.Person import Person as SchemaOrgPerson
1111
from pydantic2_schemaorg.Organization import Organization as SchemaOrgOrganization
1212

13-
from codemeticulous.codemeta.extract import (
14-
CodeMetaActorExtractor,
15-
extract_doi_from_codemeta_identifier,
16-
)
13+
from codemeticulous.models import CanonicalCodeMeta
14+
from codemeticulous.extract import ActorExtractor, extract_doi_from_identifier
1715
from codemeticulous.codemeta.models import (
1816
CodeMeta,
1917
Actor as CodeMetaActor,
@@ -45,7 +43,7 @@ def codemeta_actors_to_cff(actors: CodeMetaActorListOrSingle) -> list[Person | E
4543
actors = ensure_list(actors)
4644
cff_actors = []
4745
for actor in actors:
48-
extractor = CodeMetaActorExtractor(actor)
46+
extractor = ActorExtractor(actor)
4947
if extractor.is_person:
5048
cff_actors.append(
5149
Person(
@@ -213,12 +211,12 @@ def extract_main_url_from_codemeta(data: CodeMeta) -> str:
213211
)
214212

215213

216-
def codemeta_to_cff(data: CodeMeta) -> CitationFileFormat:
214+
def canonical_to_cff(data: CanonicalCodeMeta) -> CitationFileFormat:
217215
"""Extract all possible Citation File Format fields from a CodeMeta object based
218216
on the CodeMeta crosswalk and return a CitationFileFormat object
219217
"""
220218
licenses, license_urls = codemeta_license_to_cff(data.license)
221-
primary_doi = extract_doi_from_codemeta_identifier(data.identifier)
219+
primary_doi = extract_doi_from_identifier(data.identifier)
222220
return CitationFileFormat(
223221
cff_version="1.2.0",
224222
message="If you use this software, please cite it using the metadata from this file.",
@@ -250,5 +248,5 @@ def codemeta_to_cff(data: CodeMeta) -> CitationFileFormat:
250248
)
251249

252250

253-
def cff_to_codemeta(data: CitationFileFormat) -> CodeMeta:
251+
def cff_to_canonical(data: CitationFileFormat) -> CanonicalCodeMeta:
254252
raise NotImplementedError

codemeticulous/cli.py

Lines changed: 14 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,9 @@
11
import os
22
import click
33
import json
4-
54
import yaml
65

7-
from codemeticulous.codemeta.models import CodeMeta
8-
from codemeticulous.datacite.models import DataciteV45
9-
from codemeticulous.cff.models import CitationFileFormat
10-
from codemeticulous.datacite.convert import codemeta_to_datacite, datacite_to_codemeta
11-
from codemeticulous.cff.convert import codemeta_to_cff, cff_to_codemeta
12-
13-
14-
models = {
15-
"codemeta": {"model": CodeMeta, "format": "json"},
16-
"datacite": {"model": DataciteV45, "format": "json"},
17-
"cff": {"model": CitationFileFormat, "format": "yaml"},
18-
}
19-
20-
converters = {
21-
"codemeta": {
22-
"datacite": codemeta_to_datacite,
23-
"cff": codemeta_to_cff,
24-
},
25-
"datacite": {
26-
"codemeta": datacite_to_codemeta,
27-
},
28-
"cff": {
29-
"codemeta": cff_to_codemeta,
30-
},
31-
}
6+
from codemeticulous.convert import STANDARDS, convert as _convert
327

338

349
@click.group()
@@ -40,16 +15,16 @@ def cli():
4015
@click.option(
4116
"-f",
4217
"--from",
43-
"from_format",
44-
type=click.Choice(models.keys()),
18+
"source_format",
19+
type=click.Choice(STANDARDS.keys()),
4520
required=True,
4621
help="Source format",
4722
)
4823
@click.option(
4924
"-t",
5025
"--to",
51-
"to_format",
52-
type=click.Choice(models.keys()),
26+
"target_format",
27+
type=click.Choice(STANDARDS.keys()),
5328
required=True,
5429
help="Target format",
5530
)
@@ -62,27 +37,19 @@ def cli():
6237
help="Output file name (by default prints to stdout)",
6338
)
6439
@click.argument("input_file", type=click.Path(exists=True))
65-
def convert(from_format, to_format, input_file, output_file):
66-
if to_format not in converters.get(from_format, {}):
67-
click.echo(
68-
f"Conversion from {from_format} to {to_format} is not supported", err=True
69-
)
70-
return
71-
40+
def convert(source_format: str, target_format: str, input_file, output_file):
7241
try:
73-
input_data = load_and_create_model(input_file, models[from_format]["model"])
74-
except ValueError as e:
75-
click.echo(str(e), err=True)
76-
return
42+
input_data = load_file_autodetect(input_file)
43+
except Exception as e:
44+
click.echo(f"Failed to load file: {input_file}. {str(e)}", err=True)
7745

7846
try:
79-
convert_func = converters[from_format][to_format]
80-
converted_data = convert_func(input_data)
47+
converted_data = _convert(source_format, target_format, input_data)
8148
except Exception as e:
8249
click.echo(f"Error during conversion: {str(e)}", err=True)
8350
return
8451

85-
output_format = models[to_format]["format"]
52+
output_format = STANDARDS[target_format]["format"]
8653

8754
try:
8855
output_data = dump_data(converted_data, output_format)
@@ -102,14 +69,14 @@ def convert(from_format, to_format, input_file, output_file):
10269
"-f",
10370
"--format",
10471
"format_name",
105-
type=click.Choice(models.keys()),
72+
type=click.Choice(STANDARDS.keys()),
10673
required=True,
10774
help="Format to validate",
10875
)
10976
@click.argument("input_file", type=click.Path(exists=True))
11077
def validate(format_name, input_file):
11178
try:
112-
load_and_create_model(input_file, models[format_name]["model"])
79+
load_and_create_model(input_file, STANDARDS[format_name]["model"])
11380
click.echo(f"{input_file} is a valid {format_name} file.")
11481
except ValueError as e:
11582
click.echo(str(e), err=True)
@@ -145,8 +112,6 @@ def load_file_autodetect(file_path):
145112
elif ext in [".yaml", ".yml", ".cff"]:
146113
return yaml.safe_load(file)
147114
else:
148-
raise ValueError(
149-
f"Unsupported file extension: {ext}."
150-
)
115+
raise ValueError(f"Unsupported file extension: {ext}.")
151116
except Exception as e:
152117
raise ValueError(f"Failed to load file: {file_path}. {str(e)}")
Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,2 @@
1-
from .models import CodeMeta
2-
3-
__all__ = ["CodeMeta"]
1+
from .models import *
2+
from .convert import *

codemeticulous/codemeta/convert.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
from codemeticulous.models import CanonicalCodeMeta
2+
from codemeticulous.codemeta.models import CodeMeta
3+
4+
5+
def canonical_to_codemeta(data: CanonicalCodeMeta) -> CodeMeta:
6+
return CodeMeta(**data.dict())
7+
8+
9+
def codemeta_to_canonical(data: CodeMeta) -> CanonicalCodeMeta:
10+
return CanonicalCodeMeta(**data.dict())

codemeticulous/convert.py

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
from codemeticulous.codemeta.models import CodeMeta
2+
from codemeticulous.datacite.models import DataciteV45
3+
from codemeticulous.cff.models import CitationFileFormat
4+
from codemeticulous.codemeta.convert import canonical_to_codemeta, codemeta_to_canonical
5+
from codemeticulous.datacite.convert import canonical_to_datacite, datacite_to_canonical
6+
from codemeticulous.cff.convert import canonical_to_cff, cff_to_canonical
7+
8+
9+
STANDARDS = {
10+
"codemeta": {
11+
"model": CodeMeta,
12+
"format": "json",
13+
"to_canonical": codemeta_to_canonical,
14+
"from_canonical": canonical_to_codemeta,
15+
},
16+
"datacite": {
17+
"model": DataciteV45,
18+
"format": "json",
19+
"to_canonical": datacite_to_canonical,
20+
"from_canonical": canonical_to_datacite,
21+
},
22+
"cff": {
23+
"model": CitationFileFormat,
24+
"format": "yaml",
25+
"to_canonical": cff_to_canonical,
26+
"from_canonical": canonical_to_cff,
27+
},
28+
}
29+
30+
31+
def to_canonical(source_format: str, source_data):
32+
source_model = STANDARDS[source_format]["model"]
33+
if isinstance(source_data, dict):
34+
source_instance = source_model(**source_data)
35+
elif isinstance(source_data, source_model):
36+
source_instance = source_data
37+
38+
source_to_canonical = STANDARDS[source_format]["to_canonical"]
39+
canonical_instance = source_to_canonical(source_instance)
40+
41+
return canonical_instance
42+
43+
44+
def from_canonical(target_format: str, canonical_instance):
45+
canonical_to_target = STANDARDS[target_format]["from_canonical"]
46+
target_instance = canonical_to_target(canonical_instance)
47+
48+
return target_instance
49+
50+
51+
def convert(source_format: str, target_format: str, source_data):
52+
# FIXME: add tons of error handling
53+
54+
canonical_instance = to_canonical(source_format, source_data)
55+
return from_canonical(target_format, canonical_instance)

0 commit comments

Comments
 (0)