Skip to content

Commit 2c7a24c

Browse files
committed
feat: update datacite model to version 4.6
I am not sure what the best way to deal with schema versions is. The easiest thing and the current strategy is to provide the latest versions that are backward-compatible with prior versions (within reason) i.e.: * `CodeMetaV3` can accept v2 fields and turn them into v3 equivalents * `DataCiteV46` implements the latest changes to v4, and does not break compatibility with any other v4 version Citation File Format provides jsonschema so in this case, it would be easy to provide seperate models
1 parent e97bb3d commit 2c7a24c

File tree

7 files changed

+691
-34
lines changed

7 files changed

+691
-34
lines changed

README.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -11,35 +11,31 @@ For converting between different standards, an extension of [CodeMeta](https://c
1111

1212
<table><thead>
1313
<tr>
14-
<th colspan="2">Schema</th>
15-
<th colspan="3">Status<br></th>
14+
<th>Schema</th>
15+
<th>Pydantic model</th>
16+
<th>Backward-compatible with<a href="#1"><sup>[1]</sup></a></th>
17+
<th>Convert <i>to</i></th>
18+
<th>Convert <i>from</i></th>
1619
</tr></thead>
1720
<tbody>
1821
<tr>
19-
<td>Name<br></td>
20-
<td>Version(s)</td>
21-
<td>Pydantic Model</td>
22-
<td>convert <b>to</b><br></td>
23-
<td>convert <b>from</b><br></td>
24-
</tr>
25-
<tr>
26-
<td><a href="https://codemeta.github.io/">CodeMeta</a><br></td>
27-
<td><a href="https://w3id.org/codemeta/3.0"><code>v3</code></a></td>
28-
<td>✅ *</td>
22+
<td><a href="https://w3id.org/codemeta/3.0">CodeMeta v3</a></td>
23+
<td>✅<a href="#2"><sup>[2]</sup></a></td>
24+
<td>v2</td>
2925
<td>✅</td>
3026
<td>✅</td>
3127
</tr>
3228
<tr>
33-
<td><a href="https://schema.datacite.org/">Datacite</a></td>
34-
<td><a href="https://datacite-metadata-schema.readthedocs.io/en/4.5/"><code>v4.5</code></a><br></td>
29+
<td><a href="https://datacite-metadata-schema.readthedocs.io/en/4.6">Datacite 4.6</a></td>
3530
<td>✅</td>
31+
<td>4.0, 4.1, 4.2, 4.3, 4.4, 4.5</td>
3632
<td>✅</td>
3733
<td></td>
3834
</tr>
3935
<tr>
40-
<td><a href="https://citation-file-format.github.io/">Citation File Format</a></td>
41-
<td><a href="https://github.com/citation-file-format/citation-file-format/blob/bd0b31df69dccf11b31584585b5fb8c39d3e0e09/schema.json"><code>1.2.0</a></code></td>
36+
<td><a href="https://citation-file-format.github.io/">Citation File Format 1.2.0</a></td>
4237
<td>✅</td>
38+
<td></td>
4339
<td>✅</td>
4440
<td></td>
4541
</tr>
@@ -67,7 +63,11 @@ For converting between different standards, an extension of [CodeMeta](https://c
6763
</tbody>
6864
</table>
6965

70-
\* The `CodeMeta` model is currently implemented as a pydantic **v1** model, due to a heavy reliance on [pydantic_schemaorg](https://github.com/lexiq-legal/pydantic_schemaorg) which has not been fully updated.
66+
##### [1]
67+
Lists the versions that can be safely used as input. Output will always use the specified version. For example, the `CodeMetaV3` model will accept v2 property names and automatically change them to v3 equivalents.
68+
69+
##### [2]
70+
The `CodeMeta` model is currently implemented as a pydantic **v1** model, due to a heavy reliance on [pydantic_schemaorg](https://github.com/lexiq-legal/pydantic_schemaorg) which has not been fully updated.
7171

7272
## Installation
7373

codemeticulous/cff/models.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1575,7 +1575,7 @@ class Reference(BaseModel):
15751575
"""
15761576

15771577

1578-
class CitationFileFormat(ByAliasExcludeNoneMixin, BaseModel):
1578+
class CitationFileFormatV120(ByAliasExcludeNoneMixin, BaseModel):
15791579
model_config = ConfigDict(
15801580
extra="forbid",
15811581
populate_by_name=True,
@@ -1694,3 +1694,6 @@ class CitationFileFormat(ByAliasExcludeNoneMixin, BaseModel):
16941694
"""
16951695
The version of the software or dataset.
16961696
"""
1697+
1698+
1699+
CitationFileFormat = CitationFileFormatV120

codemeticulous/codemeta/models.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ class VersionedLanguage(ComputerLanguage):
4040
SoftwareListOrSingle = Software | SoftwareList
4141

4242

43-
class CodeMeta(ByAliasExcludeNoneMixin, BaseModel):
43+
class CodeMetaV3(ByAliasExcludeNoneMixin, BaseModel):
4444
"""CodeMeta v3 schema (supports v2 fields aliased to v3)
4545
see: https://codemeta.github.io/terms/
4646
and: https://github.com/codemeta/codemeta-generator/blob/master/js/validation/
@@ -272,3 +272,6 @@ def validate_sub_type(cls, value, base_class):
272272

273273
class Config:
274274
allow_population_by_field_name = True
275+
276+
277+
CodeMeta = CodeMetaV3

codemeticulous/convert.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
from codemeticulous.codemeta.models import CodeMeta
2-
from codemeticulous.datacite.models import DataciteV45
2+
from codemeticulous.datacite.models import DataCite
33
from codemeticulous.cff.models import CitationFileFormat
44
from codemeticulous.codemeta.convert import canonical_to_codemeta, codemeta_to_canonical
55
from codemeticulous.datacite.convert import canonical_to_datacite, datacite_to_canonical
@@ -14,7 +14,7 @@
1414
"from_canonical": canonical_to_codemeta,
1515
},
1616
"datacite": {
17-
"model": DataciteV45,
17+
"model": DataCite,
1818
"format": "json",
1919
"to_canonical": datacite_to_canonical,
2020
"from_canonical": canonical_to_datacite,

codemeticulous/datacite/convert.py

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
Contributor,
2727
ContributorType,
2828
Creator,
29-
DataciteV45,
29+
DataCite,
3030
DateModel,
3131
Description,
3232
NameIdentifier,
@@ -154,6 +154,7 @@
154154
"financial backer",
155155
},
156156
ContributorType.Supervisor: {"supervisor", "advisor", "overseer", "mentor"},
157+
ContributorType.Translator: {"translator", "translation"},
157158
ContributorType.WorkPackageLeader: {
158159
"work package leader",
159160
"package leader",
@@ -222,9 +223,17 @@ def codemeta_actors_to_datacite(
222223
# match roles to contributor types
223224
matched_roles = set()
224225
for role in extractor.role_names:
225-
normalized_role = role.lower().replace(" ", "").replace("-", "").replace("_", "")
226-
for contributor_type, synonyms, in CONTRIBUTOR_TYPE_MAP.items():
227-
if normalized_role in {s.lower().replace(" ", "").replace("-", "").replace("_", "") for s in synonyms}:
226+
normalized_role = (
227+
role.lower().replace(" ", "").replace("-", "").replace("_", "")
228+
)
229+
for (
230+
contributor_type,
231+
synonyms,
232+
) in CONTRIBUTOR_TYPE_MAP.items():
233+
if normalized_role in {
234+
s.lower().replace(" ", "").replace("-", "").replace("_", "")
235+
for s in synonyms
236+
}:
228237
matched_roles.add(contributor_type)
229238
# if we have no matched roles, default to "Other"
230239
if not matched_roles:
@@ -235,13 +244,16 @@ def codemeta_actors_to_datacite(
235244
Contributor(
236245
name=extractor.name,
237246
nameType=(
238-
"Organizational" if extractor.is_organization else "Personal"
247+
"Organizational"
248+
if extractor.is_organization
249+
else "Personal"
239250
),
240251
givenName=extractor.given_names,
241252
familyName=extractor.family_names,
242253
nameIdentifiers=[NameIdentifier(**i) for i in name_identifiers]
243254
or None,
244-
affiliation=[AffiliationItem(**a) for a in affiliations] or None,
255+
affiliation=[AffiliationItem(**a) for a in affiliations]
256+
or None,
245257
contributorType=role_type.value,
246258
)
247259
)
@@ -307,7 +319,7 @@ def codemeta_language_fileformat_to_datacite_format(
307319

308320
def canonical_to_datacite(
309321
data: CanonicalCodeMeta, ignore_existing_doi=False, **custom_fields
310-
) -> DataciteV45:
322+
) -> DataCite:
311323
primary_doi = (
312324
extract_doi_from_identifier(data.identifier)
313325
if not ignore_existing_doi
@@ -328,7 +340,7 @@ def canonical_to_datacite(
328340
for note in release_notes
329341
]
330342
)
331-
return DataciteV45(
343+
return DataCite(
332344
doi=primary_doi,
333345
prefix=doi_prefix,
334346
suffix=doi_suffix,
@@ -372,7 +384,7 @@ def canonical_to_datacite(
372384
)
373385

374386

375-
def datacite_to_canonical(data: DataciteV45) -> CanonicalCodeMeta:
387+
def datacite_to_canonical(data: DataCite) -> CanonicalCodeMeta:
376388
raise NotImplementedError(
377389
"DataCite metadata is not yet supported as an input format"
378390
)

codemeticulous/datacite/models.py

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
"""
2-
generated by datamodel-codegen (schema/datacite/schema45.json)
2+
generated by datamodel-codegen (schema/datacite/schema46.json)
33
with options:
44
--output-model-type pydantic_v2.BaseModel \
55
--field-constraints \
@@ -11,9 +11,9 @@
1111
--disable-timestamp
1212
1313
MANUAL CHANGES:
14-
- add ByAliasExcludeNoneMixin to DataciteV45
14+
- add ByAliasExcludeNoneMixin to DataCiteV46
1515
- add identifier and identifierType to Container, this is included in examples
16-
for schema 4.5 but not in the schema itself..
16+
but not in the source jsonschema
1717
"""
1818

1919
from __future__ import annotations
@@ -163,6 +163,7 @@ class ContributorType(Enum):
163163
RightsHolder = "RightsHolder"
164164
Sponsor = "Sponsor"
165165
Supervisor = "Supervisor"
166+
Translator = "Translator"
166167
WorkPackageLeader = "WorkPackageLeader"
167168
Other = "Other"
168169

@@ -179,6 +180,7 @@ class DateType(Enum):
179180
Available = "Available"
180181
Copyrighted = "Copyrighted"
181182
Collected = "Collected"
183+
Coverage = "Coverage"
182184
Created = "Created"
183185
Issued = "Issued"
184186
Submitted = "Submitted"
@@ -190,6 +192,7 @@ class DateType(Enum):
190192

191193
class ResourceTypeGeneral(Enum):
192194
Audiovisual = "Audiovisual"
195+
Award = "Award"
193196
Book = "Book"
194197
BookChapter = "BookChapter"
195198
Collection = "Collection"
@@ -210,6 +213,7 @@ class ResourceTypeGeneral(Enum):
210213
PeerReview = "PeerReview"
211214
PhysicalObject = "PhysicalObject"
212215
Preprint = "Preprint"
216+
Project = "Project"
213217
Report = "Report"
214218
Service = "Service"
215219
Software = "Software"
@@ -225,6 +229,7 @@ class RelatedIdentifierType(Enum):
225229
ARK = "ARK"
226230
arXiv = "arXiv"
227231
bibcode = "bibcode"
232+
CTSR = "CTSR"
228233
DOI = "DOI"
229234
EAN13 = "EAN13"
230235
EISSN = "EISSN"
@@ -237,6 +242,7 @@ class RelatedIdentifierType(Enum):
237242
LSID = "LSID"
238243
PMID = "PMID"
239244
PURL = "PURL"
245+
RRID = "RRID"
240246
UPC = "UPC"
241247
URL = "URL"
242248
URN = "URN"
@@ -280,6 +286,8 @@ class RelationType(Enum):
280286
Requires = "Requires"
281287
IsObsoletedBy = "IsObsoletedBy"
282288
Obsoletes = "Obsoletes"
289+
isTranslationOf = "isTranslationOf"
290+
hasTranslation = "hasTranslation"
283291

284292

285293
class RelatedObject(BaseModel):
@@ -469,7 +477,7 @@ class RelatedItem(RelatedObject):
469477
relationType: RelationType
470478

471479

472-
class DataciteV45(ByAliasExcludeNoneMixin, BaseModel):
480+
class DataciteV46(ByAliasExcludeNoneMixin, BaseModel):
473481
model_config = ConfigDict(
474482
extra="forbid",
475483
populate_by_name=True,
@@ -502,3 +510,6 @@ class DataciteV45(ByAliasExcludeNoneMixin, BaseModel):
502510
"http://datacite.org/schema/kernel-4"
503511
)
504512
container: Optional[Container] = None
513+
514+
515+
DataCite = DataciteV46

0 commit comments

Comments
 (0)