-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Milestone
Description
A lot of freedom is left to creators of models and datasets, particularly model dependencies and data formats - Scivision is supposed to work with a wider range of models and data than we could anticipate.
Despite this, there are certainly some recommendations we could make, even if it would be hard to make them requirements.
We can link to recommendations from others (general advice or community/library specific).
Some ideas below - please update the list with more!
General
Model authors
- platform portability
- package dependencies. Ideally pin all primary dependencies either to a range (including both top and bottom) or to the current version (which is known to work)
- Tensorflow-specific advice
- ...
- pytorch-specific advice
- ...
- Testing
- Include a test that runs the model on toy data (check the output at the right level - could check for NaN, probably don't want to insist on bitwise reproducibility. Classifier could check most probable class etc.)
- Insist on pytest?
Data providers
- Some suggested options for data storage (e.g. [ENH] Investigate HuggingFace for data storage #317)
- DOI creation
- Size considerations - expectation is that these are to try out quickly, fit on available services, downloaded to users' machines.
- If their dataset is 'large', to include a "sample" dataset (e.g. hosted on Zenodo)
- Should have an option to try out a dataset with a download limit of 10-100 MB
- potentially in addition to a larger version of the data, also in the catalog (consider how to link these - via a 'project'?)
- 'available on request' option (via 'homepage'/'contact' url - not currently in data catalog)
Metadata
Metadata
Assignees
Labels
No labels