-
Notifications
You must be signed in to change notification settings - Fork 38
New biomol fields #400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
New biomol fields #400
Changes from 12 commits
eca24df
572a29f
173172f
5911d53
05a8fed
76f690c
1f5650d
a8781ac
ac6ed65
55f71e2
534ec8d
f865a5a
af3c817
616019c
bd3e9e1
ce93c9d
a875aaf
a1ddd88
a0ee16a
c0ea1ac
af14d72
7610146
c38075c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2617,6 +2617,7 @@ species | |
| - :property:`nattached`: list of integers (OPTIONAL) | ||
| - :property:`mass`: list of floats (OPTIONAL) | ||
| - :property:`original_name`: string (OPTIONAL). | ||
| - :property:`_biomol_atom_name`: string (OPTIONAL). | ||
|
|
||
| - **Requirements/Conventions**: | ||
|
|
||
|
|
@@ -2655,6 +2656,8 @@ species | |
|
|
||
| **Note**: With regards to "source database", we refer to the immediate source being queried via the OPTIMADE API implementation. | ||
| The main use of this field is for source databases that use species names, containing characters that are not allowed (see description of the list property `species_at_sites`_). | ||
|
|
||
| - **\_biomol\_atom\_name**: OPTIONAL. Name of the atom according to the biomolecular field standards. | ||
|
|
||
| - For systems that have only species formed by a single chemical symbol, and that have at most one species per chemical symbol, SHOULD use the chemical symbol as species name (e.g., :val:`"Ti"` for titanium, :val:`"O"` for oxygen, etc.) | ||
| However, note that this is OPTIONAL, and client implementations MUST NOT assume that the key corresponds to a chemical symbol, nor assume that if the species name is a valid chemical symbol, that it represents a species with that chemical symbol. | ||
|
|
@@ -3148,6 +3151,129 @@ Relationships with files may be used to relate an entry with any number of :entr | |
| Appendices | ||
| ========== | ||
|
|
||
| Domain Specific Fields | ||
| ---------------------- | ||
|
|
||
| The fields below are all optional and are only used within specific research fields. | ||
|
|
||
| Every field has a standard domain-specific prefix. | ||
|
|
||
| _biomol_residues | ||
| ~~~~~~~~~~~~~~~~ | ||
|
|
||
| - **Description**: For each residue in the system there is a dictionary that describes this residue. Residues are groups of related atoms (e.g. an aminoacid). | ||
| Databases are allowed to add more properties as long as the properties are prefixed with the database specific prefix. | ||
| - **Type**: list of dictionaries with the properties: | ||
| - :property:`name`: string (REQUIRED) | ||
| - :property:`number`: integer (REQUIRED) | ||
| - :property:`insertion_code`: string or null (REQUIRED) | ||
| - :property:`chain`: string (OPTIONAL) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does it mean when
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may happen in a regular PDB file that the chain column is blank and this is not necessarily wrong. I don't think there is any physical or chemical meaning. Chains are something very custom and there is not a strict criteria for setting them. In our database when chains are missing we set them automatically using a chain per fragment logic but this is just to have the data standardized. Some tools just set all atoms belonging to chain 'X' and some tools simply respect that and let the structure without chains.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for explanation. But maybe then it would make sense to make
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, it also works to me. We also talked about not getting constrained by the limits of PDB format regarding the 1 character string in the chain name, so the missing chain could also be 'Not defined', '', null or many others. As you prefer. |
||
| - **Requirements/Conventions**: | ||
| - **Query**: Support for queries on this property is OPTIONAL. | ||
| If supported, only a subset of the filter features MAY be supported. | ||
| - **name**: The residue name | ||
| - **number**: The residue number according to source notation. | ||
| - **insertion_code**: The residue insertion code. It MUST NOT be longer than 1 character. It MAY be null. | ||
| - **chain**: The chain number this residue belongs to. | ||
| - Values in :property:`chain` SHOULD be in capital letters. | ||
| - Values in :property:`chain` SHOULD NOT be longer than 1 character when the number of chains is not greater than the number of letters in English alphabet (26). | ||
| - There MUST NOT be two or more residues with the same integer in :property:`sites`. | ||
d-beltran marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - All :property:`name` and :property:`insertion_code` values SHOULD be in capital letters. | ||
|
|
||
| - **Examples**: | ||
|
|
||
| .. code:: jsonc | ||
| { | ||
| "_biomol_residues":[ | ||
| { | ||
| "name": "PHE", | ||
| "number": 17, | ||
| "insertion_code": null, | ||
| "chain": "A" | ||
| }, | ||
| { | ||
| "name": "ASP", | ||
| "number": 18, | ||
| "insertion_code": null, | ||
| "chain": "A" | ||
| }, | ||
| { | ||
| "name": "LEU", | ||
| "number": 18, | ||
| "insertion_code": "A", | ||
| "chain": "A" | ||
| }, | ||
| ] | ||
| } | ||
|
|
||
| _biomol_residues_at_sites | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| - **Description**: Index of the residues at each site (where values for sites are specified with the same order of the property `cartesian_site_positions`_). | ||
| The properties of the residues are found in the property `_biomol_residues`_. | ||
| - **Type**: list of integers. | ||
| - **Requirements/Conventions**: | ||
| - **Support**: SHOULD be supported by all biomol implementations, i.e., SHOULD NOT be :val:`null`. | ||
d-beltran marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - **Query**: Support for queries on this property is OPTIONAL. | ||
| If supported, filters MAY support only a subset of comparison operators. | ||
| - MUST have length equal to the number of sites in the structure (first dimension of the list property `cartesian_site_positions`_). | ||
d-beltran marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - Residue indices mentioned in the `_biomol_residues_at_sites`_ list MUST be lower than the length of the list property `_biomol_residues`_ (i.e. for each value in the `_biomol_residues_at_sites`_ list there MUST exist one dictionary in the `_biomol_residues`_ list with the index equal to the corresponding `_biomol_residues_at_sites`_ value). | ||
d-beltran marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| - **Examples**: | ||
|
|
||
| - :val:`[0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1, ... ]` indicates that the first 8 sites belong to the first residue in the the `residues`_ list, while the 9 following atoms belong to the second residue. | ||
|
|
||
| _biomol_site_sequences | ||
| ~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| - **Description**: A list of dictionaries, each representing a linear segment of covalently-linked standard or modified amino acids or nucleotides having atoms with coordinates in sites. The order of the elements in the `_biomol_site_sequences`_ list is not relevant. Each dictionary in the list holds two keys: sequence and type. The sequence is a string of one-letter codes identifying each amino acid or nucleotide as defined by the `mmCIF standard <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entity_poly.pdbx_seq_one_letter_code.html>`__. The type is a string defining the monomers of the sequence. Accepted values are “polypeptide” for amino acids, “polydeoxyribonucleotide” for deoxyribonucleotides and “polyribonucleotide” for ribonucleotides, according to the `mmCIF standard <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_pdbx_reference_linked_entity.link_to_entity_type.html>`__. | ||
| - **Type**: list of dictionaries with the properties: | ||
| - :property:`sequence`: string (REQUIRED) | ||
| - :property:`type`: string (REQUIRED) | ||
| - **Requirements/Conventions**: | ||
| - **Query**: Queries on this property SHOULD be supported. | ||
| - **sequence**: A string with a letter for each residue in the sequence. Letters SHOULD be capital letters. | ||
| - **type**: The type of a sequence is defined by the type of its residues (e.g. "polypeptide"). | ||
|
|
||
| - **Examples**: | ||
|
|
||
| .. code:: jsonc | ||
| { | ||
| "_biomol_site_sequences":[ | ||
| { | ||
| sequence: 'MSHHWGYG', | ||
| type: 'polypeptide' | ||
| }, | ||
| { | ||
| sequence: 'GATTACA', | ||
d-beltran marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| type: 'polydeoxyribonucleotide' | ||
| } | ||
| ] | ||
| } | ||
|
|
||
| _biomol_full_sequences | ||
| ~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| - **Description**: A list of dictionaries, each representing a linear segment of covalently-linked standard or modified amino acids or nucleotides including residues without coordinates in sites. The order of the elements in the `_biomol_full_sequences`_ list is not relevant. | ||
| Each element in the list is a dictionary, with the same schema defined for `_biomol_site_sequences`_. | ||
|
|
||
| - **Examples**: | ||
|
|
||
| .. code:: jsonc | ||
| { | ||
| "_biomol_full_sequences":[ | ||
| { | ||
| sequence: 'MSHHWGYG', | ||
| type: 'polypeptide' | ||
| }, | ||
| { | ||
| sequence: 'GATTACA', | ||
| type: 'polydeoxyribonucleotide' | ||
| } | ||
| ] | ||
| } | ||
|
|
||
|
|
||
| The Filter Language EBNF Grammar | ||
| -------------------------------- | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.