Skip to content

Flattening high value parameters onto parent entity #37

@patrick-austin

Description

@patrick-austin

Taken from notes of ICAT F2F, in response to question asked by @antolinos:

Can the entities indexed be controlled? If only interested in Datasets, and specific DatasetParameters (~6 valuable ones, the rest are not interesting)?

  • entitiesToIndex is a config option in ICAT server. Only these will be indexed by Lucene/the search engine backend. This config option will still be present in the "new" version of free text search.
  • Currently, all Parameters are stored in their own index (one for Investigation, Dataset, Datafile and Sample). When searching/faceting, under the hood we "join" the main entity index to the Parameter index.
  • Joining has a negative performance impact, but is the only way to retain nested lists of objects (i.e. the only way to keep the type.name, type.units associated with the same numericValue)
  • In your use case, where there are certain valuable Parameters, it would be better to (as you have already done) "flatten" these parameters into fields on the Dataset document, as you do not need to worry about needing to be able to update these Parameters or an explosion in the number of Parameter fields.
    • This is not currently possible in either the icat.lucene or the OS/ES backend support, however in principle it should be possible to do by writing additional logic (and would be more performant) providing you don't mind the following drawbacks:
      • Parameters need to be reduced to key:value pairs, so units would need to be embedded into either the key or the value, rangeTop/rangeBottom would need to be mapped to a single value etc.
      • You would not be able to easily modify the "ParameterType" information - e.g. to change the ParameterType.name would mean changing the mapping of the entire index, or adding an alias for the field which would need very specific logic compared to the rest of the functionality

To implement this, changes would also be needed in icat.server and DataGateway

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions