Skip to content

Bring canonization up to current commit and optimize. #83

@AgentElement

Description

@AgentElement

It is difficult to build our crate on Windows, because of our dependence on Nauty. We want to get rid of this dependency. We only use its canonical-representation subroutine, which obtains a hashable representation of a graph, such that two isomorphic graphs have the same representation.

@devrz45's canonization branch has a function that does exactly this, and it is specific to molecular graphs (and therefore possibly faster). However,

  1. It branched off of the main codebase early, and needs a rebase and update to interface with the current molecule::Molecule.
  2. We need to optimize it to be at least as fast as nauty.

For (2), I have a few optimization ideas:

(a) Get rid of all internal string operations (string formatting, etc). I think the main place where string operations are used is to index into MolAtomNode::canonize::DAG_vertex_map.
(b) Remove almost all uses of hashmaps/hashsets. Use bitsets/vectors in their place if feasible. Note that NodeIndexs and EdgeIndexs are contiguous, so it is possible to index a vector or bitset by {NodeIndex/EdgeIndex}::index(). Such a representation will likely be faster than a hashmap/hashset.

Ideally, we'd have a prototype that looks like this:

// Very lightweight representation, clones should be very cheap. 
// The meat of this struct might be a `Vec<usize>`, or similar.
#[derive(Hash, Copy, Clone)]
struct CanonicalRepresentation { ... }

// Should run in ~100us for most subgraphs of (say) Morphine.
fn canonical_representation(molecule: &Molecule) -> CanonicalRepresentation

Metadata

Metadata

Assignees

Labels

algorithmA new or improved algorithm

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions