-
Notifications
You must be signed in to change notification settings - Fork 2
Description
It is difficult to build our crate on Windows, because of our dependence on Nauty. We want to get rid of this dependency. We only use its canonical-representation subroutine, which obtains a hashable representation of a graph, such that two isomorphic graphs have the same representation.
@devrz45's canonization branch has a function that does exactly this, and it is specific to molecular graphs (and therefore possibly faster). However,
- It branched off of the main codebase early, and needs a rebase and update to interface with the current
molecule::Molecule. - We need to optimize it to be at least as fast as nauty.
For (2), I have a few optimization ideas:
(a) Get rid of all internal string operations (string formatting, etc). I think the main place where string operations are used is to index into MolAtomNode::canonize::DAG_vertex_map.
(b) Remove almost all uses of hashmaps/hashsets. Use bitsets/vectors in their place if feasible. Note that NodeIndexs and EdgeIndexs are contiguous, so it is possible to index a vector or bitset by {NodeIndex/EdgeIndex}::index(). Such a representation will likely be faster than a hashmap/hashset.
Ideally, we'd have a prototype that looks like this:
// Very lightweight representation, clones should be very cheap.
// The meat of this struct might be a `Vec<usize>`, or similar.
#[derive(Hash, Copy, Clone)]
struct CanonicalRepresentation { ... }
// Should run in ~100us for most subgraphs of (say) Morphine.
fn canonical_representation(molecule: &Molecule) -> CanonicalRepresentation