Skip to content

Conversation

@richarddushime
Copy link

@richarddushime richarddushime commented Jul 25, 2025

This PR fixes a small part of #16

3 notebooks
2_polblogs_benchmark.ipynb:
3_verify_shapes.ipynb:
5_sampling.ipynb:

@richarddushime richarddushime changed the title Tests misc Miscellaneous tests to add (from legacy notebooks) Jul 27, 2025
@richarddushime
Copy link
Author

@pchlenski would you please give a feedback ?

@pchlenski
Copy link
Owner

@richarddushime Please be patient, as I have several high-priority tasks I need to finish first. I will try to look at this within the next week.

@richarddushime
Copy link
Author

@richarddushime Please be patient, as I have several high-priority tasks I need to finish first. I will try to look at this within the next week.

Thanks 🙏

dist2_sum = sum(M.dist2(X1[:, slc], X2[:, slc]) for M, slc in zip(pm.P, slices))

assert torch.allclose(pdist2_total, pdist2_sum, atol=1e-5), "pdist2 does not match sum of component pdist2"
assert torch.allclose(dist2_total, dist2_sum, atol=1e-5), "dist2 does not match sum of component dist2"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests would error as written because the single manifold class Manifold does not have the P attribute. This should be moved to a separate function, or possibly into test_product_manifold_methods instead.

dist2_total = pm.dist2(X1, X2)

# Compute dimension slices manually
slices = []
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the purpose-built pm.factorize method to slice the manifold instead

from manify.embedders._losses import dist_component_by_manifold # type: ignore
if len(pm.P) > 1: # Requires multiple components
contributions = dist_component_by_manifold(pm, X1)
assert torch.isclose(torch.tensor(sum(contributions)), torch.tensor(1.0), atol=1e-5), "Contributions do not sum to 1"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to this comment above, this code will error when run with the Manifold class; please move this as well

# Compute distances to origin
distances_to_origin = m.dist(samples, m.mu0.unsqueeze(0)).squeeze()

# For hyperbolic and Euclidean manifolds, distances should be consistent
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can have something like this instead:

max_dist = float('inf') if K > 0 else torch.pi / torch.sqrt(K)

# Rest of the code does not have if K <= 0 / else control flow any longer:

# Check that distances are reasonable (not all zero, not all infinite)
assert distances_to_origin.min() > 0, f"Distances to origin should be positive for K={K}"
assert distances_to_origin.max() < max_dist, f"Distances to origin should be less than {max_dist} for K={K}"

# Check that distances have reasonable variance (not all identical)
assert distances_to_origin.std() > 1e-6, f"Distances to origin should have variance for K={K}"

elif K > 0:
# Spherical manifolds have bounded distances
max_possible_distance = math.pi / math.sqrt(K)
assert distances_to_origin.max() <= max_possible_distance, f"Distances should be bounded for spherical K={K}"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we have much stronger expectations here: that the distance to origin should be, on average, close to the distance of mu to the origin, and that the standard deviation of the distance should be related to Sigma in some way (need to figure out what the exact relation is)

assert distances_to_origin.max() <= max_possible_distance, f"Distances should be bounded for spherical K={K}"


def test_log_likelihood_differences():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like test_log_likelihood_differences and test_product_manifold_log_likelihood_differences can be added to the shared manifold tests instead, letting the loop over curvatures in test_manifold_methods and over signatures in def test_product_manifold_methods handle this variation.

assert log_likelihood_diff.std() > 0, f"Log-likelihood differences should have variance for signature={signature}"


def test_kl_divergence_equivalence():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as this comment above, please incorporate this into shared tests rather than writing separate loops over curvatures/signatures

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This" meaning KL divergence equivalence tests for single/product manifolds


curvatures = [-1.0, 0.0, 1.0]

for K in curvatures:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be strengthened by setting random seeds specifically: in that case, we should be able to verify that setting n_samples or using stacked z_mean are equivalent, as e.g. in this code from my original _shared_tests function:

    # Sampling shapes should support a variety of inputs
    stacked_means = torch.stack([M.mu0] * 5)
    s1 = M.sample(100)
    assert s1.shape == (100, M.ambient_dim), "Sampled points should have the correct shape"
    s2 = M.sample(100, z_mean=M.mu0)
    assert s2.shape == (100, M.ambient_dim), "Sampled points should have the correct shape"
    s3 = M.sample(z_mean=stacked_means)
    assert s3.shape == (5, M.ambient_dim), "Sampled points should have the correct shape"
    s3 = M.sample(100, z_mean=stacked_means)
    assert s3.shape == (500, M.ambient_dim), "Sampled points should have the correct shape"

Try setting seeds equivalently and verifying that this behaves as intended

@pchlenski
Copy link
Owner

@richarddushime thanks for all of your work. I have added comments for this PR, please review if you have time. Let me know if you have any questions or if anything is unclear.

Additionally a have a few general questions/comments:

  • No random seeds are set. This might be OK for tests that are inside the call to _shared_tests, since we set the torch random seed prior to calling it, but for any new test functions involving sampling, please make sure to set a seed.
  • Please change imports from math library to use their torch equivalents, e.g. torch.pi.
  • In the future, please run the tests and verify that they pass. I can tell you didn't do this because, for instance, Manifold ojects have no P attribute.
  • I'm not convinced that the extreme/edge case tests at the end of this PR will pass. It's a nice idea, but working with these manifolds can be pretty numerically unstable. A more moderate approach may be to carve out a range of curvatures, dimensionalities, etc., which are "officially supported," and have the tests reflect the extremes of that range. Let's talk more about this once you've addressed the rest of the comments / determined which of the final tests fail.

…ance tests, sampling consistency, and edge cases
@richarddushime
Copy link
Author

@pchlenski Its clear that i do not master the codebase and all the funcs as for now but i liked so much the project , its very interesting, and i want to learn more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants