-
Notifications
You must be signed in to change notification settings - Fork 9
Miscellaneous tests to add (from legacy notebooks) #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@pchlenski would you please give a feedback ? |
|
@richarddushime Please be patient, as I have several high-priority tasks I need to finish first. I will try to look at this within the next week. |
Thanks 🙏 |
tests/test_manifolds.py
Outdated
| dist2_sum = sum(M.dist2(X1[:, slc], X2[:, slc]) for M, slc in zip(pm.P, slices)) | ||
|
|
||
| assert torch.allclose(pdist2_total, pdist2_sum, atol=1e-5), "pdist2 does not match sum of component pdist2" | ||
| assert torch.allclose(dist2_total, dist2_sum, atol=1e-5), "dist2 does not match sum of component dist2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests would error as written because the single manifold class Manifold does not have the P attribute. This should be moved to a separate function, or possibly into test_product_manifold_methods instead.
tests/test_manifolds.py
Outdated
| dist2_total = pm.dist2(X1, X2) | ||
|
|
||
| # Compute dimension slices manually | ||
| slices = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the purpose-built pm.factorize method to slice the manifold instead
tests/test_manifolds.py
Outdated
| from manify.embedders._losses import dist_component_by_manifold # type: ignore | ||
| if len(pm.P) > 1: # Requires multiple components | ||
| contributions = dist_component_by_manifold(pm, X1) | ||
| assert torch.isclose(torch.tensor(sum(contributions)), torch.tensor(1.0), atol=1e-5), "Contributions do not sum to 1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to this comment above, this code will error when run with the Manifold class; please move this as well
tests/test_manifolds.py
Outdated
| # Compute distances to origin | ||
| distances_to_origin = m.dist(samples, m.mu0.unsqueeze(0)).squeeze() | ||
|
|
||
| # For hyperbolic and Euclidean manifolds, distances should be consistent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can have something like this instead:
max_dist = float('inf') if K > 0 else torch.pi / torch.sqrt(K)
# Rest of the code does not have if K <= 0 / else control flow any longer:
# Check that distances are reasonable (not all zero, not all infinite)
assert distances_to_origin.min() > 0, f"Distances to origin should be positive for K={K}"
assert distances_to_origin.max() < max_dist, f"Distances to origin should be less than {max_dist} for K={K}"
# Check that distances have reasonable variance (not all identical)
assert distances_to_origin.std() > 1e-6, f"Distances to origin should have variance for K={K}"
tests/test_manifolds.py
Outdated
| elif K > 0: | ||
| # Spherical manifolds have bounded distances | ||
| max_possible_distance = math.pi / math.sqrt(K) | ||
| assert distances_to_origin.max() <= max_possible_distance, f"Distances should be bounded for spherical K={K}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, we have much stronger expectations here: that the distance to origin should be, on average, close to the distance of mu to the origin, and that the standard deviation of the distance should be related to Sigma in some way (need to figure out what the exact relation is)
tests/test_manifolds.py
Outdated
| assert distances_to_origin.max() <= max_possible_distance, f"Distances should be bounded for spherical K={K}" | ||
|
|
||
|
|
||
| def test_log_likelihood_differences(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like test_log_likelihood_differences and test_product_manifold_log_likelihood_differences can be added to the shared manifold tests instead, letting the loop over curvatures in test_manifold_methods and over signatures in def test_product_manifold_methods handle this variation.
tests/test_manifolds.py
Outdated
| assert log_likelihood_diff.std() > 0, f"Log-likelihood differences should have variance for signature={signature}" | ||
|
|
||
|
|
||
| def test_kl_divergence_equivalence(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as this comment above, please incorporate this into shared tests rather than writing separate loops over curvatures/signatures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This" meaning KL divergence equivalence tests for single/product manifolds
|
|
||
| curvatures = [-1.0, 0.0, 1.0] | ||
|
|
||
| for K in curvatures: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be strengthened by setting random seeds specifically: in that case, we should be able to verify that setting n_samples or using stacked z_mean are equivalent, as e.g. in this code from my original _shared_tests function:
# Sampling shapes should support a variety of inputs
stacked_means = torch.stack([M.mu0] * 5)
s1 = M.sample(100)
assert s1.shape == (100, M.ambient_dim), "Sampled points should have the correct shape"
s2 = M.sample(100, z_mean=M.mu0)
assert s2.shape == (100, M.ambient_dim), "Sampled points should have the correct shape"
s3 = M.sample(z_mean=stacked_means)
assert s3.shape == (5, M.ambient_dim), "Sampled points should have the correct shape"
s3 = M.sample(100, z_mean=stacked_means)
assert s3.shape == (500, M.ambient_dim), "Sampled points should have the correct shape"Try setting seeds equivalently and verifying that this behaves as intended
|
@richarddushime thanks for all of your work. I have added comments for this PR, please review if you have time. Let me know if you have any questions or if anything is unclear. Additionally a have a few general questions/comments:
|
…ance tests, sampling consistency, and edge cases
|
@pchlenski Its clear that i do not master the codebase and all the funcs as for now but i liked so much the project , its very interesting, and i want to learn more |
This PR fixes a small part of #16
3 notebooks
2_polblogs_benchmark.ipynb:
3_verify_shapes.ipynb:
5_sampling.ipynb: