Allow selecting subset of variables when using as_cmdstan_fit() #1121
Allow selecting subset of variables when using as_cmdstan_fit() #1121
as_cmdstan_fit() #1121Conversation
|
Still need to add tests |
| if (!is.null(variables)) { | ||
| csv_contents$metadata$variables <- posterior::variables(csv_contents$post_warmup_draws) | ||
| } |
There was a problem hiding this comment.
Overriding the metadata$variables is necessary to avoid errors when subsequently calling methods like draws.
There was a problem hiding this comment.
Doing csv_contents$metadata$variables <- variables doesn't work because variables can contain names of non-scalar parameters and we need the names of the individual elements (e.g. variables = "beta" but we need metadata$variables = c("beta[1]", "beta[2]"), etc.).
There was a problem hiding this comment.
Ideally we would also override metadata$stan_variables and metadata$stan_variable_sizes, but those are bit trickier to get right and I don't think the methods that use those are even available after creating an object just from CSV files (e.g. methods like unconstrain_draws() and others that require calling init_model_methods won't be available if we can't recompile).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1121 +/- ##
==========================================
+ Coverage 86.24% 87.49% +1.25%
==========================================
Files 14 14
Lines 5955 5983 +28
==========================================
+ Hits 5136 5235 +99
+ Misses 819 748 -71 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| metadata$variables <- union(metadata$sampler_diagnostics, metadata$variables) | ||
| if (!user_variables_subset) { | ||
| # because for pathfinder variables and diagnostics are read in together, | ||
| # if user hasn't selected a custom subset of variables we need to include | ||
| # all diagnostics | ||
| variables <- union(metadata$sampler_diagnostics, variables) | ||
| } |
There was a problem hiding this comment.
Pathfinder is a special case because diagnostics and variables are read in together (mcmc reads them in separately). So some special handling was needed to subset properly for pathfinder.
There was a problem hiding this comment.
Pull request overview
This PR adds a variables argument to the as_cmdstan_fit() function to allow creating fitted model objects from a subset of variables in CSV files. This provides users with more flexibility when working with large models by enabling them to load only the parameters they need.
Changes:
- Added
variablesparameter toas_cmdstan_fit()with proper handling for all inference methods (MCMC, optimization, variational, Laplace, and pathfinder) - Added
user_variables_subsetflag inread_cmdstan_csv()to track when users specify custom variable subsets - Enhanced pathfinder-specific logic to correctly handle variable filtering with sampler diagnostics
- Added comprehensive test coverage for variable filtering across all inference methods
- Minor cleanup: removed unused vignette line and improved code style (changed
=to<-for assignments)
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| R/csv.R | Added variables parameter to as_cmdstan_fit(), added user_variables_subset tracking in read_cmdstan_csv(), improved pathfinder variable handling, and added pathfinder to unavailable methods list |
| tests/testthat/test-csv.R | Reorganized tests (moved to end of file), added pathfinder to existing tests, and added new comprehensive test for variable filtering across all methods |
| tests/testthat/helper-models.R | Added "pathfinder" to the list of supported methods in testing_fit() |
| man/read_cmdstan_csv.Rd | Updated function signature to include variables parameter for as_cmdstan_fit() |
| vignettes/posterior.Rmd | Removed unused line showing fit$metadata()$model_params |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Submission Checklist
Summary
Adds
variablesargument toas_cmdstan_fit()to allow creating objects from a subset of variables in the CSV files.Copyright and Licensing
Please list the copyright holder for the work you are submitting
(this will be you or your assignee, such as a university or company):
Columbia University
By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses: