Skip to content

Conversation

@imreddyTeja
Copy link
Member

@imreddyTeja imreddyTeja commented Nov 3, 2025

The diagnostics tests can take a very long
time (8 min on gha). This commit makes two
changes to speed this test up.

  1. Override some constructors inside the @testset

The diagnostics tests do not need a real jacobian, and they do not need to perform a real step. This commit adds constructors for the implicit and explicit tendency updates, cache update, boundary fluxes update, and jacobian compuation which do nothing. It also uses a fake
field matrix and solver.

  1. Change define_diagnsotics! and default_diagnostics to only add valid diagnostics. Constructing a DiagnosticVariable is expensive, and this reduces the number of useless ones created.

The speedup I get from this PR is much larger when I run locally. It makes gha ci a few minutes faster, and the buildkite gpu tests ~8 min faster.


  • I have read and checked the items on the review checklist.

The diagnostics tests can take a very long
time (8 min on gha). This commit makes two
changes to speed this test up.

1. Override some constructors inside the @testset

The diagnostics tests do not need a real jacobian, and
they do not need to perform a real step. This commit
adds constructors for the implicit and explicit tendency
updates, cache update, boundary fluxes update, and jacobian
compuation which do nothing. It also uses a fake
field matrix and solver.

2. Change define_diagnsotics! and default_diagnostics
to only add valid diagnostics. Constructing a DiagnosticVariable
is expensive, and this reduces the number of useless ones created.

use dummy integrator for diag test

extend overriden constructors in test

conditionally add diagnostics
@imreddyTeja imreddyTeja force-pushed the tr/test-integrator branch 2 times, most recently from ae018aa to a36b12b Compare November 4, 2025 18:52
Copy link
Member

@kmdeck kmdeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The speed up is great!

Conceptually I'm a little confused about how defining diagnostics should depend on requested diagnostics. Requested diagnostics seems like more of what you want to get out of a simulation, while the definitions should be independent of that?

E.g. here we request only two diagnostics as output:

diagnostics = ClimaLand.Diagnostics.default_diagnostics(
model,
start_date;
output_vars = ["ct", "trans"],
output_writer = diag_writer,
reduction_period = :hourly,
);

Is this doing something different?

@imreddyTeja
Copy link
Member Author

The speed up is great!

Conceptually I'm a little confused about how defining diagnostics should depend on requested diagnostics. Requested diagnostics seems like more of what you want to get out of a simulation, while the definitions should be independent of that?

E.g. here we request only two diagnostics as output:

diagnostics = ClimaLand.Diagnostics.default_diagnostics(
model,
start_date;
output_vars = ["ct", "trans"],
output_writer = diag_writer,
reduction_period = :hourly,
);

Is this doing something different?

Requested diagnostics is not a great name for it. When default_diagnostics is used, requested_diagnostics would always be equal to output_vars. I'll rename it to output_vars if that makes more sense.

I think the name of define_diagnostics is a bit misleading. The diagnostic compute methods are always defined. define_diagnostics creates diagnostic variables and puts them in the ALL_DIAGNOSTICS dict. I'm not sure why, but creating diagnostic variables seems to take a very long time (maybe because the type is parameterized by the compute function, which is a closure). The current setup creates diagnostic variables that are not relevant for the model (for example the sif diagnostic variable is created for the bucket). If a user doesn't want to use every possible diagnostic, the diagnostic variable for it does not need to be created. Maybe a better solution would be to create diagnostic variables for all possible diagnostics, rather than for every single diagnostic (done currently), or only for the output vars (what this pr does)

@juliasloan25
Copy link
Member

The speed up is great!
Conceptually I'm a little confused about how defining diagnostics should depend on requested diagnostics. Requested diagnostics seems like more of what you want to get out of a simulation, while the definitions should be independent of that?
E.g. here we request only two diagnostics as output:

diagnostics = ClimaLand.Diagnostics.default_diagnostics(
model,
start_date;
output_vars = ["ct", "trans"],
output_writer = diag_writer,
reduction_period = :hourly,
);

Is this doing something different?

Requested diagnostics is not a great name for it. When default_diagnostics is used, requested_diagnostics would always be equal to output_vars. I'll rename it to output_vars if that makes more sense.

I think the name of define_diagnostics is a bit misleading. The diagnostic compute methods are always defined. define_diagnostics creates diagnostic variables and puts them in the ALL_DIAGNOSTICS dict. I'm not sure why, but creating diagnostic variables seems to take a very long time (maybe because the type is parameterized by the compute function, which is a closure). The current setup creates diagnostic variables that are not relevant for the model (for example the sif diagnostic variable is created for the bucket). If a user doesn't want to use every possible diagnostic, the diagnostic variable for it does not need to be created. Maybe a better solution would be to create diagnostic variables for all possible diagnostics, rather than for every single diagnostic (done currently), or only for the output vars (what this pr does)

Thanks Teja, this is helpful! It definitely makes sense to avoid constructing DiagnosticVariables we aren't using, especially if the construction is time consuming. Can we rename define_diagnostics to construct_diagnostics? I think that more explicitly shows that the DiagnosticVariables are being constructed via that method.

To me the name requested_diags does make sense. It's equivalent to diagnostics rather than output_vars in default_diagnostics (though if the user passes in a vector of output_vars we also have diagnostics == output_vars so all 3 are the same).

comments,
compute!,
)
if isnothing(requested_diags) || short_name in requested_diags
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using requested_diags == nothing, we can use requested_diags = get_possible_diagnostics(model) to get the entire set of diagnostics available for a model. Then we can just have:

Suggested change
if isnothing(requested_diags) || short_name in requested_diags
if short_name in requested_diags

and this will have the same effect of not restricting the DiagnosticVariables that are constructed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I had originally, but docs/src/diagnostics/make_diagnostic_table.jl relies on adding all possible diagnostic variables while passing in a nothing model. I can't think of a nice way to deal with this. Maybe adding
get_possible_diagnostics(::Nothing) = :all_diagnostics or =[] and then checking for that.

Comment on lines +5 to +6
compute function for `land_model`. If `requested_diags` is not `nothing`, diagnostics are
only added if their `short_name` is in `requested_diags`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
compute function for `land_model`. If `requested_diags` is not `nothing`, diagnostics are
only added if their `short_name` is in `requested_diags`.
compute function for `land_model`. Diagnostics are
only constructed if their `short_name` is in `requested_diags`.

Copy link
Member

@kmdeck kmdeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining more.

Renaming define_diagnostics! to construct_diagnostics! is fine - or even construct_requested_diagnostics!? and we can keep the name requested_diags, and I like Julia's idea of the default being get_possible_diagnostics(land_model).

I stumbled a bit seeing that define_diagnostics! takes the model as input, but then does things internally that apply to all models - like, as you wrote, define_diagnostics!(bucket_model) adds canopy diagnostics, etc. so i almost wonder if a better solution might be to have different methods for different models, but this is more work for the same result, so I think what you have here is good!

Rename to `construct_diagnostics`. This name
is more reflective of its functionality and purpose.
The ! is dropped because none of the args are modified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants