Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 46 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,24 +71,38 @@ draws from target log density, which may be used for Monte Carlo
estimates of posterior expectations and quantiles for uncertainty
quantification.

#### Random-walk Metropolis sampler
#### Metropolis sampler

Random-walk Metropolis (RWM) is a diffusive sampler that requires a
target log density function and a symmetric pseudorandom proposal
generator.
Metropolis is a diffusive sampler that requires a target log density
function and a symmetric pseudorandom proposal generator.

#### Metropolis-adjusted Langevin sampler
* **Random-Walk Metropolis (RWM)**: uses a zero-centered normal proposal,
resulting in Markov chains that are random walks

Metroplis-adjusted Langevin (MALA) is a diffusive sampler that adjusts
proposals with gradient-based information. MALA requires a target log
density and gradient function.
#### Metropolis-Hastings sampler

Metropolis Hastings (MH) is a sampler that requires a target log
density function, a (not necessarily symmetric) proposal generator,
and a way to evaluate a proposal's log density.

* **Metropolis-adjusted Langevin (MALA) **: uses a random proposal with a
gradient adjustment. MALA requires a target log density and gradient
function.

* **Affine-Invariant Walker**: uses an ensemble of parameter values
and uses complementary values to condition an MH proposal.
Affine-invariant walkers require only a target log density function.

#### Hamiltonian Monte Carlo sampler

Hamiltonian Monte Carlo (HMC) simulates Hamiltonian dynamics with a
potential energy function equal to the negative log density. It
requires a target log density, gradient function, and optionally a
metric.
metric. Technically, HMC composes a Gibbs kernel that refreshes the
quadratic (i.e., normally distributed) momentum with a deterministic
Metropolis sampler, the proposals for which are generated by following
the Hamiltonian dynamics.


### Sequential Monte Carlo samplers

Expand All @@ -108,6 +122,29 @@ p(theta | y)^t[n] * p(theta),
where the temperature `t[n]` runs from 0 to 1 across iterations.



### Posterior analysis

#### R-hat

R-hat is a statistic over multiple Markov chains that converges to 1
if the chains have the same stationary distribution (under some
mild assumptions).


#### Effective sample size

The effective sample size of a Markov chain for estimating a specific
expectation is the number of independent draws that would lead to the
same standard error.

#### Standard error

The standard error of an unbiased estimate of an expectation provides
the scale of the distribution of normally distributed error.



## Dependencies

`bayes-kit` only depends on a single external package,
Expand Down
2 changes: 1 addition & 1 deletion bayes_kit/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from .hmc import HMCDiag
from .rwm import RandomWalkMetropolis
from .ensemble import Stretcher
from .ensemble import AffineInvariantWalker
from .smc import TemperedLikelihoodSMC
121 changes: 72 additions & 49 deletions bayes_kit/ensemble.py
Original file line number Diff line number Diff line change
@@ -1,68 +1,91 @@
from typing import Callable, Optional, Tuple
from typing import Callable, Iterator, Optional, Tuple
from numpy.typing import NDArray
import numpy as np

from .model_types import LogDensityModel

Sample = NDArray[np.float64]

class Stretcher:
class AffineInvariantWalker:
"""
An implementation of the affine-invariant ensemble sampler of
Goodman and Weare (2010).
References:
Goodman, J. and Weare, J., 2010. Ensemble samplers with affine invariance.
*Communications in Applied Mathematics and Computational Science*
5(1):65--80.
"""

# def __init__(
# self,
# model: LogDensityModel,
# a: Optional[float] = None,
# walkers: Optional[int] = None
# init: Optional[NDarray[np.float64]] = None)
# ):
# self._model = model
# self._dim = self._model.dims()
# if a != None and a < 1:
# raise ValueError(f"stretch bound must be greater than or equal to 1; found {a=}")
# self._a = a
# self._sqrt_a = np.sqrt(a)
# self._inv_sqrt_a = 1 / self._sqrt_a
# if walkers != NONE and (walkers <= 0 or walkers % 2 != 0) :
# raise ValueError(f"walkers must be strictly positive, even integer; found {walkers=}")
# self._walkers = walkers or 2 * self._dim
# self._halfwalkers = a / 2
# self._drawshape = (self._walkers, self._dim)
# if init != None and init.shape != self._drawshape:
# raise ValueError(f"init must be shape of draw {self._drawshape}; found {init.shape=}")
# self._thetas = init or np.random.normal(size=self._drawshape)
# self._firsthalf = range(halfwalkers)
# self._secondhalf = range(halfwalkers, walkers)
def __init__(
self,
model: LogDensityModel,
a: Optional[float] = None,
walkers: Optional[int] = None,
init: Optional[NDArray[np.float64]] = None
):
"""
Initialize the sampler with a log density model, and optionally
proposal bounds, number of walkers and initial parameter values.
Parameters:
model: class used to evaluate log densities
a: bounds on proposal (default 2)
walkers: an even number of walkers to use (default dimensionality of `model * 2`)
init: `walker` x `dimensio`n array of initial positions (defaults to standard normal)
Throws:
ValueError: if `a` is provided and not >= 1, `walker`s is provided and not strictly positive and even,
or if the `init` is provided and is not an `NDArray` of shape `walker` x `dimension`
"""
self._model = model
self._dim = self._model.dims()
if a != None and np.float64(a) < 1:
raise ValueError(f"stretch bound must be greater than or equal to 1; found {a=}")
self._a = np.float64(a or 2.0)
self._sqrt_a = np.sqrt(np.float64(a))
self._inv_sqrt_a = 1 / self._sqrt_a
self._walkers = np.int64(walkers or 2 * self._dim)
if self._walkers < 2 or self._walkers % 2 != 0:
raise ValueError(f"walkers must be strictly positive, even integer; found {walkers=}")
self._halfwalkers = self._walkers // 2
self._drawshape = (int(self._walkers), self._dim)
self._thetas = np.asarray(init or np.random.normal(size=self._drawshape))
if self._thetas.shape != self._drawshape:
raise ValueError(f"init must be shape of draw {self._drawshape}; found {self._thetas.shape=}")
self._firsthalf = range(0, int(self._halfwalkers))
self._secondhalf = range(int(self._halfwalkers), int(self._walkers))

# def __iter__(self):
# return self
def __iter__(self) -> Iterator[Sample]:
return self

# def __next__(self):
# return self.sample
def __next__(self) -> Sample:
return self.sample()

# def draw_z(self):
# """Return random draw z in (1/a, a) with p(z) propto 1 / sqrt(z)"""
# return np.square(np.random.uniform(self._inv_sqrt_a, self._sqrt_a))
def draw_z(self) -> Sample:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What's a z?

  • Should we prefix "private" methods with an underscore, as we do with fields? Doing so can make it more obvious what's the intended API.

Copy link
Collaborator Author

@bob-carpenter bob-carpenter Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

z is the name of the variable in the paper.

The doc explains how the function works:

Return a random draw of `z` in `(1/a, a)` with `p(z) propto 1 / sqrt(z)`

The usage shows that it's an interpolation ratio:

        z = self.draw_z()
        theta_star: NDArray[np.float64] = theta_j + z * (theta_k - theta_j)

Are you asking indirectly what it is because you think it should have a more descriptive variable name?

I'm OK prefixing internal methods with an underscore. I wasn't even aware of that convention. Please bear with me for a couple of years as I come up to speed on Python!

"""Return random draw z in (1/a, a) with p(z) propto 1 / sqrt(z)"""
return np.asarray(np.square(np.random.uniform(self._inv_sqrt_a, self._sqrt_a)))

# def stretch_move(self, theta_k: NDarray[np.float64], theta_j: NDarray[np.float64]):
# z = self.draw_z()
# theta_star = theta_j + z * (theta_k - theta_j) # (1 - z) * theta_j + z * theta_k
# log_q = (self._dims - 1) * np.log(z) + self._model.log_density(theta_star) - self._model.log_density(theta_k)
# if np.log(np.random.uniform()) < log_q:
# return theta_star
# return theta_k
def stretch_move(self, theta_k: NDArray[np.float64], theta_j: NDArray[np.float64]) -> Sample:
z = self.draw_z()
theta_star = np.asarray(theta_j + z * (theta_k - theta_j)) # (1 - z) * theta_j + z * theta_k
print(f"{theta_k=} {theta_j=} {z=} {theta_star=}")
log_q = (self._dim - 1) * np.log(z) + self._model.log_density(theta_star) - self._model.log_density(theta_k)
log_u = np.log(np.random.uniform())
print(f"{log_q=} {log_u=}")
if log_u < log_q:
return theta_star
return theta_k

# def sample(self) -> NDarray[np.float64]
# js = np.random.choice(secondhalf, size=self._halfwalkers)
# for k in firsthalf:
# self._thetas[k] = stretch_move(self._thetas[k], self._thetas[js[k]])
# js = np.random.choice(firsthalf, size=self._halfwalkers)
# for k in secondhalf:
# self_thetas[k] = stretch_move(self._thetas[k], self._thetas[js[k]])
# return self._thetas
def sample(self) -> Sample:
print(f"IN: {self._thetas=}")
js = np.random.choice(self._secondhalf, size=self._halfwalkers, replace=False)
for k in self._firsthalf:
self._thetas[k] = self.stretch_move(self._thetas[k], self._thetas[js[k]])
js = np.random.choice(self._firsthalf, size=self._halfwalkers, replace=False)
for k in self._secondhalf:
self._thetas[k] = self.stretch_move(self._thetas[k], self._thetas[js[k - self._halfwalkers]])
print(f"OUT: {self._thetas=}")
return self._thetas


# TODO(carpenter): cache log density rather than recomputing for self
12 changes: 6 additions & 6 deletions bayes_kit/ess.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import numpy.typing as npt

FloatType = np.float64
IntType = np.int64
IntType = int
VectorType = npt.NDArray[FloatType]

def autocorr_fft(chain: VectorType) -> VectorType:
Expand All @@ -22,7 +22,7 @@ def autocorr_fft(chain: VectorType) -> VectorType:
fft = np.fft.fft(ndata, size)
pwr = np.abs(fft) ** 2
N = len(ndata)
acorr = np.fft.ifft(pwr).real / var / N
acorr: VectorType = np.fft.ifft(pwr).real / var / N
return acorr

def autocorr_np(chain: VectorType) -> VectorType:
Expand All @@ -39,7 +39,7 @@ def autocorr_np(chain: VectorType) -> VectorType:
chain_ctr = chain - np.mean(chain)
N = len(chain_ctr)
acorrN = np.correlate(chain_ctr, chain_ctr, "full")[N - 1 :]
return acorrN / N
return np.asarray(acorrN / N)

def autocorr(chain: VectorType) -> VectorType:
"""
Expand Down Expand Up @@ -92,9 +92,9 @@ def ess_ipse(chain: VectorType) -> FloatType:
raise ValueError(f"ess requires len(chains) >=4, but {len(chain) = }")
acor = autocorr(chain)
n = first_neg_pair_start(acor)
sigma_sq_hat = acor[0] + 2 * sum(acor[1:n])
sigma_sq_hat = acor[0] + 2 * acor[1:n].sum()
ess = len(chain) / sigma_sq_hat
return ess
return np.float64(ess)

def ess_imse(chain: VectorType) -> FloatType:
"""
Expand Down Expand Up @@ -132,7 +132,7 @@ def ess_imse(chain: VectorType) -> FloatType:
# end diff code
sigma_sq_hat = acor[0] + 2 * accum
ess = len(chain) / sigma_sq_hat
return ess
return np.float64(ess)

def ess(chain: VectorType) -> FloatType:
"""
Expand Down
6 changes: 3 additions & 3 deletions bayes_kit/rhat.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,10 @@ def rhat(chains: list[SeqType]) -> FloatType:
"""
if len(chains) < 2:
raise ValueError(f"rhat requires len(chains) >= 2, but {len(chains) = }")
chain_lengths = [len(chain) for chain in chains]
chain_lengths = [len(np.asarray(chain)) for chain in chains]
mean_chain_length = np.mean(chain_lengths)
means = [np.mean(chain) for chain in chains]
vars = [np.var(chain, ddof=1) for chain in chains]
means = [np.mean(np.asarray(chain)) for chain in chains]
vars = [np.var(np.asarray(chain), ddof=1) for chain in chains]
r_hat: np.float64 = np.sqrt(
(mean_chain_length - 1) / mean_chain_length + np.var(means, ddof=1) / np.mean(vars)
)
Expand Down
1 change: 1 addition & 0 deletions bayes_kit/rwm.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,4 @@ def sample(self) -> Sample:
self._theta = np.asanyarray(theta_star)
self._log_p_theta = log_p_theta_star
return self._theta, self._log_p_theta

20 changes: 20 additions & 0 deletions test/test_ensemble.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
from test.models.std_normal import StdNormal
from bayes_kit.ensemble import AffineInvariantWalker
import numpy as np

def test_aiw_std_normal() -> None:
Copy link
Collaborator

@jsoules jsoules Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the elegance of delegating to the actual sampling test, however I think there are two potential issues:

  • This can get a bit burdensome in test execution time; and
  • it's not clear that the "sampler worked" base test is the right level of granularity for the features being tested.

On the second point:

  • If the run_sampling_test call passes with the sampler's default parameters, it isn't clear that rerunning it with different parameters can capture how those parameters have changed the underlying behavior. Like, would specifying a different value for a be expected to produce a tighter tolerance, convergence after fewer iterations, etc.? For another example, as written, we test that an initialization value can be passed successfully, but it's not clear whether that value was actually honored, e.g. by seeing whether choosing a good initial value results in faster convergence/tighter tolerance/etc. Essentially, if the tolerances in run_sampling_test are loose enough, that function might pass cases where the desired behavior from the extra sampler parameters isn't actually implemented.
  • Conversely, we may not need to run even 10k iterations to confirm that the desired behavior is being honored. For instance, when setting the number of walkers, it may be enough to construct the sampler object, and then assert on the dimensions of its _thetas member (along with maybe some tests for e.g. stretch_move specifically, to make sure that function is actually using all available walkers appropriately).

I'm not proposing to test a bunch of volatile implementation details, but piercing the veil a little bit might lead to more specific tests and a faster test suite overall.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what you're asking me to do here.

this can get a bit burdensome in test execution time

Our whole test suite, including this one, runs in 13s on my desktop. Is this really a concern or can I just ignore this for now?

it's not clear that the "sampler worked" base test is the right level of granularity
If the run_sampling_test call passes with the sampler's default parameters, it isn't clear that rerunning it with different parameters can capture how those parameters have changed the underlying behavior.

There's no good theory around this to test in the affine-invariant case. The hyperparameter a is similar to the proposal scale in random-walk Metropolis. There will be an optimal value. In many cases we know what that is for Metropolis but there's not much theory around the affine-invariant ensemble methods. Should we start by testing the behavior of differently scaled or differently shaped proposals in Metropolis?

If we're willing to open up the box and test internals, I could maybe rewrite parts of the code to expose as functions rather than just as code. For example, we could test that the variance of proposals is higher with larger a.

ssert on the dimensions of its _thetas member (along with maybe some tests for e.g. stretch_move specifically, to make sure that function is actually using all available walkers appropriately).

That won't test that they're being used appropriately. Everything's stochastic all the way down in that updating a single walker involves selecting a random complementary walking, proposing, then doing an MH accept/reject step. I'm not sure how to test things like that we have a uniform distribution over all complementary walkers. Again, we'd have to decompose the sampler into finer grained pieces and test those.

For another example, as written, we test that an initialization value can be passed successfully, but it's not clear whether that value was actually honored, e.g. by seeing whether choosing a good initial value results in faster convergence/tighter tolerance/etc.

Again, same problem for Metropolis and every other sampler that we use. And again, we can open up the box and test that we really are setting the values appropriately on initialization. We might want to test that everything breaks if each ensemble member is initialized at the same value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, this does not seem to me to be a blocker for the PR going through--as @roualdes says, it's definitely a good step forward!--and I don't intend to single out this PR for particular scrutiny, it just happened to be what made me realize that we need to revisit the topic of testing strategy.

Also, I wasn't clear enough in my earlier comment. I'm not requesting even-stricter tests; I just want to make sure that each test adds new information. If we have several tests that don't capture different specific behavior of the subject-under-test, one option is to write them to test more specific details; another is to say "These are testing the same thing, maybe we'd be fine cutting some of them to save runtime." Both are valid responses, as long as we're satisfied with the resulting coverage.

Here, I'm hearing you say that for the affine-invariant sampler, there's no good theory for how changing the hyperparameters should affect overall algorithmic performance. To me that sounds like it means that even if we rerun the convergence test, we don't have an outcome to assert on afterwards. So I'm thinking there's no need to spend time rerunning it--I'd be happy to cut entirely the tests against a specific initialization point, initializing with different random generators, etc.

(That said, if we do want to test those things, rerunning the full integration test for them will take longer & probably give less new information than testing something more specific. So for the initialization point, instead of repeating the test for convergence, I'd pick a start point out in the boonies with a model centered at 0, take one draw, and assert that the result is within $APPROPRIATE_VALUE of the initialization; for the random generators, pass in a mock generator and assert that that mock was used; etc.)

I do love the idea of testing for higher-order properties that any correct implementation ought to have--you mentioned ensuring correct distribution over walkers; hearing you mention it, that sounds like a great thing to check. For that specifically, we can mock _stretch_move with a function that captures the parameters it was called with, run 500 or so draws (which will be instantaneous, since we don't actually need to evaluate the densities), and then do some assertions on the distribution of parameters used to call it--I'm happy to help with setting this up, it'll be good for me too to figure out the mechanics. But again, I don't think that needs to be present for this PR to go through. (Similarly with checking for higher proposal variance with higher a--that sounds useful to add, and maybe something we can test either through a test against _draw_z, or with some clever mocking or a minor refactor, but I think that's more an "open an issue" thing than a blocker for the PR.)

Our whole test suite, including this one, runs in 13s on my desktop. Is this really a concern or can I just ignore this for now?

Basically, it's a real concern that you can ignore for now 😄 I'm just looking at the possible matrix of parameters and thinking that even if each call to run_sampling_tests takes (say) half a second, the overall runtime could get noticeable. (And again, my objection to that is in the context where we don't have a good theoretical basis to test for new outcomes from the additional integration test calls--I'm perfectly sanguine about taking the extra time if it gives us key information, but it sounds like it might not here.)

Should we start by testing the behavior of differently scaled or differently shaped proposals in Metropolis?

I think this gets into the tension between testing the algorithm and testing the implementation of the algorithm. For the purposes of testing the implementation, I don't think we need to go beyond testing that Metropolis calls what it's given; past that point, it seems like we'd be more testing the Metropolis algorithm itself rather than this implementation of it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the key issue that @jsoules raised:

we'd be more testing the Metropolis algorithm itself rather than this implementation of it

I agree that we should test our implementations. And that what we have now mainly tests the algorithm.

We can test our implementation does the right thing with changes in a, namely that it gives you a broader range of proposals. That's easy to check. And that's why we were talking about mock objects on the way to lunch. It does require refactoring the code so we can test the internal details of the algorithm. Otherwise, all we have is something we can test end-to-end. You can see that I just pushed the new tests for ESS and added the IAT functionality, where the IAT funcitonality is required for the ESS functionality. We can do the same thing here.

Part of what I was doing with all those calls is making sure that we have code coverage. The tests can be simplified.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we're on the same page!

Re: doing the right thing with changes in a, would you consider hitting _draw_z for N draws each with M different values of a (well, _inv_sqrt_a and _sqrt_a) to be sufficient? Or would you prefer to test only through "public" parts of the API?

My understanding was that the z interpolation factor was pretty specific to this sampler and so isn't likely to have outside consumers if promoted to a public API endpoint, so I'm comfortable leaving the code as currently structured and just testing the private function if it avoids some of the refactoring. Also happy to go along with a bigger restructure if there's a practical or policy reason to do so--just looking for an appropriately minimized amount of effort here.

(To be clear, I'm imagining a test that in pseudocode says basically:

def test_variance_of_interpolation_factor_increases_as_a_increases():
    def test_an_a(a: int) -> Sample[]:
        sampler = # instantiate an AffineInvariantWalker with the given a
        return [sampler._draw_z() for _ in range(100)]
    as = [1, 2, 5, 10, 100]
    zs = [test_an_a(a) for a in as]
    # Do something to fix dimensions, then
    assert var(z[0]) < var(z[1]) < var(z[2]) < var(z[3]) < var(z[4])  

Naturally we could get a lot more specific about the statistical properties we'd expect from _draw_z() and the tolerance to which we'd want those to be realized, this is just what I had in mind right now.)

# init with draw from posterior
init = np.random.normal(loc=0, scale=1, size=[1])
model = StdNormal()
sampler = AffineInvariantWalker(model, a = 2, walkers=10)
M = 10
for m in range(M):
theta = sampler.sample()
print(theta)
return 1
draws = np.array([sampler.sample()[0] for _ in range(M)])
print(f"{draws=}")
mean = draws.mean(axis=0)
var = draws.var(axis=0, ddof=1)
np.testing.assert_allclose(mean, model.posterior_mean(), atol=0.1)
np.testing.assert_allclose(var, model.posterior_variance(), atol=0.1)
7 changes: 0 additions & 7 deletions test/test_rwm.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,7 @@ def test_rwm_std_normal() -> None:
np.testing.assert_allclose(mean, model.posterior_mean(), atol=0.1)
np.testing.assert_allclose(var, model.posterior_variance(), atol=0.1)

accept = M - (draws[: M - 1] == draws[1:]).sum()
print(f"{accept=}")
print(f"{draws[1:10]=}")
print(f"{mean=} {var=}")


def test_rwm_repr() -> None:

init = np.random.normal(loc=0, scale=1, size=[1])
model = StdNormal()

Expand Down