Acquisition Functions¶
Acquisition functions for Bayesian optimization and active learning in psychophysical experiments.
Package Overview¶
acquisition
¶
acquisition¶
Acquisition functions for adaptive experimental design.
This module provides: - AcquisitionFunction: Protocol for acquisition functions - optimize_acqf(): Functional interface for optimization - Common acquisition functions (Expected Improvement (EI), Upper Confidence Bound (UCB), Information Gain (IG))
Design
Unlike BoTorch's class-based approach, we use a functional style: acq_fn = lambda X: expected_improvement(model.posterior(X), best_f) X_next = optimize_acqf(acq_fn, bounds, q=1)
This is simpler and more composable than inheritance hierarchies.
Available Functions
- expected_improvement: Maximize expected improvement over best observation
- upper_confidence_bound: Balance exploration vs exploitation
- probability_of_improvement: Maximize probability of improvement
- mutual_information: Maximize information gain (Bayesian Active Learning by Disagreement (BALD))
Optimization Methods
- optimize_acqf_discrete: Exhaustive search over candidate set
- optimize_acqf: Gradient-based optimization (Optax)
- optimize_acqf_random: Random search baseline
Examples:
Classes:
| Name | Description |
|---|---|
AcquisitionFunction |
Protocol for acquisition functions. |
Functions:
| Name | Description |
|---|---|
log_expected_improvement |
Log expected improvement for numerical stability. |
optimize_acqf |
Optimize acquisition function over continuous domain. |
optimize_acqf_discrete |
Optimize acquisition function over discrete candidate set. |
optimize_acqf_random |
Optimize acquisition function via random search. |
AcquisitionFunction
¶
Bases: Protocol
Protocol for acquisition functions.
An acquisition function scores candidate points for selection in adaptive experimental design. Higher scores indicate more valuable points.
Examples:
Notes
We deliberately do NOT use a class hierarchy (unlike BoTorch's AcquisitionFunction base class). Functional composition is simpler and more flexible for research code.
If you need stateful acquisition (e.g., caching), use a callable class: class CachedAcquisition: def init(self, model): self.model = model self._cache = {}
1 2 3 | |
log_expected_improvement
¶
log_expected_improvement(
posterior: PredictivePosterior,
best_f: float,
maximize: bool = True,
) -> ndarray
Log expected improvement for numerical stability.
Useful when EI values span many orders of magnitude.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
posterior
|
PredictivePosterior
|
Predictive posterior |
required |
best_f
|
float
|
Best observed value |
required |
maximize
|
bool
|
If True, maximize. If False, minimize. |
True
|
Returns:
| Type | Description |
|---|---|
(ndarray, shape(n_candidates))
|
log(EI) values |
Examples:
Notes
Since we only care about ranking, log(EI) preserves the order: argmax EI(x) = argmax log(EI(x))
This is numerically more stable when EI values are near zero.
Source code in src/psyphy/acquisition/expected_improvement.py
optimize_acqf
¶
optimize_acqf(
acq_fn: Callable[[ndarray], ndarray],
bounds: ndarray,
q: int = 1,
*,
method: Literal["gradient", "random"] = "gradient",
num_restarts: int = 10,
raw_samples: int = 100,
optim_steps: int = 100,
lr: float = 0.01,
key: Any = None,
) -> tuple[ndarray, ndarray]
Optimize acquisition function over continuous domain.
Uses multi-start gradient descent to find global optimum.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
acq_fn
|
callable
|
Acquisition function. Takes (n_points, input_dim) array, returns (n_points,) scores. |
required |
bounds
|
(ndarray, shape(input_dim, 2))
|
Box constraints [[x1_min, x1_max], [x2_min, x2_max], ...] |
required |
q
|
int
|
Batch size (number of points to select) |
1
|
method
|
('gradient', 'random')
|
Optimization method |
"gradient"
|
num_restarts
|
int
|
Number of random restarts for gradient descent |
10
|
raw_samples
|
int
|
Number of random samples to initialize restarts |
100
|
optim_steps
|
int
|
Number of optimization steps per restart |
100
|
lr
|
float
|
Learning rate for gradient descent |
0.01
|
key
|
KeyArray | None
|
PRNG key for random initialization |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
X_next |
(ndarray, shape(q, input_dim))
|
Optimized points |
acq_values |
(ndarray, shape(q))
|
Acquisition values at X_next |
Examples:
Notes
For gradient-based optimization, ensure your acquisition function is differentiable through JAX. Use jax.grad() or jax.value_and_grad().
For non-differentiable acquisition functions, use method="random" or optimize_acqf_discrete() with a candidate grid.
Source code in src/psyphy/acquisition/optimize.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | |
optimize_acqf_discrete
¶
optimize_acqf_discrete(
acq_fn: Callable[[ndarray], ndarray],
candidates: ndarray,
q: int = 1,
) -> tuple[ndarray, ndarray]
Optimize acquisition function over discrete candidate set.
This is the simplest and most common approach for psychophysics, where stimulus space is often discretized.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
acq_fn
|
callable
|
Acquisition function. Takes (n_candidates, input_dim) array, returns (n_candidates,) scores. |
required |
candidates
|
(ndarray, shape(n_candidates, input_dim))
|
Discrete candidate points to evaluate |
required |
q
|
int
|
Batch size (number of points to select) |
1
|
Returns:
| Name | Type | Description |
|---|---|---|
X_next |
(ndarray, shape(q, input_dim))
|
Selected candidate points |
acq_values |
(ndarray, shape(q))
|
Acquisition values of selected points |
Examples:
Source code in src/psyphy/acquisition/optimize.py
optimize_acqf_random
¶
optimize_acqf_random(
acq_fn: Callable[[ndarray], ndarray],
bounds: ndarray,
q: int = 1,
*,
num_samples: int = 1000,
key: Any = None,
) -> tuple[ndarray, ndarray]
Optimize acquisition function via random search.
Simple baseline: sample random points, evaluate, select best.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
acq_fn
|
callable
|
Acquisition function |
required |
bounds
|
(ndarray, shape(input_dim, 2))
|
Box constraints |
required |
q
|
int
|
Batch size |
1
|
num_samples
|
int
|
Number of random samples to evaluate |
1000
|
key
|
KeyArray | None
|
PRNG key |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
X_next |
(ndarray, shape(q, input_dim))
|
Best random samples |
acq_values |
(ndarray, shape(q))
|
Acquisition values |
Examples:
Source code in src/psyphy/acquisition/optimize.py
Base¶
base
¶
base.py
Base protocol for acquisition functions.
Design
We use Protocol (not ABC) for maximum flexibility. An acquisition function is just any callable that: 1. Takes test points X 2. Returns scalar scores (higher = better)
This enables functional composition without inheritance.
Classes:
| Name | Description |
|---|---|
AcquisitionFunction |
Protocol for acquisition functions. |
Attributes:
| Name | Type | Description |
|---|---|---|
AcqFn |
|
AcquisitionFunction
¶
Bases: Protocol
Protocol for acquisition functions.
An acquisition function scores candidate points for selection in adaptive experimental design. Higher scores indicate more valuable points.
Examples:
Notes
We deliberately do NOT use a class hierarchy (unlike BoTorch's AcquisitionFunction base class). Functional composition is simpler and more flexible for research code.
If you need stateful acquisition (e.g., caching), use a callable class: class CachedAcquisition: def init(self, model): self.model = model self._cache = {}
1 2 3 | |
Expected Improvement¶
expected_improvement
¶
expected_improvement.py
Expected Improvement (EI) acquisition function.
The most popular acquisition function for Bayesian optimization. Balances exploration (high uncertainty) and exploitation (high mean).
References
Mockus, J., Tiesis, V., & Zilinskas, A. (1978). The application of Bayesian methods for seeking the extremum. Towards Global Optimization, 2, 117-129.
Functions:
| Name | Description |
|---|---|
expected_improvement |
Expected improvement acquisition function. |
log_expected_improvement |
Log expected improvement for numerical stability. |
expected_improvement
¶
expected_improvement(
posterior: PredictivePosterior,
best_f: float,
maximize: bool = True,
) -> ndarray
Expected improvement acquisition function.
Computes E[max(0, f(x) - best_f)] for each candidate point.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
posterior
|
PredictivePosterior
|
Predictive posterior p(f(X*) | data) |
required |
best_f
|
float
|
Best observed value so far |
required |
maximize
|
bool
|
If True, maximize (higher is better). If False, minimize (lower is better). |
True
|
Returns:
| Type | Description |
|---|---|
(ndarray, shape(n_candidates))
|
EI values for each candidate |
Examples:
Notes
For psychophysics:
- best_f is typically the highest accuracy observed
- Use maximize=True to find points that maximize accuracy
- EI naturally balances exploration (high variance) and
exploitation (high mean)
Mathematical Details
Let μ(x), σ(x) be the posterior mean and std at x. Let u = (μ(x) - best_f) / σ(x) (standardized improvement).
Then: EI(x) = σ(x) * [u * \Phi(u) + arphi(u)]
where \Phi is the standard normal CDF and arphi is the PDF.
When σ(x) = 0 (no uncertainty), EI(x) = max(0, μ(x) - best_f).
Source code in src/psyphy/acquisition/expected_improvement.py
log_expected_improvement
¶
log_expected_improvement(
posterior: PredictivePosterior,
best_f: float,
maximize: bool = True,
) -> ndarray
Log expected improvement for numerical stability.
Useful when EI values span many orders of magnitude.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
posterior
|
PredictivePosterior
|
Predictive posterior |
required |
best_f
|
float
|
Best observed value |
required |
maximize
|
bool
|
If True, maximize. If False, minimize. |
True
|
Returns:
| Type | Description |
|---|---|
(ndarray, shape(n_candidates))
|
log(EI) values |
Examples:
Notes
Since we only care about ranking, log(EI) preserves the order: argmax EI(x) = argmax log(EI(x))
This is numerically more stable when EI values are near zero.
Source code in src/psyphy/acquisition/expected_improvement.py
Mutual Information¶
mutual_information
¶
mutual_information.py
Mutual information (information gain) acquisition function.
Also known as BALD (Bayesian Active Learning by Disagreement). Selects points that maximize information gain about model parameters.
References
Houlsby, N., Huszár, F., Ghahramani, Z., & Lengyel, M. (2011). Bayesian active learning for classification and preference learning. arXiv preprint arXiv:1112.5745.
Functions:
| Name | Description |
|---|---|
binary_entropy |
Binary entropy H(p) = -p log(p) - (1-p) log(1-p). |
mutual_information |
Mutual information between parameters and observations. |
binary_entropy
¶
Binary entropy H(p) = -p log(p) - (1-p) log(1-p).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
p
|
ndarray
|
Probabilities in [0, 1] |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Entropy values |
Source code in src/psyphy/acquisition/mutual_information.py
mutual_information
¶
mutual_information(
param_posterior: ParameterPosterior,
X: ndarray,
probes: ndarray | None = None,
n_samples: int = 100,
key: Any = None,
) -> ndarray
Mutual information between parameters and observations.
Computes I(θ; y | X, data) = H[p(y | X, data)] - E_θ[H[p(y | θ, X)]]
This measures how much we expect to learn about parameters θ from observing response y at location X.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
param_posterior
|
ParameterPosterior
|
Posterior over model parameters p(θ | data) |
required |
X
|
(ndarray, shape(n_candidates, input_dim))
|
Candidate reference stimuli |
required |
probes
|
(ndarray, shape(n_candidates, input_dim) | None)
|
Candidate probe stimuli. Required for discrimination tasks. |
None
|
n_samples
|
int
|
Number of posterior samples for MC approximation |
100
|
key
|
KeyArray | None
|
PRNG key for sampling |
None
|
Returns:
| Type | Description |
|---|---|
(ndarray, shape(n_candidates))
|
Mutual information scores (higher = more informative) |
Examples:
Notes
For psychophysics, mutual information is ideal for: - Threshold estimation: Find stimuli that maximally reduce uncertainty about perceptual thresholds - Model selection: Distinguish between competing perceptual models - Efficient design: Minimize trials needed for desired precision
Computational Cost
Requires n_samples posterior samples and forward passes through the model. For large candidate sets, use optimize_acqf_discrete() for efficiency.
Mathematical Details
Let θ ~ p(θ | data_observed) be the current parameter posterior. Let y be the hypothetical response at candidate X.
Mutual information: I(θ; y | X) = H[p(y | X)] - E_θ[H[p(y | θ, X)]]
where: - H[p(y | X)] is the predictive entropy (uncertainty before observing y) - E_θ[H[p(y | θ, X)]] is the expected conditional entropy (average uncertainty given a parameter sample)
This is approximated via MC: p(y | X) ≈ (1/N) Σ_i p(y | θ_i, X) where θ_i ~ p(θ | data)
BALD Interpretation
BALD (Bayesian Active Learning by Disagreement) selects points where different parameter samples θ_i disagree most about the prediction. High disagreement → high information gain.
Source code in src/psyphy/acquisition/mutual_information.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
Upper Confidence Bound¶
upper_confidence_bound
¶
upper_confidence_bound.py
Upper Confidence Bound (UCB) acquisition function.
Balances exploration and exploitation via a tunable parameter β.
References
Srinivas, N., Krause, A., Kakade, S. M., & Seeger, M. (2009). Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995.
Functions:
| Name | Description |
|---|---|
lower_confidence_bound |
Lower confidence bound (LCB) for minimization. |
upper_confidence_bound |
Upper confidence bound acquisition function. |
lower_confidence_bound
¶
lower_confidence_bound(
posterior: PredictivePosterior, beta: float = 2.0
) -> ndarray
Lower confidence bound (LCB) for minimization.
Alias for upper_confidence_bound(..., maximize=False).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
posterior
|
PredictivePosterior
|
Predictive posterior |
required |
beta
|
float
|
Exploration parameter |
2.0
|
Returns:
| Type | Description |
|---|---|
ndarray
|
LCB values |
Source code in src/psyphy/acquisition/upper_confidence_bound.py
upper_confidence_bound
¶
upper_confidence_bound(
posterior: PredictivePosterior,
beta: float = 2.0,
maximize: bool = True,
) -> ndarray
Upper confidence bound acquisition function.
Computes μ(x) + β * \sigma(x) for maximization. Computes μ(x) - β * \sigma(x) for minimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
posterior
|
PredictivePosterior
|
Predictive posterior p(f(X*) | data) |
required |
beta
|
float
|
Exploration-exploitation trade-off parameter. - β = 0: Pure exploitation (greedy selection) - β = 1: Balanced - β > 2: Aggressive exploration |
2.0
|
maximize
|
bool
|
If True, maximize (higher is better). If False, minimize (lower is better). |
True
|
Returns:
| Type | Description |
|---|---|
(ndarray, shape(n_candidates))
|
UCB values for each candidate |
Examples:
Notes
For psychophysics: - Use β ∈ [1, 3] for typical experiments - Larger β explores uncertain regions more - Smaller β focuses on high-accuracy regions
UCB is often faster to compute than EI (no need for CDF/PDF), but theoretically less well-motivated for finite-sample regret.
Adaptive β
Theoretically, β should grow with number of trials: β_t = sqrt(2 * log(input_dim * t^2 * \pi^2 / 6\delta))
where t is the trial number and delta is a confidence parameter. In practice, a fixed β ∈ [1, 3] works well.
Source code in src/psyphy/acquisition/upper_confidence_bound.py
Optimization¶
optimize
¶
optimize.py
Optimization utilities for acquisition functions.
Provides functional interface for maximizing acquisition functions: - optimize_acqf_discrete: Exhaustive search over candidate set - optimize_acqf: Gradient-based optimization (continuous) - optimize_acqf_random: Random search baseline
Design
Following BoTorch's API: X_next, acq_value = optimize_acqf(acq_fn, bounds, q=1)
But adapted for psyphy: - Support for both continuous and discrete optimization - JAX-based gradient descent (Optax) - Batch acquisition (q > 1) support
Functions:
| Name | Description |
|---|---|
optimize_acqf |
Optimize acquisition function over continuous domain. |
optimize_acqf_discrete |
Optimize acquisition function over discrete candidate set. |
optimize_acqf_random |
Optimize acquisition function via random search. |
optimize_acqf
¶
optimize_acqf(
acq_fn: Callable[[ndarray], ndarray],
bounds: ndarray,
q: int = 1,
*,
method: Literal["gradient", "random"] = "gradient",
num_restarts: int = 10,
raw_samples: int = 100,
optim_steps: int = 100,
lr: float = 0.01,
key: Any = None,
) -> tuple[ndarray, ndarray]
Optimize acquisition function over continuous domain.
Uses multi-start gradient descent to find global optimum.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
acq_fn
|
callable
|
Acquisition function. Takes (n_points, input_dim) array, returns (n_points,) scores. |
required |
bounds
|
(ndarray, shape(input_dim, 2))
|
Box constraints [[x1_min, x1_max], [x2_min, x2_max], ...] |
required |
q
|
int
|
Batch size (number of points to select) |
1
|
method
|
('gradient', 'random')
|
Optimization method |
"gradient"
|
num_restarts
|
int
|
Number of random restarts for gradient descent |
10
|
raw_samples
|
int
|
Number of random samples to initialize restarts |
100
|
optim_steps
|
int
|
Number of optimization steps per restart |
100
|
lr
|
float
|
Learning rate for gradient descent |
0.01
|
key
|
KeyArray | None
|
PRNG key for random initialization |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
X_next |
(ndarray, shape(q, input_dim))
|
Optimized points |
acq_values |
(ndarray, shape(q))
|
Acquisition values at X_next |
Examples:
Notes
For gradient-based optimization, ensure your acquisition function is differentiable through JAX. Use jax.grad() or jax.value_and_grad().
For non-differentiable acquisition functions, use method="random" or optimize_acqf_discrete() with a candidate grid.
Source code in src/psyphy/acquisition/optimize.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | |
optimize_acqf_discrete
¶
optimize_acqf_discrete(
acq_fn: Callable[[ndarray], ndarray],
candidates: ndarray,
q: int = 1,
) -> tuple[ndarray, ndarray]
Optimize acquisition function over discrete candidate set.
This is the simplest and most common approach for psychophysics, where stimulus space is often discretized.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
acq_fn
|
callable
|
Acquisition function. Takes (n_candidates, input_dim) array, returns (n_candidates,) scores. |
required |
candidates
|
(ndarray, shape(n_candidates, input_dim))
|
Discrete candidate points to evaluate |
required |
q
|
int
|
Batch size (number of points to select) |
1
|
Returns:
| Name | Type | Description |
|---|---|---|
X_next |
(ndarray, shape(q, input_dim))
|
Selected candidate points |
acq_values |
(ndarray, shape(q))
|
Acquisition values of selected points |
Examples:
Source code in src/psyphy/acquisition/optimize.py
optimize_acqf_random
¶
optimize_acqf_random(
acq_fn: Callable[[ndarray], ndarray],
bounds: ndarray,
q: int = 1,
*,
num_samples: int = 1000,
key: Any = None,
) -> tuple[ndarray, ndarray]
Optimize acquisition function via random search.
Simple baseline: sample random points, evaluate, select best.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
acq_fn
|
callable
|
Acquisition function |
required |
bounds
|
(ndarray, shape(input_dim, 2))
|
Box constraints |
required |
q
|
int
|
Batch size |
1
|
num_samples
|
int
|
Number of random samples to evaluate |
1000
|
key
|
KeyArray | None
|
PRNG key |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
X_next |
(ndarray, shape(q, input_dim))
|
Best random samples |
acq_values |
(ndarray, shape(q))
|
Acquisition values |
Examples: