SimV3 Bayesian Optimization & Surrogate Modeling

Executive Summary

SimV3 will introduce Gaussian Process (GP) surrogates and Bayesian optimization to enable efficient design space exploration and parameter identification for photovoltaic thermal systems. By constructing probabilistic emulators of the expensive SimV1/SimV2 solvers, SimV3 hopes to reduce the cost of optimization from thousands of HF evaluations to fewer than 100 strategically selected queries, achieving global optima with quantified uncertainty bounds.

Gaussian Process Regression Framework

Probabilistic Model

A Gaussian Process defines a distribution over functions $f: \mathcal{X} \rightarrow \mathbb{R}$, specified by a mean function $m(\mathbf{x})$ and covariance kernel $k(\mathbf{x}, \mathbf{x}')$:

$$f(\mathbf{x}) \sim \mathcal{GP}\left(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')\right)$$

For PV thermal modeling, $\mathbf{x} \in \mathbb{R}^d$ represents design parameters (e.g., encapsulant thickness, thermal conductivity, surface emissivity), and $f(\mathbf{x})$ is the output of interest (e.g., peak temperature, power conversion efficiency).

Posterior Predictive Distribution

Given training data $\mathcal{D} = \{(\mathbf{x}_i, y_i)\}_{i=1}^n$ where $y_i = f(\mathbf{x}_i) + \epsilon_i$, $\epsilon_i \sim \mathcal{N}(0, \sigma_n^2)$, the posterior GP at test point $\mathbf{x}_*$ is:

$$\begin{aligned} \mu(\mathbf{x}_*) &= \mathbf{k}_*^T (\mathbf{K} + \sigma_n^2 \mathbf{I})^{-1} \mathbf{y} \\[10pt] \sigma^2(\mathbf{x}_*) &= k(\mathbf{x}_*, \mathbf{x}_*) - \mathbf{k}_*^T (\mathbf{K} + \sigma_n^2 \mathbf{I})^{-1} \mathbf{k}_* \end{aligned}$$

The posterior variance $\sigma^2(\mathbf{x}_*)$ quantifies epistemic uncertainty (reducible via additional data), distinct from aleatoric noise $\sigma_n^2$.

Kernel Design for PV Systems

Matérn 5/2 Kernel

SimV3 will employ the Matérn 5/2 kernel for its balance between smoothness and flexibility:

$$k_{\text{M52}}(\mathbf{x}, \mathbf{x}') = \sigma_f^2 \left(1 + \frac{\sqrt{5}r}{\ell} + \frac{5r^2}{3\ell^2}\right) \exp\left(-\frac{\sqrt{5}r}{\ell}\right)$$

where $r = \|\mathbf{x} - \mathbf{x}'\|_2$, $\ell > 0$ is the length-scale, and $\sigma_f^2$ is signal variance.

Rationale: Matérn 5/2 produces twice-differentiable samples (vs. infinitely differentiable for SE), making it more robust to discontinuities in PV response surfaces (e.g., bypass diode activation).

Bayesian Optimization Algorithm

Expected Improvement Acquisition Function

Bayesian Optimization (BO) sequentially selects query points by maximizing an acquisition function $\alpha(\mathbf{x})$ that balances exploration (high uncertainty) and exploitation (high predicted performance).

Expected Improvement over current best $f^+ = \max_{i \leq n} y_i$:

$$\text{EI}(\mathbf{x}) = \mathbb{E}\left[\max(0, f(\mathbf{x}) - f^+)\right] = (\mu(\mathbf{x}) - f^+) \Phi(Z) + \sigma(\mathbf{x}) \phi(Z)$$

where $Z = \frac{\mu(\mathbf{x}) - f^+}{\sigma(\mathbf{x})}$, $\Phi(\cdot)$ is the CDF and $\phi(\cdot)$ is the PDF of standard normal distribution.

At each iteration $t$, solve:

$$\mathbf{x}_{t+1} = \underset{\mathbf{x} \in \mathcal{X}}{\arg\max} \, \text{EI}(\mathbf{x} \mid \mathcal{D}_t)$$

References

Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press.
Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. NeurIPS, 25.
Forrester, A. I., Sóbester, A., & Keane, A. J. (2008). Engineering Design via Surrogate Modelling. Wiley.
Saltelli, A., et al. (2010). Variance based sensitivity analysis of model output. Computer Physics Communications, 181(2), 259-270.