1 Introduction
Although fundamental principles of quantum electrodynamics (QED) are known for their precise experimental validations, the implications they purport for sufficiently strong electromagnetic fields remain theoretically intricate and lack experimental data. Colliding accelerated electrons with high-intensity laser pulses can be seen as a newly emerging pathway to such experimental data[ Reference Cole, Behm, Gerstmayr, Blackburn, Wood, Baird, Duff, Harvey, Ilderton, Joglekar, Krushelnick, Kuschel, Marklund, McKenna, Murphy, Poder, Ridgers, Samarin, Sarri, Symes, Thomas, Warwick, Zepf, Najmudin and Mangles 1 – Reference Yakimenko, Alsberg, Bong, Bouchard, Clarke, Emma, Green, Hast, Hogan, Seabury, Lipkowitz, O’Shea, Storey, White and Yocky 4 ]. The local interaction is characterized by the dimensionless ratio of the electron acceleration in its rest frame to the acceleration that would be caused by the Schwinger field ${E}_{\mathrm{crit}}$ :
where $\vec{v},{\gamma}_\mathrm{e}$ are the velocity and Lorentz factor of the electron, respectively, whereas $\vec{E},\vec{B}$ are the electromagnetic field vectors. Here, ${E}_{\mathrm{crit}}={m}_\mathrm{e}^2{c}^3/{q}_\mathrm{e}\mathrm{\hslash}\approx {10}^{18}\ \mathrm{V}/\mathrm{m}$ , where $\mathrm{\hslash}$ is the reduced Planck constant, $c$ is the speed of light and ${m}_\mathrm{e}$ and ${q}_\mathrm{e}$ are the mass and charge of the electron, respectively. At $\chi \ll 1$ the electrons are subject to classical emission and the corresponding radiation reaction[ Reference Gonoskov, Blackburn, Marklund and Bulanov 5 ]. Emission of photons and corresponding recoils at $\chi \sim 1$ are described by nonlinear Compton scattering and have been experimentally observed in several experiments[ Reference Bula, McDonald, Prebys, Bamber, Boege, Kotseroglou, Melissinos, Meyerhofer, Ragg, Burke, Field, Horton-Smith, Odian, Spencer, Walz, Berridge, Bugg, Shmakov and Weidemann 6 – Reference Englert and Rinehart 9 ]. Measuring quantitative properties of the photon emission (e.g., energy, angular or polarization distribution) at $\chi \sim 1$ can be perceived as a logical next step, while results for $\chi \gg 1$ can potentially facilitate theoretical developments or even lead to fundamental discoveries (see Ref. [Reference Fedotov, Ilderton, Karbstein, King, Seipt, Taya and Torgrimsson10] and references therein).
A severe obstacle for the outlined efforts is the interaction complexity. The value of $\chi$ for each electron in the beam varies in time and overall depends on the electron position relative to the laser pulse location, which can also vary from experiment to experiment due to spatio-temporal mismatches. In addition, for contemporary laser pulse durations, many electrons can lose a significant part of their initial energy prior to reaching the strong-field region, where they have a chance to emit at high $\chi$ . Furthermore, due to the Breit–Wheeler process the emitted photons can decay into electron–positron pairs, which can lead to the onset of an electromagnetic cascade. In combination, this can make the measurable post-collision distributions of photons, electrons and positrons be predominantly determined by low- $\chi$ emissions, giving no direct information about emissions at high- $\chi$ , even if they had been invoked.
One known way of dealing with such difficulties is Bayesian binary hypothesis testing, which is based on comparing experimental results with the outcomes computed on the basis of each of two competing theories. However, even in the absence of a distinct hypothesis to be tested, one can use a similar technique to determine parameters that quantify deviations from the approximate theory (sometimes referred to as the parameter calibration procedure[ Reference Ritto, Beregi and Barton 11 – Reference DeJong, Ingram and Whiteman 13 ]), which in our case can be the theory on nonlinear Compton scattering that is valid for moderate $\chi$ values. One practicality of this approach is the possibility to gain statistically rigorous knowledge from many experiments even in case of low repeatability. For example, the inference about high- $\chi$ events is feasible even if the collision between the laser pulse and the electron bunch happens with a spatial offset that varies from collision to collision without being measured.
In this paper we consider the possibility of using the technique of approximate Bayesian computation (ABC) in the forthcoming experiments[ Reference Ritto, Beregi and Barton 11 , Reference Brehmer, Louppe, Pavez and Cranmer 14 , Reference Akeret, Refregier, Amara, Seehars and Hasner 15 ]. Since the application of ABC is known to require problem-specific developments and analysis, we consider some essential elements using a proof-of-principle problem incorporating this method for measuring the constant that quantifies the effective mass shift[ Reference Fedotov, Ilderton, Karbstein, King, Seipt, Taya and Torgrimsson 10 , Reference Yakimenko, Meuren, Gaudio, Baumann, Fedotov, Fiuza, Grismayer, Hogan, Pukhov, Silva and White 16 – Reference Meuren and Di Piazza 18 ]. We assess the use of the ABC technique in the context of possible experimental conditions and analyze the main requirements, difficulties and opportunities for improvements. The paper is arranged as follows. In Section 2 we demonstrate a proof-of-principle approach to infer the effective mass change, assessing the difficulties and limitations. In Section 3 we motivate the use of likelihood-free inference and state the ABC algorithm. Section 4 provides the numerical aspects in simulating the experiment and gives the prospects of the outlined methodology. We make conclusions in Section 5.
2 Problem statement
Our goal is to analyze the ABC method for quantitative studies of the processes described by strong-field quantum electrodynamics (SFQED). For this purpose, we choose to consider the problem of detecting and measuring the extent of the effective mass shift for the electron due to its coupling with the strong-field environment[ Reference Fedotov, Ilderton, Karbstein, King, Seipt, Taya and Torgrimsson 10 , Reference Yakimenko, Meuren, Gaudio, Baumann, Fedotov, Fiuza, Grismayer, Hogan, Pukhov, Silva and White 16 – Reference Meuren and Di Piazza 18 ]. In this section, we first give a rather general description of a potential experiment and then motivate a simplified problem formulation that is sought to serve as a proof-of-principle case. In the following sections we elaborate the application of ABCs and describe how the developed routine can be generalized to deal with realistic experimental conditions.
The presence of a strong background electromagnetic field is conjectured to drive the expansion parameter of QED to ${\alpha}_\mathrm{f}{\chi}^{2/3}$ , where ${\alpha}_\mathrm{f}\approx 1/137$ is the fine-structure constant[ Reference Ritus 17 , Reference Narozhny 19 ]. When ${\alpha}_\mathrm{f}{\chi}^{2/3}\gtrsim 1$ , the theory is rendered nonperturbative. In this domain, electrons and positrons can be thought to acquire an effective mass as a result of radiative corrections. Specifically, one can show that the effective mass of the electron ${\tilde{m}}_\mathrm{e}$ can be estimated to be as follows[ Reference Yakimenko, Meuren, Gaudio, Baumann, Fedotov, Fiuza, Grismayer, Hogan, Pukhov, Silva and White 16 ]:
which implies an effective value of $\chi$ (the mass enters Equation (1) through ${E}_{\mathrm{crit}}$ ):
To benchmark this effect and determine its extent, one can consider the value of ${\theta}^\mathrm{true}=0.84$ as a model parameter $\theta$ to be inferred from the data of repeated experiments:
In general, a chosen parameterization is not necessarily unique and several possibilities can exist. Here, another option is to replace ${\chi}^{2/3}\to {\chi}^{\theta }$ to test the conjectured power-law behavior that causes QED to become nonperturbative.
In replacing the quantities ${m}_\mathrm{e}\to {\tilde{m}}_\mathrm{e},\chi \to \tilde{\chi}$ , the rates of photon emission and pair formation computed from QED being valid at moderate $\chi$ become modified. As for the former, the rate can be written as follows[ Reference Berestetskii, Lifshitz and Pitaevskii 20 , Reference Baier and Katkov 21 ]:
where $\zeta =\frac{2}{3\tilde{\chi}}\frac{\delta }{1-\delta }$ , $\delta =\frac{\mathrm{\hslash}\omega }{{\tilde{m}}_\mathrm{e}{c}^2{\gamma}_\mathrm{e}}$ represents the photon energy with frequency $\omega$ normalized to the emitting electron energy and ${F}_1(x),{F}_2(x)$ denote the first and second Synchrotron functions defined by the following:
in which ${K}_{\nu }(y)$ are the modified Bessel functions of the second kind.
Equation (4) indicates that the dependency of the emission rate (Equation (5)) on $\theta$ becomes weaker with the decrease of $\chi$ and we would ideally need to reach values of the order of $\chi \sim {\alpha}_\mathrm{f}^{-3/2}\approx 1600$ to have a prominent dependency. Since such values are currently unattainable, it is clear that achieving as large values of $\chi$ as possible can be crucial. The collision of tightly focused laser pulses with accelerated electron bunching provides a promising layout for these experimental efforts. The use of optimal focusing, provided by the so-called bi-dipole wave, can yield values of $\chi$ as large as $5.25\left(\varepsilon /\left(1\;\mathrm{GeV}\right)\right){\left(P/\left(1\;\mathrm{PW}\right)\right)}^{1/2}\left(\left(1\;\unicode{x3bc} \mathrm{m}\right)/\lambda \right)$ , where $\varepsilon$ is the energy of electrons, whereas $P$ and $\lambda$ are the laser power and wavelength, respectively[ Reference Olofsson and Gonoskov 22 ]. Nevertheless, bi-dipole focusing as well as any tight focusing implies a spatial localization of the strong-field region to a size of approximately $\lambda$ that can be comparable or smaller than the typical shot-to-shot variation of the relative electron beam location provided by current alignment capabilities. This leads to a reasonable question: to what extent can information about $\theta$ be retrieved despite a large variation of experimental outcomes due to the varied and unmeasured offset between the laser pulse and electron bunch inherent in current experiments? This difficulty is further complicated by the probabilistic nature of emissions and the dominance of the signal produced by emissions at low values of $\chi$ . We try to assess the prospects of overcoming these difficulties by the use of ABCs and to do so we consider the problem of retrieving information about $\theta$ from a sequence of measured experimental outcomes assuming that there is an unknown offset for each measurement.
To focus on the outlined difficulty, we make a number of simplifications, while a more realistic problem formulation is to be considered in future works. Firstly, we neglect the generation of pairs, which in practice can be suppressed by the use of short laser pulses[ Reference Rivas, Borot, Cardenas, Marcus, Gu, Herrmann, Xu, Tan, Kormin, Ma, Dallari, Tsakiris, Földes, Chou, Weidman, Bergues, Wittmann, Schröder, Tzallas, Charalambidis, Razskazovskaya, Pervak, Krausz and Veisz 23 ] and/or the high energy of electrons. We also consider a circularly polarized laser pulse and the use of the angular-energy distribution of produced photons, because in this case high-energy photons emitted toward the directions that deviate the most from the collision direction are predominantly produced at large values of $\chi$ [ Reference Olofsson and Gonoskov 22 ]. Although the deviation angle is used for our diagnostics, we still assume that the high energy of electrons makes this angle sufficiently small so that the instantaneous transverse deviations of electrons are much less than $\lambda$ and thus do not affect the observed field amplitude. Moreover, we assume that the laser pulse is sufficiently short not to cause multiple emissions, which means that we assume all electrons to have their initial energy throughout the interaction. Finally, we need to account for the presence of an unknown spatio-temporal offset between the electron beam and focused laser field for each collision. Since such a mismatch leads to a reduced field amplitude observed by the electrons, we choose to model these variations by assuming that the electron beam propagates through a 1D laser pulse with an unknown amplitude that varies from collision to collision. In what follows, we detail this model of hypothetical experiments.
In defining the model of the experiment, we simulate a single electron of momentum ${p}_z=-{m}_\mathrm{e} c\gamma$ to interact with a plane wave laser pulse having circular polarization. The corresponding vector potential and electric field amplitudes are given by the following:
where $\xi =z- ct$ is the moving coordinate, ${A}_0$ is the peak vector potential amplitude, $k$ and $L$ are the wavenumber and pulse length of the laser, respectively, and $\Pi (x)$ is defined as a function equating to unity when $\mid x\mid <1/2$ and zero otherwise. The electromagnetic field amplitude is measured in dimensionless units ${a}_0=\frac{q_\mathrm{e}{A}_0}{m_\mathrm{e} ckL}$ and $0\le d\le 1$ expresses the uncontrollable misalignment mentioned previously and is referred to as a latent parameter.
The inclusion of the latent parameter $d$ becomes the primary reason to require the angular-energy spectra of photons as opposed to the energy distribution alone. Indeed, from Equations (1) and (4) one can see that the same value of $\chi$ (which completely determines radiation) can be achieved by various combinations of $d$ and $\theta$ . For instance, the case of ( $\theta ={\theta}^\mathrm{true}$ ; $d=0$ ) yields the same value for $\chi$ as the combination of $\theta =0$ and the following:
Because of this degeneracy, inferring the value of $\theta$ from the energy spectrum alone is impossible.
Let us now analyze the implications of including the angular part of the spectrum. Firstly, consider the transverse motion of an electron in a plane electromagnetic wave[ Reference Landau 24 ]:
where ${\vec{p}}_{\perp }$ and ${\vec{A}}_{\perp }$ denote the transverse components of the electron momentum and vector potential, respectively. Consequentially, at each instance of time, the electron propagates toward the direction that deviates from the initial direction by an angle $\alpha$ :
where we assume that the motion remains highly relativistic. Evidently, emitted photons retain this angle and, given the circular polarization of the wave, this becomes correlated to the value of $\chi$ [ Reference Olofsson and Gonoskov 22 ]. We assume that in the case of highly relativistic motion with $\alpha \ll 1$ , the change of effective mass ( $\theta \ne 0$ ) does not lead to the change of ${\vec{p}}_z$ before the emission, while ${\vec{p}}_{\perp }$ is totally defined by the vector potential according to Equation (10). Following Equation (11), this means that the value of $\theta$ does not affect $\alpha$ . On the contrary, the increase of $d$ implies a diminished electromagnetic field amplitude perceived by the electrons, which leads to smaller angles $\alpha$ . Note that our idea here is to break the described degeneracy, which does not necessarily require $\alpha$ being independent of $\theta$ (other cases can be analyzed in a similar way).
Having discussed the necessity for learning from both the energy and angle of emitted photons we define the measured data of our model experiment as a fractional energy distribution ${\partial}^2E/\partial \delta \partial \alpha$ , which is a function of normalized photon energy $\delta$ and deviation angle $\alpha$ . For the computational implementation this function is represented by a matrix ${x}_{\mathrm{obs}}$ , of which the elements ${x}_{\mathrm{obs}}^{(k)(l)}$ can be computed from the result of a simulation by counting the total energy of photons belonging to each cell: $k\Delta \delta <\delta \le \left(k+1\right)\Delta \delta;\ l\Delta \alpha <\alpha \le \left(l+1\right)\Delta \alpha$ , where the choice of cell sizes $\Delta \delta$ and $\Delta \alpha$ is a matter of a trade-off between the level of noise and resolution. The problem formulation is then to infer the value of $\theta$ from a sequence of measured ${x}_{\mathrm{obs}}$ despite unknown $d$ in each case.
3 Methodology
We now establish the methodology of ABC sampling using a general problem formulation in which Bayesian statistics is employed for making inferences based on the results from repeated experiments. Starting from Bayes’ theorem, the problem of intractable likelihoods is discussed and the appeal for ‘likelihood-free’ methods is introduced by considering standard rejection sampling. By reformulating the rejection step by imposing an exact agreement between simulated data and that of experiments, one obtains a likelihood-free technique. We shall see that this suffers from the curse of dimensionality and through dimensionality reduction techniques we will arrive at ABC sampling. Lastly, we consider two additional improvements for ABC sampling with the aim to accelerate convergence.
Let us start from considering the task of characterizing a probabilistic process (for instance, nonlinear Compton scattering with an effective electron mass shift) by carrying out experiments. Each experiment yields measurement data ${x}_{\mathrm{obs}}$ . We have a model $M\left(\theta, d\right)$ that gives predictions $x=M\left(\theta, d\right)$ for this data for any given value of a model parameter $\theta$ and a latent parameter $d$ . Here, $\theta$ is a fundamental parameter that quantifies the process itself and thus its unique value is of interest, whereas $d$ denotes an unmeasured parameter that can vary from experiment to experiment and determines the outcome $x$ in accordance with model $M$ . We assume that there exists a value of $\theta$ for which the model describes (to some extent) observations given an appropriate value of $d$ for each experiment. Our task is to infer the probability distribution for the value of $\theta$ from a series of repeated experimental measurements. Put differently, the objective is to infer the most probable range for $\theta$ given the observed data ${x}_{\mathrm{obs}}$ . Bayesian statistics provides a framework for the outlined problem. The probability distribution to be determined is referred to as a posterior distribution $p\left(\theta |{x}_{\mathrm{obs}}\right)$ , which explicitly indicates the data ${x}_{\mathrm{obs}}$ used for making the inference. Let us start from the case of no latent parameter. The posterior can then be calculated using Bayes’ theorem:
where $p\left(\theta \right)$ quantifies the prior knowledge about possible values of $\theta$ , the likelihood $p\left({x}_{\mathrm{obs}}|\theta \right)$ conveys how likely a measurement yielding ${x}_{\mathrm{obs}}$ is for a given $\theta$ and $p\left({x}_{\mathrm{obs}}\right)=\int p\left({x}_{\mathrm{obs}}|\theta \right)p\left(\theta \right)\partial \theta$ appears as a normalizing factor. To incorporate the dependence on the latent parameter we integrate over all its possible values, denoting $p\left({x}_{\mathrm{obs}}|\theta, d\right)$ as the corresponding joint likelihood:
where $p(d)$ specifies prior knowledge related to values of the latent parameter $d$ . Now we can sequentially account for all observations, each time using the obtained posterior as the prior for processing the next observation. Note that we do not update the prior for $d$ because its value is assumed to be different in all the experiments.
A closed form of the posterior rarely exists and numerical approaches are often used. A common strategy is to approximate the posterior by collecting a finite number of samples from it. Methods such as importance sampling, Markov chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC)[ Reference Tokdar and Kass 25 – Reference Brooks, Gelman, Jones and Meng 27 ] are prevalent choices. However, all of the above will require direct evaluation of the likelihood, which can be computationally prohibitive for highly dimensional datasets[ Reference Sisson, Fan and Beaumont 28 ]. If the model $M$ is implicitly defined through a computer simulation, its concomitant likelihood can be intractable[ Reference Brehmer, Louppe, Pavez and Cranmer 14 ]. A remedy is offered by the rapidly developing field of simulation-based inference[ Reference Cranmer, Brehmer and Louppe 29 ] in which the direct calculation of the likelihood is averted. To motivate its use we adopt and develop the discussion made in Ref. [Reference Sisson, Fan and Beaumont28].
Consider the standard rejection sampling algorithm with the goal of sampling a target density $T\left(\theta \right)$ provided some auxiliary sampling density $A\left(\theta \right)$ with the requirement $A\left(\theta \right)>0\;\mathrm{if}\;T\left(\theta \right)>0$ . Then, the algorithm reads as follows.
Algorithm 1. Standard rejection sampling algorithm.
-
1: Sample a proposal ${\theta}^{\ast}\sim A\left(\theta \right).$
-
2: Admit the proposal with a probability of $\frac{T\left({\theta}^{\ast}\right)}{CA\left({\theta}^{\ast}\right)},$ where $C\ge \mathrm{argmax}\left(\frac{\mathrm{T}\left(\theta \right)}{\mathrm{A}\left(\theta \right)}\right)$ .
-
3: If ${\theta}^{\ast }$ was not admitted, discard the proposal and repeat steps (1) and (2) as many times as necessary.
After $N$ trials, a collection of samples from $T\left(\theta \right)$ is obtained. The connection to Bayesian statistics is made by selecting $T\left(\theta \right)=p\left(\theta |{x}_{\mathrm{obs}}\right)$ and $A\left(\theta \right)=p\left(\theta \right)$ . Then, Equation (12) states that the acceptance rate appearing in Algorithm 1 becomes proportional to the likelihood $\frac{p\left({\theta}^{\ast }|{x}_{\mathrm{obs}}\right)}{p\left({\theta}^{\ast}\right)}\propto p\left({x}_{\mathrm{obs}}|{\theta}^{\ast}\right)$ , which is incalculable by our premise. Still, it is possible to determine whether or not to accept proposals without explicit computation of the likelihood. To demonstrate this, we first note that the model $M\left(\theta, d\right)$ is capable of generating samples of observations $x\sim p\left({x}_{\mathrm{obs}}|\theta, d\right)$ provided values of $\theta$ and $d$ . Then, the key aspect is to recognize that the probability to produce $x={x}_{\mathrm{obs}}$ coincides with $p\left({x}_{\mathrm{obs}}|\theta, d\right)$ . Put differently, calculating the acceptance rate from the likelihood can be replaced by enforcing an exact match between the datasets, meaning that a likelihood-free approach has been found. Therefore, the rejection step of Algorithm 1 is reformulated to read as follows.
Algorithm 2. Likelihood-free rejection sampling.
-
1: Sample proposals ${\theta}^{\ast}\sim p\left(\theta \right)$ , ${d}^{\ast}\sim p(d)$ .
-
2: Generate data ${x}^{\ast}=M\left({\theta}^{\ast },{d}^{\ast}\right)$ from the model.
-
3: If ${x}^{\ast}={x}_{\mathrm{obs}}$ , the proposal is admitted; if not, it is discarded.
-
4: Repeat steps (1)–(3) as many times as necessary.
While avoiding direct computation of the likelihood, demanding ${x}^{\ast}={x}_{\mathrm{obs}}$ introduces a notable impediment. To illustrate it, consider the binning of data from an experiment into $\dim \left({x}_{\mathrm{obs}}\right)=B$ bins such that the following are obtained:
where ${c}_b,{c}_b^{\prime}\in \mathrm{\mathbb{Z}}$ denote integer counts belonging to the $b$ th bin. Then, denote ${p}_b$ as the probability to coincide ${c}_b={c}_b^{\prime }$ at bin $b$ , assuming that this is independent between bins. Then, the probability to accept a proposal ${\theta}^{\ast }$ becomes as follows:
which approaches zero in the limit of highly dimensional datasets, $B\to \infty$ . The acceptance rate becomes even lower or infeasible for continuous data in which ${c}_b,{c}_b^{\prime}\in \mathrm{\mathbb{R}}$ are real numbers. Hence, the appeal for a precise match has to be relieved in making the sampling efficiency practical. Realizing that this rate becomes significantly higher by admitting samples if $x\approx {x}_{\mathrm{obs}}$ prompts us to define a rule when data are sufficiently close:
where $\left\Vert \cdot \right\Vert$ is a suitable distance metric and $\epsilon$ is a threshold. Accepted samples in accordance with Equation (17) are inevitably drawn from an approximate posterior $p\left(\theta |{x}_{\mathrm{obs}}\right)$ and its accuracy is solely dictated by $\epsilon$ , which also affect the sampling efficiency. Rejection based on Equation (17) provides a notable improvement, but high dimensionality remains an issue. Consider the aforementioned example with an Euclidean distance metric so that Equation (17) reads as follows:
and examine the favorable case in which ${c}_b-{c}_b^{\prime}\sim {\Delta}_c\ll 1$ varies negligibly between bins. We can then naively state Equation (18) as follows:
Evidently, Equation (19) states that the dimension of ${x}_{\mathrm{obs}}$ is bounded from above by the threshold $\epsilon$ and the error ${\Delta}_c$ . However, for the quality of inference $\epsilon \to 0$ is desired, which puts a stringent limit on the dimensionality of ${x}_{\mathrm{obs}}$ . The solution, being the central component of ABCs, is to introduce so-called summary statistics:
being a function that transforms data of a potentially noisy nature into a vector of indicative characteristics sought to unambiguously characterize the data with respect to all possible $\theta$ . Clearly, the dimensionality $\beta$ of the space of such vectors can be much less than the number of cells $B$ . Moreover, the function of summary statistics can even be defined in an agnostic way with respect to the binning choice. As an example, one could construct a vector containing the sample mean $\mu$ and variance ${\sigma}^2$ of ${x}_{\mathrm{obs}}$ : $S\left({x}_{\mathrm{obs}}\right)=\left(\mu, {\sigma}^2\right)$ .
By converting ${x}_{\mathrm{obs}}\to S\left({x}_{\mathrm{obs}}\right)$ , the third step of Algorithm 2 can be reformulated to accept samples if the following applies:
Note that the mapping ${x}_{\mathrm{obs}}\to S\left({x}_{\mathrm{obs}}\right)$ into a vector of summary statistics is always possible in terms of reducing the data dimension. The question is how well the chosen vector of summary statistics retains the information contained in ${x}_{\mathrm{obs}}$ , which is discussed later in Section 4.
We now have a methodologically accurate and in some cases practically feasible routine for sampling the posterior. However, we shall examine two more improvements for accelerating the sampling convergence. Firstly, note that Equation (21) implies an acceptance probability of either zero or one without accounting for how close the match is. To enhance the contribution of the cases yielding a more accurate agreement relative to the ones giving a marginal agreement, one can use a so-called kernel function:
which defines a probability transition from one in case of a perfect match ( ${K}_{\epsilon}(0)=1$ ) to zero in cases of deviation by the summary-statistics distance of order $\epsilon$ and greater.
The second improvement concerns the fact that Algorithm 1 either accepts or rejects cases, which means that many accepted cases are needed to mitigate the noise related to this additional probabilistic element in the algorithm. Effectively, this means that we marginally benefit from cases of low acceptance probability. To avoid this, one can instead interpret the acceptance probability as the weight of samples, thereby accounting for all the proposals that yield non-zero acceptance probability.
We can now return back to the inclusion of the latent variable $d$ . In this case, we can generate several proposals ${d}^{\ast}\sim p(d)$ based on our prior knowledge of it and again accept the cases of good enough matches based on the outlined procedure. Effectively, we try to guess $d$ using as many attempts as needed. Finally, we note that we can sequentially update our posterior using each ${x}_{\mathrm{obs}}$ in a sequence of measurements. To do so, we can compute the posterior for each new measurement using the previous posterior as the prior. The algorithm for processing the $i$ th observation ( $i=1$ denotes the first measurement in the sequence) of ${x}_{\mathrm{obs}}^i$ for computing the posterior $p\left(\theta\, |\ {x}_{\mathrm{obs}}^i,{x}_{\mathrm{obs}}^{i-1},\dots, {x}_{\mathrm{obs}}^1\right)$ from the previous $p\left(\theta |{x}_{\mathrm{obs}}^{i-1},\dots, {x}_{\mathrm{obs}}^1\right)$ then takes the following form.
Algorithm 3. ABC sampling with a latent variable.
-
1: Sample proposals ${\theta}^{\ast}\sim p\left(\theta |{x}_{\mathrm{obs}}^{i-1},\dots, {x}_{\mathrm{obs}}^1\right)$ , ${d}^{\ast}\sim p(d)$ .
-
2: Perform a simulation, retrieve ${x}^{\ast}=M\Big({\theta}^{\ast }$ , ${d}^{\ast}\Big)$ and compute the weight:
(23) $$\begin{align}{w}^{\ast}=\frac{K_{\varepsilon}\left(\parallel S\left({x}_{\mathrm{obs}}^i\right)-S\left({x}^{\ast}\right)\parallel /\epsilon \right)}{p\left({\theta}^{\ast }|{x}_{\mathrm{obs}}^{i-1},\dots, {x}_{\mathrm{obs}}^1\right)p\left({z}^{\ast}\right)}.\end{align}$$ -
3: If ${w}^{\ast }>0$ , accept the proposal with the computed weight.
-
4: Repeat steps (1)–(3) as many times as needed to approximate the posterior $p\left(\theta |{x}_{\mathrm{obs}}^i,\dots, {x}_{\mathrm{obs}}^1\right)$ .
In practice, one central difficulty of the ABC routine is choosing valid summary statistics, that is, summary statistics that differentiate all the cases in terms of $\theta$ and $d$ . This means that summary statistics do not yield close states for any two different pairs of $\theta$ and $d$ . Clearly, if this is not the case the procedure admits the acceptance of cases of wrong ${\theta}^{\ast }$ when ${d}^{\ast }$ provides a compensation to make $S\left({x}^{\ast}\left({\theta}^{\ast },{d}^{\ast}\right)\right)$ close to $S\left({x}_{\mathrm{obs}}\left({\theta}^\mathrm{true},{d}^\mathrm{true}\right)\right)$ . This can totally preclude the convergence of the ABC sampling procedure. Finding robust summary statistics is known to be a problem-dependent task that requires analysis of possible cases. In the next section we determine valid summary statistics for the proof-of-principle problem as well as outlining the simulation routines.
4 Analysis
With respect to the simulation, the plane wave pulse is designated by a wavelength of $\lambda =0.8\ \unicode{x3bc} \mathrm{m}$ , pulse length $L=6\lambda$ and peak amplitude ${a}_0=100$ (excluding the factor of $1-d$ ). Electrons are assigned an initial energy of $170$ GeV ( ${\gamma}_\mathrm{e}\sim {10}^5$ ) situated a distance ${z}_{\mathrm{s}}=5\lambda$ from the origin (the numerical layout can be seen in Figure 1). Both the electron and pulse are allowed to counter propagate for ${N}_t=100$ time steps $\Delta t=\frac{\left(L+{z}_{\mathrm{s}}/2\right){c}^{-1}}{N_t}$ . Here, $x\left(\delta, \delta +\Delta \delta, \alpha, \alpha +\Delta \alpha \right)$ is discretized into a $100\times 100$ grid of cells $x\left(m\Delta \delta, n\Delta \alpha \right)$ each with size $\Delta \delta \times \Delta \alpha$ and $m,n=0,1,2,\dots, 99$ . At each time step $q$ , Equations (10) and (11) are used to estimate the index $n\approx \alpha /\Delta \alpha$ . Then, for each $m$ we accumulate the following:
where we have suppressed the arguments of $x$ for readability and the subscripts denote the time step. In this analysis, we perform blind tests of the simulated cases against an ‘experiment’ ${x}_{\mathrm{obs}}=M\left({\theta}^\mathrm{true},d\right)$ that serves as a ground truth. Here, the $\theta$ value is fixed to ${\theta}^\mathrm{true}=0.84$ but the latent variable $d$ varies randomly between experiments.
It was concluded in Section 2 that the fractional energy distribution ${x}_{\mathrm{obs}}$ needs to be measured in order to infer the effect of the effective mass shift. For this to serve as input to ABC sampling, corresponding summary statistics must be chosen to map ${x}_{\mathrm{obs}}$ onto a smaller subspace while still retaining the information necessary to separate the effects of $\theta$ and $d$ . There might exist many configurations that achieve this given the broad definition of Equation (20). However, it is also beneficial to select a mapping that is tolerant (insensitive) to the probabilistic random variations of the experimental outcome. To identify some robust and simple enough options we consider moments of the 2D data ${x}_{\mathrm{obs}}$ to orders $i$ and $j$ :
Now, let us try to select a set of moments such that any combination $\left(\theta, d\right)$ maps to a unique value of this set. To select a valid option we analyze the dependency of several of the lowest distribution moments on $\theta$ and $d$ . Figure 2 illustrates contours of four distinct moments ${M}_{ij}$ in the parameter space of $\theta$ and $d$ . The set of moments in Figure 2(a) is a practical choice as the contours are not parallel anywhere, suggesting a unique pair for every $\theta$ and $d$ . In contrast, Figure 2(b) depicts a scenario when the contours become parallel at several points in the parameter space, meaning that the values of the plotted moments do not unambiguously indicate a single pair of $\theta$ and $d$ . For instance, the contour lines that intersect the x-axis at $\theta \approx 1.25$ imply that the values for $\left({M}_{01},{M}_{11}\right)$ are almost identical along this line, which holds for many pairs $\left(\theta, d\right)$ . We conclude that selecting $S\left({x}_{\mathrm{obs}}\right)=\left({M}_{00},{M}_{12}\right)$ is a valid choice to proceed with ABC sampling.
Note that for a larger number of model parameters (representing both theory inquiries and latent parameters of collisions) one would need to inspect the dependency of summary statistics in a multidimensional space, which can be unfeasible. In this case one might need physics insights to make an educated strategy to disentangle specific types of degeneracy. In the considered case we can interpret the identified choice through the following logic. The value of ${M}_{00}$ indicates the overall rate of emission that does not differentiate the effect of $\theta$ and $d$ . Nevertheless, the value of ${M}_{12}$ reflects the change of the spectrum shape (mean energy) with angle $\alpha$ . This is exactly what is needed because, due to circular polarization, the emissions with large $\alpha$ happen exclusively at large values of $\chi$ (for details, see Ref. [Reference Olofsson and Gonoskov22]) and have a shape altered due to the modification of $\chi$ (see Equation (4)) to the extent defined by $\theta$ .
Let us now continue the analysis for the chosen summary statistics. We make the following choice of priors over $\theta$ and $d$ :
where $\mathcal{U}\left(a,b\right)$ denotes the uniform distribution with lower and upper bounds $a$ and $b$ , respectively. Although there is no prior knowledge apart from $\theta \ge 0$ and $0\le d\le 1$ , we argue that the given simulation parameters yield ${\chi}_0\approx 100$ and a value of $\theta =150$ would reduce the value of ${\tilde{\chi}}_0$ below one, approaching a classical description. For each round of sampling, the prior is replaced by the obtained posterior as described in Algorithm 3. For instance, the new prior obtained from the first observation (and those following it) is modeled as a normal distribution $\pi \left(\theta \right)\to p\left(\theta |{x}_{\mathrm{obs}}^1\right)\approx \mathcal{N}\left({\mu}_{\theta },{\sigma}_{\theta}\right)$ , where ${\mu}_{\theta }$ and ${\sigma}_{\theta }$ are the weighted mean and standard deviation of the accepted samples, respectively. As for the prior over $d$ , one could construct a prior from empirical values obtained in a real experiment. Lacking this option, we assume that the amplitude can vary at most by $50\%$ . Allowing $d$ to vary as widely as $100\%$ will not affect the final result. Instead, the sampling efficiency will be reduced as larger values of $d$ are allowed, potentially affecting the dataset to the extent that proposals are rejected more often.
During sampling, the following distance is calculated to discriminate between observations:
where ${d}_{ij}=\mid 1-\frac{M_{ij}^{\mathrm{sim}}}{M_{ij}}\mid$ (not to be confused with the latent parameter) in which the superscript labels moments evaluated from simulated data. A uniform kernel ${K}_{\varepsilon}\left(\cdot \right)=\Pi \left(\cdot \right)$ is chosen as well as a threshold that is progressively decreased from $\epsilon =1.1$ to $\epsilon =0.065$ over the course of the processed observations ${x}_{\mathrm{obs}}$ included in the analysis. Here, the choice of $\epsilon$ is manually selected to yield a narrower posterior for each processed observation ${x}_{\mathrm{obs}}$ while maintaining a feasible sampling duration. For every $50$ th proposal ${\theta}^{\ast }$ we generate new observed data ${x}_{\mathrm{obs}}$ as to not bias the result toward the existing value of ${d}^{\ast}\sim p(d)$ .
In Figure 3 we present the result of sampling the posterior based on the described ABC routine after processing $i=\mathrm{1,5,8}$ observations ${x}_{\mathrm{obs}}$ . Each observation corresponds to the simulated outcome of a single collision experiment with unknown value of $d$ and fixed $\theta ={\theta}^\mathrm{true}$ . The fact that the accepted samples are distributed around the actually selected value of ${\theta}^\mathrm{true}=0.84$ indicates the claimed capability of the method. Attaining a narrower posterior can be achieved by reducing the threshold $\epsilon$ .
Having discussed the use of ABCs via the selected proof-of-principle case, we can outline further developments that could help to deal with real experiments. The first direction of developments concerns the elaboration of an appropriate model of experiments, which implies a relevant 3D geometry of interaction with an account of unmeasured variations via an extended list of latent parameters, such as all spatio-temporal offsets, as well as parameters quantifying both the electron bunch and the focused laser pulse. The second direction of developments concerns computational methods to deal with the chosen geometry of the experiment, as well as accounting for all physical processes of relevance, such as the generation of electron–positron pairs and the interaction between particles and photons. Efficient use of supercomputer resources can be crucial to compensate for the increase of computational demands in relation to both the complexity of simulations and the necessity to obtain a meaningfully narrow posterior using a large number of collisions. The third direction of developments concerns improvements related to the use of the ABC itself. This may include a better choice of summary statistics, a modification of diagnostics or the experiment layout, as well as reducing the rejection rate by employing machine learning methods for an informed generation of proposals. Finally, the fourth direction of developments concerns the very formulation of theoretical questions, which takes the form of determining models and parameters to be inferred from experiments.
5 Conclusions
We have considered prospects for an experiment capable of inferring a parameter $\theta$ that signifies deviations from nonlinear Compton scattering via the notion of effective mass in the regime $\chi \gg 1$ . The results propel the strategies necessary to incorporate ABC sampling in analogous experiments, scalable to the inclusion of several model parameters ( $\theta$ ), as well as latent parameters that are sought to account for unmeasured shot-to-shot variations in the conditions of collisions. An improved implementation of the interaction will be needed for designing future experiments. This can be done by, for example, simulating a realistically focused laser pulse, devising a more comprehensive description via latent parameters and accounting for electromagnetic cascades. Carrying it out might pose an increased computational load that can be mitigated by further developments. The convergence can be accelerated by further investigating additional summary statistics, nonuniform kernels and the use of machine learning to suggest better proposals. In addition, the use of high-performance computing to recruit many ABC samplers in parallel can alleviate both impairments.
Acknowledgements
The authors acknowledge support from the Swedish Research Council (Grant No. 2017-05148 and No. 2019-02376). The computations were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at Tetralith, partially funded by the Swedish Research Council through grant agreement No. 2022-06725. The authors would like to thank Tom Blackburn for useful discussions.