Impact statement
Modeling and solving complex systems based on Partial Differential Equations (PDE) and machine learning (ML) can unlock a new era of robust and reliable forecasting for complex systems such as wildfires, epidemics, and climate change. This interdisciplinary approach promises to revolutionize scientific discovery and empower effective decision-making in critical areas.
1. Introduction
“As all of you are, I am concerned about the world in which we are going to live tomorrow, a world in which a new machine, the digital computer, may be of even greater importance than the atomic bomb.” These are the words of screen actor David Wayne in a chat with Professor Jerome Wiesner (MIT) on developments in Artificial Intelligence (AI) that aired on the US television network CBS in 1961 (it can be found at this link https://techtv.mit.edu/videos/10268-the-thinking-machine-1961---mit-centennial-film). This prediction proved prescient, as 30 years later, following the ban on system-level nuclear testing, computer simulations were proposed as a viable alternative to physical bomb detonations. Simulations have indeed become a key tool in the U.S. nuclear stockpile maintenance program. In the 30 years since the ban, researchers around the world have raced to make such simulations a reality, leading to immense progress in simulation capabilities of complex systems across scientific domains. A key enabler has been the exponential growth in transistor density on microchips, roughly doubling every two years, which has significantly increased computational power while reducing relative costs. This growth, described as Moore’s law, forged new connections between disciplines and laid the foundations of computational science, which is now widely recognized today as the third pillar of scientific inquiry, along with theory and experiments. Today, predictive simulations with quantified uncertainties of critical phenomena described by PDEs, such as climate and epidemics, are becoming a key element in human decision-making and engineering design.
However, many experts believe that Moore’s law is nearing its end, or at least slowing down, due to physical and economic constraints (Shalf, Reference Shalf2020). There are major concerns about the energy demands of computers and the veracity and reliability of computer simulations. At the same time, new hope for modeling and predicting complex systems is emerging, driven by recent advances in Machine Learning (ML) and AI. The idea of using artificial intelligence (AI) for modeling complex systems has a long history as a part of the semi-forgotten field of cybernetics (Wiener, Reference Wiener1950; Tsetlin, Reference Tsetlin1961; Ivanenkho and Lapa, Reference Ivanenkho and Lapa1967). However, what is novel in our times is the trillion-fold increase in hardware capabilities and the dizzying pace of acquiring, transmitting, and processing massive amounts of data. Despite these unprecedented capabilities, there is broad consensus that resolving the full range of spatiotemporal scales in complex systems such as turbulence, ocean circulation, and cloud dynamics will remain out of reach for the foreseeable future (Palmer, Reference Palmer2015). An alternative is reduced-order models (ROMs) that may bypass the need for expensive simulations. ML and, more broadly, AI methodologies can be invaluable tools for constructing data-driven ROMs that may also serve as companions to expensive simulations. Indeed, as we are experiencing unprecedented capabilities of computers, new forms of computational thinking emerge that were unimaginable only a couple of decades ago. One example is Digital Twins, which have become the epitome and frontier of our computing capabilities. DTs play an essential role in the intricate tasks of designing, constructing, operating, and maintaining complex systems across disciplines ranging from medicine to aircraft design (Glaessgen and Stargel, Reference Glaessgen and Stargel2012; Venkatesh et al., Reference Venkatesh, Raza and Kvedar2022; Ferrari and Willcox, Reference Ferrari and Willcox2024). DTs are central to decision-making frameworks, which often rely on designs tailored to specific cases and may lack the flexibility needed to handle situations with limited data or unclear system dynamics. We believe that there are open horizons in the embedding of DTs within decision-making frameworks. We envision probabilistic DTs that go beyond PDEs but employ multifidelity models and machine learning, while adhering to physical constraints and their predictive uncertainties that are continuously updated with available data over different time horizons to support diverse decision-making scenarios. At the same time, new concepts such as foundation models and generative AI open up new vistas for DTs and beyond that need to be explored (Bommasani et al., Reference Bommasani, Hudson, Adeli, Altman, Arora, von Arx, Bernstein, Bohg, Bosselut and Brunskill2021; Jacobsen et al., Reference Jacobsen, Zhuang and Duraisamy2023; Bodnar et al., Reference Bodnar, Bruinsma, Lucic, Stanley, Brandstetter, Garvan, Riechert, Weyn, Dong, Vaughan, Gupta, Thambiratnam, Archibald, Wu, Heider, Welling, Turner and Perdikaris2024; Li et al., Reference Li, Carver, Lopez-Gomez, Sha and Anderson2024; Price et al., Reference Price, Sanchez-Gonzalez, Alet, Andersson, El-Kadi, Masters, Ewalds, Stott, Mohamed, Battaglia, Lam and Willson2024; Gao et al., Reference Gao, Kaltenbach and Koumoutsakos2024a). The perspective of researchers familiar with PDEs can be invaluable in elucidating the capabilities and limitations of foundational models.
In this article, I share a personal perspective shaped by 25 years of experience (Müller et al., Reference Müller, Milano and Koumoutsakos1999; Brunton et al., Reference Brunton, Noack and Koumoutsakos2020) in solving complex PDE problems using scientific computing and machine learning, discussing successes, failures, and the balance between hype and hope at the ML-PDE interface. With all due respect to the numerous colleagues who have influenced my work, this perspective is primarily reflected in the publications of our group in Neural Networks (NN) for evolving systems described by PDEs, Reinforcement Learning for closures of ROMs, and recent work on generative AI for capturing the evolution of complex physical systems. I advocate for the creation of algorithmic alloys that combine scientific computing and AI to tackle some of the most challenging problems of our times. I believe that a particularly attractive feature of ML is its inherently statistical nature that allows for a systematic quantification of uncertainty. By forming algorithmic alloys with scientific computing, we obtain access to robust ways of computing the uncertainty of the predictions obtained from solving PDEs. I believe that AI not only enhances automation by maximizing machine capabilities but also unlocks new paths for human thought and scientific discovery.
2. Benchmarking is critical when comparing scientific computing and machine learning
Computing can be understood as a process of dimensionality reduction, where infinite or high-dimensional fields are translated into reduced-order, query-driven information. This perspective aligns with computations ranging from forecasting hurricane trajectories in scientific computing to recognizing cats in image collections through machine learning. In order to juxtapose computations made in these fields, I use an example that I first saw in a talk by Christopher Bishop (Microsoft). One of the most well-known algorithms for dimensionality reduction is the linear principal component (PCA) analysis. The
$ N $
dimensional data vector is used to form a covariance matrix, and extracting its
$ M<<N $
eigenvectors that correspond to its largest eigenvalues provide a reduced-order basis for representing the data. However, venturing beyond the linear representation offered by PCA is not a straightforward task. The task of developing a non-linear PCA is much easier if one adopts a neural network (NN) architecture. A single-layer NN with linear activation functions that learns the identity mapping is a linear autoencoder equivalent to linear PCA (Baldi and Hornik, Reference Baldi and Hornik1989). Unlike the eigenvalue problem, it is possible to render the NN nonlinear by simply changing the node activation functions. Adding more layers allows for more flexibility and more powerful nonlinear dimensionality reduction algorithms. This was the architecture we adopted in order to compare dimensionality reduction using NN and Proper Orthogonal Decomposition (POD- the name with which linear PCA is known in the fluid mechanics community) (Berkooz et al., Reference Berkooz, Holmes and Lumley1993).To the best of my knowledge, this was the first application of deep neural networks (DNN) in fluid mechanics and turbulent flows (Milano and Koumoutsakos, Reference Milano and Koumoutsakos2002). In that same paper, NNs complemented an analytical expression for reconstructing the flow above the wall using only wall-based information suggests the hybridization of numerical/analytical approaches with and ML. One of the reviewers of that same paper raised the issue of interpretability in using DNN encodings at the time. Indeed, interpreting the behavior of DNNs is not as straightforward as for a linear eigenvalue problem. But again, interpretability is a human trait, and this view may only apply to researchers trained in classical numerical analysis concepts. New generations of scientists may be more comfortable with DNNs and their ablation studies than with linear stability analysis. Furthermore, training for DNNs may become stuck in local minima of the objective function. Training a DNN 25 years ago was 7–8 orders of magnitude slower than it is today due to the speed of computers (2 minutes versus a year for the same DNN based on the speed of the top supercomputer of the Top500 list). I believe that several interesting ideas in ML for PDEs today owe their inception to these capabilities.
In recent years, ML algorithms have exploited the ample availability of data and powerful computing architectures (Kurth et al., Reference Kurth, Treichler, Romero, Mudigonda, Luehr, Phillips, Mahesh, Matheson, Deslippe, Fatica, Prabhat and Houston2018) to provide new solutions to problems across all scientific fields (Baldi et al., Reference Baldi, Sadowski and Whiteson2014; Carleo et al., Reference Carleo, Cirac, Cranmer, Daudet, Schuld, Tishby, Vogt-Maranto and Zdeborová2019; Kochkov et al., Reference Kochkov, Yuval, Langmore, Norgaard, Smith, Mooers, Klöwer, Lottes, Rasp, Düben, Hatfield, Battaglia, Sanchez-Gonzalez, Willson, Brenner and Hoyer2024). The advent of large language models adds yet another dimension to the capabilities of human discovery (Buehler, Reference Buehler2024; Meng et al., Reference Meng, Yan, Zhang, Da, Cui, Yang, Zhang, Cao, Wang, Wang, Gao, Wang, Ji, Qiu, Li, Qian, Guo, Ma, Wang, Guo, Lei, Shao, Wang, Fan and Tang2024). This excitement is unavoidably accompanied by a certain degree of hype on the potential of AI and Machine Learning (ML) as a replacement for scientific computing. Indeed, ML has shown potential for bypassing classical scientific computing and the forecasting of systems based on PDEs by taking the alternate route of data-driven modeling. Successes in climate and weather forecasting are a testament to this capability (Lam et al., Reference Lam, Sanchez-Gonzalez, Willson, Wirnsberger, Fortunato, Alet, Ravuri, Ewalds, Eaton-Rosen, Hu, Merose, Hoyer, Holland, Vinyals, Stott, Pritzel, Mohamed and Battaglia2023; Bodnar et al., Reference Bodnar, Bruinsma, Lucic, Stanley, Brandstetter, Garvan, Riechert, Weyn, Dong, Vaughan, Gupta, Thambiratnam, Archibald, Wu, Heider, Welling, Turner and Perdikaris2024; Price et al., Reference Price, Sanchez-Gonzalez, Alet, Andersson, El-Kadi, Masters, Ewalds, Stott, Mohamed, Battaglia, Lam and Willson2024). At the same time, there have been limited advances in solving challenging PDEs using ML methods. Efforts to adopt (or plug in) ML techniques in scientific computing frameworks (such as using NNs for Galerkin type of numerical methods) are attractive due to the relative simplicity of using a cost function and the availability of automatic differentiation tools. Furthermore, it is more attractive these days to use NNs in the title of a paper instead of Galerkin. However, this fusion may not be the final answer in forecasting complex systems. Several popular methods (Bar-Sinai et al., Reference Bar-Sinai, Hoyer, Hickey and Brenner2019; Li et al., Reference Li, Kovachki, Azizzadenesheli, Liu, Bhattacharya, Stuart and Anandkumar2020; Karniadakis et al., Reference Karniadakis, Kevrekidis, Lu, Perdikaris, Wang and Yang2021) have solved a plethora of low-dimensional, simplified problems, but extensions to complex multiscale systems, such as flows at high Reynolds numbers, remain an open question. Hence, I wonder if for future research, the right direction is to try to replace, for example, complex flux limiters and shock-capturing schemes with deep NNs to solve supersonic flows. There are limitations of ML learning methods in forecasting, for example, the evolution of the complete (near body and wake) vorticity field in flows past a circular cylinder or vortex merging at Reynolds numbers above 5000, even in 2D. This fact does not discount the current developments but suggests that more potent computer hardware is an enabler for innovation and new modes of inquiry in human thinking. Today, ML offers new impulses, hope (mixed with some hype) for solving PDEs across disciplines. However, in order to achieve progress in understanding and forecasting complex systems, we need to specify rigorous metrics that quantify the veracity and robustness of ML solutions for PDEs (McGreivy and Hakim, Reference McGreivy and Hakim2024). There is an ongoing debate about whether the physics-informed approaches are superior to the physics-consistent approaches. The latter have been ingrained for decades in the development of numerical methods, but indeed, they can be dauntingly complex. The former promises simplicity along with abundant software. Which direction should we choose? I believe that to decide on a path forward, we need a critical evaluation of these approaches. The concepts of validation and verification and uncertainty quantification that have become the cornerstones of scientific computing are making inroads in machine learning (Gal et al., Reference Gal, Koumoutsakos, Lanusse, Louppe and Papadimitriou2022). It would be very useful if the ROMs developed through AI or scientific computing techniques were accompanied with guarantees and error bounds of their predictions. Although consistency with first-principles models is not at the core of data-driven approaches, the use of synthetic data from solving PDEs can guide the development of guarantees for ML methods.
Perhaps, instead of trying to answer the question “can AI replace numerical methods?” we need to reflect more on whether this is the right question to ask. Probabilities are a powerful framework to base this discussion (Jaynes, Reference Jaynes2003). In a departure from deterministic solutions of PDEs, the outcomes of ML are statistical objects that we can use to quantify the element of surprise in our assumptions. Bayesian model selection suggests mixture models (possibly based on PDEs) with quantified probabilities as weights. In case a single model needs to be selected, Bayesian inference offers relevant methods (Zavadlav et al., Reference Zavadlav, Arampatzis and Koumoutsakos2019). In turn, the fusion of ROMs and expensive simulations may offer the best of both worlds. Generative models can sample data from high-dimensional target distributions and can be conditioned on specific quantities or latent structures (Gao et al., Reference Gao, Kaltenbach and Koumoutsakos2024a). This path provides exciting opportunities for a novel way of handling Uncertainty Quantification (UQ) in systems described by PDEs.
3. Simplify interfaces: patterns across scales in space and time
The growth of ML has led to a “gold rush” across all scientific fields. As the number of publications increases at an exponential pace, it is sometimes difficult to identify novelty, in particular, in advances occurring at the important interfaces between disciplines. I believe that the simplification of questions, algorithms, and results can facilitate the identification of common elements at the interface of AI and scientific computing. The breakthrough in ML in pattern recognition came with the publication of the ImageNet classification achieved by a convolutional neural network (Krizhevsky et al., Reference Krizhevsky, Sutskever and Hinton2012). Today, it is well-known that ML has an advantage in pattern recognition across a broad range of feature spaces (Theodoridis and Koutroumbas, Reference Theodoridis and Koutroumbas2006). Patterns are also present in dynamical systems, not only in space and time but also, more importantly, in a suitable latent space whose dynamics may control the evolution of their effective dynamics. The challenge here is to identify these latent spaces. Can machine learning extend its undisputed capabilities for spatial pattern recognition to such cases? The automated identification and characterization of these patterns may offer a pathway to solving a wide range of scientific problems and engineering designs with dynamics that span multiple spatio-temporal scales (Wilcox, Reference Wilcox1988; Dura-Bernal et al., Reference Dura-Bernal, Suter, Gleeson, Cantarelli, Quintana, Rodriguez, Kedziora, Chadderdon, Kerr, Neymotin, McDougal, Hines, Shepherd and Lytton2019). Several works in our group have shown that the use of RNN-LSTMS, echo state networks, transformers, and reservoir computing cannot be used to forecast complex systems through data alone (Vlachas et al., Reference Vlachas, Byeon, Wan, Sapsis and Koumoutsakos2018, Reference Vlachas, Pathak, Hunt, Sapsis, Girvan, Ott and Koumoutsakos2020) for times that are large multiples of the respective Lyapunov exponent of system dynamics. Is there a way to use the capabilities of RNN-LSTM and transformers to forecast complex systems for unlimited time frames?
Multiscale methods in scientific computing rely on judicious approximations of the interactions between processes occurring over different scales, and a number of potent frameworks have been proposed. The equation-free framework (EFF) (Kevrekidis et al., Reference Kevrekidis, Gear and Hummer2004) pioneered the notion of alternating between coarse and fine descriptions to efficiently capture the evolution of dynamical systems. In these descriptions, success depends on the separation of scales in the system dynamics and their capability to capture the transfer of information between scales. Perhaps the most important aspect is identifying a suitable latent space (Wiewel et al., Reference Wiewel, Becher and Thuerey2019). Can ML methods identify and evolve these latent spaces? There have been several efforts to answer this question(for example (Lusch et al., Reference Lusch, Kutz and Brunton2018), (Regazzoni et al., Reference Regazzoni, Dede and Quarteroni2019), (Lee et al., Reference Lee, Kooshkbaghi, Spiliotis, Siettos and Kevrekidis2020), (Maulik et al., Reference Maulik, Botsas, Ramachandra, Mason and Pan2021), (Khoo et al., Reference Khoo, Lu and Ying2021), (Simpson et al., Reference Simpson, Dervilis and Chatzi2021), (Floryan and Graham, Reference Floryan and Graham2022), and (Seyyedi et al., Reference Seyyedi, Bohlouli and Oskoee2023) for a recent survey). Several of these efforts are distinguished by developing a ROM in space, followed by tracking its evolution in time reminiscent of the classical methods for solving PDEs by assuming solutions that are a product of functions of space and time. The effective dynamics learning algorithm (LED) belongs to this category of methods (Vlachas et al., Reference Vlachas, Arampatzis, Uhler and Koumoutsakos2022). LED extended the EFF by deploying probabilistic and variational autoencoders (AE) to transfer information between coarse- and fine-scale descriptions and recurring neural networks (RNN) with long-short-term memory (LSTM) (Hochreiter, Reference Hochreiter1997) gating that evolves the coarse-grained (latent) dynamics. The development of novel attention-based architectures offers new venues for greater expressiveness and specialized mechanisms in computational models. Originally designed for speech and NLP applications (Bahdanau, Reference Bahdanau2014; Devlin et al., Reference Devlin, Chang, Lee and Toutanova2019; Radford et al., Reference Radford, Wu, Child, Luan, Amodei and Sutskever2019), these architectures, particularly transformers (Vaswani et al., Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin2017), are now being adapted for other domains, including dynamical systems (Lee et al., Reference Lee, Lee, Kim, Kosiorek, Choi and Teh2019; Fenton et al., Reference Fenton, Shmakov, Ho, Hsu, Whiteson and Baldi2020). The use of memory in the dynamics of evolving latent spaces is critical, as projection to this pace may have distorted the Markovian character associated with PDEs. This salient but important feature of ML versus classical scientific computing should not be overlooked. Transformers, which are made up of encoder-decoder modules with attention mechanisms, excel in handling sequential data, making them promising for time series analysis. However, their quadratic complexity limits scalability for multiscale systems (Baldi, Reference Baldi2021). Recent efforts address the theoretical foundation for attention mechanisms, advancing their understanding and computational capabilities (Baldi and Vershynin, Reference Baldi and Vershynin2023). Transformers can be integrated into LEDs and, as mentioned below, may play a critical role in the expansion of AI generative advances to the prediction of complex systems (Gao et al., Reference Gao, Kaltenbach and Koumoutsakos2024a).
LED was able to accurately simulate the dynamics of flow past a cylinder at Re = 100 using only two degrees of freedom in the latent space and one degree of freedom in evolving small molecular systems (Vlachas et al., Reference Vlachas, Zavadlav, Praprotnik and Koumoutsakos2021). However, the LED did not predict the flow dynamics at Re = 1000 with the same accuracy, exhibiting higher errors in the vicinity of the cylinder surface, indicating that the vorticity generation mechanisms were not captured. At the same time, this failure may indicate ways of combining domain knowledge with the building of suitable encoders. Machine learning algorithms shall be consistent with the underlying physics not simply by adding a term in the cost function, but by designing learning algorithms that can identify causal relationships.
I remark that the EFF framework introduced one more concept for the simulation of multiscale systems: the separate treatment of spatial and temporal dynamics (first coarsen in space, evolve in time, then refine, and evolve). This situation may be reminiscent of the classical method of solving PDEs by the separation of variables. However, this separation of variables has intrinsic limitations and does not account for the coupling between scales in the system dynamics. A possible solution, building on key features of the LED framework, is the adoption of generative AI to couple the space and time evolution of the system dynamics. In the Generative Learning of Effective Dynamics (G-LED) framework (Gao et al., Reference Gao, Kaltenbach and Koumoutsakos2024a), instances of high-dimensional data are down-sampled to a lower-dimensional manifold that is evolved through a multi-head, autoregressive attention model, leveraging its low memory footprint and expressivity. In turn, Bayesian diffusion models that map this low-dimensional manifold onto its corresponding high-dimensional space capture the statistics of the system dynamics. G-LED uses a Bayesian diffusion model as a decoder, incorporating physical information through conditional diffusion and virtual observables. The reverse diffusion process flexibly captures the statistics of fields governed by PDEs (Gao et al., Reference Gao, Han, Fan, Sun, Liu, Duan and Wang2024b). It is important to note that the G-LED sequence of snapshots is correlated with the underlying physical process through macro-sequences. Moreover, G-LED decodes multiple consecutive macro-states together as a batch to enhance temporal coherence and increase temporal smoothness in the results. The decoding of multiple macrostates that are contextually connected is reminiscent of OpenAI’s Sora, a text-to-video generative AI model that can generate videos of realistic or imaginative scenes from text prompts (Liu et al., Reference Liu, Zhang, Li, Yan, Gao, Chen, Yuan, Huang, Sun, Gao, He and Sun2024).
The exploration of generative AI for solving PDEs is in its infancy. This is a fertile area for a wide range of explorations that can lead to new ways of interfacing AI and scientific computing. The need for plurality in the interface of AI and scientific computing is captured by Jürgen Schmidhuber, who wrote in his pioneering diploma thesis in 1987 (Schmidhuber, Reference Schmidhuber1987) that “… we cannot capture the essence of learning by relying on a small number of algorithms. In contrast, there is a need for a host of context-dependent learning strategies to acquire domain-specific information using information that is already available. Due to the complexity and richness of these strategies and their triggering conditions, the obvious escape seems to be the following. Lastly, give the system the ability to learn how to learn, too. A system with such meta-learning capabilities should view every problem as consisting of at least two problems: Solving it and improving the strategies employed to solve it. Of course, we do not want to stop at the first meta-level!”
4. Discover: the best of both worlds
4.1. Solving forward and inverse problems for PDEs using machine learning
The foundations of scientific computing have been laid on the development of numerical methods for solving PDEs and the effective deployment of these numerical methods in supercomputers. In particular, the field of fluid dynamics and the numerical solution of the Navier–Stokes equations have driven innovations applied to disciplines ranging from epidemics to wildfires. The Navier-Stokes equations involve many classes of PDEs that we distinguish as parabolic, elliptic, and hyperbolic. As mentioned in the abstract, learning numerical solutions of PDEs is a prominent example of inductive learning that is transferable and generalizable. It is a sort of tragedy that today, the curriculum of many Universities no longer includes related classes. The development of numerical discretizations that are consistent with the PDE, while being stable and accurate, has driven the development (and the complexity) of these numerical methods. These methods have been used broadly for solving forward simulations and inverse problems. As already mentioned, scientific computing has limitations on the scales that its methods can resolve. Can ML algorithms replace numerical methods for solving PDEs?
An answer to this question comes from the so-called Physics-Informed ML methods that employ a NN for the approximation of the unknown field. The term “Physics informed” implies employing a cost function in terms of the PDE that, when minimized, produces the weights of the NN and, as such, the solution field. This approach was pioneered for forward problems in PDEs thirty years ago by Lagaris et al. (Reference Lagaris, Likas and Fotiadis1998). Recently, the method was revived by Raissi et al. (Reference Raissi, Perdikaris and Karniadakis2019), who used modern ML tools to improve its performance and popularize it as Physics-Informed NN (PINN). The tens of thousands of citations in this paper are a manifestation of interest in developing simple solutions to PDEs without the hassle associated with classical numerical methods. However, there is no free lunch. Although there have been successful demonstrations of PINNs in several benchmark problems (usually 1D in space and time or steady 2D problems), concerns regarding their training and the associated computational cost (Rathore et al., Reference Rathore, Lei, Frangella, Lu and Udell2024). In PINNs, the cost of evaluating the solution at one point is proportional to the number of weights of the NN. More importantly, the NN approximation may not be consistent with the character of the original differential problem. At least for low-dimensional fields, PINNs are not a viable alternative for solving forward problems. However, there is merit in their use for inverse problems, in particular, as they can easily blend data and equations in formulating the cost function to be minimized. At the same time, there are questions on the well-posedness of such inverse problems and the computational cost as second-order methods requiring a Hessian can be difficult to implement due to the resulting dense Hessians the applicability of efficient optimization methods such as Newton’s method. An extensive study (McGreivy and Hakim, Reference McGreivy and Hakim2024) revealed issues of weak baselines and reporting biases when comparing physics-informed machine learning algorithms with classical numerical methods. Nevertheless, the discussion on physics-informed/enhanced/constrained methods is at its infancy, and one may expect new advances as they serve as convergence points for researchers from different disciplines (Haywood-Alexander et al., Reference Haywood-Alexander, Liu, Bacsa, Lai and Chatzi2024). A valuable aspect of physics informed ML methods (He et al., Reference He, Barajas-Solano, Tartakovsky and Tartakovsky2020; Wang et al., Reference Wang, Kashinath, Mustafa, Albert and Yu2020; Karniadakis et al., Reference Karniadakis, Kevrekidis, Lu, Perdikaris, Wang and Yang2021). for problems where either the PDEs have missing parameters or no sufficient data is available to form a correct initial-value problem. Such problems are encountered in various fields of science and engineering, and they are handled by various methods such as PDE-constrained optimization (Gunzburger, Reference Gunzburger2002), data assimilation (Lewis et al., Reference Lewis, Lakshmivarahan and Dhall2006), and system identification (Ljung, Reference Ljung1999). Possible extensions of PINNs to high-dimensional problems in these fields could be very valuable. Research in our group has been inspired by the revival of PINNs. While we had been aware of the work of Lagaris and his coworkers (Lagaris et al., Reference Lagaris, Likas and Fotiadis1998), it was the paper by the Karniadakis group (Cai et al., Reference Cai, Wang, Fuest, Jeon, Gray and Karniadakis2021) that sparked our idea of combining the discretized form of the equations and data to solve inverse problems. However, instead of using NNs to represent the solution, we introduce the discrete form of the equations in the loss function. Minimization of this loss function produces the solution of the equation at the discretization points, a method we call Optimizing the Discrete Loss (ODIL) (Karnakov et al., Reference Karnakov, Litvinov and Koumoutsakos2023; Karnakov et al., Reference Karnakov, Litvinov and Koumoutsakos2024). ODIL targets inverse problems and combines discrete formulations of PDEs with modern ML tools. The former capability builds on decade-long efforts in numerical analysis and guarantees consistent discretizations. The latter involves automatic differentiation tools such as JAX to allow for flexible software development. ODIL is also related to a number of other approaches that were developed before the recent access to massive data. Solving the discrete equations as a minimization problem is known as the discretize-then-differentiate approach in PDE-constrained optimization (Gunzburger, Reference Gunzburger2002), and it has been formulated for linear problems as the penalty method (van Leeuwen and Herrmann, Reference van Leeuwen and Herrmann2015). ODIL is also related to the 4D-VAR problem in data assimilation (Lewis et al., Reference Lewis, Lakshmivarahan and Dhall2006). ODIL differs from these methods as its sparse linearization is constructed using automatic differentiation tools, and it is geared towards problems with noisy and gappy data. ODIL has two key components: (i) the discretization that defines the accuracy, stability, and consistency of the method and (ii) the optimization algorithm to solve the discrete problem. In case the underlying problem is sparse, this sparsity is preserved, and the optimization can use a Hessian and achieve a quadratic convergence rate. This rate remains out of reach for stochastic gradient-based training of NNs (Bottou et al., Reference Bottou, Curtis and Nocedal2018). Moreover, the use of automatic differentiation to compute the Hessian makes the implementation as convenient as applying gradient-based methods. A number of comparisons of the computational cost of ODIL and PINN for a number of benchmark problems has shown (Karnakov et al., Reference Karnakov, Litvinov and Koumoutsakos2022) that for the same number of parameters, the methods have comparable accuracy, but ODIL is up to five orders of magnitude faster. A notable application of ODIL has been its recent extension to forecasting the evolution as well as the initiation of gliomas in brain tumors (GLI-ODIL) using real patient multimodal data (Balcerak et al., Reference Balcerak, Ezhov, Karnakov, Litvinov, Koumoutsakos, Weidner, Zhang, Lowengrub, Wiestler and Menze2025).
However, I believe that physics-informed machine learning tries to impose onto machine learning principles of scientific computing. A fresh look that may completely bypass PDEs and the usual scientific computing ideas may be a much more powerful approach, as we have already seen in successful applications of data-driven teaching techniques to weather forecasting and ocean dynamics (Kochkov et al., Reference Kochkov, Yuval, Langmore, Norgaard, Smith, Mooers, Klöwer, Lottes, Rasp, Düben, Hatfield, Battaglia, Sanchez-Gonzalez, Willson, Brenner and Hoyer2024; Price et al., Reference Price, Sanchez-Gonzalez, Alet, Andersson, El-Kadi, Masters, Ewalds, Stott, Mohamed, Battaglia, Lam and Willson2024).
4.2. Machine learning for closures of under-resolved PDEs
The development of closures for Reduced Order Models (ROMs), such as coarse-grained or underresolved partial differential equations (PDEs), is of tremendous importance for fields ranging from aircraft design to weather forecasting (Moser, Reference Moser2023). In fluid mechanics, closures for large eddy simulation (LES) and Reynolds-averaged Navier–Stokes equations (RANS) have traditionally been developed using physical insight and engineering intuition (Jimenez and Moser, Reference Jimenez and Moser2000). The vast majority of turbulence models that use ML are based on Supervised Learning (SL). However, there are lingering questions regarding the generalization of these models beyond the training data (Hickel et al., Reference Hickel, Franz, Adams and Koumoutsakos2004; Gamahara and Hattori, Reference Gamahara and Hattori2017; Maulik and San, Reference Maulik and San2017; Vollant et al., Reference Vollant, Balarac and Corre2017; Duraisamy et al., Reference Duraisamy, Iaccarino and Xiao2019; Fukami et al., Reference Fukami, Fukagata and Taira2019; Xie et al., Reference Xie, Wang, Li, Wan and Chen2019). In SL, neural network parameters (NN) are commonly derived by stochastic gradient descent to minimize the model prediction error. As the error is required to be differentiable with respect to the model parameters, and due to the computational challenge of obtaining chain derivatives through complex, large-scale solvers, SL approaches often define one-step target values for the model (e.g., subgrid scale[SGS] stresses computed from filtered DNS). Due to the single-step cost function in SL, the NN model is not trained to compensate for the systematic discrepancies between DNS and LES and the compounding of numerical discretization errors (Nadiga and Livescu, Reference Nadiga and Livescu2007; Wu et al., Reference Wu, Xiao and Paterson2018; Beck et al., Reference Beck, Flad and Munz2019). One way to resolve this difficulty is through the iterative algorithm of Multi-Agent Deep Reinforcement Learning (MADRL). Our group pioneered MADRL as a framework for the systematic construction of such closures (Novati et al., Reference Novati, de Laroussilhe and Koumoutsakos2021). The key idea of the scientific MADRL method is to simultaneously treat points in the grid as agents that learn to correct discretization errors while considering external cost functions and constraints. Unlike SL, RL optimizes a parametric model by directly exploring the underlying task. In addition, the performance of the RL strategy is measured not by a differentiable objective function but by a cumulative reward. These features are especially beneficial in turbulence modeling, as they allow for avoiding the distinction between a priori and a posteriori evaluations. In the case of LES, the performance of the RL is measured by comparing the statistical properties of the simulation to those of the reference data. MADRL can be trained with limited data as it does not require knowledge of the fully resolved flow field but rather global quantities such as energy spectra. Rather than perfectly recovering SGS computed from filtered simulations that may produce numerically unstable LES (Nadiga and Livescu, Reference Nadiga and Livescu2007), RL can develop novel models that are optimized to reproduce QoI accurately. In addition, automated discovery promoted by the RL shifts the focus from specific models and their parameters to the exploration of spatio-temporal patterns that are inherent to turbulent flows and can allow the generalization of the learned models.
We have obtained state-of-the-art results for homogeneous turbulence (Novati et al., Reference Novati, de Laroussilhe and Koumoutsakos2021) and wall-bounded turbulent flows (Bae and Koumoutsakos, Reference Bae and Koumoutsakos2022) that outperform established dynamic SGS modeling approaches. A key part of this success has been the capability to perform policy optimization with the Remember and Forget Experience Replay (ReF-ER) (Novati and Koumoutsakos, Reference Novati and Koumoutsakos2019) that incorporates efficient sampling has shown state-of-the-art performance on benchmark problems, and can even surpass optimal control algorithms (Novati et al., Reference Novati, Mahadevan and Koumoutsakos2019). MADRL develops the SGS model as a policy that relates agent observations and actions. The learning agents are distributed among the discretization grid points and minimize the discrepancies between the energy spectrum based on the LES and that calculated from fully resolved simulations (DNS). We emphasize that MADRL does not require DNS simulations for its training but can use global quantities such as energy spectra available from experiments or observations. MADRL maximizes high-level objectives and produces SGS models that are stable under perturbation and resistant to compounding errors. Moreover, it offers new paths to solve many of the classic challenges of LES, such as wall-layer modeling, which are difficult to formulate in terms of SL. We believe that this property of MADRL is well suited to wind-wave interaction problems that are faced with relatively limited data. (Cranmer et al., Reference Cranmer, Gonzalez, Battaglia, Xu, Cranmer, Spergel and Ho2020). There is great potential for the interpretability of MADRL through the analysis of effective policies using symbolic computations. Such an analysis involves observation-action pairs that can guide the identification of causal processes in turbulent energy dissipation and the distillation of mechanistic models for multiphysics simulations. Finally, MARL closures enable systematic ablation studies, in terms of the quantities, actions, and rewards of the observed system, that can be used to extract causal information about the processes that determine complex flow dynamics. We argue that, unlike neural network-based approaches, MARL allows for interpretable learning and identification of causal relationships for extreme events. We already have evidence of this capability in 2D oceanic flow simulations (Mojgani et al., Reference Mojgani, Waelchli, Guan, Koumoutsakos and Hassanzadeh2023). In these simulations, sciMARL was trained to capture the enstrophy spectrum, but it also captured extreme events in terms of the tails of the probability distribution of the vorticity field. This was not achievable with classical Smagorisnsky models. The intervention character of MADRL may be a gateway to explainable ML (Roscher et al., Reference Roscher, Bohn, Duarte and Garcke2020) and effective closures for PDEs. MADRL is a novel, revolutionary strategy for automating the derivation of closures for multi-fidelity ROMs using sparse data.
5. Summary
The speed, veracity, and reliability of simulations for complex systems have a great impact on science and society. Traditionally, PDEs have been the main modeling tool for describing such systems. Simulations based on the solutions of such PDEs have led to many breakthroughs, but they are facing major limitations. Machine learning and data-driven methodologies present new opportunities and frontiers. I suggest that developing algorithmic alloys between AI and scientific computing offers exciting new tools to tackle complex problems, ranging from weather forecasting to epidemics modeling. The focus and cultures of the two disciplines are highly complementary. Where scientific computing is based on numerical analysis, AI is based on modules and architectures; precision is complemented by statistics, and scientific knowledge can be fused with goal-oriented discoveries. In addition to these methods, I believe that exchanges between the two cultures can be beneficial. The scientific computing community can draw a major lesson from the openness in exchanging ideas and software that have empowered the AI and Data Science communities (Donoho, Reference Donoho2024). At the same time, the rigor of scientific computing testing and checks can greatly benefit the rapid evaluation of new ideas in AI (Koumoutsakos, Reference Koumoutsakos2024). The clear identification of strengths and weaknesses in each field is a powerful tool to advance science.
We live in very exciting times, with advances in computing and artificial intelligence in all fields of science. One of the greatest contributions of AI has been its ability to capture the imagination of scientists of different disciplines, offering a medium for exchanging ideas. More importantly, it has captured the interest and imagination of a new generation of scientists. It is exciting to see exchanges of ideas between scientists in disciplines such as computer graphics, fluid dynamics, archeology, and psychology. Understanding the world through machine learning models that interact with and challenge those built around PDEs offers new perspectives and new scientific frontiers. We live in very exciting times!
Data availability statement
This is not applicable to this article, as no new data was created or analyzed in this study.
Acknowledgments
I have had the privilege to interact and learn over the last few years from numerous insightful discussions with Dr. Lucas Amoudruz, George Arampatzis, Han Gao, Petr Karnakov, Sebastian Kaltenbach, and Sergey Litvinov. I am grateful to the Swiss Supercomputing Center (CSCS) and, in particular, Dr. Maria Grazia Giufredda for their unwavering support that has made our research possible (and enjoyable) for over 25 years. Last but not least, I wish to express my gratitude to Professor Eleni Chatzi (ETHZ) for her kindness, encouragement, and patience with me in writing this perspective.
Author contribution
Conceptualization: P.K.; Investigation: P.K.; Resources: P.K.; Writing – original draft: P.K.; Writing – review & editing: P.K.
Funding statement
I am grateful for funding from the US National Science Foundation, DARPA, AFOSR, and the European Research Council.
Competing interests
The author declares none.
Comments
No Comments have been published for this article.