Special Session 107: Recent Advances in Data Assimilation with Machine Learning

An Efficient Online Smoother and Sampling Algorithm for Partially Observed Nonlinear Dynamical Systems
Marios Andreou
University of Wisconsin-Madison
USA
Co-Author(s):    Nan Chen, Yingda Li
Abstract:
State estimation is a critical task in practice as it is the prerequisite for effective parameter estimation (PE), skillful prediction, optimal control, and the development of robust and complete datasets. Of essence is the probabilistic state estimation of the unobserved variables in high-dimensional complex nonlinear turbulent dynamical systems with intermittent instability, given a partially observed time series. Data assimilation (DA) through Bayesian inference combines the partial observations and the model-induced likelihood, potentially plagued by a small error in model structure or initial condition, to form the posterior distribution for optimal state estimation. In this work, an efficient algorithm with closed-form explicit expressions is developed for the optimal (in the mean-squared sense) online smoother and sampling of a rich class of nonlinear turbulent dynamical systems widely appearing in modeling natural phenomena like physics-constrained stochastic models (noisy Lorenz models, low-order models of Charney-DeVore flows, paradigm models for topographic mean flow interaction), stochastically coupled reaction-diffusion models in neuroscience and ecology (stochastically coupled FitzHugh-Nagumo models, stochastically coupled SIR epidemic models), multiscale models for geophysical flows (noisy Boussinesq equations, stochastically forced rotating shallow water equation), and to develop realistic systems for the Madden-Julian oscillation and Arctic sea ice. The conditional Gaussian nonlinear system framework is highly flexible since it encompasses the linear Kalman filter and the Rauch-Tung-Striebel smoother and facilitates studying extreme events, stochastic parameterisation, DA, and online PE. The method particularly handles systems with hidden intermittency and extreme events along with the associated highly non-Gaussian features, such as heavy tails in the probability density functions, by using a computationally effective and systematic method to determine the adaptive lag for the online procedure. A rigorous analysis of the performance of the framework will be illustrated through numerical simulations of high-dimensional Lagrangian DA and online model identification of highly intermittent time series via an expectation-maximization procedure.