On Discovering Physics From Data
"The most incomprehensible thing about the world is that it is comprehensible. The fact that it is comprehensible is a miracle." - Albert Einstein
“The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve.” - Edward Wigner

Our understanding of how the world works has evolved with the means by which we shape it. Once upon a time, it was language that first opened our eyes to the infinitely vast universe of abstract representations. Although it wasn’t until the Renaissance that society was radically transformed by the idea that space and time can be quantified and modeled by simple mathematical forms that predict their behavior with surprising accuracy.

Galileo Galilei, one of the earliest pioneers of this revolution, dropped objects from the leaning tower of Pisa to deduce relationships between time and displacement. Then came Isaac Newton to formalize the relationship between force, mass and acceleration. The empirical laws of the 17th and 18th century emerged from cleverly designed experiments whose controllable input measurements linearly affect their output measurements. As the measurement devices and the mathematical tools evolved, our paradigms of how the world works - and how we exist in it - evolved with them.

The last century saw many such paradigm shifts, from the relativity of time to the quantum nature of matter. Yet, the most revolutionary shift in how we do science happened thanks to the advent of computers. 

Until the mid-1950’s, scientists discovered the mathematical rules that animated the universe on paper. Modeling the universe heavily relied on developing analytical techniques to formalize and use those rules for prediction. But the limits of our methods (using pens, papers and a vivid imagination) meant that we have a strong bias towards simpler representations. 

In the absence of sophisticated computational tools, we favor well-posed linear equations that involve as few variables as possible. We also tend to think in terms of simple building blocks (like atoms, cells, agents etc.) that can be put together in simple input-output relationships. Additionally, we prefer working with systems where superposition applies, avoiding models where the sum of the parts gives rise to unpredictable dynamics. These are understandable practical considerations if we’re hoping to achieve generalization beyond the behavior of the objects we perceive with analytical tools alone. But these assumptions come with great limitations. Particularly, they cannot capture the true complexity of important systems we’re interested in understanding: living things, ecosystems, self-organizing systems, turbulent flows etc… ultimately everything in the universe.

With computers becoming faster and widely available, scientists were now able to crunch numbers at very high speeds to solve complicated nonlinear systems of equations they never imagined possible. The ability to perform these so-called numerical “experiments” opened the door to the study of complex, nonlinear, high-dimensional, multi-scale, and multi-physical systems. The bias towards linear models can now be relaxed; at least in part.

Over time, studying complex physical systems has slowly transformed into an iterative process of deriving a system of equations, designing numerical solvers for them, simulating them, analyzing their predictions, and finally changing the model until observations match predictions. It was now possible to completely automate the process of making forward predictions given a well-posed model, which were mostly derived by human brains doing the good-old effort to put together a set of equations from first principles. But just as human brains are too slow for computing millions of algebraic operations in seconds, the optimization process of modeling the world by hand can benefit from computational automation.

In the past two decades, the speed of floating point operations increased exponentially thanks to Moore’s law, the amount of available data exploded thanks to the internet, and optimization algorithms, particularly for neural networks, became powerful enough to optimize intelligent systems that beat humans at many complex tasks. Was it finally possible to automate the modeling of complex systems?

Physics-Informed Machine Learning

Scientists use a large toolbox of modeling techniques that have been accumulated over the centuries. Those range from the descriptive (e.g. matter is made of indivisible particles) to the quantitative (e.g. those particles have velocities and positions and follow Newton’s laws). Some modeling techniques, like differential equations, have been around for centuries because they’ve proven effective and versatile. Most methods are extensions of existing ones, designed to solve problems where they traditionally do not work.

Depending on the field and the problems of interest, some methods gain prominence while others fade away depending on their relative utility in a landscape of problems people are interested in solving. With the rise of data-driven inverse modeling, many new techniques are either replacing or complementing existing tools.

Algorithmic approaches to discovering scientific models from data have been around for a long time. This is also the case for neural networks - proposed by McCullock and Pitts in 1943 - which were responsible for the latest boom in artificial intelligence. But it was only relatively recently that computers became efficient enough to handle the amount of data and computational power required to make those methods appealing for solving real-world problems.

For example, discovering scientific laws from data has been proposed as early as the 1980’s by Pat Langley and others, summarized in Scientific Discovery: Computational Explorations of the Creative Processes. Solving differential equations with neural networks was proposed by Lagaris and others in 1998. But these approaches were revived in recent years thanks, in part, to computational and algorithmic resources that were made openly available by big technology companies whose primary source of income increasingly depend on them. 

With the rise of large language models like OpenAI’s ChatGPT and Google’s Bard, it is becoming self-evident that the intelligence required to solve complex problems that traditionally require brains, will become much cheaper and much more available in the coming few years. Science cannot be immune to that transformation.

But popular deep learning techniques that do well on generating language and images, recommending books and ads, etc., do not necessarily perform up to standards on modeling of physical systems where there are much higher expectations on prediction accuracy. Scientific discovery also relies more heavily on extrapolation, which is not guaranteed in modern deep learning algorithms.

Why I’m interested

I’ve approached this field of research through the good-old big questions of science. Namely, how does the world work? 

At first, my fascination was focused on understanding the relationship between scales; or, how big stuff forms from small stuff. Particularly, I wanted to understand how atoms self-assemble into cells and cells into human beings (like me!). I believed that the rigorous first-principled approach of the hard sciences would lead me to an answer, or at least would provide me with the tools I needed to construct an answer. While I wasn’t completely wrong, I discovered that those tools are much more limited than I originally thought. It turns out modeling complex systems from first-principles can only take you so far. 

This is where data-driven modeling comes in. Combining modern machine learning techniques and widely available data has shown to be a promising approach for discovering the rules that animate multiscale complex systems. Whether that promise is fulfilled will depend on how research will pan out in the next decade. One thing is for sure: 22nd century scientific research will look pretty different from that of the 20th century, in that it will involve much more computational resources.

Over the years, the questions that motivate much of my work still revolve around understanding how the world works, but they have slowly transformed into questions that include the intelligences - both machine and human - that do the understanding. Now, I spend more time thinking about how understanding how the world works works. This might have been a purely philosophical question a couple of decades ago. But recently, it has become an engineering and scientific question.

In practice, I’ve worked on a few fundamental methods that power the data-driven discovery of scientific laws. Those include: 

  • Multiscale modeling
  • Equation discovery from partial measurements
  • Non-dimensionalization from data

Data-Driven Coarse Graining

One of the most fascinating aspects of the universe is that it is made of scales. On its own, there is nothing special about the fact that large things are made of smaller things. But it gets interesting when you realize that their behavior can be wildly different from each other.

While many of the rules that we discover neatly apply to a wide range of phenomena within the same scale, that generalization often fails across different scales. For example, Newton’s laws will do very well at spatial and temporal scales that are similar to ours. But they fail at the atomic level, where the rules of quantum mechanics have to be used.

Additionally, the rules that bridge those scales are not trivial. Indeed, one of the most challenging scientific questions of our time is how large complex behavior emerges from simple building blocks. It might be often easier to find the governing laws of the different scales than to deduce a scale from another.

This being said, there is a common theme in how scales are related. When smaller things are uniform, larger things can be studied statistically. This was first discovered in thermodynamics and statistical mechanics during the 19th century. In simple cases, like ideal gasses, it is even possible to derive large scale equations for the probability distribution of small-scale interacting particles. Boltzmann’s equation is one of the earliest examples of such a feat.

But the analytical techniques used by Boltzmann can only give rise to closed-form equations in simple cases where the correlations in the field of smaller particles are limited and well-known. In most cases, deriving a set of probabilistic density function equations (PDF equations) requires many approximations to close the equations and make them solvable. This is called closure modeling and it often arises in computational fluid dynamics where designing efficient solvers requires statistical modeling of turbulent fluctuations.

In a recent paper, we address the challenge of discovering PDF equations and their closure approximations from data in the context of uncertainty quantification.

We are given measurements of a field $u(x, t)$, where $u$ can be a concentration of a pollutant in the air, or a die in a fluid, and $x$ is the location of that measurement and $t$ its time. The field is random because we have uncertainty about the initial or about certain parameters that $u$ depends on. This means that $u$ has an associated probability density function $f_u(U; x, t)$, where $U$ is the sample space variable; a deterministic value that the random variable $u(x, t)$ can take. We’re also assuming that the equations governing $u(x, t)$ are given by

$$ \dot u = \mathcal L_\mu u $$

where $\mathcal L$ is a differential operator (like $\partial / \partial x + \partial^2 / \partial x^2$ in 1D), and $\mu$ is a vector of input parameters.

When everything is deterministic, one can solve the equation without worrying about probabilities. But given some uncertainties - say in the initial conditions given by $f_u^0 = f_u(x, t=0)$ - we would like to predict how that distribution evolves in time according to an equation of the form

$$ \frac{\partial f_u}{\partial t} = \mathcal K_{\nu} f_u $$

where $\mathcal K_{\nu}$ is a differential operator that typically appears in advection-diffusion form (i.e. a Fokker-Planck equation). Now the question is, how can we find $K_{\nu}$ from $L_\mu$?

There are analytical techniques to do that, like the PDF method used in turbulence, but they often run into closure problems as I mentioned earlier. Our approach uses the sparse identification of differential equations method (a.k.a. sparse identification of nonlinear dynamics, or SINDy) to find the $K_{\nu}$ from Monte Carlo simulation data.

The results are promising, capable of discovering full PDF equations and approximating closure terms when part of the equation can be derived analytically. Our proposed method has found exciting applications in active matter (like schooling fish), neuroscience, uncertainty quantification and beyond.

Discovering Hidden Variables and their Associated Dynamics

Recent work in physics-informed machine learning has mostly focused on two areas: discovering models from data, and accelerating numerical solvers. But the most fundamental questions of physics start from identifying the variables required to make useful predictions. Much of the work that went into early scientific modeling is in identifying relevant variables that best fit within a generalizing law.

The problem of finding low-dimensional latent variables in high dimensional measurements is related to the problem of coarse-graining I’ve just spoken about. It is also related to a long tradition in reduced-order modeling and dimensionality reduction in engineering systems where the underlying dynamics are distilled from high-dimensional correlated measurements of a system.

The recent popularity of using neural networks to address these questions has opened the possibility to use autoencoder architectures that perform nonlinear dimensionality reduction to find these latent variables that best represent the underlying dynamics. Given the non-uniqueness of the latent variable, depending on the architecture and the data, current research focuses on how to best constrain the hypothesis class (through the loss function) to obtain a physically-meaningful latent variable.

Our latest work addresses the central challenge of modeling systems where only partial measurements are available. Say you’re interested in modeling a 3 dimensional system, but you are only given measurements of a single dimension, would it be possible to recover a 3-dimensional system of equations?

This sounds like an almost impossible problem to solve (given its ill-posedness), but Takens' theorem provides conditions for when it is possible to augment these partial measurements with time delayed information, resulting in an attractor that is diffeomorphic to that of the original full-state system. 

However, the coordinate transformation back to the original attractor is typically unknown, and learning the dynamics in the embedding space has remained an open challenge for decades. Our paper designs a custom deep autoencoder network to learn a coordinate transformation from the delayed embedded space into a new space where it is possible to represent the dynamics in a sparse, closed form. We show that it is possible to simultaneously find the hidden variables along with the associated coordinate system for partially observed dynamics.

While the initial results are promising, there is still much that has to be done to make these latent-dynamics-discovery algorithms computational efficient and robust to noise. One of the main challenges is the dimensionality of the hyperparameter space. This is due to the high number of weighted terms present in the loss function.

We are actively working on algorithmic improvements and applications to real-world measurements, in hopes of making these latent-variable discoveries capable of solving modeling problems in all fields of science and engineering.

Discovering Dimensionless Groups

In the absence of governing equations, physicists have always fallen back on dimensional analysis for extracting insights and finding symmetries in physical systems.

Dimensional analysis is based on the simple idea that physical laws do not depend on the units of measurements. As a consequence, any function that expresses a physical law has the fundamental property of so-called generalized homogeneity and does not depend on the observer. 

Although such concepts of dimensional analysis go back to the time of Newton and Galileo, it was formalized mathematically by the pioneering contributions of Edgar Buckingham in 1914. Specifically, Buckingham proposed a principled method for extracting the most general form of physical equations by simple dimensional considerations of the seven fundamental units of measurement: length (metre), mass (kg), time (seconds), electric current (ampere), temperature (kelvin), amount of substance (mole), and luminous intensity (candela). 

From electromagnetism to gravitation, measurements can be directly related to these seven fundamental units. For example, force is measured in Newtons, which is $kg \cdot m \cdot  s^{−2}$, and electric charge by the Volt, which is $kg \cdot m^2 s^{−3} A^{−1}$. The resulting Buckingham Pi theorem was originally contextualized in terms of physically similar systems or groups of parameters that related similar physics.

But given knowledge of the variables and parameters alone, the Buckingham Pi theorem provides a procedure for finding a set of dimensionless groups that spans the solution space, although this set is not unique. 

In recent work, we propose an automated approach using the symmetric and self-similar structure of available measurement data to discover the dimensionless groups that best collapse these data to a lower dimensional space according to an optimal fit. We developed three data-driven techniques that use the Buckingham Pi theorem as a constraint: (1) a constrained optimization problem with a non-parametric input–output fitting function, (2) a deep learning algorithm (BuckiNet) that projects the input parameter space to a lower dimension in the first layer and (3) a technique based on sparse identification of nonlinear dynamics to discover dimensionless equations whose coefficients parameterize the dynamics. 

We’re currently working on applying these techniques to problems where dimensionless groups are not known, or are partially known in simpler cases, such as non-newtonian flows and granular materials.

On Intelligence

Much of this work is motivated by bigger questions of how our understanding of the physical world works, how much of it we can automate and use to assist us in discovering more fundamental questions.

The most profound realization I’ve had in the past few years is that the rules that animate our world cannot be divorced from the means by which they were discovered. And changing the means will change the rules and their properties in non-trivial ways. In other words, the process of understanding and the assumptions it presumes are an integral part of the understanding we seek.

This puts in question our dogma of objectivity: the ability to find representations of the world that are independent from the observer. If the reality we conceive is heavily dependent on our means of discovery (including our thinking patterns), speaking of an independent ‘reality’ out there is almost meaningless. Furthermore, if the intelligence we use to access reality is poorly understood, how can we agree on a unified objective truth, independent of us? 

For example, machine intelligence is very different from human intelligence in many respects; but one cannot deny its superhuman ability to process information and make predictions. Do we then claim that machines aren’t intelligent simply because they do not fit our traditional definition, or do we readjust our view of what reality is in terms of that intelligence. 

The perspective that reality is not accessible objectively, independently of the observer, is not new. My favorite perspective on this problem comes from enactivism: the view that reality arises through the organism’s interaction with its environment. Both reality and the self that accesses it are deeply intertwined that isolating one or the other - as we are deeply programmed to do so - is prone to fully capture what we are, and what the world is.

This philosophical question will become more relevant as we are increasingly concerned with automating the intelligence that finds the rules of motion, rather than doing it purely based on our intuition. Questions of how fundamental the rules we find are, how much they represent the reality they can predict, and how much we can expect them to generalize, become engineering rather than philosophical questions. 

My work attempts to address these questions from both the application and the theoretical perspectives; while constantly grappling with the ethical issues that arise from them. I’m excited about how the next 10 years will transform how we do science; and I’m excited to contribute to that revolution.