Physics-informed neural networks

Date

Friday, February 27, 2026

Links of interest

Notes

In class, we first introduced two concepts needed for the transformer (embeddings and the encoder-decoder), then covered physics-informed neural networks.

Below are key concepts from this lecture:

Embeddings

An embedding is a vector representation of an object — a word, an image patch, a geographic location, a time series window — that encodes meaning as a position in a continuous, high-dimensional space.

Unlike one-hot encoding, embeddings place objects that appear in similar contexts near each other. For example, “granite” and “basalt” sit close together; “granite” and “fossil” are further apart.

Embeddings are not limited to words. Any data that can be broken into discrete units can be embedded, including geospatial locations, image patches, and time series segments.

Encoder-decoder architecture

The encoder compresses an input into a compact context representation.
The decoder expands that representation to generate an output.

Embedding models are themselves a type of encoder. This architecture will be central to the transformer.

What are PINNs?

Physics-informed neural networks (PINNs) incorporate governing equations — PDEs or ODEs — directly into the cost function during training.

Rather than fitting data alone, the network simultaneously minimizes:

Data mismatch: how far predictions deviate from observations.
Physics violations: how far predictions deviate from the governing equation.

PINNs are useful when data are sparse or noisy, the underlying physics are well understood, and traditional numerical solvers are expensive.

Physics as regularization

We have already seen regularization in the form of L1 and L2 penalties on weights. In a PINN, we add a physics loss term instead — a penalty for violating the governing equation at a set of evaluation points.

Example: reconstructing a subsurface temperature profile from sparse borehole measurements. The steady-state 1-D heat equation is: $$\frac{d^2 T}{d x^2} = 0$$ We penalize any departure from this at interior points where no measurements exist.

The combined loss function

$$\mathcal{C}\text{total} = \underbrace{\frac{1}{M}\sum{i=1}^{M}\!\left(\hat{T}i - T_i^{\text{obs}}\right)^2}{\mathcal{C}\text{data}} + \lambda\,\underbrace{\frac{1}{N}\sum{j=1}^{N}\!\left(\frac{d^2\hat{T}}{dx^2}\bigg|{x_j}\right)^2}{\mathcal{C}_\text{physics}}$$

$T_i^{\text{obs}}$: observed temperature at measurement $i$.
$\hat{T}_i$: network prediction at $x_i$.
$\lambda$: weight balancing data fidelity against physics fidelity.
The physics sum runs over $N$ interior points with no measurements.

Key insight: we do not need the solution

We are not providing the network with the solution to the PDE. We are using it as a rule:

At each evaluation point, compute $\hat{T}(x_j)$.
Differentiate twice using automatic differentiation.
Penalize any residual from zero.

The optimizer adjusts network weights to minimize both losses simultaneously. If the second derivative is near zero, the physics are satisfied; if it is large, the network is penalized.