Integration points

This page describes where and how AI components can be integrated into a differentiable, implicit, constrained simulation pipeline such as SOFAx v2.

The emphasis is on solver-aware integration: AI models interact with residuals, operators, and iterative solvers rather than bypassing them.

Guiding principle

AI components are integrated at specific operator-level interfaces within the solver, preserving the residual-based structure of FEM–Newton methods and their energy-conserving properties. The implicit constrained solve remains intact; AI assists or enriches local operations rather than replacing the core physics.

1) Learned constitutive laws

Role

Replace or augment the internal force model:

\[ f_{\mathrm{int}}(u) \;\longrightarrow\; f_{\theta}(u, \nabla u, \text{state}). \]

Operator signature: $f_\theta: \mathbb{R}^n \times \mathbb{R}^{n \times d} \times \mathcal{S} \to \mathbb{R}^n$

where $n$ is the number of DOFs, $d$ is the spatial dimension, and $\mathcal{S}$ represents additional state (e.g., history variables, material parameters).

Interface

The learned constitutive law is embedded within the residual operator evaluation. During the upward pass of the scene graph traversal, each field contributes to the residual:

\[ R_{\mathrm{phys}} = M\,a + B\,v + f_\theta(u, \nabla u, \text{state}) - f_{\mathrm{ext}}. \]

The model operates at the field level, meaning it replaces the internal force computation for specific material regions or elements.

2) Warm starts for implicit solves

Role

Predict an initial guess for the Newton solve:

\[ y^{(0)} = (a^{(0)}, \lambda^{(0)}), \]

given the previous state and context.

Operator signature: $\mathcal{W}_\theta: \mathcal{S}_n \times \mathcal{C} \to \mathbb{R}^n \times \mathbb{R}^{n_c}$

where $\mathcal{S}_n$ is the state at time $t_n$ (displacements, velocities, accelerations), $\mathcal{C}$ is context (time step, solver history, etc.), and the output is the initial guess for $(a_{n+1}, \lambda_{n+1})$.

Interface

The warm start is applied before the first Newton iteration in the Newton–Krylov solver.

This is a pure prediction step—the model does not participate in the solver loop itself, only provides the starting point.

Benefits

Significantly reduces Newton iterations: A good initial guess can reduce iterations from 5–10 to 1–3, providing substantial speedup
Improves robustness: For stiff or highly constrained systems, poor initial guesses can cause divergence; learned warm starts improve convergence
Trivial to integrate: No changes to solver structure, just replace the initial guess computation

Typical models

MLPs on global features: Simple and fast, operates on aggregated state (e.g., mean velocity, total energy)
GNNs operating on mesh/graph state: Respects spatial structure, predicts per-node accelerations and multipliers
Neural operators predicting global fields: Generalizes across mesh resolutions and geometries

3) Pseudo-Newton corrections

Role

Replace or augment the linear solve by predicting update directions:

\[ \Delta y^{(k)} \approx \mathcal{N}_\theta\big(R(y^{(k)}), J(y^{(k)})v, \text{context}\big). \]

Operator signature: $\mathcal{N}_\theta: \mathbb{R}^n \times \mathbb{R}^{n_c} \times \mathbb{R}^n \times \mathcal{C} \to \mathbb{R}^n \times \mathbb{R}^{n_c}$

The inputs are:

Residual $R(y^{(k)}) = (R_u, \Phi)$
Directional Jacobian action $J(y^{(k)})v$ (for some direction $v$)
Context (iteration count, solver history, etc.)

The output is the predicted update $\Delta y = (\Delta a, \Delta \lambda)$.

Integration point

The model replaces or augments the Krylov linear solve within the Newton loop (see Newton–Krylov Solver):

for k in range(max_newton_iterations):
    R_k = residual(y_k)
    # Instead of: delta_y = solve_linear_system(J_k, -R_k)
    delta_y = pseudo_newton_model(R_k, J_k @ v, context)
    y_{k+1} = y_k + delta_y
    # Still apply projections and check convergence
    y_{k+1} = apply_projections(y_{k+1})
    if converged(y_{k+1}):
        break

The update is still inserted into the Newton loop:

\[ y^{(k+1)} = y^{(k)} + \Delta y^{(k)}. \]

Key idea

The model learns to behave like a solver step, not like a full simulator. It predicts corrections that move toward the solution, respecting the residual structure.

This keeps:

Constraint enforcement: Projections and constraint satisfaction remain under physics control
Convergence checks: The solver still verifies that $F(y^{(k)}) \to 0$
Robustness: If the model prediction is poor, the solver can detect divergence and fall back to traditional Krylov methods

For detailed discussion, see Pseudo-Newton Methods.

4) Learned preconditioners

Role

Approximate the action of an inverse operator:

\[ r \;\mapsto\; \tilde A^{-1} r, \]

or, in constrained settings (see Newton–Krylov Solver for the primal–dual structure),

\[ (r_u, r_c) \;\mapsto\; (\delta u, \delta \lambda). \]

Operator signature: $\mathcal{P}_\theta: \mathbb{R}^n \times \mathbb{R}^{n_c} \to \mathbb{R}^n \times \mathbb{R}^{n_c}$

The preconditioner approximates the action of the inverse Jacobian:

\[ \mathcal{P}_\theta(r_u, r_c) \approx J^{-1} \begin{bmatrix} r_u \\ r_c \end{bmatrix} \]

Integration

The preconditioner is inserted into the Krylov solver (GMRES) as a right or left preconditioner:

# Right preconditioning: solve J P^{-1} (P delta_y) = -R
# Left preconditioning: solve P^{-1} J delta_y = -P^{-1} R
def preconditioned_gmres(J, R, preconditioner):
    # Preconditioner is a pure function
    P_inv_R = preconditioner(R)
    # ... GMRES iterations using P_inv_R ...

Key requirements:

Pure function: No side effects, compatible with JAX transformations
Fully differentiable: Gradients can flow through the preconditioner
Matrix-free: Operates via function application, not matrix assembly

Typical architectures

GNNs respecting mesh adjacency: Exploit spatial structure, predict per-node corrections
Multilevel / coarse-space models: Learn coarse-to-fine corrections, similar to geometric multigrid
Low-rank neural corrections: Combine physics-based preconditioners with learned low-rank updates

Block structure

For the coupled primal–dual system, the preconditioner can respect the block structure (see Preconditioning):

\[ \mathcal{P}_\theta^{-1} \approx \begin{bmatrix} \tilde A^{-1} & 0 \\ 0 & \tilde S^{-1} \end{bmatrix} \]

or the block-triangular form:

\[ \mathcal{P}_\theta^{-1} \approx \begin{bmatrix} \tilde A^{-1} & -\tilde A^{-1} G^\top \tilde S^{-1} \\ 0 & \tilde S^{-1} \end{bmatrix} \]

where $\tilde A^{-1}$ approximates the dynamics inverse and $\tilde S^{-1}$ approximates the Schur complement inverse.

This is a high-impact but more delicate integration point: a good preconditioner can reduce Krylov iterations by an order of magnitude, but a poor one can cause divergence or slow convergence.

5) Schur complement assistance

Role

Assist or approximate the Schur complement action:

\[ S\,\delta\lambda = G A^{-1} G^\top \delta\lambda. \]

Operator signature: $\mathcal{S}_\theta: \mathbb{R}^{n_c} \to \mathbb{R}^{n_c}$

The Schur complement $S = G A^{-1} G^\top$ represents the dynamic compliance seen by the constraint (see Preconditioning and Constraints & Physical Interactions). The learned model approximates this action:

\[ \mathcal{S}_\theta(\delta\lambda) \approx S \delta\lambda = G A^{-1} G^\top \delta\lambda. \]

Integration point

The Schur complement appears in two contexts:

Schur-based preconditioning: When using block preconditioners, the Schur complement inverse $\tilde S^{-1}$ is needed. A learned model can approximate this action.
Schur-based decoupling: For systems with $n_c \ll n_x$ (few constraints relative to DOFs), the problem can be reduced to the constraint space:

$$ S \delta\lambda = r_c - G A^{-1} r_u $$

A learned model can approximate either $A^{-1}$ (the dynamics inverse) or $S^{-1}$ (the Schur complement inverse) directly.

Possible uses

Learned approximation of $A^{-1}$: The model learns to approximate the dynamics inverse, enabling Schur complement computation without solving the full dynamics system
Learned mapping from constraint residuals to multiplier updates: Directly learn $S^{-1}: r_c \mapsto \delta\lambda$, bypassing the need to compute $G A^{-1} G^\top$
Hybrid physics/learning Schur solvers: Combine physics-based approximations (e.g., diagonal approximations) with learned corrections

This is particularly attractive when the number of constraints $n_c$ is moderate (not too small to benefit from reduction, not too large to make learning intractable).

6) Reduced-order latent solvers

Role

Operate the solver in a latent space:

\[ u \;\xrightarrow{\text{enc}}\; z \;\xrightarrow{\text{solve}}\; z_{n+1} \;\xrightarrow{\text{dec}}\; u_{n+1}. \]

Integration

the latent solver may be implicit or explicit
constraints can be enforced in latent or physical space
decoding reintroduces the full field

This trades physical fidelity for speed and is best suited for exploration or control.

7) Inverse problems and data assimilation

Role

Use AI around the solver to:

estimate material parameters
infer boundary conditions
assimilate measurements

Because the solver is differentiable, gradients flow through:

\[ \theta \;\longrightarrow\; u(t;\theta) \;\longrightarrow\; \mathcal{L}. \]

AI can act as:

a prior
a regularizer
an amortized inverse model

Summary

The most natural integration points are:

Learned constitutive fields: Drop-in replacement at the field level, fully compatible with automatic differentiation
Warm starts: Lowest-risk entry point, provides initial guess without affecting solver structure
Preconditioning: High-impact but delicate, can dramatically accelerate Krylov convergence
Solver corrections: Pseudo-Newton methods that learn to predict update directions

Each integration point preserves the physical structure of the solver while enabling learning-based acceleration or enrichment. The matrix-free, operator-based design of SOFAx v2 makes these integrations seamless.

Integration points

Guiding principle

1) Learned constitutive laws

Role

Interface

2) Warm starts for implicit solves

Role

Interface

Benefits

Typical models

3) Pseudo-Newton corrections

Role

Integration point

Key idea

4) Learned preconditioners

Role

Integration

Typical architectures

Block structure

5) Schur complement assistance

Role

Integration point

Possible uses

6) Reduced-order latent solvers

Role

Integration

7) Inverse problems and data assimilation

Role

Summary

See also