Concepts
THRML does block Gibbs sampling of graphical models at scale. From a user's perspective there are three things to work with: blocks, factors, and programs. This page is the mental model behind them.
Blocks
Blocks are fundamental to THRML, because it implements block sampling. A Block is a collection of nodes of the same type with an implicit ordering. Graph-colouring so that no two neighbours share a block is what lets every node in a block resample at once, which is the parallelism the hardware exploits.
Factors and interaction groups
Factors and their conditionals are the backbone of sampling. AbstractFactors take their name from factor graphs and organize interactions between variables into a bipartite graph of factors and variables. A factor synthesizes its interactions into InteractionGroups through a to_interaction_groups() method, which is the array-friendly form the sampler consumes.
Programs
Programs are the orchestrating data structures. BlockSamplingProgram handles the mapping and bookkeeping for padded block Gibbs sampling, managing the global state representation efficiently for JAX. FactorSamplingProgram is a convenient wrapper that converts factors to interaction groups. A program coordinates free and clamped blocks, samplers, and interactions to actually run the algorithm.
The global state
From a developer's perspective, the core approach is to represent as much as possible as contiguous arrays and pytrees, operate on those structures, then map to and from them for the user. Internally this is called the global state, in opposition to the block state. It is the same data-oriented idea as a struct-of-arrays layout, and it is similar to other JAX graphical model packages such as PGMax. The distinction is that THRML supports pytree and heterogeneous states: nodes are split by their pytree type, and the global state is a list of those pytrees, stacked where several blocks share a type.
Since JAX does not support ragged arrays, every block must be the same size in its array leaves. THRML solves this by stacking blocks of the same pytree type and padding them out as needed. There is a tradeoff between padding, which adds some runtime overhead, and looping over blocks, which would pay a likely untenable compile-time cost instead. Everything else in THRML exists to make building and running a program convenient; the focused core is block index management and padding, which keeps the codebase lightweight and hackable at around 1,000 lines.

Limitations
THRML is fast and efficient, but sampling itself is a genuinely hard problem. Drawing samples from a distribution in high-dimensional space can take prohibitively many steps even when proposals are parallelized. THRML is also focused on Gibbs sampling, since that is what Extropic's hardware accelerates, and for general problems it is not known when Gibbs is substantially faster or slower than other MCMC methods, so some problems will want other tools. As a small example, a two-node Ising model with a single edge at $J = -\infty$, $h = 0$ never mixes between its ground states $\{-1,-1\}$ and $\{1,1\}$ under Gibbs, because it never flips once it reaches one of them, while a uniform Metropolis-Hastings move would converge quickly.
Factor and sampler hierarchies
THRML ships two parallel hierarchies, one for factors that define energy and one for the conditionals that sample them:
Factors
AbstractFactorWeightedFactor: parameterized by weightsEBMFactor: defines energy functions for energy-based modelsDiscreteEBMFactor: EBMs with discrete states (spin and categorical)SquareDiscreteEBMFactor: optimized for square interaction tensorsSpinEBMFactor: spin-only interactions ($\{-1, 1\}$ variables)SquareCategoricalEBMFactor: square categorical interactions
CategoricalEBMFactor: categorical-only interactions
Samplers
AbstractConditionalSamplerAbstractParametricConditionalSamplerBernoulliConditional: spin-valued Bernoulli samplingSpinGibbsConditional: Gibbs updates for spin variables in EBMs
SoftmaxConditional: categorical softmax samplingCategoricalGibbsConditional: Gibbs updates for categorical variables in EBMs