Introduction

Covering ML-DSA, or acting as an introduction to lattice signatures is outside the scope of this blog post. If you need to get up to speed, I highly recommend Lyubashevsky Basic Lattice Cryptography. In this blog post, we see how rejection sampling plays with lattice parameters and signature sizes, and we explore a recent paper at Crypto 2025 that introduces techniques for “incrementally rejection sampling”, and how they intuitively and mathematically pan out. The result is a lattice signature with roughly half the size of ML-DSA; we might have standardized ML-DSA a bit too soon…

Rejection Sampling in ML-DSA

Throughout, $\Rqf := \ZZ_\q[X]/(X^\d + 1)$, where for ML-DSA, $\q = 2^{23} - 2^{13} + 1$ and $\d = 256$. ML-DSA is a “sigma protocol with abort”, i.e. the prover sometimes aborts instead of finishing the execution; this paradigm originates in Lyubashevsky’s Fiat-Shamir with Aborts from Asiacrypt 2009. The underlying sigma protocol looks like:

Private: \(\vs_1 \in [\beta]^m,\ \vs_2 \in [\beta]^n\)
Public: \(\vA \in \Rqf^{\,n\times m},\ \vt = \vA\vs_1 + \vs_2 \in \Rqf^{\,n}\)
Prover
Verifier
\(\vy_1 \leftarrow [\gamma + \bbeta]^m\)
\(\vy_2 \leftarrow [\gamma + \bbeta]^n\)
\(\vw := \vA\vy_1 + \vy_2\)
\(\vw\)
\(c \leftarrow \Chal\)
\(c\)
\(\vz_1 := c\vs_1 + \vy_1\)
\(\vz_2 := c\vs_2 + \vy_2\)
if \(\vz_1 \notin [\bbeta]^m\) or \(\vz_2 \notin [\bbeta]^n\)
then \((\vz_1, \vz_2) := \bot\)
\((\vz_1, \vz_2)\)
Accept iff \(\vz_1 \in [\bbeta]^m\) and \(\vz_2 \in [\bbeta]^n\)
and \(\vA\vz_1 + \vz_2 - c\vt = \vw\)

Let us start by reminding ourselves, why does the prover abort if $\vz_1 \not\in [\bbeta]^m$ or $\vz_2 \not\in [\bbeta]^n$? The answer is easy: if we did not, then for a fixed $c$ and $\vs$, the distrbution of $c \vs + \vy$, where $\vy$ is uniformly distributed in $[\gamma + \bar{\beta}]$, would be centered around $c \vs$. For a fixed $c$ we could recover $c \vs$ by averaging, which in turn leaks $\vs$…

Of course, $c$ is not fixed, but even then, you clearly have issues: inuitively, a large value of $c \vs + \vy$ would be more unlikely if $\vs$ was small and correspondingly more likely if $\vs$ was large. For each $\vs$ there is a range that $c \vs + \vy$ can take, and these ranges are not identical, hence some $c \vs + \vy$ will rule out some candidates for $\vs$. This is illustrated by the following figure:

Without Rejection: Some $\vz$ Leak Information About $\vs$

Range of $z$

The block $z=\chl\cdot s+y$ is centered at $\chl\cdot s$; its span depends on the secret

$\chl\cdot s=+\gam$
$\chl\cdot s=0$
$\chl\cdot s=-\gam$
only $\chl\cdot s=+\gam$ lands here

Each $\chl\cdot s$ shifts the block; the marked $z$ lies in only the $\chl\cdot s=+\gam$ range, so observing it rules out the other secrets.

However, there is a common overlap: on the interval $\left[-\bar{\beta}, \bar{\beta}\right]$ all the distributions have the same probability mass, regardless of $c \cdot \vs$: for any $\vz \in [\bbeta]$ within the internal, any challenge $\chl \in \Chal$ and any two secrets $\vs^{(1)} \in [\beta]$ and $\vs^{(2)} \in [\beta]$: $$ \Pr\left[\vz = \chl \cdot \vs^{(1)} + \vy \right] = \Pr\left[\vz = \chl \cdot \vs^{(2)} + \vy\right] $$ In other words, observing a value within this window reveals nothing about the secrets. So to fix the leakage, we simply reject all samples that fall outside this window:

With Rejection: $\vz$ is Independent of $\vs$

Kept Rejected Window

Every $\chl\cdot s$ is clamped to $[-\bbeta,\,\bbeta]$; the accepted $z$ is the same for every secret

$-\bbeta$
$\bbeta$
$\chl\cdot s=+\gam$
$\chl\cdot s=0$
$\chl\cdot s=-\gam$

The block always covers $[-\bbeta,\,\bbeta]$; reject whatever lands outside and the kept block is identical for every $\chl\cdot s$, hence the accepted $\vz$ is independent of the secret.

The result is a distribution that is independent of $\vs$. ML-DSA employs a particularly simply form of rejection sampling, but this hints at the general idea: we have some secret-dependent distribution of $\vz$, by rejecting certain values of $\vz$ with some probability, we shift the probability mass of the distribution into a secret-independent one. In the ML-DSA case, we always reject values outside of $[\bbeta]$, however, as we shall see, for e.g. Gaussian distributions (as opposed to uniform), we want to reject $\vz$ with a value that depends on “how far” $\vz$ is from the mean: which means we don’t always reject large values. It goes without saying, but obviously rejection sampling can only “remove” values from the distribution of $\vz$, so we want a distribution of $\vz$ that we can more efficiently “sculpt” into some secret-independent distribution. We will get a lot more intuition for this later.

“The sculptor produces the beautiful statue by chipping away such parts of the marble block as are not needed — it is a process of elimination.” — Elbert Hubbard

Rejection Sampling Trade-Off

First, we want to get an understanding for how rejection sampling affects performance and security, beyond just making sure that the distribution of $\vz$ is independent of $\vs$. So what happens to ML-DSA if we make $\bbeta$ smaller? If we make $\bbeta$ smaller, then we are rejecting more potential signatures, which means that signing becomes slower. However it makes signatures smaller: clearly $\vz$ is now smaller, so it requires fewer bits to encode, however the bigger win is that the verifier checks:

$$ A \cdot \vz_1 + \vz_2 = \chl \cdot \vt + \vw $$

For this to not be trivially forgeable, we need that Ring-SIS is hard for $[A | I] \cdot \vz$ with norm bound $\bbeta$. The hardness of Ring-SIS is related to the norm bound $\bbeta$, as well as the size of the ring $\Rqf$ and the dimensions $n, m$:

  • Smaller $\bbeta$ $\implies$ Ring-SIS is harder to solve.
  • Smaller $\Rqf, n, m$ $\implies$ Ring-SIS is easier to solve.

Therefore, as we make norm bound on the solution $\bbeta$ smaller, Ring-SIS becomes harder, which in turn allows the parameters of $\Rqf$, $n, m$ to be smaller for the same security level. This also means that we additionally want our “secret independent distribution” to be as small as possible: so that Ring-SIS for the sigma response message is as hard as possible.

128-bit 192-bit 256-bit ML-DSA
512102415362048256-bit192-bit128-bitML-DSA-44ML-DSA-65ML-DSA-87
$2^{10}$
$2^{11}$
$2^{12}$
$2^{13}$
$2^{14}$
$2^{15}$
$2^{16}$
$2^{17}$
$2^{18}$
$2^{19}$
$2^{20}$
$2^{21}$
Norm Bound  $\bbeta=\lVert\vz\rVert_\infty$
Lattice Dimension  $n=k\cdot 256$

The ring is fixed ($q=2^{23}-2^{13}+1$, $d=256$); only the module rank $k$ grows, enlarging the Module-SIS lattice dimension $n=k\cdot256$. Each curve is the Pareto front at a fixed security level: on it you cannot admit a larger norm $\bbeta$ without raising $n$, nor lower $n$ without tightening $\bbeta$. Core-SVP classical hardness, computed with the lattice estimator (commit 27a581b).

Pure BLISS & Better Marble Blocks

Obviously, the input distribution of $\vz$ does not need to be a shifted uniform distribution, we can pick whatever distribution it is easiest to “chisel” into the target distribution. Furthermore, the target distribution of $\vz$ also does not need to have the same “form” as the input distribution: in the case of ML-DSA, both $\vz$ and $\chl \cdot \vs + \vy$ are uniformly distributed, but in principle they could have different distributions. All of this makes sense if we can pick $\vz$ to be a “short” distribution, while ensuring that the distribution of $\chl \cdot \vs + \vy$ has enough probability mass in the places that the distribution of $\vz$ requires it to: it is easier to chisel a block which already has a lot of marble in the right places into a statue.

Bimodal Gaussians

This is the central idea behind the “BLISS” signature scheme introduced in the Lattice Signatures and Bimodal Gaussians from Crypto 2013: the original work of Vadim Lyubashevsky masked the response $\vz$ using a $\vy$ sampled from a (discrete) Gaussian distribution, this means that the pre-rejection distroibution is a Gaussian centered at $\chl \cdot \vs$. The issue is that this distribution is not at all symmetric around $0$, which means that if $\chl \cdot \vs$ is large, then the probability of getting a negative value is very very small because the tails of a Gaussian diminish very very quickly:

before rejection target
$0$

Here the arrows (and their numbers) indicate how much the probability of a sample at that point needs to be increased/decreased to get back to the centered Gaussian from whatever shifted Gaussian $\chl \cdot \vs + \vy$ has. Long red arrows (with small fractions) indicates that samples in that region needs to be rejected very often. You can play around with the illustration by dragging the midpoint $\chl \cdot \vs$. The innovation of BLISS is deceptively simple, instead of always outputting $\vy + \chl \cdot \vs$ instead extends to having a $\vz$ that is either:

$$ \vz = \begin{cases} \vy + \chl \cdot \vs & \text{With Probability } 1/2 \\ \vy - \chl \cdot \vs & \text{With Probability } 1/2 \end{cases} $$

Now this distribution is symmetric around $0$: it is a bimodal Gaussian distribution with centers at $\chl \cdot \vs$ and $-\chl \cdot \vs$. The result is that you need to reject a lot less to “boost” the probability on the other side of $0$ to get back at a centered Gaussian. Compared to the unimodal Gaussian, the rejection probability for the example is roughly ten times smaller:

before rejection target
$0$

If you flip this around, and “fix” the amount of rejection sampling that you are willing to do, the result of using the bimodal distribution is that you can set the distribution of the target Gaussian to have a much smaller standard deviation, meaning SIS is harder, meaning you can have smaller lattice dimensions, meaning you can have smaller signature sizes. You know the story.

Less Pictures, More Math

Now that you have seen the pretty pictures, let’s briefly make this a bit more formal. In the case of ML-DSA (the uniform distribution), it was easy, you always rejected outside of $[\bbeta]$ and always accepted inside of it. The general case is only slight more involved. We have a source distribution with probability density function $\src$, namely what $\chl \cdot \vs + \vy$ actually produces, and a target distribution with some probability density function $\tgt$. The two figures above plot exactly this pair: the shaded “before rejection” curve is $\src$, the dotted “target” curve is $\tgt$. The ratio $\tgt(\vz)/\src(\vz)$ is precisely the correction factor drawn as arrows in the figures: where the source has a deficit of mass ($\tgt/\src > 1$) we keep almost everything, where it has an excess ($\tgt/\src < 1$) we throw most of it away. We are looking for an acceptance probability $\p(\vz) \in [0, 1]$ — the probability with which we keep a candidate $\vz$, discarding it and resampling with probability $1 - \p(\vz)$ — such that $\tgt(\vz) = \p(\vz) \cdot \src(\vz) \cdot \rep$, meaning we accept a sample $\vz$ with probability:

$$ \p(\vz) = \frac{\tgt(\vz)}{\src(\vz)} \cdot \frac{1}{\rep} $$

Here $\rep \geq 1$ is a constant large enough that $\p(\vz) \leq 1$ everywhere: rejection can only remove probability mass, never add it and the job of $\rep$ is to “scale up” the whole $\src$ distribution by $\rep$, so that even the most starved point asks for an acceptance probability $\leq 1$, i.e. we avoid $\p(\vz) > 1$, which is nonsensical. This “worst point” is what fixes $\rep$:

Now, let’s look at some examples:

Uniform (ML-DSA). The simple $\vz \in [\bbeta]$ acceptance criteria in ML-DSA can be rephrased in this formalism: the target distribution $\tgt$ is the centered uniform distribution over the small window $[\bbeta]$, while the source $\src$ is the shifted uniform distribution that $\chl \cdot \vs + \vy$ actually produces, i.e. uniform over the larger window $\chl \cdot \vs + [\gamma + \bbeta]$. The centered window always sits inside the shifted one (this is exactly the common overlap from the start of the post), and on that window both densities are constant, so the ratio takes only two values:

$$ \frac{\tgt(\vz)}{\src(\vz)} = \begin{cases} \rep & \tgt(\vz) > 0 \text{ i.e. } \vz \in [\bbeta] \\ 0 & \tgt(\vz) = 0 \text{ i.e. } \vz \notin [\bbeta] \end{cases} $$

Both distributions $\tgt$, $\src$ are uniform, so each density is just the reciprocal of the size of its support. A response here has $\d(m+n)$ uniform integer coordinates, per coordinate the target window $[\bbeta]$ offers $2\bbeta + 1$ values while the source window $[\gamma + \bbeta]$ offers $2(\gamma + \bbeta) + 1$, so on their respective supports the densities are:

$$ \tgt(\vz) = \frac{1}{\left(2\bbeta+1\right)^{\d(m+n)}} \qquad \src(\vz) = \frac{1}{\left(2(\gamma+\bbeta)+1\right)^{\d(m+n)}} $$

Dividing, the ratio on the overlap is the constant:

$$ \rep = \frac{\tgt(\vz)}{\src(\vz)} = \left(\frac{2(\gamma+\bbeta)+1}{2\bbeta+1}\right)^{\d(m+n)} $$

This is the smallest $\rep$ that makes $\p(\vz) = 1/\rep \cdot \tgt(\vz) / \src(\vz) \leq 1$ everywhere: since $\rep = \tgt(\vz) / \src(\vz)$, we have $\p(\vz) = 1$ inside $[\bbeta]$ and $\p(\vz) = 0$ outside, which is the always-accept-inside, always-reject-outside rule we started with.

Unimodal Gaussian. The source is a single Gaussian centered at $\vv$, so $\src(\vz) \propto \rho_\sigma(\vz - \vv)$ and the correction factor is:

$$ \frac{\tgt(\vz)}{\src(\vz)} = \frac{\rho_\sigma(\vz)}{\rho_\sigma(\vz - \vv)} = \exp\left(\frac{\|\vv\|^2 - 2\langle \vz, \vv\rangle}{2\sigma^2}\right) $$

Hence the acceptance probability is:

$$ \p(\vz) = \frac{1}{\rep}\exp\left(\frac{\|\vv\|^2 - 2\langle \vz, \vv\rangle}{2\sigma^2}\right) $$

The trouble is the linear term $-2\langle \vz, \vv\rangle$: it is unbounded below, so as $\langle \vz, \vv\rangle \to -\infty$ the acceptance blows past $1$ and no finite $\rep$ keeps $\p \leq 1$ for every $\vz$. We settle for $\p \leq 1$ except with negligible probability over $\vz$ drawn from the target, cutting the tail at $\langle \vz, \vv\rangle \geq -\tau\sigma\|\vv\|$, which forces:

$$ \rep = \exp\left(\frac{\|\vv\|^2}{2\sigma^2} + \frac{\tau\|\vv\|}{\sigma}\right) $$

This is the $\rep$ from the unimodal figure, where $\tau = 1.5$: the unavoidable quadratic cost $\|\vv\|^2 / 2\sigma^2$, plus a linear penalty for the tail we had to discard.

Bimodal Gaussian. Now the source is the symmetric mixture $\src(\vz) \propto \frac{1}{2}\rho_\sigma(\vz - \vv) + \frac{1}{2}\rho_\sigma(\vz + \vv)$.

Each shifted Gaussian expands as:

$$ \rho_\sigma(\vz \mp \vv) = \rho_\sigma(\vz) \rho_\sigma(\vv) \exp\left(\pm\frac{\langle \vz, \vv\rangle}{\sigma^2}\right) $$

Pulling out the common factor $\rho_\sigma(\vz) \rho_\sigma(\vv)$:

$$ \src(\vz) \propto \rho_\sigma(\vz)\rho_\sigma(\vv)\cosh\left(\frac{\langle \vz, \vv\rangle}{\sigma^2}\right) $$

So the correction factor is:

$$ \frac{\tgt(\vz)}{\src(\vz)} = \frac{\exp\left(\|\vv\|^2 / 2\sigma^2\right)}{\cosh\left(\langle \vz, \vv\rangle / \sigma^2\right)} $$

The average over the two signs replaced the runaway $\exp(-2\langle \vz, \vv\rangle / 2\sigma^2)$ with $\frac{1}{2}(e^x + e^{-x}) = \cosh(x)$, which is $\geq 1$ everywhere. Hence we may take:

$$ \rep = \exp\left(\frac{\|\vv\|^2}{2\sigma^2}\right) $$

This gives $\p(\vz) = 1/\cosh(\langle \vz, \vv\rangle / \sigma^2) \leq 1$ for every $\vz$, with no tail to cut and no leftover linear term. This $\rep$ is strictly smaller than the unimodal one: the same quadratic cost, minus the linear penalty. That gap is exactly the order-of-magnitude improvement we saw in the pictures, and it is the whole point of sampling $\vz$ bimodal.

Fixing Verification

I glossed over one thing, for the sake of exposition: since the prover may produce signatures using both $\vy + \chl \cdot \vs$ and $\vy - \chl \cdot \vs$, we need to make sure that the verifier accepts either of these; otherwise half the signatures will be broken. To do this, we work modulo $2\q$ and pick the secret/public key $(\vs, \vA)$ such that:

$$ \vA\vs \equiv \q \pmod{2\q} $$

This way the verification equations become:

$$ \begin{align*} \vA \left( \vy + \chl \cdot \vs \right) &\equiv \vA \vy + \chl \cdot \vA \vs \equiv \vA \vy + \chl \q \pmod{2\q} \\ \vA \left( \vy - \chl \cdot \vs \right) &\equiv \vA \vy - \chl \cdot \vA \vs \equiv \vA \vy - \chl \q \pmod{2\q} \end{align*} $$

However $\q = -\q \pmod{2\q}$ so the two expressions are the same: a verifer checking $\vA \left( \vy + \chl \cdot \vs \right) = \vA \vz$ accepts the prover sending both $\vy + \chl \cdot \vs$ and $\vy - \chl \cdot \vs$. Changing the key to be like this, instead of MLWE samples $\vt$, may seem like a big change, but it is mostly notational, in particular it can be constructed by sampling the key as usual:

  1. Sample $\vA_0$ uniformly modulo $\q$.
  2. Sample short $\vs_0, \ve$ modulo $\q$.
  3. Publish $\vt = \vA_0\vs_0 + \ve \bmod \q$.

And “assembling” the keys $\vs$ and $\vA$ over $\ZZ_{2\q}$ as:

$$ \vA = \left[\, \q - 2\vt \mid 2\vA_0 \mid 2I \,\right], \qquad \vs = (1, \vs_0, \ve) $$

To see why this key “works”, simply observe that:

$$ \vA\vs = \q - 2\vt + 2\vA_0\vs_0 + 2\ve \equiv \q \pmod{2\q} $$

Where the middle terms drop because $\vt \equiv \vA_0\vs_0 + \ve \pmod{\q}$, so $-2\vt + 2(\vA_0\vs_0 + \ve)$ is a multiple of $2\q$.

Finally, I have another confession to make, the version above is not the version in BLISS and the Crypto 2025 paper: they rely on (a generalization of) NTRU. I will explain this at the end, because I want to keep this complexity seperate from the central idea of improved rejection sampling, which is the primary novel contributions of these works.

proposal accepted output window \(\pm\bbeta\) (verifier check)
Uniform (ML-DSA)Unimodal (Lyubashevsky)Bimodal (BLISS)

Uniform accept

secret-free, fills the window

Unimodal accept

leaks the secret

Bimodal accept

secret-free, centred

All three accept exactly when \(\lVert\vz\rVert_\infty\le\bbeta\) (the same window, dotted) and mask the same secret \(\Sec\), so the forgery problem — short vector in that box — is the same MSIS instance: equally hard by construction. Move \(\Sec\): only the signer’s acceptance cost and leakage move. The Gaussian std is fixed at \(r=\bbeta/\tau\), \(\tau\approx3.4\), so the window passes \(\sim\!50\%\) jointly (a second mild reject the verifier also checks).

Same hardness
\[ \text{accept}\iff\lVert\vz\rVert_\infty\le\bbeta,\qquad \text{forge}\iff \vz\in\ker[\,\vA\mid I\,],\ \lVert\vz\rVert_\infty\le\bbeta \]The forger is bound only by the verifier’s box, never by the honest distribution. Same \(\bbeta\) (and ring, \(n,m\)) \(\Rightarrow\) identical MSIS \(\Rightarrow\) equal hardness, whatever shape fills the box. The lattice estimator confirms the bit-security depends on \(\bbeta\) alone.
Uniform output
\[ \vz=\Sec+\vy,\quad \vy\sim\mathcal U\big[-(\bbeta+\Sec),\ \bbeta+\Sec\big],\quad \text{accept iff}\ \ \vz\in[-\bbeta,\bbeta] \]The flat box, shifted by the secret, still covers the centred window, so the accepted \(\vz\) is exactly uniform there — secret hidden by coverage. It pays by filling the whole window, and acceptance \(=\big(\tfrac{\bbeta}{\bbeta+\lVert\Sec\rVert_\infty}\big)^{\dim}\) degrades gently.
Unimodal output
\[ \Pr[\text{accept}\mid\vz]=\min\!\left(1,\ e^{-\langle\vz,\Sec\rangle/r^{2}}\right) \]The cap starves only the side opposite \(\Sec\), so the output stays shifted toward \(\Sec\): it leaks. The surviving tail \(Q(\lVert\Sec\rVert/r)\) also drags acceptance below the bimodal rate at the same secret.
Bimodal output
\[ \Pr[\text{accept}\mid\vz]=\min\!\left(1,\ \frac{1}{\cosh\!\big(\langle\vz,\Sec\rangle/r^{2}\big)}\right) \]\(\cosh\) cancels the secret on both sides, so the output is exactly the centred \(\mathcal D_{r}\) — secret-free and concentrated well inside the window. Acceptance \(=e^{-\lVert\Sec\rVert^{2}/2r^{2}}\), the best of the three, then \(\times\,{\sim}50\%\) for the window cap.
Reading the cost. The numbers are per-coordinate. The window cap (\(\sim\!50\%\)) is a joint effect over all \(\dim\) coordinates; per coordinate it is nearly invisible, since \(\bbeta\approx3.4\,r\) sits far out in the tail. Real signing rates are the product across coordinates, but the relative ordering — bimodal cheapest and clean, unimodal cheaper-looking but leaking, uniform robust but window-filling — is what the panels show.

Iterative Bimodal Rejection Sampling

The observation central in the new paper is this:

$$ c \cdot \vs $$

Is a product of polynomials. Recall that $c \in \{0,1\} \subseteq R_q$. It is the sum of:

$$ \sum_{x_j \text{ has a non-zero coeff in } c} x^j \cdot \vs $$

So its norm is roughly the sum of $\eta \cdot |\vt|$: each $x_j \cdot \vs$ is a “cylic rotation” of the polynomial(s) $\vt$. This makes the norm larger, by a factor $\eta$. What if we could rejection sample a bit smarter than “all-or-nothing” on the final sum?

In BLISS (G+G), we flipped a coin to pick either $\vy + c \vs$ or $\vy - c \vs$, what if:

  • We flipped a coin at every step of the sum above?
  • What if it was a bent coin? Depending on the current running sum?

Let’s denote by $\vz_i$ the running sum after $i$ steps.

Then the goal:

Ensure that $\vz_i$ is distributed as a secret-independent central Gaussian for every $i$

We are going to do this by induction:

  • $\vz_{i-1}$ is distributed as a secret-independent central Gaussian
  • We flip-and-reject st. $\vz_{i}$ is also distributed as a central Gaussian.

The trick is that:

  • We only need to “hide” something “smaller” $|\vt|$-sized at each step.
  • We can adaptively “flip”, depending on how much $\vz_{i-1}$ is on one-or-the other side of zero. Trying to “drive” it towards the middle.
\(\vz_{i-1}\)
\(\vz-\vv\)
\(\vz+\vv\)
\(0\)

drag \(\vz_{i-1}\) along the axis: the coin (bar below) bends toward the centre as the sum drifts out; grey is the rare rejection

Biased Random Walk

\(\vz_0=\vy\)
\(\vz\)

Same Target \(D_r\), Far Less Rejection

All at once BLISS hides the whole \(\Sec\), \(\lVert\Sec\rVert\approx\eta|\vt|\)
One rotation at a time this paper, \(\eta\) cheap steps

Per-step keep
\[ \Pr[\text{keep}\mid \vz']=\frac{1}{\cosh\!\big(\langle \vz',\vv\rangle/r^{2}\big)},\qquad \vz'=\vz_{i-1}\pm\vv \]Flip a uniform sign, land at a candidate \(\vz'\), keep it with this probability (the BLISS rule with the small piece \(\vv\), \(M=e^{|\vv|^2/2r^2}\)). If \(\vz_{i-1}\sim D_r\) then the kept \(\vz_i\sim D_r\), exactly central, for any secret. Because \(|\vv|\approx|\vt|\) is small, \(\cosh\approx 1\) and a step is almost always kept.
The bent coin
\[ \frac{\Pr[\text{toward centre}]}{\Pr[\text{away}]}=\frac{\cosh\!\big((z\!+\!v)v/r^2\big)}{\cosh\!\big((z\!-\!v)v/r^2\big)}\ \ge\ 1 \]The candidate nearer the centre has the larger \(1/\cosh\), so it is kept more often: a fair coin at \(z=0\), bent back toward the centre as \(|z_{i-1}|\) grows. That is what keeps the running sum from wandering off as the pieces accumulate.
Why it is cheaper
\[ \text{cost}_{\text{iter}}\propto \eta\,|\vt|^{2}\quad\ll\quad \big(\eta|\vt|\big)^{2}=\lVert\Sec\rVert^{2}\propto\text{cost}_{\text{all}} \]Hiding \(\eta\) small \(|\vt|\)-pieces costs a factor \(\eta\) less rejection than hiding the whole \(\Sec\) at once. So for the same rejection budget you can target a narrower \(D_r\) (shorter \(\vz\), smaller signatures) without paying more.

Of course, the sum of (centered) Gaussians is a Gaussian, but by doing this, we make it narrower, meaning everything gets smaller; without having to increase the rejection sampling…

A Simplified Signature Scheme

Keys

  • Secret: a short $\vt\in R^{m}$.
  • Public: $A\in R_{2q}^{\,1\times m}$, generated so that

$$A\vt\equiv q \pmod{2q}.$$

The image of the secret under $A$ is the fixed constant $q$, a multiple of $q$, which is precisely what the bimodal cancellation below needs. No separate per-key syndrome is published.

Built as in the previous section, this key is just an MLWE instance dressed up modulo $2q$: distinguishing it from random is MLWE, and forging $(\vz,c)$ is MSIS. The paper pushes the key smaller still with NTWE, the subject of the final section.

Signing

  1. Sample the mask $\vy\leftarrow D_{R,r}^{\,m}$.
  2. Commit: $u = A\vy\bmod 2q$.
  3. Challenge: $c = H(u,\mu)$.
  4. The rejection-sampling steps fix a sign on each nonzero coefficient, producing a ternary $c'$ with the same support as $c$ and $c'\equiv c\bmod 2$, i.e. $c$ has coefficients $\{ 0, 1 \}$ and $c' \in \{ -1, 1 \}$ is the conditional “flip”:

$$ c' = \sum_{j \in \operatorname{supp}(c)} (-1)^{a_j}\, x^{\,j} \qquad a_j \in \{0,1\}. $$

Set:

$$\vz = \vy + \vt \cdot c'$$

  1. Rejection sampling forces $\vz$ to follow the secret-independent Gaussian $D_{R,r}^{\,m}$; restart on abort or if $\lVert\vz\rVert>B$.
  2. Output the signature $(\vz,c)$.

In BLISS the sign step is a single global bit, $c'=(-1)^{b}c$; the paper generalizes it to one independent sign per coefficient.

Check

Accept $(\vz,c)$ on message $\mu$ if and only if

$$\lVert\vz\rVert \le B $$

$$c = H\!\left(A \vz-q\,c \bmod 2q,\ \mu\right)$$

Why it holds

The verification equation follows from:

\begin{align*} A\vz - qc &= A(\vy + \vt c') - qc \\ &= A\vy + A\vt c' - qc \\ &\equiv u + qc' - qc \pmod{2q} \\ &= u + q(c' - c) \\ &\equiv u \pmod{2q} \end{align*}

The last step uses $c' \equiv c \pmod{2}$, coefficient-wise.

Why modulo $2q$

The cancellation rests on

$$-q \equiv q \pmod{2q}\qquad(\text{since } -q+2q=q),$$

so both signs of a $q$-coefficient land on the same residue. Modulo $q$ alone they would both be $0$ and the relation would be vacuous; the factor of two is what hides the sign while keeping the equation meaningful. It is also why the scheme is bimodal: the two lobes $\vy+\vtc$ and $\vy-\vtc$ both map to $u+qc \bmod 2q$, and that bimodal structure is what the paper’s improved rejection sampling is built on.

Smaller Keys: NTWE

The MLWE key from Fixing Verification is flexible: you choose the shape of $\vA_0$ freely. What it is not is small. The secret $\vs = (1, \vs_0, \ve)$ drags along both $\vs_0$ and the full error $\ve$, and the response $\vz$ grows with all of it.

BLISS sits at the opposite end. Its public key is a ratio of two short polynomials:

$$ \vb = (2g + 1)\, f^{-1} \bmod \q $$

With $f, g$ short this is about as compact as a key gets, and there is no error term to ship. You pay for it twice: the parameters turn rigid (you can no longer choose the two module dimensions independently, and the NTT pins $\d$ to a power of two), and security now rests on the NTRU assumption that this ratio is indistinguishable from a uniform element, despite being stitched from small pieces.

The 2024 paper takes the middle road: the NTWE problem, “NTRU with Errors” (Gärtner, NTWE: A natural combination of NTRU and LWE, PQCrypto 2023). Keep $\vA_0$ uniform, as in MLWE, but divide the LWE sample by a short invertible $f = 2f_0 + 1$:

$$ \vb = (\vA_0\vs_0 + \ve)\, f^{-1} \bmod \q, \qquad \vs = (f, \vs_0, \ve) $$

The public key is assembled exactly as before, $\vA = [\, \q\vj - 2\vb \mid 2\vA_0 \mid 2I \,]$, and the target relation survives the inversion:

$$ \vA\vs = (\q\vj - 2\vb)f + 2\vA_0\vs_0 + 2\ve \equiv \q\vj f \equiv \q\vj \pmod{2\q} $$

Where the cross terms cancel because $\vb f \equiv \vA_0\vs_0 + \ve \pmod{\q}$, and the last step uses that $f$ is odd. The short $f$ stands in for a whole block of the secret, so the key shrinks toward NTRU sizes, while the uniform $\vA_0$ keeps MLWE’s freedom over the dimensions. Hence the name: NTRU, but with errors.

The assumption is the interpolation: NTWE sits between NTRU and MLWE, collapsing to one or the other in a limiting case. But notice what does not move. In all three constructions the public $\vA$ looks uniform modulo $\q$, and a forgery is a short vector in its kernel, so forging always reduces to MSIS. The choice of NTRU, MLWE, or NTWE only decides how the key hides and how small it gets, never the unforgeability argument. That is the sense in which the scheme is “NTWE-based”: it forges like MSIS, but packs like NTRU.