Introduction
This series of posts aims to be a comprehensive collection of facts, protocols, theorems related to the information-theoretic foundations of multilinear proof systems. By “multilinear proofs system” we refer to a system with multilinear polynomials as the underlying “arithmetization” of the proof system: where satisfiability of the computation, a (RAM) machine or circuit, is expressed as randomized relations between multilinear polynomials and the witness is the evaluation of multilinear polynomials over some tensor product.
This is in contrast to “univariate proof systems”, where the computation is expressed as relations between univariate polynomials and the witness is the set of evaluations of univariate polynomials (or rational functions) over some subgroup of the field.
In this first installment, we look broadly at the fundemental facts, theorems and technqiues required to argue about the properties of multilinear polynomials and the soundness of multivariate proof systems. We will see how to interpolate a multilinear polynomial with a particular set of evaluations over a “tensor product”, we will see that such a polynomial is uniquely defined, we will explore how many roots multivariate polynomials can have.
The Factor Theorem
Because this series is aiming to be from first principles, we will start by recalling the factor theorem. It states that if $f(X) \neq 0$ and $f(x) = 0$, then we can write $f(X) = $ $ (X - x) $ $ \cdot $ $ g(X)$ for some polynomial $g(X)$. These polynomials will be polynomials over “$\mathcal{F}$”, e.g.
$$ f(X) = \sum_i c_i X^i \ \ \text{ Where } \ \ \forall i. c_i \in \mathcal{F} $$
For what follows, $\mathcal{F}$ is a commutative ring. If the word “commutative ring” is unfamiliar just think of a finite field, however commutative rings include every structure where you can “multiply” and “add” e.g. the integers $\ZZ$ or 32-bit integers $\mathbb{Z}_{2^{32}}$. The “commutative” part simply means that the order of multiplication is irrelevant, i.e.
$$ \forall a, b \in \mathcal{F}, a \cdot b = b \cdot a $$
For instance, even though we can multiply and add matrices, matrix multiplication does not commute ($A \cdot B \neq B \cdot A$ in general) and hence matrices are not a commutative ring. For now, think of $\mathcal{F}$ as a finite field or the integers $\ZZ$, but the theorems and corollaries here will hold more generally. With this out of the way, here is the factor theorem:
Theorem (“The Factor Theorem”): Let $f(X) \in \mathcal{F}[X]$, with $f \neq 0$ and let $x \in \mathcal{F}$ be such that $f(x) = 0$. Then there exists $g(X) \in \mathcal{F}[X]$ such that $f(X) = (X - x) \cdot g(X)$.
We first consider a special case, then use this to prove the general case:
Case x = 0. First we show the theorem when $f(0) = 0$, then the claim is that $f(x) = (X - x)$ $\cdot g(X) $ $ = $ $ X \cdot g(X)$. To see this observe that if $ f(X) = \sum_{i = 0}^{d} c_i \cdot X^i $ then $f(0) = c_0$, and since $f(0) = 0$ we have $c_0 = 0$. Therefore $f(X)$ is of the form: $$ \begin{align} f(X) &= c_1 \cdot X + \ldots + c_d \cdot X^d \\ &= X \cdot (c_1 + \ldots + c_d \cdot X^{d-1}) \end{align} $$ If we define $g(X) = c_1 + \ldots + c_d \cdot X^{d-1}$, we see that: $$ f(X) = X \cdot g(X) = (X - 0) \cdot g(X) $$ As desired.
Case x $\mathbf{\neq}$ 0. Suppose $f(x) = 0$. Define the polynomial $f^* (X) = f(X + x)$ and observe that $f^* (0) = 0$. Therefore, by the “x = 0” case, we conclude that $f^* (X) = X \cdot g^* (X)$ for some $g^* (X) \in \mathcal{F}[X]$. Next write: $$ \begin{align} f(X) &= f((X - x) + x) \\ &= f^* (X - x) = (X - x) \cdot g^* (X - x) \end{align} $$ If we define $g(X) = g^*(X - x)$, then we see that: $$ \begin{align} f(X) &= (X - x) \cdot g^* (X - x) \\ &= (X - x) \cdot g(X) \end{align} $$
The Fundemental Theorem
After showing this general theorem, we are going to immediately restrict ourselves a bit, but just a bit. Namely, we are going to require that $\mathcal{F}$ is an integral domain. An integral domain is a commutative ring which has no zero divisors, which is a fancy way of saying:
$$ \forall a, b \in \mathcal{F}. a \cdot b = 0 \implies a = 0 \text{ or } b = 0 $$
In other words: the only way to get a zero product is to multiply by zero. Fields satisfy this, the integers $\mathbb{Z}$ satisfy this, however, e.g. $\ZZ_{2^{32}}$ does not satisfy this:
$$ 2^{16} \cdot 2^{16} = 2^{32} = 0 \mod 2^{32} $$
The fact that $\mathcal{F}$ is an integral domain is important, because it allows us to conclude that if we have a product of polynomials $f(X) = f_1(X) \cdot f_2(X)$ and a point $x \in \mathcal{F}$ such that $f(x) = 0$, then either $f_1(x) = 0$ or $f_2(x) = 0$. Combined with the factor theorem, this allows us to upper bound the number of roots of any polynomial $f(X)$ by the degree of $f(X)$ as follows:
Theorem (“The Fundemental Theorem”): Let $\mathcal{F}$ be an integral domain and let $f(X) \in \mathcal{F}[X]$ be a non-zero polynomial of degree $d$, then $f(X)$ has at most $d$ roots in $\mathcal{F}$.
We prove this using induction over the degree $d$ of $f(X)$.
Base d = 0. If $d = 0$, then $f(X) = c$ for some non-zero $c \in \mathcal{F}$. Clearly, $f(X)$ has $0$ roots.
Step d > 0. Let $f(X)$ be a polynomial of degree $d$. If $f(X)$ has $0$ roots in $\mathcal{F}$, then we are done. Otherwise, let $x_0 \in \mathcal{F}$ be a root of $f(X)$. Then, using the factor theorem, we write $f(X) = (X - x_0) \cdot g(X)$ for some polynomial $g(X)$ of degree $d-1$. By the inductive hypothesis, $g(X)$ has at most $d-1$ roots and $X - x_0$ has at most $1$ root. Finally $f(X)$ has at most $d+1$ roots in because $\mathcal{F}$ is an integral domain: since if $x \in \mathcal{F}$ is such that $f(x) = (x - x_0) \cdot g(x) = 0$, then either $(x - x_0) = 0$ or $g(x) = 0$, in other words, $x$ must be a root of either $X - x_0$ or $g(X)$ of which there is at most $1 + (d - 1) = d$.
The theorem also allows us to conclude that if two polynomials $f_0(X)$ and $f_1(X)$ of degree $d$ share more than $d$ evaluations in $\mathcal{F}$, then the polynomials must be equal.
Corollary (Equality of Polynomials): Let $f_0(X)$ and $f_1(X)$ be polynomials of degree $d$ over $\mathcal{F}$. Let $x_0, \ldots, x_d \in \mathcal{F}$ be distinct elements. Then: $$ \forall i \in \{0, \ldots, d\}. f_0(x_i) = f_1(x_i) $$ $$\iff$$ $$f_0(X) = f_1(X)$$ Proof: Define $f(X) = f_0(X) - f_1(X)$. Note that $f(X)$ is a polynomial in $\mathcal{F}[X]$ of degree at most $d$ and observe that $f(x_i) = 0$ for all $i \in \{0, \ldots, d\}$: $f(X)$ has at least $d+1$ roots in $\mathcal{F}$. Therefore, $f(X)$ must be the zero polynomial, i.e. $f_0(X) = f_1(X)$.
We can turn the corollary above into a probablistic check of equality between polynomials, this technique is called the Schwartz-Zippel lemma and widely used; we will use it a lot.
Corollary (Schwartz-Zippel): Let $f_0(X) \neq f_1(X)$ be distinct polynomials of degree $d$ and let $\mathcal{C} \subseteq \mathcal{F}$ be an arbitrary subset of the integral domain $\mathcal{F}$, then: $$ \prob_{x \sample \mathcal{C}}[f_0(x) = f_1(x)] \leq \frac{d}{|\mathcal{C}|} $$
Proof: Note that the statement is vacuous if $|\mathcal{C}| \leq d$. When $|\mathcal{C}| > d$ then the statement implies that there must exists a subset $S \subseteq \mathcal{C}$ with $|S| > d$ such that: $$\forall x \in S, f_0(x) = f_1(x)$$ In which case the previous corollary shows that $f_0(X) = f_1(X)$.
Tensor Products & Hypercubes
The “Boolean Hypercube” is a tensor product, i.e. “all lists with entries from”, the set $\bin$:
$$ \begin{align} \mathbb{H}_k &= \bin^k \subseteq \FF^k \\ &= \big\{ (x_1, \ldots, x_k) \mid x_1, \ldots, x_k \in \bin \big\} \end{align} $$
For instance:
$$\mathbb{H}_2 = \bin^2 = \big\{ (0, 0), (0, 1), (1, 0), (1, 1) \big\}$$
It is clear that the $k$-dimensional hypercube has $2^k$ elements: $|\mathbb{H}_k| = 2^k$. In many ways, the choice of $\bin$ is arbitary, another popular choice is $\{ -1, 1 \}$ which has the slight advantage what is is a group under multiplication, making some things slightly nicer: it natrually encodes “XOR” of bits (can you see how?).
The most important part about $\mathbb{H}_k$ is not the particular sets $\bin$ or $\{ -1, 1 \}$, that every coordinate is from the same set, but the tensor structure, in the most general case: $$ \begin{align} \mathbb{H}_k &= S_1 \tensor S_2 \tensor \ldots \tensor S_k \\ &= \big\{ (x_1, \ldots, x_k) \mid x_1 \in S_1, \ldots, x_k \in S_k \big\} \end{align} $$ For small sets $S_1, \ldots, S_k$, most naturally $\forall i. |S_i| = 2$, hence the name “Boolean hypercube”.
The only change from picking another/different $S_1, \ldots, S_k$ would be the lagrange basis used during interpolation we will explore next; how to interpolate the polynomials passing through $\{ (x, y) \}$ for $x \in S_i, y \in \FF$. I encourage the reader to always consider how things in this post would change if we were to use $\{ -1, 1 \}$ instead or even what should happen if we used a ternary $\{ 0, 1, 2 \}$, this is a great way to test your understanding of the material.
Multivariate Polynomials
A multivariate polynomial is, as the name suggests, is simply a polynomial in one or more variables.
For instance:
- $f(X_1, X_2) = X_1^2 + X_2^2 + X_1 X_2 + 1$
Is a multivariate polynomial of individual degrees $2$ and $2$ in the variables $X_1$ and $X_2$. - $f(X_1, X_2) = X_1^2 + X_2^2 + X_1 X_2 + X_1 X_2^3 + 1$
Is a multivariate polynomial of individual degrees $2$ and $3$ in the variables $X_1$ and $X_2$. - $f(X_1) = 5 \cdot X_1^6 + 3 \cdot X_1^4 + 2 \cdot X_1^2 + 1$
Is a multivariate polynomial of individual degree $6$ in the variable $X_1$.
Definition. A multilinear polynomial $f(X_1, \ldots, X_k) \in \FF[X_1, \ldots, X_k]$ is a multivariate polynomial where individual degrees in each variable are at most one. For instance:
- $f(X_1, X_2) = 8 \cdot X_1 X_2 + 5 \cdot X_1 + X_2$
- $f(X_1) = 5 \cdot X_1 + 37$
While all the examples of multivariate polynomials were not multilinear polynomials.
Roots over Tensor Products
An important way for us to view multivariate polynomials will be as univariate polynomials over polynomials rings. To this end, it is useful for us to verify that multivariate polynomials form integral domains allowing us to apply the fundamental theorem:
Theorem. Let $\mathcal{F}$ be an integral domain, then $\mathcal{F}[X]$ is an integral domain.
Proof. We can multiply and add multivariate polynomials, it is also clear that polynomial multiplication is commutative (since $\mathcal{F}$ is), i.e. for $f(X) \in \mathcal{F}[X]$ and $g(X) \in \mathcal{F}[X]$, we have: $$ f(X) \cdot g(X) = g(X) \cdot f(X) $$ Finally, let us verify that there are no zero divisors in $\mathcal{F}[X]$, i.e. show: $$ f(X) \cdot g(X) = 0 \implies f(X) = 0 \text{ or } g(X) = 0 $$ To see this let $j$ and $i$ be the degrees of $f(X)$ and $g(X)$ respectively, denote by $f_j \in \mathcal{F}$ and $g_i \in \mathcal{F}$ the leading coefficients of $f(X)$ and $g(X)$. Note that $f_j \neq 0$ and $g_i \neq 0$ otherwise $f(X) = 0$ or $g(X) = 0$ respectively. Then the leading coefficient of $f(X) \cdot g(X)$ is $f_j \cdot g_i$ which is non-zero since $\mathcal{F}$ is an integral domain. Therefore, $f(X) \cdot g(X) = 0$ if and only if $f(X) = 0$ or $g(X) = 0$.
Corollary. By applying the theorem above $n$ times, we can conclude that $\FF[X_1, \ldots, X_n]$ is an integral domain: let $\mathcal{R}_0 = \FF$ and $\mathcal{R}_i = \mathcal{R}_{i-1}[X_i]$ for $i = 1, \ldots, n$. Observe that: $$ \mathcal{R}_i = (((\FF[X_1])[X_2])\ldots)[X_i] = \FF[X_1, \ldots, X_i] $$ This “iterative” construction of $\FF[X_1, \ldots, X_n]$ us to view $k$-variate polynomials over $\FF$ as univariate polynomials over $\mathcal{F}$:
$$ f(X_1, \ldots, X_k) \in \FF[X_1, \ldots, X_{k}] $$ $$ \text{Equivalently} $$ $$ f(X_k) \in \mathcal{F}[X_k] \text{ where } \mathcal{F} = \FF[X_1, \ldots, X_{k-1}] $$ With this interpretation $f(X_k)$ is a polynomial over $\mathcal{F} = \FF[X_1, \ldots, X_{k-1}]$ and therefore $X_k$ can take any value in $\mathcal{F}$ (not just $\FF$), i.e. can evaluate $f(X_k)$ for every $k-1$ variate polynomial $X_k \in \mathcal{F}$, which includes the constant polynomials, i.e. $\FF \subseteq \mathcal{F}$.
If we apply the fundamental theorem to this particular setting we get:
Corollary: Let $f(X_1, \ldots, X_{k}) = f(X_k) \in \mathcal{F}[X_k]$ with $\mathcal{F} = \FF[X_1, \ldots, X_{k-1}]$ be a $k$-variate polynomial. Let $d$ be the degree in the $X_k$ variable, then there exists at most $d$ distinct $(k-1)$-variate polynomials $x_k \in \mathcal{F}$ such that $f(x_k) = 0 \in \mathcal{F}$. In particular, there exists at most $d$ field elements (constant polynomials) $x_k \in \FF$ such that $f(x_k) = 0 \in \FF[X_1, \ldots, X_{k-1}]$.
If we apply this observation recursively, we can conclude that for sufficiently large tensor products, a polynomial vanishes over the tensor product iff. the polynomial is the zero polynomial:
Theorem: Let $f(X_1, \ldots, X_k) \in \FF[X_1, \ldots, X_{k-1}]$ be a non-zero $k$-variate polynomial with degree $d_i$ in each variable $X_i$. Let $\mathbb{H}_k$ be the tensor of $S_1, \ldots, S_k$ where $\forall i. |S_i| > d_i$: $$ \mathbb{H}_k = S_1 \tensor S_2 \tensor \cdots \tensor S_k \subseteq \FF^k $$ Then: $$ \forall (x_1, \ldots, x_k) \in \mathbb{H}_k. f(x_1, \ldots, x_k) = 0 $$ $$\iff$$ $$f(X_1, \ldots, X_k) = 0 \in \FF[X_1, \ldots, X_k]$$ We prove this by induction:
Base k = 1. When $\vec{X} = (X_1)$ the “multivariate” polynomial is a univariate polynomial $f(X_1) \in \FF[X_1]$, by applying the theorem about the number of roots with $\mathcal{F} = \FF$ and $d = 1$, we observe that the number of roots of $f(X_1)$ is at most $d_1$, however since $|S_1| > d_1$ the polynomial cannot evaluate to zero on all of $S_1$. So the claim holds.
Step k > 1. Define $\mathcal{F}$ $=$ $\FF[X_1, \ldots, X_{k-1}]$ and now rewrite $f(\vec{X})$ as a polynomial with coefficients in $\mathcal{F}$: $$ f(X_1, \ldots, X_k) = \sum_i X_k^i \cdot f_i(X_1, \ldots, X_{k-1}) \in \mathcal{F}[X_k] $$ Since $\mathcal{F}$ is an integral domain, we can applying the theorem about the number of roots of a polynomial, this time to $\mathcal{F}$ $=$ $\FF[X_1, \ldots, X_{k-1}]$, rather than $\mathcal{F} = \FF$. We conclude that at most $d_k$ values $x_k \in \mathcal{F}$ satisfy: $$ f(x_k) = \sum_i x_k^i \cdot f_i(X_1, \ldots, X_{k-1}) = 0 $$ And, in particular, there exists at most $d_k$ elements $x_k \in S_k \subseteq \FF \subseteq \mathcal{F}$ (constant polynomials) satisfying this. On the other hand, since $|S_k| > d_k$ there must exist at-least one $x_k \in S_k$ which is not a root, in other words. $$ f(x_k) = g(X_1, \ldots, X_{k-1}) \neq 0 \in \mathcal{F} $$ We then apply the induction hypothesis on $g(X_1, \ldots, X_{k-1})$ to conclude that it does not vanish over $\mathbb{H}_{k-1}$, on other words, we conclude that there is at-least one $(x_1,$ $\ldots,$ $ x_{k-1})$ $\in$ $\mathbb{H}_{k-1}$ st. $g(x_1, \ldots, x_{k-1}) \neq 0$, which also allow us to conclude: $$ f(x_1, \ldots, x_k) = g(x_1, \ldots, x_k) \neq 0 \in \FF $$ So $f(X_1, \ldots, X_k)$ also cannot vanish over $\mathbb{H}_k$ and the claim holds for $k$ as well.
Corollary: Setting $d_1 = d_2 = \ldots = d_k = 1$ and $\mathbb{H}_k = \bin \tensor \ldots \tensor \bin$ is the $k$-dimensional hypercube, we conclude that a non-zero multilinear polynomial cannot vanish on the hypercube, i.e. if $f(X_1, \ldots, X_k) \in \FF[X_1, \ldots, X_k]$ then there exists at-least one $(x_1, \ldots, x_k) \in \bin^k$ st. $f(x_1, \ldots, x_k) \neq 0$.
An easy, but very important, observation is that two multilinear polynomials can agree on the hypercube if and only if they actually are equal as polynomials.
Corollary: Let $f, g \in \FF[X_1, \ldots, X_k]$ be two multilinear polynomials which: $$ \forall \vec{x} \in \mathbb{H}_k. f(\vec{x}) = g(\vec{x}) $$ Then we can form: $$ h(X_1, \ldots, X_k) = f(X_1, \ldots, X_k) - g(X_1, \ldots, X_k) $$ By assumption $\forall \vec{x} \in \mathbb{H}_k. h(\vec{x}) = f(\vec{x}) - g(\vec{x}) = 0$, therefore we conclude that $h(X_1, X_2, \ldots, X_k) = 0$ by the theorem. Hence $f(X_1, \ldots, X_k) = g(X_1, \ldots, X_k)$.
Swartz-Zippel
We can expand the techniques above to reason about the probability that a multivariate polynomial $f(X_1, \ldots, X_k)$ vanishes at a random point $\vec{x} \sample \mathbb{H}_k$.
Theorem (“Multivariate Swartz-Zippel”). Let $\mathbb{H}_k = S_1 \tensor \ldots \tensor S_k$ and let $f(X_1, \ldots, X_k)$ be a multivariate polynomial of individual degrees $d_i$ in $X_i$. Then the probability that the polynomial vanishes at uniformly random $\vec{x} \sample \mathbb{H}_k$ can be bounded as follows: $$ \prob_{\vec{x} \sample \mathbb{H}_k } \left[ f(\vec{x}) = 0 \right] \leq \sum_{i=1}^{k} \frac{d_i}{|S_i|} $$
We show this by induction:
Base k = 1. When $k = 1$ the “multivariate” polynomial is a univariate polynomial $f(X_1) \in \FF[X_1]$, by applying the fundemental theorem with $\mathcal{F} = \FF$ we observe that the number of roots of $f(X_1)$ is at most $d_1$. Hence for uniform $x_1 \sample S_1$, the probability that $f(x_1) = 0$, i.e. that $x_1$ is one of the at most $d_1$-roots, is at most $d_1 / |S_1|$.
Step k > 1. Basically, there are two ways that $f(x_1, \ldots, x_k)$ could be zero:
- When we partially evaluate we get the zero polynomial $f(X_1, \ldots, X_{k-1}, x_k) = 0$
- Or, $g(X_1, \ldots, X_{k-1}) = f(X_1, \ldots, X_{k-1}, x_k) \neq 0$, but $g(x_1, \ldots, x_{k-1}) = 0$.
Define $\mathcal{F}$ $=$ $\FF[X_1, \ldots, X_{k-1}]$ and view $f(\vec{X})$ as a polynomial in $\mathcal{F}$: $$ f(X_1, \ldots, X_k) = \sum_i X_k^i \cdot f_i(X_1, \ldots, X_{k-1}) \in \mathcal{F}[X_k] $$ Since $\mathcal{F}$ is an integral domain. We conclude that at most $d_k$ values $x_k \in \mathcal{F}$ satisfy: $$ f(x_k) = \sum_i x_k^i \cdot f_i(X_1, \ldots, X_{k-1}) = 0 $$ And, in particular, there exists at most $d_k$ elements $x_k \in S_k \subseteq \FF \subseteq \mathcal{F}$ which makes $f(x_k) = 0$, hence the probability that $x_k \sample S_k$ makes $f(x_k) = 0$ is at most $d_k / |S_k|$.
On the other hand, if $x_k \sample S_k$ is not a root: $$ f(x_k) = g(X_1, \ldots, X_{k-1}) \neq 0 \in \mathcal{F} $$ We can apply the induction hypothesis on $g(X_1, \ldots, X_{k-1})$ to conclude that: $$ \prob_{\vec{x} \sample \mathbb{H}_k} \left[ g(\vec{x}) = 0 \right] \leq \sum_{i=0}^{k-1} \frac{d_i}{|S_i|} $$ By applying a union bound on both these events we conclude that: $$ \begin{align} \prob_{\vec{x} \sample \mathbb{H}_k } \left[ f(\vec{x}) = 0 \right] &\leq \frac{d_k}{|S_k|} + \left( \sum_{i=1}^{(k-1)} \frac{d_i}{|S_i|} \right) \\ &= \sum_{i=1}^{k} \frac{d_i}{|S_i|} \end{align} $$
Corollary: When $S_1 = S_2 = \ldots = S_k = \FF$, i.e we sample challenge points from the entire field, and $d_1 = d_2 = \ldots = d_k$ then we get the following special case: $$ \prob_{\vec{x} \sample \FF^k } \left[ f(\vec{x}) = 0 \right] \leq \frac{k \cdot d}{|\FF|} $$ In particular, when $f(X_1, \ldots, X_{k-1})$ is multilinear ($d_1 = d_2 = \ldots = d_{k} = 1$) we have: $$ \prob_{\vec{x} \sample \FF^k } \left[ f(\vec{x}) = 0 \right] \leq \frac{k}{|\FF|} $$