Given any binary boolean function \(\funcdecl{\varphi}{\mathbb{F}_2^n}{\mathbb{F}_2}\), there exist \(a_u \in \mathbb{F}_2\) such that:

\[\begin{equation} \varphi(x) = \sum_{u\in \mathbb{F}_2^n}a_u\left(\prod_{i=1}^n x_i^{u_i}\right) \label{eq:anf1}\tag{1} \end{equation}\]This representation is called the **algebraic normal form** (ANF) of \(\varphi\), and the bulk of this text is about the way we compute these coefficients.^{1} But first, we need to show that this representation always exists. One way to do so, is by reasoning as follows: define a function \(\funcdecl{\delta}{\mathbb{F}_2^n}{\mathbb{F}_2}\), such that \(\delta(x)=1\) if \(x=0\), and \(\delta(x)=0\) if \(x\neq 0\).^{2} We can now define \(\varphi\) as:

This works because \(\delta(z+x)\) is \(0\) except when \(z=x\), in which case it is \(1\). Equation \(\ref{eq:phi_kronecker1}\) might seem a circular definition, specially if one is thinking of \(\varphi\) as being defined by a “formula”. However, the most general definition of a function from, say, set \(X\) to set \(Y\) is as a subset — that is, a relation — of \(X \times Y\), subject to the caveat that, for every \(x\in X\), there *must exist exactly one pair* \((x, y)\) in that relation (in the case of function \(\varphi\), we would say that for all \(x\), there must exist exactly one \(y\) such that \(y=\varphi(x)\)). Hence equation \(\ref{eq:phi_kronecker1}\) is just an analytical description of that subset.

Now, we can replace the \(\delta\) function with \(\prod_{i=1}^n (1+x_i+z_i)\), which is \(1\) if and only if \(x=z\), just like \(\delta\). Equation \(\ref{eq:phi_kronecker1}\) now becomes:

\[\begin{equation} \varphi(x) = \sum_{z\in \mathbb{F}_2^n} \varphi(z)\prod_{i=1}^n (1+x_i+z_i) \label{eq:phi_kronecker2}\tag{3} \end{equation}\]When \(\varphi\) is a concrete function, both \(\varphi(z)\) and the \(z_i\)’s will be concrete \(0\)’s and \(1\)’s — the only variables will be the \(x_i\)’s. Furthermore, the \(x_i\)’s belong to \(\mathbb{F}_2\), and hence we have that \(x_i^2 = x_i\). Thus when developing (\(\ref{eq:phi_kronecker2}\)) for a concrete function, we will end up with an equation of the form of (\(\ref{eq:anf1}\)). This shows that **the ANF always exists**.

The ANF is also unique, which we can show through a cardinality argument. Consider the set of all the functions \(\funcdecl{\varphi}{\mathbb{F}_2^n}{\mathbb{F}_2}\); this set has \(2^{2^n}\) elements; denote it set \(\mathcal{BBF}\).^{3} Now consider the set \(\mathcal{ANF}\) consisting of all the expressions of the form of the right hand side of equation \(\ref{eq:anf1}\). Its elements are in one-to-one correspondence with the set of subgroups of the set of all expressions of the form \(\prod_{i=1}^n x_i^{u_i}\). There are \(2^n\) of these expressions (one for each \(u\in \mathbb{F}_2^n\)), and hence the number of elements of \(\mathcal{ANF}\) is \(\sum_{j=0}^{n} \binom{2^n}{j}\), which, by a well-known identity relating the binomial coefficients,^{4} equals \(2^{2^n}\). As every element of \(\mathcal{ANF}\) corresponds exactly to one binary boolean function — i.e. an element of \(\mathcal{BBF}\) — and as both sets have the same cardinality, we conclude that **the ANF is unique**.^{5}

We usually abbreviate \(\prod_{i=1}^n x_i^{u_i}\) as \(x^u\), so (\(\ref{eq:anf1}\)) becomes \(\varphi(x) = \sum_{u\in \mathbb{F}_2^n}a_u x^u\). But we can simplify it even further. For any \(x\in \mathbb{F}_2^n\), the set \(\{i\in \{1..n\}\ \vert\ x_i\neq 0\}\) is called the *support* of \(x\), denoted \(supp(x)\). Note that the cardinality of the support of \(x\) equals its *Hamming weight*, \(hw(x)\).^{6} Going back to \(\prod_{i=1}^n x_i^{u_i}\), it equals \(1\) if and only if for all \(i\), \(u_i = 1 \rightarrow x_i = 1\) (otherwise it is \(0\)) — but this is the same as requiring that \(supp(u)\subseteq supp(x)\). Let us denote this latter condition as \(u \preccurlyeq x\) (we thus say that \(u\) *is covered* by \(x\), or equivalently, that \(x\) *covers* \(u\)). This allows us to rewrite (\(\ref{eq:anf1}\)) as:

But how to compute the different \(a_u\)? We are aided by the fact that the “converse” of (\(\ref{eq:anf2}\)) also holds:

**Theorem 1.** For all \(u\in \mathbb{F}_2\), the following holds:

\[\tag*{$\square$}\]

**Proof.** Replace \(a_u\) in (\(\ref{eq:anf2}\)) with the right hand side of (\(\ref{eq:anf3}\)) to obtain:

These two nested summations run over all the pairs \((u, y)\) such that \(u\preccurlyeq x \land y\preccurlyeq u\). We can join both conditions in one: \(y\preccurlyeq u \preccurlyeq x\); and this means that we can cover all the pairs \((y, u)\), where both \(y\) and \(u\) verify the same condition as above, by iterating over all \(y\) covered by \(x\), and then for each \(y\), go over all \(u\) that covers \(y\) (and is covered by \(x\)). That is to say that we can rewrite the nested summation as follows:

\[\begin{equation} \sum_{y \in \finitefield{F}_2^n \mid \covered{y}{x}} \left(\sum_{u \in \finitefield{F}_2^n \mid \covered{y}{u} \preccurlyeq x} \varphi(y)\right) \label{eq:anf5}\tag{7} \end{equation}\]This can be further rearranged as:

\[\begin{equation} \sum_{y \in \finitefield{F}_2^n \mid \covered{y}{x}} \varphi(y) \left(\sum_{u \in \finitefield{F}_2^n \mid \covered{y}{u} \preccurlyeq x} 1\right) = \varphi(x) \tag{8} \end{equation}\]The last equality follows because of the set \(\{u \in \finitefield{F}_2^n \mid \covered{y}{u} \preccurlyeq x\}\): if \(x \neq y\), then it is either empty (if \(y \not\preccurlyeq x\)), or (if \(y \preccurlyeq x\)) it is nonempty, and has \(2^{hw(x)-hw(y)}\) elements — but in both cases, the inner summation evaluates to zero (remember we are summing in \(\finitefield{F}_2\)). If \(y=x\), the set has just one element, and the result follows. \(\blacksquare\)

## The ANF/truth table algorithm

The duality shown in equations \(\ref{eq:anf2}\) and \(\ref{eq:anf3}\) yields a simple algorithm to compute the ANF coefficients (\(a_u\)’s) from \(\varphi\)’s truth table, or vice-versa (i.e. it can also be used to compute the value of the function, given the \(a_u\)’s). I will focus on the former. Let us take as an example the function defined by \((x_1 \rightarrow x_2) \rightarrow x_3\). Here is its truth table:

\[ \begin{array}{c|c|c|c} x_1 & x_2 & x_3 & (x_1 \rightarrow x_2) \rightarrow x_3 \\ \hline 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 1 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 1 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ \end{array} \]

The algorithm goes like this: start in the column that corresponds to the “most significant bit”; \(x_1\) in the table above. Where it has the value \(0\), leave the value of the function unchanged; where it has the value \(1\), add to the value of the function at the current entry the value of the function at the entry obtained by setting \(x_1\) to \(0\). This is illustrated in the table below, where instead of replacing the values of the function, they are shown in an auxiliary column:

\[ \begin{array}{c|c|c|c|c} x_1 & x_2 & x_3 & (x_1 \rightarrow x_2) \rightarrow x_3 & \textrm{aux_1} \\ \hline 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 1 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 1 & 1 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 \\ \end{array} \]

And now you just do the same thing for \(x_2\), and then for \(x_3\) (if there were more columns, you would just continue until reaching the one corresponding to the “least significant bit”). Again to illustrate, here’s the result, with two more auxiliary columns (aux_2 for \(x_2\) and aux_3 for \(x_3\)). Note that, as we are not replacing the function values, aux_2 is computed using the values of aux_1, and aux_3 using the values of aux_2.

\[ \begin{array}{c|c|c|c|c|c|c} x_1 & x_2 & x_3 & (x_1 \rightarrow x_2) \rightarrow x_3 & \textrm{aux_1} & \textrm{aux_2} & \textrm{aux_3} \\ \hline 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 1 & 1 & 0 & 0 \\ 1 & 0 & 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 & 0 & 0 & 1 \\ 1 & 1 & 0 & 0 & 0 & 1 & 1 \\ 1 & 1 & 1 & 1 & 0 & 0 & 1 \\ \end{array} \]

Now, the values in column aux_3 are the \(a_u\)’s that correspond to \(x^u\); for example the coefficient for \(x_1x_2\) (i.e. \(u=110\)) is \(1\), which means the monomial shows up in the final expression. Doing this for all the values in the table, yields that final expression:

\[\begin{equation} (x_1 \rightarrow x_2) \rightarrow x_3 \equiv x_1+x_3 + x_1x_2 + x_1x_3 + x_1x_2x_3 \tag{9} \end{equation}\]And as mentioned above, if we start with a table listing of the \(a_u\)’s, this algorithm yields the original function:

\[ \begin{array}{c|c|c|c|c|c|c} u_1 & u_2 & u_3 & \textrm{aux_3 } (\text{i.e. } a_u) & \textrm{aux_4} & \textrm{aux_5} & \textrm{aux_6} \text{ i.e. } (x_1 \rightarrow x_2) \rightarrow x_3\\ \hline 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 & 0 & 0 & 1 \\ 1 & 1 & 0 & 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \end{array} \]

Doing this for a few different functions, it is very easy to get a “feeling” that this works. Of course the *high level* reason why this works is obvious: because by the time the algorithm get to the aux_3 column, it has already, for each row (where the variables’ columns are seen as an \(u\) value) summed all the function values for all the \(x\)’s that that \(u\) value covers. **But how to prove this?**

To see the lower level reason why this works, we borrow a bit of notation from Python: indexing starts from \(0\), and \(x[m:n+1]\) means the substring of \(x\) that starts at index \(m\) and ends at index \(n\), including this last element (but not the one at position \(n+1\)) — this is how it works on Python. The element of \(x\) at index \(m\) is denoted \(x[m]\). Also, \(x[m:]\) is the strings that goes from (and includes) position \(m\), to the end of string \(x\). Lastly, if \(x\) and \(y\) are strings, then \(x == y\) means the strings are equal, and \(x + y\) denotes concatenation. A string that contains just the element \(a\) is denoted \([a]\) (this is also a “Python-ism”).

Also, to explain why the algorithm works (we will only look at the case going from the function to the \(a_u\)’s), it is a lot easier to forget the ANF and just think of it as a transversal problem: for a given \(x\), which elements does the algorithm touch? We will show it touches (once and only once) precisely those elements \(u\) such that \(u\preccurlyeq x\).

Indeed, if we are dealing with strings of length \(1\), it is obvious that the algorithm works. The case of length \(2\) can be easily checked by hand, and furthermore, the truth table for length two essentially “doubles” that of the case of length \(1\) — and this provides the motivation for an inductive reasoning. Suppose that for a given string \(x\) of length \(n\) (i.e. when working in \(\mathbb{F}_2^n\)), the algorithm works as expected: i.e., it passes by every string \(u\) of the same length, such that \(u\preccurlyeq x\) holds, *exactly once*. Consider now the strings of length \(n+1\), where the “extra bit” is added to the left: then, this set can be constructed taking the set of strings of length \(n\), and prepending to it a column of all \(0\)’s, and then doing the same thing, but now with all \(1\)’s column. To take a concrete example, in the truth tables above, observe that the set of tuples of \(x_2\) and \(x_3\) are repeated, once for \(x_1=0\) and again for \(x_1=1\).

Let \(x\in \mathbb{F}_2^{n+1}\), and suppose that \(x[0] = 0\). From the induction hypothesis (and the above discussion about truth table “duplication”), it is clear the algorithm will touch exactly once in all strings covered by \([0] + x[1:] = x\) — which is what we wanted to show. If \(x[0] = 1\), then a similar argument shows that the algorithm will touch exactly once all strings covered by \([1] + x[1:] = x\). However, because the the first element is a \(1\), it will also touch once on the string \([0] + x[1:]\) — which entails that, as it moves throughout the remaining columns, it will also touch exactly once on all strings covered by this latter string. Thus, again we conclude it touches once and just once on all strings covered by \(x\). As \(x\) is an arbitrary string, this holds for all elements of \(\mathbb{F}_2^{n+1}\).

**EDIT November 22, 2018**: added the table showing how to go from the \(a_u\)’s to the function (the inverse case to what is explained).

**EDIT March 30, 2019**: simplified the explanation of the algorithm, and of the double summation swap.

\(\mathbb{F}_2^n\) denotes the vector space of dimension \(n\) over the Galois field with two elements. It will be clear from context the field in which the summations take place (i.e. \(\mathbb{F}_2\) or \(\mathbb{Z}\)). Also, this formalism implies the convention — widely accepted within the “discrete domains” of mathematics, but by no means universal outside of it — that \(0^0 = 1\). If the reader finds himself mystified by this, checking out the relevant Wikipedia page is very recommended.↩

Such a function is called the Kronecker delta.↩

There two choices — \(0\) or \(1\) — for each element of the domain, and each element of the domain must have exactly one image. Hence we have two choices for the first element of the domain, and for each of those, another two choices for the second element, … which yields a total number of functions of \(2^{2^n}\). More generally, for a function \(\funcdecl{\psi}{X}{Y}\), with \(X\) and \(Y\) finite, the total number of functions is \(|Y|^{|X|}\).↩

Wikipedia to the rescue.↩

Actually, this argument also shows the ANF always exists, because the correspondence between \(\mathcal{BBF}\) and \(\mathcal{ANF}\) is a bijection.↩