Kyber is a post-quantum cryptography algorithm that was standardized by NIST in 2022 and is considered secure against attacks from quantum computers. Kyber is a Key Encapsulation Mechanism (KEM). It allows two parties to securely establish a shared secret key, even over an insecure channel.

This post provides the basic intuition behind Kyber and the ideas that make it secure. We’ll look at a simplified version of the underlying concept. The goal is to understand how the core idea works, before diving into the details of Kyber itself.

In a later post, we’ll look at Kyber in more detail and explain how it builds upon these foundations to create a practical, efficient, and quantum-safe cryptosystem.

The scheme we’ll look at here comes from Oded Regev’s paper “On Lattices, Learning With Errors, Random Linear Codes, and Cryptography”, published in 2005. An updated version is available on arXiv. It defines the Learning With Errors (LWE) problem, which is the basis for many post-quantum cryptographic schemes, including Kyber.

You don’t need any background in quantum computing. The math behind the the scheme actually quite approachable and basic linear algebra and modular arithmetic is sufficient to follow.

The clever trick behind the scheme is to intentionally introduce tiny random mistakes into the data. Without these errors, the secret would be easy to uncover. With them, even quantum computers can’t do that. Due to these errors, there is a small chance that decryption might fail. However, if the parameters are chosen wisely, these errors almost never cause any real decryption failures.

Parameters

The scheme uses three parameters.

The parameter $n$ determines the security level of the scheme. The larger $n$ is, the more secure the scheme becomes. However, increasing $n$ also increases the computational cost for the participating parties as well as the size of the keys.

The parameter $p$ defines the size of the numerical range in which computations are performed. All operations are carried out modulo $p$, meaning results are reduced to the range from $0$ to $p - 1$. The value of $p$ should be a prime number, and it should lie between $n^2$ and $2n^2$.

The third parameter, $m$, influences the size of the public key. A larger value of $m$ results in a larger public key but also increases the security of the scheme. The value of $m$ is chosen based on $n$ and $p$, and should be set to

$$ (1 + \epsilon)(n + 1)\log(p), $$

where $\epsilon$ is any constant greater than $0$.

Since $p$ is on the order of $n^2$, it follows that $m = \mathcal{O}(n \log n).$

Generating the Private Key

The private key is a secret vector $\mathbf{s} \in \mathbb{Z}_p^n$. This vector has $n$ entries which are selected uniformly at random from the range $0$ to $p - 1$. This vector remains private and serves as the secret key.

Building the Public Key

To derive the public key from the private key, a matrix $\mathbf{A} \in \mathbb{Z}_p^{m \times n}$ is generated. This matrix has $m$ rows and $n$ columns. Its entries are chosen uniformly at random from the range $0$ to $p - 1$.

We also choose an error vector $\mathbf{e} \in \mathbb{Z}_p^m$. The entries of this vector are also selected at random, but they are required to be small random values.

From the matrix $\mathbf{A}$, the private key $\mathbf{s}$ and the error vector $\mathbf{e}$, we compute the vector $\mathbf{b} \in \mathbb{Z}_p^m$:

$$ \mathbf{b} = \mathbf{A} \mathbf{s} + \mathbf{e} $$

As mentioned earlier, all computations are performed modulo $p$.

The public key consists of the pair $(\mathbf{A}, \mathbf{b})$.

If the error vector $\mathbf{e}$ were not present, one could easily recover the private key $\mathbf{s}$ from the public key. For example, if we had $\mathbf{e} = \mathbf{0}$, then $\mathbf{b} = \mathbf{A} \mathbf{s}$. Since $\mathbf{A}$ and $\mathbf{b}$ are part of the public key and therefore known, one could compute $\mathbf{s}$ using the inverse of $\mathbf{A}$ (assuming that $\mathbf{A}$ is invertible):

$$ \mathbf{s} = \mathbf{A}^{-1} \mathbf{b} $$

The inverse of a matrix can be computed very easily, for example using the Gauss–Jordan elimination algorithm.

Since the error vector $\mathbf{e}$ is not zero, computing $\mathbf{s}$ becomes significantly more complicated. The error introduces uncertainty that makes it extremely difficult to reconstruct the private key from the public key. This is the Learning With Errors (LWE) problem (the goal is to learn the secret vector from noisy linear equations), and it forms the core of the scheme’s security, a problem believed to be hard even for quantum computers.

Size of the Public Key

Because the public key consists of the matrix $\mathbf{A}$ and the vector $\mathbf{b}$, it contains a total of $mn + m$ entries. Since $m$ is on the order of $n \log n$, the public key has $\mathcal{O}(n^2 \log n)$ entries. Each entry requires $\log_2(p)$ bits to represent. Because $p$ is on the order of $n^2$, we have $\log_2(p) = \mathcal{O}(\log n)$. Thus, representing the public key requires $\mathcal{O}(n^2 \log^2 n)$ bits.

Encrypting a Single Bit

To encrypt a message, in this case a single bit $m \in {0, 1}$, the sender randomly chooses a subset $S$ from all $2^m$ possible subsets of $M = {1, 2, \ldots, m}$. This subset may theoretically be empty or contain all elements of $M$, but on average it will contain half of the elements of $M$.

Next, the sender selects the rows $\mathbf{a_i}$ from the matrix $\mathbf{A}$ whose indices are contained in $S$. The sender then computes a vector $\mathbf{u}$ and a scalar $v$ as follows:

$$ \mathbf{u} = \sum_{i \in S} \mathbf{a_i}, \qquad v = \left\lfloor \frac{p}{2} \cdot m \right\rfloor + \sum_{i \in S} b_i $$

The ciphertext consists of the pair $(\mathbf{u}, v)$.

Decrypting the Ciphertext

To decrypt the ciphertext $(\mathbf{u}, v)$, the receiver uses their private key $\mathbf{s}$. They compute

$$ v - \mathbf{u} \cdot \mathbf{s} $$

where $\mathbf{u} \cdot \mathbf{s}$ denotes the dot product of $\mathbf{u}$ and $\mathbf{s}$.

If the result is closer to $0$ than to $\left\lfloor \frac{p}{2} \right\rfloor$, the decrypted message is interpreted as $0$. If the result is closer to $\left\lfloor \frac{p}{2} \right\rfloor$, the message is interpreted as $1$.

Why Decryption Recovers the Original Message

Let’s take a closer look at why the scheme works.

When the receiver decrypts the ciphertext, they compute:

$$ v - \mathbf{u} \cdot \mathbf{s} = \left\lfloor \frac{p}{2} \cdot m \right\rfloor + \sum_{i \in S} b_i - \mathbf{u} \cdot \mathbf{s} $$

The vector $\mathbf{b}$ is the result of the computation $\mathbf{A} \mathbf{s} + \mathbf{e}$. Let $\mathbf{b} = (b_1, b_2, \ldots, b_m)$ and $\mathbf{e} = (e_1, e_2, \ldots, e_m)$. Then each $b_i$ is the dot product of the $i$-th row of $\mathbf{A}$ with the vector $\mathbf{s}$ plus the error $e_i$:

$$ b_i = \mathbf{a_i} \cdot \mathbf{s} + e_i $$

We therefore obtain

$$ \begin{aligned} v - \mathbf{u} \cdot \mathbf{s} &= \left\lfloor \frac{p}{2} \cdot m \right\rfloor + \left( \sum_{i \in S} \mathbf{a_i} \cdot \mathbf{s} + e_i \right) - \mathbf{u} \cdot \mathbf{s} \\[1.5em] &= \left\lfloor \frac{p}{2} \cdot m \right\rfloor + \sum_{i \in S} \mathbf{a_i} \cdot \mathbf{s} + \sum_{i \in S} e_i - \mathbf{u} \cdot \mathbf{s} \\[1.5em] &= \left\lfloor \frac{p}{2} \cdot m \right\rfloor + \sum_{i \in S} \mathbf{a_i} \cdot \mathbf{s} + \sum_{i \in S} e_i - \left( \sum_{i \in S} \mathbf{a_i} \right) \cdot \mathbf{s} \end{aligned} $$

Because

$$ \left( \sum_{i \in S} \mathbf{a_i} \right) \cdot \mathbf{s} = \sum_{i \in S} (\mathbf{a_i} \cdot \mathbf{s}) $$

the receiver obtains, when computing $v - \mathbf{u} \cdot \mathbf{s}$, the value

$$ \left\lfloor \frac{p}{2} \cdot m \right\rfloor + \sum_{i \in S} e_i $$

If the error vector $\mathbf{e}$ were zero, then $\sum_{i \in S} e_i = 0$, and the receiver would obtain exactly $0$ for $m = 0$ and exactly $\left\lfloor \frac{p}{2} \right\rfloor$ for $m = 1$.

However, since the entries of $\mathbf{e}$ are small random values, the sum $\sum_{i \in S} e_i$ is also a small random value. For example, if each $e_i$ were chosen uniformly from ${-1, 0, 1}$, then $\sum_{i \in S} e_i$ would typically be small, because the positive and negative contributions tend to cancel out. As long as this value remains sufficiently small, the result for $m = 0$ will still lie closer to $0$ and the result for $m = 1$ will still lie closer to $\left\lfloor \frac{p}{2} \right\rfloor$.

A Small Example

Suppose we choose $p = 7$ and $n = 3$. Our private key $\mathbf{s}$ might be:

$$ \mathbf{s} = \begin{pmatrix} 3 \\ 5 \\ 2 \end{pmatrix} $$

To generate the public key, we choose a random matrix $\mathbf{A} \in \mathbb{Z}_7^{4 \times 3}$

$$ \mathbf{A} = \begin{pmatrix} 1 & 4 & 3 \\ 4 & 2 & 6 \\ 0 & 1 & 2 \\ 2 & 5 & 0 \end{pmatrix} $$

and an error vector $\mathbf{e} \in \mathbb{Z}_7^4$, for example:

$$ \mathbf{e} = \begin{pmatrix} 1 \\ 0 \\ 6 \\ 1 \end{pmatrix} $$

As described earlier, the entries of $\mathbf{e}$ should be small random values. In this example, we have a very large value with $6$. However, since we are working modulo $7$, this can also be interpreted as $-1$.

We can now compute $\mathbf{b}$ and obtain:

$$ \mathbf{b} = \mathbf{A} \mathbf{s} + \mathbf{e} = (2, 6, 1, 4)^T $$

With $\mathbf{A}$ and $\mathbf{b}$ as the public key, we can now encrypt a message. Suppose we want to encrypt the bit $m = 1$.

We randomly select rows from the matrix $\mathbf{A}$. Assume we choose the rows with indices $0$ and $2$. Then we compute:

$$ \mathbf{u} = \mathbf{a_0} + \mathbf{a_2} = (1, 4, 3) + (0, 1, 2) = (1, 5, 5) $$

Next, we compute $v$:

$$ v = \left\lfloor \frac{7}{2} \cdot 1 \right\rfloor + b_0 + b_2 = 3 + 2 + 1 = 6 $$

Thus, we obtain the ciphertext $(\mathbf{u}, v) = ((1, 5, 5), 6)$.

To decrypt the ciphertext, the receiver computes:

$$ v - \mathbf{u} \cdot \mathbf{s} = 6 - \big( (1, 5, 5) \cdot (3, 5, 2) \big) = 3 $$

Since $3$ is closer to $\left\lfloor \frac{7}{2} \right\rfloor = 3$ than to $0$, the receiver interprets the decrypted message as $1$, which is correct.

If we increase the amount of error, it may happen that the result becomes closer to $0$, even though $m = 1$. For example, if we choose the error vector $\mathbf{e} = (3, 0, 3, 1)^T$, the decryption of the new ciphertext (which contains the larger error) yields $2$. This value is closer to $0$. In this case, the receiver would incorrectly interpret the message as $0$.

Summary

This article introduced a simple scheme for encrypting a single bit. It is based on the Learning With Errors (LWE) problem, in which a secret is hidden inside a system of linear equations with small, intentionally added errors. These errors make the underlying problem hard to solve, even for quantum computers.

The scheme shown here is deliberately simplified and is meant to show the core idea behind LWE-based approaches without going into the many optimizations and security details found in modern constructions such as Kyber. Nevertheless, this exact principle forms the foundation of Kyber and many other post-quantum secure schemes. Kyber replaces the approach shown here with polynomial algebra, but the conceptual framework remains very similar.

In a later article, I will discuss how Kyber extends this basic idea and enables the encryption of many bits instead of just one.

Kyber Part 1: Learning With Errors - The Math That Makes Post-Quantum Cryptography Possible