Ascon - Cryptography for Constrained Devices

Table of Contents

Introduction

In 2025 NIST published SP 800-232, a standard for lightweight cryptography which was the result of the CAESAR competition (Competition for Authenticated Encryption: Security, Applicability, and Robustness). This competition was announced in 2013 and Ascon which was developed in 2014 at the TU Graz was selected as the primary choice for authenticated encryption and hashing in 2024. The goal of the competition was to identify cryptography algorithms that are well suited for constrained environments such as IoT devices.

The standard defines:

  • Ascon-AEAD128: An authenticated encryption algorithm with a 128-bit key and a 128-bit nonce.
  • Ascon-Hash256: A hashing algorithm that produces a 256-bit hash value.
  • Ascon-XOF128 and Ascon-CXOF128: Extendable-output functions (XOFs) that can produce variable-length output and provide 128 bit security.

The hashing and the encryption algorithms are based on a sponge construction and the whole design focuses on simplicity and efficiency.

Sponge construction

In the following diagram the sponge construction is illustrated. It consists of a state which is divided into two parts: the rate (r) and the capacity (c). The rate is the part of the state that is used for input and output, while the capacity is used for security. The capacity is crucial for the security of the sponge construction. If there would only be a rate part, an attacker could easily reconstruct the initial state from the output.

The state is initialized with the secret key and then updated using a permutation function $p$ (shown as $f$ in the diagram below) after each input or output operation.


Source: https://en.wikipedia.org/wiki/Sponge_function

The diagram shows how a hash function could be constructed based on the sponge construction. The input $P$ is XORed with the rate part of the state. Then the permutation function is applied to the state. This process is repeated until all input data has been processed. Finally, the output $Z$ is extracted from the rate part of the state.

The advantage of the sponge construction is that it can be used to build various cryptographic primitives such as hash functions, message authentication codes, and authenticated encryption schemes. It has been well studied and is considered to be secure when the underlying permutation function is secure.

The same design principle is also used for SHA-3 which was standardized by NIST already in 2015. Since then no weaknesses have been found in the sponge construction itself.

Ascon-AEAD128

Ascon-AEAD128 is an authenticated encryption algorithm that provides both confidentiality and integrity for messages. So compared to AES which requires a separate mode of operation (e.g., GCM) to provide authentication, Ascon-AEAD128 combines both encryption and authentication in a single algorithm.

Ascon takes a 128-bit key, a 128-bit nonce and associated data (AD) as input. The associated data is not encrypted but authenticated. The ciphertext that is produced has the same length as the plaintext. The 128 bit authentication tag can be used to verify the integrity of the associated data and the ciphertext.

The following diagram shows how the encryption process works.


Source: SP 800-232

There are different phases during the encryption process:

  • Initialization
  • Processing of associated data
  • Plaintext encryption
  • Finalization

In the following, these phases are explained in more detail.

Initialization

During the initialization the 128 bit key (K), the 128 bit nonce (N) and a 64 bit initialization vector (IV) are concatenated to produce a 320 bit state. The IV is constructed from the parameters of the algorithm. The standard defines the following construction:


Source: SP 800-232

Once the initial state has been constructed, a permutation is applied to S with 12 rounds (later we will look into the details of the permutation). After that, the last 128 bit of the state are XORed with the key.

Processing of associated data

In this phase the associated data (AD) is split into blocks of 128 bit (the rate size $r$). Each block is XORed with the first 128 bit of the state $S$ and then a permutation with 8 rounds is applied to $S$. This is repeated until all associated data has been processed.

When all blocks of AD have been processed, the resulting state $S$ is XORed with a 1. This ensures domain separation - more on that later.

Plaintext encryption

Like the associated data, the plaintext (P) is also processed in blocks of 128 bit. The process is similar to the processing of the associated data with one difference: The first 128 bit of $S$ are XORed with the plaintext block to produce the ciphertext (C). Again a permutation with 8 rounds is applied to $S$ after each block.

If the length of the plaintext is not a multiple of 128 bit, it is padded with a 1 bit followed by 0 bits until it has a length of 128 bit. Then the padded block is XORed with the first 128 bit of $S$ and the first $l$ bits of $S$ are output as ciphertext ($l$ = length of the last block).

Finalization

After all plaintext blocks have been processed, the authentication tag is computed. For this the state $S$ is XORed with the key K. Then a permutation with 12 rounds is applied to $S$. Finally, the last 128 bit of $S$ are XORed with K again and the result is used as authentication tag (T). However, the standard also allows to truncate the tag.

Domain separation

As mentioned above, after processing the associated data, the state $S$ is XORed with a 1. This ensures domain separation between the associated data and the plaintext. Without domain separation an attacker could manipulate the ciphertext and associated data in such a way that the authentication tag remains valid even though the data has been changed.

For demonstration purposes let’s assume we don’t have domain separation and we have one block of associated data ($AD$) and the plaintext $P = P_1||P_2||P_3$. Furthermore, let $S = S_r || S_c$ and the function $f$ be the finalization. The encryption would look like this:

\[\begin{align*} S_r & \leftarrow S_r \oplus AD \\ S & \leftarrow p(S) \\ S_r & \leftarrow S_r \oplus P_1 \qquad C_1 \leftarrow S_r \\ S & \leftarrow p(S) \\ S_r & \leftarrow S_r \oplus P_2 \qquad C_2 \leftarrow S_r \\ S & \leftarrow p(S) \\ S_r & \leftarrow S_r \oplus P_3 \qquad C_3 \leftarrow S_r \\ S & \leftarrow p(S) \\ T & \leftarrow f(S) \\ \end{align*}\]

The output is $(C, T)$ with $C = C_1 || C_2 || C_3$.

We can see that if we would use $AD || P_1$ as associated data and $P_2 || P_3$ as plaintext, exactly the same operations would be performed and the same tag T would be the result.

Therefore, since

$\quad \text{Enc}(AD,\ P_1 || P_2 || P_3)\ \text{ produces }\ (C_1 || C_2 || C_3,\ T)$

and

$\quad \text{Enc}(AD || P_1,\ P_2 || P_3)\ \text{ produces }\ (C_2 || C_3,\ T)$

with $T$ being the same in both cases, an attacker who knows $P_1$ and intercepts the ciphertext $(C_1 || C_2 || C_3,\ T)$ could construct a new valid ciphertext $(C_2 || C_3,\ T)$ with associated data $AD || P_1$.

Example

Let’s have a look at a concrete example. In the following we want to send a configuration to devices. The associated data contains the device ID of the device which should receive the configuration. The configuration itself consists of several parameters. One of these parameters is ALLOW_REMOTE which controls whether remote access to the device is allowed or not. The configuration is encrypted.

A valid pair of associated data and plaintext could look like this:

  • $AD$ = “DEVICE_ID=1234\r\n…”
  • $P_1$ = “ALLOW_REMOTE=0\r\n”
  • $P_2$ = …

An attacker being able to intercept $(AD,\ C_1||C_2||\ldots,\ T)$ could instead send $(AD || P_1,\ C_2||\ldots,\ T)$ to the device. Now, “ALLOW_REMOTE=0” is stored in the associated data and is not part of the configuration. Hence, if the device’s default configuration is ALLOW_REMOTE=1, remote access is now enabled.

The domain separation bit ensures that a different tag is generated in these two cases.

\[\begin{align*} S_r & \leftarrow S_r \oplus AD &\\ S & \leftarrow p(S) &\\ S_c & \leftarrow S_c \oplus 1 & \text{(domain separation)} \\ S_r & \leftarrow S_r \oplus P_1 \qquad &C_1 \leftarrow S_r \\ S & \leftarrow p(S) &\\ \vdots \\ T & \leftarrow f(S) &\\ \end{align*}\]

Once the domain separation bit has been added to $S_c$, the state is different and hence the resulting tag T will also be different. Because the domain separation bit is added to the capacity part $S_c$ which is not XORed with user controlled data, it cannot be canceled out by modifying $AD$ or $P_1$.

Permutations

A permutation consists of three different round functions:

  • Substitution layer $p_S$ (S-Boxes)
  • Linear diffusion layer $p_L$
  • Addition of constants $p_C$

In each round all three functions are applied multiple times. First, $p_L$ is applied, followed by $p_S$, and finally $p_C$. This is typically written as $p = p_L \circ p_S \circ p_C$

Substitution layer

The substitution layer divides the state $S$ into five 64 bit words:

$S = S_0 || S_1 || S_2 || S_3 || S_4$

Then, it modifies the state as follows:

It takes the bits which are at the same position in each word and applies a 5-bit S-Box to these bits. The S-Box produces a 5-bit output which is then used to replace the original bits. This is done for all 64 bit positions and can be done in parallel. The process is illustrated in the following diagram:

The S-Box can be implemented as a lookup table or as a sequence of bitwise operations. The lookup table is shown in the following:


Source: SP 800-232

However, implementing the S-Box with a lookup table requires more memory and could also allow cache-timing attacks. Therefore, the S-Box was designed in a way that allows an efficient implementation with bitwise operations:


Source: SP 800-232

S-Boxes typically have various properties: for instance they are typically invertible (i.e. if $p_S(x_0) = p_S(x_1)$ then $x_0 = x_1$) and have no fixed points (i.e. $p_S(x) \neq x$ for all $x$). Another interesting property is the algebraic degree of the S-Box. For Ascon this is 2. This means that the output bits can be expressed as polynomials of the input bits where not more than 2 variables are multiplied together.

The advantage of a smaller algebraic degree, is that it is easier to implement the S-Box with a technique called masking which is a countermeasure against side-channel attacks where sensitive data is split into multiple shares. The shares are processed independently and only combined at the end of the computation. This makes it harder for an attacker to extract sensitive information by observing the power consumption or electromagnetic radiation of a device. The lower the algebraic degree of a function, the fewer shares are needed to achieve a certain level of security.

Example

Let’s assume we have a register $x$. We want to perform some operations on $x$, e.g. we compute $x = x << n$ (left shift by n bits). If we compute the left shift directly, an attacker might be able to observe the value of $x$ via side-channel attacks.

To reduce the risk of side-channel attacks, we can use masking:

Instead of using one register, we can use two registers $x_0$ and $x_1$. We assign a random value $r$ to register $x_0$ and the value $x \oplus r$ to register $x_1$:

\[\begin{align*} x_0 &\leftarrow r \\ x_1 &\leftarrow x \oplus r \end{align*}\]

Now, we can perform the left shift operation on both registers independently:

\[\begin{align*} y_0 &\leftarrow x_0 << n\\ y_1 &\leftarrow x_1 << n \end{align*}\]

Since $x_0$ is random, an attacker gains no information about $x$ by observing $x_0$ alone. Similarly, because $x_1$ is masked with the random value $r$, observing $x_1$ alone also reveals no information about $x$.

Finally, we can combine the results from both registers to get the final result:

\[\begin{align*} y_0 \oplus y_1 & = (x_0 <<n) \oplus (x_1 << n) \\ & = (r << n) \oplus ((x \oplus r) << n) \\ & = (r << n) \oplus ((x << n) \oplus (r << n)) \\ & = (r << n) \oplus (x << n) \oplus (r << n) \\ & = x << n \end{align*}\]

Of course, if an attacker can observe both registers $x_0$ and $x_1$ at the same time, he can easily compute $x$. However, if the operations on $x_0$ and $x_1$ are performed independently and not at the same time, it becomes less likely for an attacker to extract sensitive information.

Linear diffusion layer

The linear diffusion layer produces diffusion in each of the five 64 bit words of the state $S$ via rotation and XOR operations. Even though these diffusions are done only locally on individual words, due to the substitution layer, these local changes affect the entire state $S$ over multiple rounds. The operations that are performed on each word are as follows:


Source: SP 800-232

Addition of constants

Additionally to the linear diffusion layer and the substitution layer, there is also an addition of constants. Here, in each round a constant that is based on the current round is XORed to $S_2$ of the state. These constants are shown in the following table where $i$ is the round number:

$i$ constant $i$ constant
0 0x000000000000003c 7 0x00000000000000b4
1 0x000000000000002d 9 0x00000000000000a5
2 0x000000000000001e 10 0x0000000000000096
3 0x000000000000000f 11 0x0000000000000087
4 0x00000000000000f0 12 0x0000000000000078
5 0x00000000000000e1 13 0x0000000000000069
6 0x00000000000000d2 14 0x000000000000005a
7 0x00000000000000c3 15 0x000000000000004b

Without that addition, the cipher might be vulnerable to certain attacks. For instance, if no constant were added, it might be possible that after a certain number of rounds the state returns to its initial value.

For instance, the S-Box of Ascon has two cycles:

  • Cycle 1 (length 26):
    0 → 4 → 1a → 1 → b → 12 → 7 → 2 → 1f → 17 → 18 → 10 → 1e → f → 1c → 16 → 11 → 13 → e → 6 → 9 → 5 → 15 → d → 3 → 14 → 0
  • Cycle 2 (length 6):
    8 → 1b → 19 → c → 1d → a → 8

If the initial state is 0, we would end up after 26 rounds in the same state (when only $p_L$ and $p_S$ are used as round functions).

If we would start with the state where all bits are 0 except $S_2$ which is all 1s, i.e.

\[S_0 = 0, \quad S_1 = \text{0xffffffffffffffff}, \quad S_2 = 0, \quad S_3 = 0, \quad S_4 = 0\]

we would end up after 6 rounds again with the same state (note that $p_L$ does not change a word if all bits of the word are set to 0 or 1). This is of course not desirable for a cryptographic primitive. By adding a round constant, this risk is mitigated.

Some additional facts

  • The ciphertext has exactly the same length as the plaintext. No padding is necessary for the last block, even if the plaintext length is not a multiple of 128 bit.
  • No separate mode of operation is necessary (such as GCM when data is encrypted with AES) since Ascon-AEAD128 is already an authenticated encryption scheme with associated data.
  • Ascon cannot be parallelized because the ciphertext of one block depends on the previous blocks. It was optimized for low latency and low memory usage, short messages and parallelism within the operations.
  • The original paper recommended to not encrypt more than $2^{64}$ bytes with the same key. However, NIST is more conservative and reduced the limit to $2^{54}$ bytes (16 PB).
    • If 100MB are encrypted per second, the same key can be used for 5 years.
    • If we increase the limit to $2^{64}$ byte, one single key could be used for even 5500 years.
  • Ascon is based on simple primitives (bitwise operations, rotations, XOR) which makes it suitable for many platforms and constrained devices such as IoT devices.
  • Ascon is inverse free: the same permutation can be used for encryption and decryption. This reduces the implementation size and the memory requirements. The decryption is shown in the image below.
  • NIST has standardized Ascon with 128 bit keys. However, the original paper also supports keys of 160 bit (Ascon-80pq, for more post quantum security). The 160 bit version is not part of the NIST standard.
  • The best known attack requires $2^{104}$ operations for 7 rounds instead of 12.


Source: SP 800-232

Hashing

The same construction can also be used for hashing. The standard defines Ascon-Hash256 which produces a 256 bit hash value.

After absorbing the complete message, the output $H$ is extracted from the rate part of the state. Since the output length is 256 bit, four blocks of 64 bit are extracted. Hence, the hash $H$ is given by $H = H_0 || H_1 || H_2 || H_3$.


Source: SP 800-232

The same construction can also be used to build extendable-output functions (XOFs) which can produce variable-length hashes.

Conclusion

Ascon brings modern, lightweight cryptography to constrained devices. Standardized in NIST SP 800-232, it unifies authenticated encryption (Ascon-AEAD128), hashing (Ascon-Hash256), and XOFs (Ascon-XOF128/CXOF128) on a simple sponge design with strong domain separation and well-chosen permutations. Implementations are small, fast, and inverse-free, making them ideal for IoT and embedded systems.

Interactive Demo

This interactive demo allows you to explore the Ascon permutation. You can toggle individual bits in the state and see how they propagate through the rounds. You can also enable/disable the different round functions (pC, pL, pS) to see their effects.

Δ1: Bits that differ from the input state.
Δ2: Bits that changed in the state due to the last modification of the input state.
pC: Constant addition, pL: Linear diffusion, pS: Substitution
Click on S0, ..., S4 to reset/set all of its bits.