Given a plaintext \(P=P_1P_2P_3\dots\) and an initialization vector IV, cipher block chaining (CBC) works as follows:

\[C_i=E(P_i\oplus C_{i-1})\]

for \(i>0\) and \(C_0 = IV\).

When using CBC ensure to fullfill the following requirements.

Unique IV

The IV should be unique for each message encrypted under the same key. Otherwise, if two messages have a common prefix, the ciphertext for the two prefixes will be the same. Thus, the ciphertext would leak information about shared prefixes of two messages, compromising the security. This is easy to see:

To illustrate this, consider two plaintexts, \(P=P_1P_2P_3\dots\) and \(P'=P'_1P'_2P'_3\dots\) both encrypted with the same IV. Assuming the first \(n\) plaintext blocks are identical, i.e. \(P_i = P'_i\) for \(1 \leq i \leq n\), the encryption process starts as follows:

The first ciphertext block of both plaintexts is computed as follows:

\[C_1 = E(P_1\oplus IV) \qquad C'_1 = E(P'_1\oplus IV)\]

Since \(P_1 = P'_1\), it follows that \(P_1\oplus IV = P'_1\oplus IV\). Encrypting both with the same key yields \(C_1 = E(P_1\oplus IV) = E(P'_1\oplus IV) = C'_1\).

Now, when encrypting the next plaintext blocks \(P_2\) and \(P'_2\), we compute:

\[C_2 = E(P_2\oplus C_1) \qquad C'_2 = E(P'_2\oplus C'_1)\]

Instead of the IV, we now use the ciphertext blocks which we got from the previous step to compute the input for the encryption function. If \(C_1 = C'_1\) the ciphertext block \(C_2\) will be identical to \(C'_2\) if \(P_2 = P'_2\).

This pattern continues until we encounter two plaintext blocks which are not equal anymore.

Maximum Size of Plaintext

The extend of data that can be securely encrypted using Cipher Block Chaining with a single key is constrained and varies based on the desired level of security you want to achieve. This limitation arises because as soon as we encounter identical ciphertext blocks for the same key, the ciphertext leaks information about the plaintext.

Let’s assume we have two identical ciphertext blocks \(C_i\) and \(C_j\). The ciphertexts blocks are the result of

\[C_i=E(P_i\oplus C_{i-1}) \qquad C_j=E(P_j\oplus C_{j-1})\]

As both ciphtertext blocks are equal we get:

\[\begin{split} E(P_i\oplus C_{i-1}) &= E(P_j\oplus C_{j-1}) \\[2ex] P_i\oplus C_{i-1} &= P_j\oplus C_{j-1} \\[2ex] P_i\oplus P_j &= C_{j-1}\oplus C_{i-1} \\[2ex] \end{split}\]

So by computing XOR between the ciphertext blocks \(C_{i-1}\) and \(C_{j-1}\), we get the result of the XOR between \(P_i\) and \(P_j\). Due to that, if one plaintext block is known we could easily recover the other plaintext block. But even if neither of the two blocks is known, we might be able to recover the plaintext blocks by exploiting patterns or statistical analysis.

The question that remains is how many data we should encrypt with CBC at most using the same key before the risk becomes unacceptable.

The encryption functions behaves like a function that gives us blocks of random data. Therefore, we just need to determine when this function is likely to return a block of random data that we’ve seen before. Let \(b\) be the size of a block measured in number of bits. Then, from the birthday attack, we know that this happens with high probability (~50%) when we have seen \(2^{b/2}\) blocks. When we use AES as the block cipher, this is likely to happen after the encryption of \(2^{64}\) plaintext blocks.

However, a probability of ~50% is already rather high and typically we want a much better security level. For instance, in NIST SP 800-38D section 8 the recommendation is to have a probability that is not worse than \(2^{-32}\). When using this value as the probability \(p\) in equation

\[n \approx \sqrt{2^{b+1}\ln\left(\frac{1}{1-p}\right)}\]

which we derived for the birthday attack we get

\[n \approx \sqrt{2^{b+1}\ln\left(\frac{1}{1-p}\right)} = \sqrt{2^{129}\ln\left(\frac{1}{1-2^{-32}}\right)} \approx 2^{48}\]

So we should not encrypt more than \(2^{48}\) plaintext blocks to have a security level of \(2^{-32}\). Given the 128 bit (16 bytes) block size of AES, we should encrypt no more than \(2^{48}\cdot 2^4\) bytes.

Therefore, AES-CBC allows us to encrypt data sizes of up to 4 petabytes (PiB).

Unpredictable IV

The IV used for CBC must be unpredictable. Otherwise, an attacker might be able to determine whether or not a ciphertext is the encryption of a particular plaintext.

Consider a scenario where an attacker is in the posession of a ciphertext \(C = C_1C_2C_3\dots\) along with the initialization vector \(IV_c\) that was used for the encryption. Let’s assume the attacker has access to the encryption function and can encrypt arbitrary plaintexts using this function to get their corresponding ciphertexts. We also assume that this function uses the same key and that the attacker knows the IV that is used by the encryption function beforehand. Let’s call this initialization vector \(IV_n\).

The ciphertext \(C\) is the result of the encryption of \(P=P_1P_2P_3\dots\), i.e.

\[C_1 = E(P_1 \oplus IV_c) \qquad C_2 = E(P_2 \oplus C_1) \quad \dots\]

The attacker wants to check if the plaintext \(P\) is equal to some message \(M\), i.e. \(P=M\). When the attacker encrypts the message \(M\), the attacker gets the following ciphertext for the initialization vector \(IV_n\):

\[C'_1 = E(M_1 \oplus IV_n) \qquad C'_2 = E(M_2 \oplus C'_1) \quad \dots\]

Even if \(M=P\), the ciphertexts would differ as the initialization vectors are different. However, if the attacker knows \(IV_n\) beforehand, the attacker could modify \(M_1\) as follows:

\[M'_1 = M_1\oplus IV_n \oplus IV_c\]

Now, if \(M' = M'_1M_2M_3\dots\) is encrypted, the attacker gets the ciphertext:

\[\begin{split} C''_1 &= E(M'_1 \oplus IV_n) = E((M_1\oplus IV_n \oplus IV_c) \oplus IV_n) = E(M_1 \oplus IV_c) \\[2ex] C''_2 &= E(M_2 \oplus C''_1) \\[2ex] \dots \end{split}\]

Now, if \(M\) equals \(P\), the ciphertexts \(C\) and \(C''\) would be equal, revealing to the attacker that the message \(M\) was encrypted, i.e. \(M=P\). To avoid this kind of attack, the IV must be unpredictable.