status: draft

category: specification

date: June 2026

AAD Canonicalization Specification

abstract

this document specifies the requirements and implementation details for aad canonicalization specification. it is intended for implementers and developers working with the gnu.foo platform.

AEAD Additional Authenticated Data (AAD) Canonicalization Specification

Version: 2.0 | Status: Draft | Date: 2026-04-27

Abstract

This document defines a JSON-based profile for constructing Additional Authenticated Data (AAD) values for use with Authenticated Encryption with Associated Data (AEAD). The profile uses the JSON Canonicalization Scheme (JCS, RFC 8785) to produce deterministic byte strings and defines a default set of contextual binding fields intended to reduce cross-context ciphertext reuse and confused-deputy behavior. Applications and protocols MAY define narrower profiles using the canonicalization and validation rules in this document. This document specifies construction of canonical AAD bytes. It does not modify AEAD algorithms, define an envelope format, or specify encryption/decryption APIs.

Introduction

AAD in AEAD constructions provides integrity protection for metadata without encrypting it. The AAD byte string is byte-sensitive: two logically identical contexts that serialize differently will cause AEAD verification to fail. Without a canonicalization rule, this creates interoperability failures across language runtimes and serialization libraries.

This document defines two layers:

Core canonicalization rules — how a JSON object becomes the byte string supplied as AAD to an AEAD algorithm. These rules apply to any conforming AAD object regardless of which field set it uses.
Default context-binding profile — one recommended field set for multi-tenant or multi-context applications. Implementors MAY define other profiles as long as they follow the core rules.

The core rules address non-deterministic serialization across implementations, ambiguous handling of Unicode and key ordering, and integer range inconsistencies. The default profile addresses confused-deputy attacks and cross-context ciphertext reuse by binding ciphertext to tenant, resource, and purpose.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Applicability Statement

When this profile is a good fit

This profile is well-suited for multi-tenant storage encryption where a structured context object is already available, for encrypted tokens with structured metadata such as purpose, tenant, and resource, for cross-language systems that require deterministic AAD bytes from the same logical context, and for systems already using JSON metadata at the application layer.

When this profile is a poor fit

This profile is NOT RECOMMENDED for compact binary protocols where JSON serialization overhead is unacceptable, for COSE/CBOR ecosystems with existing AAD conventions, for protocols that already specify their own AAD construction rules, or for cases requiring negotiated schema registries or dynamic field sets unknown at build time. Applications in these categories SHOULD use a canonicalization approach native to their data format rather than adopting this profile.

Terminology

AAD — Additional Authenticated Data. Bytes passed to an AEAD algorithm that are integrity-protected but not encrypted.

AEAD — Authenticated Encryption with Associated Data. An encryption mode that provides both confidentiality and integrity, defined in [RFC5116].

JCS — JSON Canonicalization Scheme. A deterministic serialization algorithm for JSON defined in [RFC8785].

Canonical AAD bytes — the UTF-8 byte sequence produced by applying the JCS algorithm to a conforming AAD object.

Core AAD object — a JSON object conforming to §4.

Default profile — the field set defined in §6, providing context binding via v, tenant, resource, purpose, and optional ts.

Extension field — an application-defined field whose key begins with x_, as specified in §6.3.

Core AAD Object Rules

These rules apply to all conforming AAD objects. A default-profile object (§6) MUST also conform to these rules.

Object Structure

The AAD input MUST be a flat JSON object. Nested objects, arrays, null, and boolean values are NOT permitted. Implementations MUST reject any input that is not a JSON object or that contains non-conforming value types.

All member keys MUST be non-empty strings matching the pattern [a-z][a-z0-9_]*. Keys MUST be unique within the object; implementations MUST reject inputs with duplicate member names.

Value Types

Each member value MUST be either a JSON string as specified in §4.3, or a non-negative JSON integer in the range 0 to 2⁵³ − 1 inclusive as specified in §4.4. No other value types are permitted.

String Constraints

String values are Unicode strings. After JSON string decoding, string values MUST NOT contain U+0000. String values MUST NOT be empty (minimum one Unicode code point after decoding). The canonical serialized JSON output MUST be encoded as UTF-8 and MUST NOT include a byte-order mark.

Note: This constraint addresses escaped NULs such as \u0000 in JSON source. A JSON string whose decoded value contains U+0000 MUST be rejected even if it passed JSON parsing, because U+0000 is not permitted in decoded string values under this profile.

Applications SHOULD normalize string inputs to Unicode Normalization Form C (NFC) before constructing AAD objects. AAD verification is byte-sensitive: two visually identical strings with different Unicode representations produce different byte sequences and cause verification to fail. Applications MUST document their normalization policy to ensure consistent AAD construction across implementations.

Integer Bounds

Integer values MUST be in the range 0 to 2⁵³ − 1 (9007199254740991) inclusive. Implementations MUST reject values outside this range during both serialization and deserialization, regardless of the native integer capacity of the runtime. Implementations MUST NOT silently truncate or round out-of-range values.

This range ensures safe round-trip handling across all common runtimes, including JavaScript environments where integers above 2⁵³ − 1 cannot be represented exactly.

Size Limit

The total canonical serialized AAD MUST NOT exceed 16,384 bytes (16 KiB). Implementations MUST reject inputs whose canonical form would exceed this limit.

The 16 KiB limit is informed by buffer constraints in hardware security modules and embedded cryptographic accelerators. It accommodates rich AAD while remaining practical for resource-constrained environments. Implementations SHOULD minimize AAD size where practical; excessive AAD may degrade performance on hardware-accelerated cryptographic modules.

Canonicalization

Serialization

Canonical AAD bytes are produced by applying the JSON Canonicalization Scheme [RFC8785] to a conforming core AAD object (§4) and encoding the result as UTF-8.

Conforming serialization MUST sort member keys lexicographically by Unicode code point value. Serialized output MUST contain no whitespace between tokens and MUST NOT include trailing commas. String values MUST be serialized using minimal JCS escape sequences. Integer values MUST be serialized without leading zeros or fractional parts. Non-BMP characters MUST be serialized as their UTF-8 byte sequences; surrogate-pair encoding and \uXXXX escape sequences MUST NOT be used for non-BMP code points.

Implementations MUST use a conforming JCS library. Native JSON.stringify implementations MUST NOT be used as a substitute; they do not comply with JCS requirements for non-BMP character handling and number formatting.

UTF-8 Encoding

The canonical JSON string produced by §5.1 MUST be encoded as UTF-8. The resulting byte sequence is the AAD value passed to the AEAD algorithm. No byte-order mark is permitted.

Canonicalization Algorithm

The following steps produce canonical AAD bytes from a candidate input:

Implementations MUST verify the input is a JSON object and MUST reject any input that is not.
Implementations MUST reject any input in which two or more member names are identical.
Implementations MUST reject any input in which a member key contains characters outside [a-z][a-z0-9_]*.
For each string value, implementations MUST decode JSON escape sequences and MUST reject the input if the decoded value contains U+0000 or is empty.
For each integer value, implementations MUST reject the input if the value is outside [0, 2⁵³ − 1].
If validating against a named profile, implementations MUST apply the profile-specific field constraints defined for that profile (e.g., §6 for the default profile).
Implementations MUST serialize the object using RFC 8785 (JCS).
Implementations MUST encode the resulting JSON string as UTF-8 and MUST reject the output if the byte length exceeds 16,384.
The resulting byte sequence is the canonical AAD value.

Default Context-Binding Profile

The default profile defines a field set for multi-tenant or multi-context applications. It is one recommended profile; applications MAY define other profiles conforming to §4–§5.

Required Fields

field	type	constraints
`v`	integer	schema version; MUST be `1` for this profile version
`tenant`	string	tenant or user identifier; 1–256 bytes after UTF-8 encoding
`resource`	string	resource path or identifier; 1–1024 bytes after UTF-8 encoding
`purpose`	string	usage context (e.g., `encryption-at-rest`, `token-signing`); minimum 1 byte

All four fields MUST be present. Implementations MUST reject objects missing any required field.

Optional Fields

field	type	constraints
`ts`	integer	Unix epoch seconds (UTC); MUST be a whole-second value in [0, 2⁵³ − 1]

ts provides temporal context for audit and policy purposes only. It does NOT provide replay protection; replay prevention requires stateful nonce tracking, which is outside the scope of this document. Applications requiring sub-second precision SHOULD use an extension field (§6.3).

Extension Fields

Applications MAY include additional fields whose keys begin with x_. Extension field keys MUST match x_[a-z0-9_]+. Applications that need collision avoidance SHOULD use a second component identifying the application, organization, or protocol — for example, x_example_region or x_vault_cluster.

The reserved field names (v, tenant, resource, purpose, ts) MUST NOT be used as extension field names and are not available for application-specific data.

Extension fields are exempt from unknown-field rejection (§6.4) and MAY be present in any profile version.

Profile Constraints

Implementations validating against the default profile MUST reject objects that contain field names not in the set {v, tenant, resource, purpose, ts} ∪ {extension fields matching x_[a-z0-9_]+}. This strict validation prevents downgrade attacks where a newer field is silently ignored and ensures fail-fast behavior on version mismatches.

Applications SHOULD maintain a configurable minimum supported v value to enable deprecation of older profile versions with known issues.

Threat Model

Attacks Mitigated

Confused-deputy attacks — an attacker moves a valid ciphertext from one context to another. Binding the ciphertext to resource, tenant, and purpose causes AEAD verification to fail outside the original context.

Cross-tenant decryption — in multi-tenant systems where tenants share encryption infrastructure, the tenant field ensures that ciphertext produced for tenant A cannot be verified under tenant B's context.

Purpose confusion — a ciphertext intended for one use (e.g., password-reset token) is replayed in another context (e.g., session token). The purpose field binds ciphertext to its intended use.

Trusted-Context Requirement

These protections depend on the decrypting party constructing or validating AAD fields from trusted context. If an attacker can choose both the ciphertext and the AAD context supplied for verification, AAD binding does not prevent cross-context replay.

In the reconstructed AAD model (§11.3), the decrypting party builds AAD from its own trusted state; this is the stronger model. In the transmitted AAD model, the AAD bytes travel alongside the ciphertext and MUST be treated as untrusted input until AEAD verification succeeds.

Systems that combine this profile with AEAD decryption SHOULD avoid exposing distinguishable protocol-level failures that reveal whether rejection was caused by ciphertext corruption, authentication tag failure, AAD mismatch, or AAD validation error.

Out of Scope

This document does not mitigate: key compromise, side-channel attacks against the AEAD implementation, nonce reuse (an envelope-layer concern), implementation bugs in underlying cryptographic primitives, or key commitment attacks. Key commitment properties depend on algorithm selection (§8), not AAD structure.

Algorithm Considerations

This document is agnostic to the specific AEAD algorithm used. Implementors SHOULD evaluate nonce-misuse resistance where nonce uniqueness cannot be guaranteed, and nonce space adequacy for high-volume encryption with random nonces. Standard AEAD algorithms, including AES-256-GCM, generally do not provide key commitment; where key commitment is required, implementors SHOULD apply a committing transform or use envelope encryption with a unique data encryption key per ciphertext.

Nonce generation and uniqueness guarantees are the responsibility of the encryption envelope layer, not this document.

Versioning

The v field enables profile evolution. Readers MUST support all published profile versions until explicitly deprecated. A reader that does not recognize the v value MUST reject the AAD object. Writers MUST use the current profile version.

Ciphertexts are immutable. To upgrade profile version, re-encrypt the data with a new key and new AAD. Version bumps require a new revision of this specification with a changelog entry.

Extension fields (x_*) are version-agnostic and MAY appear in any profile version without triggering unknown-field rejection.

AEAD Integration Guidance

This section is non-normative guidance for integrating canonical AAD bytes with an AEAD construction. It does not impose normative requirements on encryption/decryption APIs, ciphertext structure, tag verification, nonce management, or key selection.

Constructing AAD for encryption: apply §5.3 to produce canonical bytes; pass those bytes as the AAD argument to the AEAD encrypt call.

Verifying AAD at decryption: two deployment models exist. In the reconstructed AAD model, the decrypting party reconstructs the AAD object from its own trusted context (session, database record, request metadata), canonicalizes it per §5.3, and passes the result to the AEAD verify call; this model provides the strongest binding because the AAD cannot be tampered with. In the externally supplied AAD model, AAD bytes travel alongside the ciphertext and the AEAD verify call uses the transmitted bytes directly; the bytes supplied to AEAD verify MUST be used as-is, and the application MAY parse the AAD structure for policy or audit purposes only after successful tag verification.

Applications SHOULD prefer the reconstructed model where context is available at decryption time.

Security Considerations

No Secrets in AAD

AAD is not encrypted. Implementations MUST NOT include sensitive data (passwords, private keys, API secrets, raw PII) in AAD fields.

Resource identifiers derived from URLs may inadvertently contain PII (e.g., usernames in paths such as /users/jdoe@example.com/files). Applications SHOULD hash or tokenize such identifiers before including them in the resource field — for example, use a keyed HMAC of the raw identifier rather than the raw value itself.

Logging

AAD is not encrypted and MUST be treated according to the sensitivity of its fields. Implementations SHOULD avoid logging raw AAD unless the application has classified the fields as non-sensitive or has tokenized sensitive identifiers. See §12 for privacy considerations.

Reconstruction vs. Transmission

Applications MUST decide whether to reconstruct AAD from trusted context at decryption time or to transmit AAD bytes alongside ciphertext. Reconstruction is the safer model because transmitted AAD can be tampered with at the envelope layer. Transmission is more flexible but requires the envelope to provide integrity protection for the transmitted AAD bytes. This decision is application-specific and outside the scope of this document.

Trusted-Context Requirement

See §7.2. The security properties of AAD binding are contingent on the decrypting party using trusted context to construct or validate AAD fields. Systems where the verifying party accepts attacker-supplied AAD without independent validation do not benefit from the mitigations described in §7.1.

Privacy Considerations

AAD fields may contain identifiers that are sensitive or that enable user tracking.

Tenant identifiers may be user IDs, organization names, or other persistent identifiers. Applications SHOULD evaluate whether raw identifiers are appropriate or whether pseudonymous or HMAC-derived tokens should be used instead.

Resource paths may encode user-visible structure such as filenames, URL paths, or account numbers. URL-derived resource identifiers frequently contain email addresses, usernames, or other PII embedded in REST paths. Applications SHOULD tokenize or hash path components that identify natural persons before including them in the resource field.

AAD bytes that appear in logs or audit trails MUST be treated in accordance with applicable data retention and minimization requirements. Because AAD is not encrypted, it persists in cleartext wherever it is logged.

Consistent AAD field values across systems — for example, using the same tenant value in multiple services — enable cross-system correlation of encrypted records. Applications operating under data minimization requirements SHOULD evaluate whether per-service pseudonyms are appropriate.

IANA Considerations

This document has no IANA actions.

Test Vectors

Implementations MUST pass all test vectors in this section. The "Octets" value is the lowercase hex encoding of the canonical UTF-8 byte sequence. The SHA-256 value is the hex-encoded SHA-256 digest of those bytes, provided for implementation verification.

Vectors §14.1–§14.2 exercise the default profile (§6). Vectors §14.1, §14.3–§14.5 exercise core canonicalization rules (§4–§5).

Core: Minimal Object

A minimal default-profile object with only required fields.

Input

{"v":1,"tenant":"org_abc","resource":"secrets/db","purpose":"encryption"}

Canonical

{"purpose":"encryption","resource":"secrets/db","tenant":"org_abc","v":1}

Octets

7b22707572706f7365223a22656e6372797074696f6e222c227265736f75726365223a22736563726574732f6462222c2274656e616e74223a226f72675f616263222c2276223a317d

SHA-256

03fdc63d2f82815eb0a97e6f1a02890e152c021a795142b9c22e2b31a3bd83eb

Default Profile: All Fields

Default profile with the optional ts field present.

Input

{"v":1,"tenant":"org_abc","resource":"secrets/db/prod","purpose":"encryption-at-rest","ts":1706400000}

Canonical

{"purpose":"encryption-at-rest","resource":"secrets/db/prod","tenant":"org_abc","ts":1706400000,"v":1}

Octets

7b22707572706f7365223a22656e6372797074696f6e2d61742d72657374222c227265736f75726365223a22736563726574732f64622f70726f64222c2274656e616e74223a226f72675f616263222c227473223a313730363430303030302c2276223a317d

SHA-256

5cf973318b78e082bb71331cab473bb3c5d3bdae5e6ae0c334139cf1d3973993

Core: Unicode Values

String values containing non-ASCII Unicode. The tenant field contains Chinese characters (组织_测试, U+7EC4 U+7EC7 U+005F U+6D4B U+8BD5). The resource field contains a non-BMP emoji (🔐, U+1F510). JCS serializes non-BMP characters as their UTF-8 byte sequences, not as surrogate pairs.

Input

{"v":1,"tenant":"组织_测试","resource":"data/🔐/secret","purpose":"encryption"}

Canonical

{"purpose":"encryption","resource":"data/🔐/secret","tenant":"组织_测试","v":1}

Octets

7b22707572706f7365223a22656e6372797074696f6e222c227265736f75726365223a22646174612ff09f94902f736563726574222c2274656e616e74223a22e7bb84e7bb875fe6b58be8af95222c2276223a317d

SHA-256

e13ac7151a48d4dfddbca3b92a7a9bf2aabcfde98c9b9e1a83739c216589cb46

Default Profile: Extension Fields

Default profile with an application extension field.

Input

{"v":1,"tenant":"org_abc","resource":"vault/key","purpose":"key-wrapping","x_vault_cluster":"us-east-1"}

Canonical

{"purpose":"key-wrapping","resource":"vault/key","tenant":"org_abc","v":1,"x_vault_cluster":"us-east-1"}

Octets

7b22707572706f7365223a226b65792d77726170706f6e67222c227265736f75726365223a227661756c742f6b6579222c2274656e616e74223a226f72675f616263222c2276223a312c22785f7661756c745f636c7573746572223a2275732d656173742d31227d

SHA-256

7d689eb3e966ce7190c39559ea05b09c34ca14af562ffbdc77bfca4b4dd6fce0

Core: JCS Edge Cases

This vector exercises JCS-specific serialization edge cases: control character escaping, embedded double-quote escaping, and the integer precision boundary.

The tenant value contains a literal newline character (U+000A). In JSON source this is represented as \u000A. JCS serializes it as \n (two characters: backslash + n) in the canonical output — not as the six-character sequence \u000A.

The resource value contains a literal double-quote character. In JSON source this is represented as \". JCS preserves this as \" in the canonical output.

The ts value is 9007199254740991, which is 2⁵³ − 1 — the maximum permitted integer (§4.4).

Input (shown with JSON escape sequences as they appear in the source)

{"v":1,"tenant":"org\u000Atest","resource":"path/with\"quotes","purpose":"test","ts":9007199254740991}

After JSON decoding, tenant is the 8-byte string org + U+000A + test, and resource is the string path/with"quotes.

Canonical (JCS output; newline serialized as \n, quote serialized as \")

{"purpose":"test","resource":"path/with\"quotes","tenant":"org\ntest","ts":9007199254740991,"v":1}

Octets

7b22707572706f7365223a2274657374222c227265736f75726365223a22706174682f776974685c22717565746573222c2274656e616e74223a226f72675c6e74657374222c227473223a393030373139393235343734303939312c2276223a317d

SHA-256

46490c9c926b35501cde5f7b7f874c36174a47aec03bc674f79e663e3e9665fd

This vector validates: control character escaping (U+000A → \n in JCS output, not \u000A), double-quote escaping within string values, integer at the 2⁵³ − 1 precision boundary, and lexicographic key ordering with underscore-prefixed extension-style keys.

References

Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. https://datatracker.ietf.org/doc/html/rfc2119

[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, May 2017. https://datatracker.ietf.org/doc/html/rfc8174

[RFC8259] Bray, T., "The JavaScript Object Notation (JSON) Data Interchange Format", RFC 8259, December 2017. https://datatracker.ietf.org/doc/html/rfc8259

[RFC8785] Rundgren, A., Jordan, B., and S. Erdtman, "JSON Canonicalization Scheme (JCS)", RFC 8785, June 2020. https://datatracker.ietf.org/doc/html/rfc8785

[RFC5116] McGrew, D., "An Interface and Algorithms for Authenticated Encryption", RFC 5116, January 2008. https://datatracker.ietf.org/doc/html/rfc5116

Informative References

[RFC8452] Gueron, S., Langley, A., and Y. Lindell, "AES-GCM-SIV: Nonce Misuse-Resistant Authenticated Encryption", RFC 8452, April 2019. https://datatracker.ietf.org/doc/html/rfc8452

[AEAD-KEY-COMMIT] Albertini, A., Bhargavan, K., Choi, S., Fuchs, A., Paquin, C., and F. Strenzke, "How to Abuse and Fix Authenticated Encryption Without Key Commitment", USENIX Security 2022. https://www.usenix.org/system/files/sec22-albertini.pdf

Changelog

2.0 (2026-04-27) — Major restructure for IETF/IRTF review. Reframed document as a JSON/JCS profile rather than a universal AAD schema. Separated core canonicalization rules (§4–§5) from default context-binding profile (§6). Added Applicability Statement (§2). Added AEAD Integration Guidance (§10) with reconstructed vs. transmitted AAD models. Removed normative requirements on decryption APIs, error codes, and tag verification — these are caller responsibilities. Fixed integer contradiction: replaced “unsigned 64-bit integer” language with non-negative integers in [0, 2⁵³ − 1] throughout. Clarified string model to distinguish JSON string decoding from UTF-8 output encoding; added explicit U+0000 constraint. Added trusted-context requirement to §7.2 and §11.4. Fixed logging guidance in §11.2: removed claim that AAD is safe to log; added sensitivity-based guidance. Added Privacy Considerations section (§12). Added IANA Considerations section (§13). Added BCP 14 boilerplate (RFC 2119 + RFC 8174). Split References into Normative and Informative (§15). Replaced JavaScript reference implementation with algorithm steps (§5.3). Fixed test vector section numbering (now §14.x). Corrected §14.5 edge-case vector rendering; verified all five vectors against canonical byte sequences. Fixed changelog cross-references (previously referenced wrong section numbers).

1.2 (2026-02-23) — Corrected test vector SHA-256 hash. Added 16 KiB rationale. Added NFC documentation requirement. Added version rejection clause. Simplified validation timing guidance. Added RFC 8259 to references.

1.1 (2026-02-08) — Corrected algorithm guidance: removed specific algorithm recommendations; standard AEAD algorithms do not natively provide key commitment properties (see [AEAD-KEY-COMMIT]). Corrected test vector SHA-256 hash. Added SHA-256 hashes and octet encodings to remaining test vectors. Removed erroneous whitespace-stripping regex from reference implementation.

1.0 (2026-01-28) — Initial draft.

Appendix A: Relationship to Envelope Format

This document defines AAD contents only. The relationship between AAD, ciphertext, nonce, and key identifier in the wire format is the responsibility of a separate envelope specification.

A typical envelope might include: a header containing algorithm identifier, key ID, and nonce; the ciphertext bytes; the authentication tag; and optionally the serialized AAD (if using the transmitted model — see §10). The key ID is part of the envelope header, not the AAD, so that key rotation does not change the logical AAD content and does not require re-encryption solely to update the key identifier.