status: draft
category: specification
date: March 2026

AAD Canonicalization Specification

abstract

this document specifies the requirements and implementation details for aad canonicalization specification. it is intended for implementers and developers working with the gnu.foo platform.

AEAD Additional Authenticated Data (AAD)

Canonicalization Specification

Version: 1.2

Status: Draft

Date: 2026-02-23

Abstract

This specification defines a canonical schema for Additional Authenticated Data (AAD) used with Authenticated Encryption with Associated Data (AEAD) algorithms. It leverages RFC 8785 (JSON Canonicalization Scheme) for deterministic serialization and defines required contextual binding fields to prevent confused deputy attacks and cross-context ciphertext reuse.

Introduction

AAD in AEAD constructions provides integrity protection for metadata without encrypting it. However, AAD is byte-sensitive: two logically identical contexts that serialize differently will cause decryption to fail. This specification addresses three problems: non-deterministic serialization across implementations, lack of standardized contextual binding fields, and ambiguous handling of edge cases such as Unicode normalization and key ordering.

Threat Model

Attacks Mitigated

This specification mitigates the following attacks. Confused Deputy attacks occur when an attacker moves a valid ciphertext from one context to another; binding ciphertext to resource, tenant, and purpose prevents this. Cross-tenant decryption occurs in multi-tenant systems where tenants share encryption keys; the tenant identifier in AAD ensures User A cannot decrypt User B's data. Purpose confusion occurs when a ciphertext intended for one use (e.g., password reset token) is replayed in another context (e.g., session token); the purpose field prevents this reuse.

Out of Scope

This specification does not mitigate key compromise, side-channel attacks against the AEAD implementation, nonce reuse (which is an envelope specification concern), or implementation bugs in the underlying cryptographic primitives. Key commitment attacks are addressed through algorithm selection requirements in Section 7, not through AAD structure.

Terminology

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.

Schema Definition

Structure

The AAD context is a flat JSON object with no nested structures. All keys are lowercase ASCII matching the pattern [a-z_]+. All values are either strings (UTF-8, no NUL bytes) or unsigned 64-bit integers.

Required Fields

Field Type Description
v integer Schema version. Current version is 1.
tenant string Tenant or user identifier. Minimum 1 byte, maximum 256 bytes.
resource string Resource path or identifier. Opaque string, maximum 1024 bytes.
purpose string Usage context (e.g., "encryption-at-rest", "token-signing").

Optional Fields

Field Type Description
ts integer Unix epoch seconds (UTC). MUST be integer; sub-second precision is not supported. See Section 5.6 for precision limits. Applications requiring sub-second precision SHOULD use extension fields. NOTE: This field provides temporal context for audit and policy purposes only. It does NOT provide replay protection; replay prevention requires stateful nonce tracking, which is outside the scope of this specification.

Extension Fields

Applications MAY include additional fields using the prefix x_. For example, x_vault_cluster or x_myapp_region. The reserved namespace (v, tenant, resource, purpose, ts) MUST NOT be used for application-specific data. Organizations with multiple independent services or multi-vendor environments SHOULD use a more specific namespace such as x to avoid collisions.

Constraints

Required fields MUST have non-empty values (minimum 1 byte for strings). Values MUST NOT contain NUL (0x00) bytes. The total serialized AAD MUST NOT exceed 16 KiB. Duplicate keys are forbidden; implementations MUST reject AAD objects containing duplicate keys. Value comparison is byte-level; "Tenant_A" and "tenant_a" are distinct values.

The 16 KiB limit is informed by buffer constraints in hardware security modules and embedded cryptographic accelerators, and accommodates rich AAD while remaining practical for resource-constrained environments. Implementations SHOULD minimize AAD size where practical. Excessive AAD may impact performance on resource-constrained devices or hardware-accelerated cryptographic modules.

Integer Precision

All integer values MUST be in the range 0 to 2^53 - 1 (9007199254740991) inclusive. To ensure interoperability, ALL implementations-regardless of native integer capacity-MUST reject values exceeding this limit during both serialization and deserialization. This constraint is universal and applies equally to 64-bit systems (Go, Rust, C++) and precision-limited environments (JavaScript).

This range ensures safe handling across all common runtime environments. For the ts field, this range covers Unix timestamps until approximately year 285 million, which is sufficient for all practical purposes. Implementations MUST NOT silently truncate or round values that exceed this limit; they MUST reject such values with an error.

Serialization

Canonicalization

AAD MUST be serialized according to RFC 8785 (JSON Canonicalization Scheme). This requires: keys sorted lexicographically by Unicode code point, no whitespace between tokens, no trailing commas, strings using minimal escape sequences, and integers without leading zeros or fractional parts.

WARNING: AAD verification is byte-sensitive. Two visually identical strings using different Unicode representations (e.g., precomposed vs. decomposed characters) will produce different byte sequences and cause verification to fail. Applications SHOULD normalize all string inputs to Unicode Normalization Form C (NFC) before constructing AAD objects. This normalization is the responsibility of the application layer, not the serialization layer. The choice of normalization form is application-local; applications MUST document their normalization policy to ensure consistent AAD construction across implementations.

Encoding

The canonical JSON string MUST be encoded as UTF-8 to produce the final byte array passed to the AEAD algorithm. No byte-order mark (BOM) is permitted.

Reference Implementation

function canonicalAAD(context) {

return JSON.stringify(context, Object.keys(context).sort());

}

WARNING: The above snippet is for illustration only. Production implementations MUST use a dedicated RFC 8785 library. Native JSON serializers do not comply with JCS requirements for Unicode escaping (non-BMP character handling) and number formatting (exponential notation rules). Using native JSON.stringify will produce non-compliant output that causes interoperability failures.

Algorithm Requirements

This specification is agnostic to the specific AEAD algorithm used. Implementations SHOULD evaluate their algorithm selection for nonce-misuse resistance, nonce space adequacy, and key commitment properties based on their threat model. Standard AEAD algorithms generally do not provide key commitment; where required, implementations SHOULD apply a committing transform or use envelope encryption with unique data encryption keys. If AES-256-GCM is required (e.g., for FIPS compliance), implementations MUST use envelope encryption with a unique data encryption key (DEK) per ciphertext. Nonce generation and uniqueness guarantees are the responsibility of the encryption envelope specification, not this document.

Versioning

The v field enables schema evolution. Readers MUST support all published versions until explicitly deprecated. A reader that does not support the version indicated by the v field MUST reject the AAD. Writers MUST use the current version only. Ciphertexts are immutable; to upgrade AAD schema, re-encrypt the data with a new key and new AAD. Version bumps require a new revision of this specification with a changelog documenting the differences.

Implementations MUST reject AAD containing fields not defined for the specified schema version. This strict validation prevents downgrade attacks where a newer field is silently ignored, and ensures fail-fast behavior on version mismatches. Extension fields (x_* prefix) are exempt from this rule and MAY be present in any version.

Applications SHOULD maintain a configurable minimum supported version to enable deprecation of older schema versions with known issues. When a security flaw is discovered in a schema version, operators can update this configuration to reject vulnerable AAD without requiring code changes.

Error Handling

Implementations MUST return a single, generic error for all decryption failures: DECRYPTION_FAILED. Implementations MUST NOT distinguish between invalid ciphertext, invalid authentication tag, or invalid AAD in error responses to callers. Detailed failure reasons MAY be logged server-side for debugging. All validation MUST use constant-time comparison where applicable to prevent timing side-channels.

Implementations SHOULD perform AAD constraint validation (field lengths, character restrictions, size limits) after the AEAD primitive has successfully verified the integrity tag, to avoid leaking information about expected AAD structure through timing differences.

Security Considerations

No Secrets in AAD

AAD is transmitted and stored in plaintext. Implementations MUST NOT include sensitive data such as passwords, social security numbers, API keys, or personally identifiable information in AAD fields.

Resource identifiers derived from URLs may inadvertently contain PII (e.g., email addresses or usernames in REST paths like /users/jdoe@example.com/files). Applications SHOULD hash or tokenize such identifiers before including them in the resource field. For example, use a keyed HMAC of the user identifier rather than the raw value.

Logging

AAD MAY be logged in plaintext for audit purposes since it contains no secrets. Applications SHOULD log AAD on decryption failures to aid debugging.

Reconstruction vs. Transmission

Applications must decide whether to transmit AAD alongside ciphertext or reconstruct it from context at decryption time. Reconstruction is safer (cannot be tampered with) but requires all context to be available. Transmission is more flexible but relies on envelope integrity. This decision is application-specific and outside the scope of this specification.

Test Vectors

Implementations MUST pass these test vectors. Octet values represent the byte sequence of the canonical output per Section 6.2. The SHA-256 hash is provided for verification of canonical output.

Minimal Required Fields

Input: {

"v": 1,

"tenant": "org_abc",

"resource": "secrets/db",

"purpose": "encryption"

}

Canonical: {"purpose":"encryption","resource":"secrets/db","tenant":"org_abc","v":1}

Octets: 7b22707572706f7365223a22656e6372797074696f6e222c227265736f75726365223a22736563726574732f6462222c2274656e616e74223a226f72675f616263222c2276223a317d

SHA-256: 03fdc63d2f82815eb0a97e6f1a02890e152c021a795142b9c22e2b31a3bd83eb

All Fields Including Optional

Input: {

"v": 1,

"tenant": "org_abc",

"resource": "secrets/db/prod",

"purpose": "encryption-at-rest",

"ts": 1706400000

}

Canonical: {"purpose":"encryption-at-rest","resource":"secrets/db/prod","tenant":"org_abc","ts":1706400000,"v":1}

Octets: 7b22707572706f7365223a22656e6372797074696f6e2d61742d72657374222c227265736f75726365223a22736563726574732f64622f70726f64222c2274656e616e74223a226f72675f616263222c227473223a313730363430303030302c2276223a317d

SHA-256: 5cf973318b78e082bb71331cab473bb3c5d3bdae5e6ae0c334139cf1d3973993

Unicode in Values

Input: {

"v": 1,

"tenant": "组织_测试",

"resource": "data/🔐/secret",

"purpose": "encryption"

}

Canonical: {"purpose":"encryption","resource":"data/🔐/secret","tenant":"组织_测试","v":1}

Octets: 7b22707572706f7365223a22656e6372797074696f6e222c227265736f75726365223a22646174612ff09f94902f736563726574222c2274656e616e74223a22e7bb84e7bb875fe6b58be8af95222c2276223a317d

SHA-256: e13ac7151a48d4dfddbca3b92a7a9bf2aabcfde98c9b9e1a83739c216589cb46

Extension Fields

Input: {

"v": 1,

"tenant": "org_abc",

"resource": "vault/key",

"purpose": "key-wrapping",

"x_vault_cluster": "us-east-1"

}

Canonical: {"purpose":"key-wrapping","resource":"vault/key","tenant":"org_abc","v":1,"x_vault_cluster":"us-east-1"}

Octets: 7b22707572706f7365223a226b65792d77726170706f6e67222c227265736f75726365223a227661756c742f6b6579222c2274656e616e74223a226f72675f616263222c2276223a312c22785f7661756c745f636c7573746572223a2275732d656173742d31227d

SHA-256: 7d689eb3e966ce7190c39559ea05b09c34ca14af562ffbdc77bfca4b4dd6fce0

JCS Edge Cases (Serializer Validation)

This test vector targets JCS-specific edge cases to validate custom serializer implementations.

Input: {

"v": 1,

"tenant": "org\u000Atest",

"resource": "path/with"quotes",

"purpose": "test",

"ts": 9007199254740991

}

Canonical: {"purpose":"test","resource":"path/with"quotes","tenant":"org\ntest","ts":9007199254740991,"v":1}

Octets: 7b22707572706f7365223a2274657374222c227265736f75726365223a22706174682f776974685c22717565746573222c2274656e616e74223a226f72675c6e74657374222c227473223a393030373139393235343734303939312c2276223a317d

SHA-256: 46490c9c926b35501cde5f7b7f874c36174a47aec03bc674f79e663e3e9665fd

This vector validates: correct escaping of control characters (newline as \n not \u000A in output), quote escaping within strings, integer at precision boundary (2^53-1), and lexicographic key sorting with mixed-case and special characters.

References

RFC 2119 - Key words for use in RFCs to Indicate Requirement Levels

RFC 8259 - The JavaScript Object Notation (JSON) Data Interchange Format

RFC 8785 - JSON Canonicalization Scheme (JCS)

RFC 5116 - An Interface and Algorithms for Authenticated Encryption

RFC 8452 - AES-GCM-SIV: Nonce Misuse-Resistant Authenticated Encryption

Albertini et al. - How to Abuse and Fix Authenticated Encryption Without Key Commitment (USENIX Security 2022)

Changelog

1.2 (2026-02-23) — Corrected test vector 11.1 SHA-256 hash. Added 16 KiB rationale to Section 5.5. Added NFC documentation requirement to Section 6.1. Added version rejection clause to Section 8. Simplified validation timing guidance in Section 9. Added RFC 8259 to references.

1.1 (2026-02-08) — Corrected algorithm guidance in Section 7: removed specific algorithm recommendations; standard AEAD algorithms do not natively provide key commitment properties (see Albertini et al., USENIX Security 2022). Corrected test vector 11.1 SHA-256 hash. Added SHA-256 hashes and octet encodings to test vectors 11.2–11.5. Removed erroneous whitespace-stripping regex from Section 6.3 reference implementation.

1.0 (2026-01-28) — Initial draft.

Appendix A: Relationship to Envelope Format

This specification defines AAD contents only. The relationship between AAD, ciphertext, nonce, and key identifier in the wire format is the responsibility of a separate envelope specification. A typical envelope might include: a header containing algorithm identifier, key ID, and nonce; the ciphertext; the authentication tag; and optionally the AAD (if not reconstructed). The Key ID is part of the envelope header, not the AAD, to support key rotation without changing logical AAD content.