|
Crunch
A Message Definition Language for Getting Things Right
|
Crunch supports pluggable serialization layouts. This document provides comprehensive wire format specifications for each layout.
All multi-byte values are serialized in Little Endian byte order.
Every message includes a 6-byte header before the payload:
| Offset | Field | Size | Description |
|---|---|---|---|
| 0 | Version | 1 byte | Protocol version (0x02) |
| 1 | Format | 1 byte | Serialization format identifier |
| 2 | MessageId | 4 bytes | Message type identifier (little-endian) |
Format Values:
0x01: Packed (Alignment = 1)0x02: Aligned4 (Alignment = 4)0x03: Aligned8 (Alignment = 8)0x04: TLVserdes::StaticLayout<Alignment> produces a deterministic, fixed-size binary format. Buffer size is calculated at compile time.
Zero-Fill Guarantee: All padding bytes and unset field regions are explicitly zeroed during serialization. This ensures consistent CRC/checksum values regardless of uninitialized memory content.
The alignment parameter controls padding insertion. For each value, padding is inserted to align the value to:
This means:
StaticLayout<1>** (Packed): No padding ever insertedStaticLayout<4>**:Bool (1 byte) → no padding (AlignTo = 1)Int16 (2 bytes) → aligned to 2 bytesInt32 (4 bytes) → aligned to 4 bytesInt64 (8 bytes) → aligned to 4 bytes (capped at Alignment)StaticLayout<8>**:Bool (1 byte) → no paddingInt16 (2 bytes) → aligned to 2 bytesInt32 (4 bytes) → aligned to 4 bytesInt64 (8 bytes) → aligned to 8 bytesAfter the 6-byte header, the payload is aligned to Alignment:
| Alignment | Header End | Padding | Payload Start |
|---|---|---|---|
| 1 | 6 | 0 | 6 |
| 4 | 6 | 2 | 8 |
| 8 | 6 | 2 | 8 |
The fields begin immediately at the payload start offset.
For each scalar field:
0x01 if set, 0x00 otherwisemin(sizeof(T), Alignment)sizeof(T) bytes, little-endian| Type | sizeof(T) | Alignment=4 AlignTo | Alignment=8 AlignTo |
|---|---|---|---|
| Bool | 1 | 1 | 1 |
| Int8/UInt8 | 1 | 1 | 1 |
| Int16/UInt16 | 2 | 2 | 2 |
| Int32/UInt32 | 4 | 4 | 4 |
| Int64/UInt64 | 8 | 4 | 8 |
| Float32 | 4 | 4 | 4 |
| Float64 | 8 | 4 | 8 |
Offset 0 is start of payload (at offset 8 after header).
| Offset | Content | Description |
|---|---|---|
| 8 | f1 is_set | 1 byte |
| 9 | No padding | Bool aligns to 1 |
| 9 | f1 Bool value | 1 byte |
| 10 | f2 is_set | 1 byte |
| 11-15 | Padding | 5 bytes to align Int64 to offset 16 (8-byte boundary) |
| 16-23 | f2 Int64 value | 8 bytes |
uint32_t lengthMaxSize bytes): String content (full capacity, zero-padded)Note: The full
MaxSizeis always written, regardless of current length. This ensures fixed buffer sizes.
Alignment bytesIf is_set = 0, the submessage region is zero-filled.
Arrays do not have an is_set byte.
uint32_t lengthMaxSize slots serialized (active elements have values, inactive are zero-filled)Elements are serialized based on their type:
[padding][value] (no is_set byte)[padding][length:4][data:MaxSize][padding][MessageId:4][fields...]Maps do not have an is_set byte.
uint32_t lengthMaxSize key-value pairs serializedEach pair is:
Where key and value are serialized according to their type (scalar, string, submessage, array, or nested map).
serdes::TlvLayout uses Tag-Length-Value encoding for compact, forward-compatible serialization.
After the 6-byte header:
Each field is prefixed with a tag that encodes both the field ID and wire type:
| Wire Type | Value | Used For |
|---|---|---|
| Varint | 0 | Int8-64, UInt8-64, Bool, Enum, Float32, Float64 |
| LengthDelimited | 1 | String, Submessage, Packed Array, Map Entry |
All integers, bools, and floats are encoded as varints:
1 for true, 0 for falseVarint format: 7 bits per byte, MSB indicates continuation.
A 64-bit value requires up to 10 bytes as a varint. Crunch does not support a size that enforces fixed types.
If required, a new serialization policy that enforces fixed-sized encoding can be written if.
Only set fields are serialized. Unset fields are omitted entirely.
When serializing a submessage (or any length-delimited content), we don't know the final length until all content is written. Crunch handles this by:
memmove the content backwardsFor example, if nested content is 50 bytes:
This avoids needing to calculate sizes in a separate pass.
All arrays use a unified packed encoding with element count:
For length-delimited elements (strings, submessages), each element includes its length prefix:
Example: ArrayField<1, Int32<None>, 5> with values [1, 2, 3]:
Example: ArrayField<1, String<32>, 5> with values ["hi", "bye"]:
Maps use packed encoding with entry count:
For length-delimited keys/values, each includes its length prefix.
Example: MapField<1, Int32<None>, String<32>, 10> with {42: "foo"}:
Maps support any field type as keys or values:
| Entry Type | Encoding (no tag) |
|---|---|
| Scalar | Varint |
| String | [Length][Data] |
| Submessage | [Length][NestedFields...] |
| Array | [Length][Count][Elements...] |
| Nested Map | [Length][Count][Pairs...] |
Each nested container uses the same packed format.
| Layout | Size Predictability | Compact | Best For |
|---|---|---|---|
StaticLayout<1> | Fixed at compile time | Moderate | Deterministic protocols |
StaticLayout<4> | Fixed at compile time | Less compact | 32-bit aligned systems |
StaticLayout<8> | Fixed at compile time | Least compact | 64-bit aligned systems |
TlvLayout | Variable(*) | Most compact(*) | Evolved protocols, bandwidth-constrained |
(*) For any given message type, the encoding is variable up to a statically determinable maximum size. No dynamic memory allocation is required. (*) TlvLayout is not the most compact for all data due to the varint encoding. For example, very large integers require more space than a fixed-size encoding.