Status: implemented Implemented in: 2026-03-24 App: prototext
prototext aims to be a superset of protoc --decode: for canonical
wire-level input (no anomalies), its output should be byte-for-byte identical
to protoc --decode (modulo the annotation comment, which protoc does not
emit).
Two known divergences exist today:
protoc --decode renders a packed repeated field as N separate lines, one
per element, identical to non-packed repeated fields:
int32Pk: 1
int32Pk: 2
int32Pk: 3
prototext renders packed fields using a bracket array syntax:
int32Pk: [1, 2, 3] #@ repeated int32 [packed=true] = 85
This is a fundamental output format difference.
protoc --decode renders all NaN values (canonical or not) as bare nan.
prototext currently renders non-canonical NaNs as nan(0x…), which is not
valid protoc syntax and would be rejected by protoc --encode.
For canonical wire input, NaN values are always the canonical quiet NaN
(0x7FC00000 for float, 0x7FF8000000000000 for double), which already
renders as bare nan — so divergence B does not affect canonical input.
It only matters for non-canonical NaN payloads, which are anomalies by
definition.
prototext output (without annotations) is
byte-for-byte identical to protoc --decode output.protoc --decode output for non-canonical input (protoc cannot
decode anomalies; this is out of scope by definition).protoc --encode compatibility on the encoding side (a
separate concern).Render each element of a packed field on its own line, using the field name, exactly as protoc does:
int32Pk: 1
int32Pk: 2
int32Pk: 3
Every element line carries its own annotation with the full field declaration.
Each annotation also carries any element-level anomaly modifiers that
apply to that specific element (ohb, neg, nan_bits, etc.):
int32Pk: 1 #@ repeated int32 [packed=true] = 85
int32Pk: 2 #@ repeated int32 [packed=true] = 85; ohb: 3
int32Pk: 3 #@ repeated int32 [packed=true] = 85
pack_sizeA protobuf message may contain multiple consecutive packed wire records
for the same field number. Each is a distinct (tag, LEN, payload) triplet
on the wire. The per-line format is ambiguous about boundaries: three lines
for int32Pk could be one record of three elements, three records of one
element each, or any other split.
To preserve this information and enable lossless round-trip, the first
element of each wire record carries a pack_size: N modifier, where N is
the number of elements in that record:
int32Pk: 1 #@ repeated int32 [packed=true] = 85; pack_size: 3
int32Pk: 2 #@ repeated int32 [packed=true] = 85
int32Pk: 3 #@ repeated int32 [packed=true] = 85
int32Pk: 4 #@ repeated int32 [packed=true] = 85; pack_size: 2
int32Pk: 5 #@ repeated int32 [packed=true] = 85
This represents two wire records: [1, 2, 3] and [4, 5].
Record-level anomaly modifiers (tag_ohb, TAG_OOR, len_ohb) apply
to the wire tag and LEN varint of a single record. They are placed on the
first element of that record (alongside pack_size):
int32Pk: 1 #@ repeated int32 [packed=true] = 85; pack_size: 3; tag_ohb: 2; len_ohb: 1
int32Pk: 2 #@ repeated int32 [packed=true] = 85
int32Pk: 3 #@ repeated int32 [packed=true] = 85
An empty packed wire record (tag + len=0) contains no elements, so there
is no value line to carry the annotation. It is rendered as a
comment-only annotation line:
#@ repeated int32 [packed=true] = 85; pack_size: 0
Record-level anomaly modifiers go on this comment-only line as well:
#@ repeated int32 [packed=true] = 85; pack_size: 0; tag_ohb: 1
The encoder recognises a pack_size: 0 comment-only line and emits
tag + len=0 with any accompanying anomaly modifiers.
Special case: INVALID_PACKED_RECORDS. If the packed payload cannot be
decoded (truncated varint, etc.), the field is rendered as a single
INVALID_PACKED_RECORDS line as today — no change.
The encoder uses pack_size: N on the first element line (or the
comment-only line for empty records) to know exactly how many element lines
to consume for each wire record.
For the example above:
int32Pk: 1 #@ repeated int32 [packed=true] = 85; pack_size: 3
int32Pk: 2 #@ repeated int32 [packed=true] = 85
int32Pk: 3 #@ repeated int32 [packed=true] = 85
int32Pk: 4 #@ repeated int32 [packed=true] = 85; pack_size: 2
int32Pk: 5 #@ repeated int32 [packed=true] = 85
→ two wire records: tag (85<<3)|2 len=3 [0x01,0x02,0x03], then
tag (85<<3)|2 len=2 [0x04,0x05].
The encoder does not need lookahead: pack_size tells it exactly how many
lines to buffer before flushing a record. A comment-only line with
pack_size: 0 flushes a zero-element record immediately.
Ordering invariant: All elements of a packed field are contiguous within a wire record. The encoder may assert this.
All existing packed fixture files must be regenerated to use the new per-line
format. The craft_a_matches_committed_fixtures test (spec 0009) will catch
any divergence.
Existing round-trip tests that construct packed wire by hand
(float_packed_noncanonical_nan_roundtrips, etc.) continue to work — the
round-trip is wire→text→wire regardless of text format.
Non-canonical NaN → value token nan(0x7f800001) (float) or
nan(0xfff8000000000000) (double).
Canonical NaN → value token nan.
Move the non-canonical NaN bit pattern from the value token into the annotation:
floatOp: nan #@ float = 22; nan_bits: 0x7f800001
For canonical NaN:
floatOp: nan #@ float = 22
For a packed array with mixed NaNs:
floatPk: nan #@ repeated float [packed=true] = 87
floatPk: nan #@ repeated float [packed=true] = 87; nan_bits: 0x7f800001
floatPk: nan #@ repeated float [packed=true] = 87; nan_bits: 0xffc00000
The value token is always bare nan — identical to protoc output.
The nan_bits modifier in the annotation carries the full 32-bit (float) or
64-bit (double) bit pattern, formatted as lower-case hex with 0x prefix.
The encoder parses nan_bits: 0x… from the annotation when the value is
nan and the field type is float or double, and uses the bit pattern
directly (with exponent forced to all-ones, as today).
When no nan_bits modifier is present, bare nan encodes as the canonical
quiet NaN (current behaviour).
Per element: if a nan_bits modifier is present on the element's annotation
line, it applies to that element's encoding.
Decoder side: moderate effort.
The current render_packed() function produces a single line with [...]
syntax. Replacing it with N individual scalar lines requires:
render_scalar() for each element instead of accumulating into a
bracket string.pack_size: N on the first element's annotation and record-level
anomaly modifiers (tag_ohb, len_ohb) alongside it.ohb, neg, nan_bits) on
each respective element's annotation.pack_size: 0.The decode_packed_to_str() function currently returns a String. It
would need to be restructured to return a Vec of (value_str, modifiers)
pairs, plus a record-level modifiers struct for the first element. This is
a moderate refactor of ~100 lines.
Encoder side: moderate effort (reduced from "significant" by pack_size).
pack_size eliminates the need for lookahead or a two-pass approach. The
encoder reads pack_size: N off the first element (or comment-only line),
buffers exactly N element lines, then flushes one wire record. The main
loop structure is unchanged; the new logic is a small state machine that
activates when a [packed=true] annotation with pack_size is encountered.
Steps:
pack_size: N on a line with [packed=true].tag + len=0 immediately.Risk: moderate. The encoder change is well-scoped thanks to pack_size.
Existing tests will catch regressions.
Interaction with INVALID_PACKED_RECORDS: the current single-line
fallback is unaffected — it stays as a single bytes-literal line.
Empty packed fields: rendered as a comment-only line with pack_size: 0.
The encoder emits tag + len=0 when it encounters this line.
Performance note: At implementation time, carefully assess the impact on the encoder's and decoder's performance properties:
Decoder: the current packed decoder is single-pass and allocation-free
for the output path. Switching to per-line output requires buffering N
(value_str, modifiers) pairs before emitting — this introduces a
per-record allocation. Evaluate whether the Vec can be replaced by an
iterator that yields lines one at a time to preserve the single-pass,
low-allocation character.
Encoder: buffering N element lines before flushing a wire record
requires holding N parsed values in memory simultaneously. For large
packed arrays this could be significant. Evaluate whether the wire output
(tag + length prefix + payload) can be written in two passes over the same
input slice (first to compute the payload length, second to emit bytes)
rather than accumulating a Vec of decoded values.
Low effort.
This is essentially a revert of spec 0008's value-token approach, replacing it with an annotation modifier.
Decoder side: in format_float_protoc / format_double_protoc, change
the non-canonical NaN branch to always return "nan", and pass the bit
pattern up to the annotation layer instead. The render_scalar call site
gains an optional nan_bits argument.
Encoder side: in parse_num, bare nan always produces canonical NaN.
The annotation parser gains a nan_bits key whose value overrides the bit
pattern.
Packed arrays: the per-element annotation already carries modifiers
(see A.1 above). A nan_bits modifier per element fits naturally.
Key concern previously identified: the annotation approach does not work
for packed arrays when they are rendered as a single […] line — there is
no per-element annotation slot. This concern goes away if A (per-line
packed rendering) is implemented first, since each element then has its own
annotation.
Conclusion: B depends on A. B alone (without A) reintroduces the packed NaN problem. The two changes must be implemented together.
| Change | Compatibility impact | Effort | Depends on |
|---|---|---|---|
| A. Per-line packed rendering | Restores protoc compatibility for packed fields | Moderate (decoder) + Moderate (encoder) | — |
| B. NaN bits in annotation | Restores protoc compatibility for canonical NaN; no impact on canonical input | Low | A |
Recommended order: implement A first, then B. Neither is a blocker for other work; both can be deferred until the test infrastructure from spec 0009 is in place, since that suite will make regressions visible immediately.
--no-annotations mode. Without annotations, packed
fields become indistinguishable from non-packed repeated fields in the
text output, and wire record boundaries are lost. The encoder has no way
to know whether to emit a packed or non-packed wire encoding, nor how to
split elements across records. This is an existing limitation (not
introduced by this spec) and is acceptable: without annotations,
lossless round-trip is only guaranteed for canonical input, and canonical
packed fields re-encode correctly as a single packed record if the schema
says [packed=true].prototext-core/src/serialize/render_text/packed.rs — current packed rendererprototext-core/src/serialize/render_text/helpers.rs — render_scalar,
AnnWriterprototext-core/src/serialize/encode_text/mod.rs — encoder main loop,
encode_packed_array_lineprototext-core/src/serialize/common.rs — format_float_protoc,
format_double_protocprototext/fixtures/cases/test_varint_packed.pb — current packed fixtureprototext/fixtures/cases/canonical_repeated.pb — protoc repeated format
reference