This document is the definitive reference for the text representation emitted
by prototext -d (decode) and consumed by prototext -e (encode).
The format is a superset of the
protobuf text format
as produced by protoc --decode. For canonical wire input, prototext -d
output (ignoring the #@ annotation comment) is byte-for-byte identical to
protoc --decode output. Every field line carries an inline annotation
comment (#@) that encodes enough information to reconstruct the exact binary
bytes on re-encoding, including all non-canonical or anomalous aspects.
The file starts with a header line:
#@ prototext: protoc
Every field occupies one line for scalar values, or an opening brace line plus content lines plus a closing brace line for nested messages and groups:
{indent}{field_name}: {value} #@ {annotation}
{indent}{field_name} { #@ {annotation}
{indent} {child fields…}
{indent}}
Two spaces before #@ separate the value from the annotation. The annotation
runs to end of line.
Field name key rules:
int32Op).999).TYPE_MISMATCH.[acme.blade_count]).#@ [wire_type ";"] [field_decl] [";" modifier]*
All parts are optional. Tokens are separated by "; ". No trailing ";".
A keyword naming the binary wire type. Emitted only for:
group) because groups are structurally
distinct from messages and must be preserved for re-encoding.Omitted for known, well-typed non-group fields (the wire type is unambiguously implied by the proto type).
Valid wire types (lower case):
| Token | Wire encoding |
|---|---|
varint | VARINT (wire type 0) |
fixed64 | FIXED64 (wire type 1) |
bytes | LEN (wire type 2) |
fixed32 | FIXED32 (wire type 5) |
group | SGROUP/EGROUP (wire types 3/4); also emitted for known group fields |
Invalid wire types (ALL CAPS) — indicate a structural decode failure:
| Token | Meaning |
|---|---|
INVALID_TAG_TYPE | Tag carries an unrecognised wire type (3-bit field not in {0,1,2,3,4,5}) |
INVALID_VARINT | Varint value field is malformed |
INVALID_FIXED64 | FIXED64 payload is truncated |
INVALID_FIXED32 | FIXED32 payload is truncated |
INVALID_LEN | LEN length prefix is malformed |
TRUNCATED_BYTES | LEN length prefix is valid but the declared bytes are not all present |
INVALID_PACKED_RECORDS | LEN payload is present but cannot be decoded as packed records |
INVALID_STRING | LEN payload is present but the bytes are not valid UTF-8 (for string fields) |
INVALID_GROUP_END | Group END tag varint is malformed |
For invalid fields, no field name or field declaration is emitted — only the
raw field number key and the INVALID_* wire type name. Exception:
INVALID_TAG_TYPE uses field number 0 as the key (no valid number exists).
Format: [label " "] type [" [packed=true]"] " = " field_number
repeated or required; optional is omitted (it is the default).int32, double, string), message or
group type short name (e.g. SwissArmyKnife, GroupOp), or enum type short
name. For enum fields the type takes the form EnumTypeName(N) where N
is the raw wire integer value. For packed enum fields the form is
EnumTypeName([n1, n2, …]). Packed non-enum varint types (int32, int64,
bool, sint32, sint64, uint32, uint64) also use the scalar proto type name
(e.g. repeated int32 [packed=true]).[packed=true] appended when the field uses packed wire encoding..proto file.The field declaration is omitted for:
TYPE_MISMATCH but no field declaration.For group fields, the wire type token group precedes the field declaration:
GroupOp { #@ group; GroupOp = 30
Zero or more name: value pairs (or bare flag names) describing non-canonical
or anomalous aspects of the binary encoding. On packed field lines, order is:
pack_size, tag_ohb, TAG_OOR, len_ohb)ohb, neg, nan_bits, ENUM_UNKNOWN, etc.)Non-canonical encodings are losslessly recoverable — they round-trip exactly.
| Name | Value type | Meaning |
|---|---|---|
tag_ohb: N | integer | Tag varint uses N redundant continuation bytes |
val_ohb: N | integer | Value varint (scalar field) uses N redundant continuation bytes |
len_ohb: N | integer | Length-prefix varint uses N redundant continuation bytes |
etag_ohb: N | integer | END_GROUP tag varint uses N redundant continuation bytes |
truncated_neg | flag | Negative int32/enum encoded as 5-byte truncated varint instead of canonical 10-byte sign-extended form |
nan_bits: 0xHH… | hex integer | Non-canonical NaN bit pattern for a float (8 hex digits) or double (16 hex digits) field |
pack_size: N | integer | Number of elements in this packed wire record (on the first element line of each record) |
ohb: N | integer | Per-element varint overhang bytes (packed varint fields, on each element line) |
neg | flag | Per-element truncated-negative int32/enum (packed fields, on each element line) |
Invalid encodings indicate data integrity issues.
| Name | Value type | Meaning |
|---|---|---|
TAG_OOR | flag | Tag field number is 0 or >= 2^29 (out of valid range) |
ETAG_OOR | flag | END_GROUP tag field number is out of valid range |
MISSING: N | integer | N bytes are missing from a truncated field (used with TRUNCATED_BYTES) |
END_MISMATCH: N | integer | END_GROUP tag carries field number N instead of the opening tag's number |
OPEN_GROUP | flag | GROUP field has no END_GROUP tag before end of buffer |
TYPE_MISMATCH | flag | Wire type conflicts with the declared proto type for the field |
| Name | Value type | Meaning |
|---|---|---|
ENUM_UNKNOWN | flag | Enum value is not in the schema's value table (integer emitted as field value) |
Packed repeated fields are rendered as one line per element, identical to
non-packed repeated fields (matching protoc --decode output). Each element
line carries its own annotation.
The first element of each wire record carries a pack_size: N modifier
indicating how many elements belong to that record. Record-level anomaly
modifiers (tag_ohb, TAG_OOR, len_ohb) also appear on the first element
line. Element-level anomaly modifiers (ohb, neg, nan_bits) appear on
each respective element's line.
int32Pk: 1 #@ repeated int32 [packed=true] = 85; pack_size: 3
int32Pk: 2 #@ repeated int32 [packed=true] = 85
int32Pk: 3 #@ repeated int32 [packed=true] = 85
Multiple consecutive wire records for the same field number each begin with
their own pack_size:
int64Pk: 1 #@ repeated int64 [packed=true] = 83; pack_size: 4; ohb: 3
int64Pk: 2 #@ repeated int64 [packed=true] = 83
int64Pk: 3 #@ repeated int64 [packed=true] = 83
int64Pk: 4 #@ repeated int64 [packed=true] = 83
An empty packed wire record (tag + len=0) has no value line. It is
rendered as a comment-only annotation line with no leading spaces before #@:
#@ repeated int64 [packed=true] = 83; pack_size: 0
If the packed payload cannot be decoded, the field is rendered as a single
INVALID_PACKED_RECORDS line with the raw bytes:
85: "\200\200\200\200\020\002\003\004" #@ INVALID_PACKED_RECORDS
float and double values are rendered as nan (bare token) in all cases,
matching protoc --decode.
For a non-canonical NaN (bit pattern differing from Rust's canonical quiet
NaN: 0x7FC00000 for float, 0x7FF8000000000000 for double), the full bit
pattern is recorded in a nan_bits annotation modifier:
floatOp: nan #@ float = 22; nan_bits: 0x7f800001
doubleOp: nan #@ double = 21; nan_bits: 0xfff8000000000000
For a canonical NaN, no nan_bits modifier is emitted:
floatOp: nan #@ float = 22
In packed arrays, nan_bits appears on the element line of the non-canonical
NaN element:
floatPk: nan #@ repeated float [packed=true] = 87; pack_size: 3
floatPk: nan #@ repeated float [packed=true] = 87; nan_bits: 0x7f800001
floatPk: nan #@ repeated float [packed=true] = 87; nan_bits: 0xffc00000
Finite float and double values are formatted to match protoc --decode output:
double: shortest representation using 15 significant digits, falling back to
17 if needed for exact round-trip.float: shortest representation using 6 significant digits, falling back to 9.Scientific notation is used when the exponent is >= 15 (double) or outside
the [1e-4, 1e15) range. Example: 3.1415926535897931, 1.23e-10,
1.7976931348623157e+308.
Without a schema, float and double fields are rendered as raw hex
(FIXED32 / FIXED64): 0x40490fdb, 0x4005bf0a8b145769.
bytes fieldsEvery byte is escaped by numeric value:
| Byte value | Emitted form |
|---|---|
\ (0x5C) | \\ |
" (0x22) | \" |
' (0x27) | \' |
\n (0x0A) | \n |
\r (0x0D) | \r |
\t (0x09) | \t |
| 0x20–0x7E (printable ASCII, excl. above) | literal byte |
| all others | \NNN (3-digit octal) |
This matches protoc --decode exactly for bytes fields.
string fields — deliberate divergence from protoc --decodeprotoc --decode octal-escapes every byte >= 0x80 in string fields.
prototext intentionally diverges: multi-byte UTF-8 sequences are emitted as
raw UTF-8. For a field containing "café", protoc emits "caf\303\251";
prototext emits "café".
Control characters (0x00–0x1F) and DEL (0x7F) are octal-escaped in both tools.
If the wire bytes of a string field are not valid UTF-8, prototext emits
INVALID_STRING.
-- Top level
message := header NEWLINE field*
header := "#@ prototext: protoc"
-- Field lines
field := scalar_field | message_field
scalar_field := field_key ": " value " #@ " annotation NEWLINE
| "#@ " annotation NEWLINE -- comment-only: empty packed record
message_field := field_key " { #@ " annotation NEWLINE field* "}" NEWLINE
field_key := IDENTIFIER | NUMBER | "[" IDENTIFIER ("." IDENTIFIER)* "]"
value := STRING | NUMBER | BOOL | IDENTIFIER
-- Annotation
annotation := unknown_field_ann | known_field_ann
unknown_field_ann := wire_type [";" modifier]*
known_field_ann := ["group" ";"] field_decl [";" modifier]*
-- Field declaration (optional is omitted as default label)
field_decl := [label " "] type [" [packed=true]"] " = " NUMBER
label := "repeated" | "required"
type := proto_scalar_type
| IDENTIFIER -- message or group type name
| IDENTIFIER "(" NUMBER ")" -- enum: scalar numeric value
| IDENTIFIER "([" NUMBER ("," NUMBER)* "])" -- enum: packed numeric values
proto_scalar_type := "double" | "float" | "int64" | "uint64" | "int32"
| "fixed64" | "fixed32" | "bool" | "string" | "bytes"
| "uint32" | "sfixed32" | "sfixed64" | "sint32" | "sint64"
-- Wire types
wire_type := valid_wire_type | invalid_wire_type
valid_wire_type := "varint" | "fixed64" | "bytes" | "fixed32" | "group"
invalid_wire_type := "INVALID_TAG_TYPE" | "INVALID_VARINT" | "INVALID_FIXED64"
| "INVALID_FIXED32" | "INVALID_LEN" | "TRUNCATED_BYTES"
| "INVALID_PACKED_RECORDS" | "INVALID_STRING" | "INVALID_GROUP_END"
-- Modifiers
modifier := noncanon_valued | noncanon_flag | invalid_valued | invalid_flag | info_flag
noncanon_valued := ("tag_ohb" | "val_ohb" | "len_ohb" | "etag_ohb" | "ohb" | "pack_size") ":" SP INTEGER
| "nan_bits: 0x" HEX+
noncanon_flag := "truncated_neg" | "neg"
invalid_valued := ("MISSING" | "END_MISMATCH") ":" SP INTEGER
invalid_flag := "TAG_OOR" | "ETAG_OOR" | "OPEN_GROUP" | "TYPE_MISMATCH"
info_flag := "ENUM_UNKNOWN"
-- Tokens
IDENTIFIER := /[a-zA-Z_][a-zA-Z0-9_]*/
NUMBER := /-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?/ | "0x" HEX+ | "inf" | "-inf" | "nan"
INTEGER := /[0-9]+/
HEX := /[0-9a-f]/
STRING := /"([^"\\]|\\.)*"/
BOOL := "true" | "false"
SP := " "
NEWLINE := "\n"
Notes:
#@ are required as the value/annotation separator.#@ with no leading spaces.group prefix in the annotation.group (lower case) is the wire type token; the group type name in the field
declaration follows after "; " (e.g. #@ group; GroupOp = 30).The examples below are taken from actual prototext -d output against the
SwissArmyKnife and EnumCollision test schemas.
#@ prototext: protoc
doubleOp: 2.7182818284590451 #@ double = 21
floatOp: 3.14159274 #@ float = 22
int64Op: -123456789 #@ int64 = 23
uint64Op: 18446744073709551615 #@ uint64 = 24
int32Op: 42 #@ int32 = 25
fixed64Op: 987654321 #@ fixed64 = 26
fixed32Op: 123456 #@ fixed32 = 27
boolOp: true #@ bool = 28
uint32Op: 999 #@ uint32 = 33
sfixed32Op: -999 #@ sfixed32 = 35
sfixed64Op: -123456789 #@ sfixed64 = 36
sint32Op: -42 #@ sint32 = 37
sint64Op: 123456789 #@ sint64 = 38
optional is omitted (default label). Wire types are omitted for all
known fields (implied by proto type).
Without a schema, all fields render by wire type. Float/double fields appear as raw hex:
#@ prototext: protoc
21: 0x4005bf0a8b145769 #@ fixed64
22: 0x40490fdb #@ fixed32
23: 18446744073586094827 #@ varint
25: 42 #@ varint
26: 0x000000003ade68b1 #@ fixed64
27: 0x0001e240 #@ fixed32
28: 1 #@ varint
#@ prototext: protoc
int32Op: 100 #@ int32 = 25
messageOp { #@ SwissArmyKnife = 31
int32Op: 200 #@ int32 = 25
stringOp: "nested" #@ string = 29
}
messageRp { #@ repeated SwissArmyKnife = 51
stringOp: "first nested" #@ string = 29
uint32Op: 1 #@ uint32 = 33
}
messageRp { #@ repeated SwissArmyKnife = 51
stringOp: "second nested" #@ string = 29
uint32Op: 2 #@ uint32 = 33
}
#@ prototext: protoc
int32Op: 42 #@ int32 = 25
GroupOp { #@ group; GroupOp = 30
uint64Op: 111 #@ uint64 = 130
}
GroupRp { #@ group; repeated GroupRp = 50
uint64Op: 10 #@ uint64 = 150
}
GroupRp { #@ group; repeated GroupRp = 50
uint64Op: 20 #@ uint64 = 150
}
#@ prototext: protoc
int32Op: 42 #@ int32 = 25
uint32Op: 100 #@ uint32 = 33
999: 123456 #@ varint
1000: "binary\000\377\376 data" #@ bytes
#@ prototext: protoc
stringOp: "tab:\there\nnewline\\backslash\"quote" #@ string = 29
bytesOp: "\000\001\002\003\004" #@ bytes = 32
#@ prototext: protoc
1: 42 #@ varint; val_ohb: 3
Value 42 encoded with 3 extra continuation bytes. Rounds-trip byte-exact.
#@ prototext: protoc
GroupOp { #@ group; GroupOp = 30; tag_ohb: 1
uint64Op: 0 #@ uint64 = 130
}
GroupOp { #@ group; GroupOp = 30; tag_ohb: 1; etag_ohb: 1
uint64Op: 0 #@ uint64 = 130
}
GroupOp { #@ group; GroupOp = 30; etag_ohb: 1
uint64Op: 0 #@ uint64 = 130
}
#@ prototext: protoc
int32Rp: -2147483648 #@ repeated int32 = 45; truncated_neg
int32Rp: -2147483648 #@ repeated int32 = 45
int32Rp: -1 #@ repeated int32 = 45; truncated_neg
int32Rp: -1 #@ repeated int32 = 45
#@ prototext: protoc
int32Pk: 1 #@ repeated int32 [packed=true] = 85; pack_size: 4
int32Pk: 2 #@ repeated int32 [packed=true] = 85
int32Pk: 3 #@ repeated int32 [packed=true] = 85
int32Pk: 4 #@ repeated int32 [packed=true] = 85
#@ prototext: protoc
int32Pk: 23 #@ repeated int32 [packed=true] = 85; pack_size: 3; ohb: 2
int32Pk: 24 #@ repeated int32 [packed=true] = 85
int32Pk: 35 #@ repeated int32 [packed=true] = 85; ohb: 3
#@ prototext: protoc
int32Pk: 1 #@ repeated int32 [packed=true] = 85; pack_size: 5
int32Pk: -1 #@ repeated int32 [packed=true] = 85; neg
int32Pk: -2147483648 #@ repeated int32 [packed=true] = 85; neg
int32Pk: -1 #@ repeated int32 [packed=true] = 85
int32Pk: 2 #@ repeated int32 [packed=true] = 85
#@ prototext: protoc
doublePk: 0 #@ repeated double [packed=true] = 81; pack_size: 3
doublePk: 3.1415926535897931 #@ repeated double [packed=true] = 81
doublePk: 1.7976931348623157e+308 #@ repeated double [packed=true] = 81
#@ prototext: protoc
#@ repeated int64 [packed=true] = 83; pack_size: 0
int64Pk: 4 #@ repeated int64 [packed=true] = 83; pack_size: 1
#@ prototext: protoc
85: "\200\200\200\200\020\002\003\004" #@ INVALID_PACKED_RECORDS
#@ prototext: protoc
color: GREEN #@ Color(1) = 2
#@ prototext: protoc
unknown_color: 99 #@ Color(99) = 3; ENUM_UNKNOWN
#@ prototext: protoc
colors_pk: RED #@ repeated Color(0) [packed=true] = 5; pack_size: 3
colors_pk: GREEN #@ repeated Color(1) [packed=true] = 5
colors_pk: BLUE #@ repeated Color(2) [packed=true] = 5
#@ prototext: protoc
colors_pk: RED #@ repeated Color(0) [packed=true] = 5; pack_size: 3
colors_pk: 99 #@ repeated Color(99) [packed=true] = 5; ENUM_UNKNOWN
colors_pk: BLUE #@ repeated Color(2) [packed=true] = 5
#@ prototext: protoc
48: 2 #@ varint; TYPE_MISMATCH
Field 48 is declared bool (valid range 0–1) but wire value is 2. No field
declaration emitted; field number used as key.
#@ prototext: protoc
GroupOp { #@ group; GroupOp = 30; OPEN_GROUP
uint64Op: 0 #@ uint64 = 130
}
#@ prototext: protoc
4 { #@ group; END_MISMATCH: 44
11: 0 #@ varint
}
#@ prototext: protoc
0: 0x02010405a2040302 #@ fixed64; TAG_OOR
0 { #@ group; TAG_OOR; ETAG_OOR
}
#@ prototext: protoc
0: "\364\201\200" #@ INVALID_TAG_TYPE
Field number 0 used as key (no valid field number available).
#@ prototext: protoc
0: "\212\003\032Bogus END_GROUP just above" #@ INVALID_GROUP_END; TAG_OOR
#@ prototext: protoc
99: "\001\002" #@ TRUNCATED_BYTES; MISSING: 5
Length prefix declared 7 bytes; only 2 available.
#@ prototext: protoc
floatRp: 3.14159274 #@ repeated float = 42
42: "\333\017" #@ INVALID_FIXED32
#@ prototext: protoc
doublePk: 3.1415926535897931 #@ repeated double [packed=true] = 81
81: "\030-DT\373!\t" #@ INVALID_FIXED64
#@ prototext: protoc
[acme.blade_count]: 42 #@ int32 = 1000