Status: implemented Implemented in: 2026-04-30 App: reproto
Spec 0016 added polyglot support (via --force-proto2-output opt-out) and fixed packed encoding. Four further
rendering issues remain in re_field.py and re_descriptor.py:
Field labels — reproto currently emits optional T f = N; for every
singular non-oneof field regardless of syntax. In proto3, implicit
singular fields must have no label keyword. Emitting optional on
such a field is illegal in proto3 (protoc rejects it).
Synthetic oneofs — when a proto3 source has optional T f = N;,
protoc records a synthetic oneof _f in the descriptor. Reproto must
suppress this synthetic oneof block and instead emit optional T f = N;
as a top-level field. Currently the synthetic oneof block is already
skipped during oneof rendering (partial fix), but the optional label is
not emitted on the field itself because the proto2 path emits it
unconditionally anyway. In polyglot mode, the proto3 rendering path
must explicitly check proto3_optional and produce the correct output.
Default values in proto3 — reproto currently emits [default = ...]
for any field whose default_value is set in the descriptor. Proto3
does not allow explicit defaults; emitting them produces a file protoc
rejects. When ctx.target_syntax == "proto3",the
default-value option must be suppressed and a cli_warning emitted.
json_name over-emission — FieldDescriptorProto.json_name is
always populated by protoc (for both syntaxes) with the auto-derived
camelCase value. Reproto currently emits [json_name = "..."] whenever
the field is set in the descriptor, which is always — producing spurious
annotations on every field. The option must be emitted only when the
stored value differs from the auto-derived camelCase of the field name.
This fix is syntax-independent (applies in both proto2 and proto3 modes).
import weak in proto3 — import weak is proto2-specific. Reproto
already renders it correctly for proto2 (via weak_dependency indices in
FileDescriptorProto). When ctx.target_syntax == "proto3", a weak
import is an inconsistency: reproto must fall back to plain import and
emit a cli_warning.
Extension ranges and extend blocks in proto3 — extension ranges
(extensions N to M;) and extend Foo { ... } blocks are proto2-only
constructs (with the sole exception of extending *Options messages for
custom options, which is handled separately). When
ctx.target_syntax == "proto3", both must be omitted and a cli_warning
emitted per occurrence.
Issues 1–3, 5–6 are proto3-specific; issue 4 is a correctness bug in both syntaxes. All are fully specified in spec 0015 §2, §11, §3, §16, §8, and §6 respectively, and require no new empirical research.
ctx.target_syntax == "proto3",suppress the
optional label on implicit singular fields
(label == LABEL_OPTIONAL and proto3_optional == False and not in a
real oneof).ctx.target_syntax == "proto3",emit
optional on fields with proto3_optional == True (these fields are
rendered outside any oneof block).ctx.target_syntax == "proto3",suppress the
synthetic oneof block entirely and do not render its single member field
inside the oneof (that field is rendered at message level instead — see
goal 2).re_syntax.py (created in spec 0016 as
syntax.py — see §Note on module name): field_label() and
is_synthetic_oneof().ctx.target_syntax == "proto3",suppress
[default = ...] on any field whose default_value is set in the
descriptor; emit a cli_warning per field. Add a should_render_default()
helper to syntax.py.[json_name = "..."] only when the
stored value is non-empty and differs from the auto-derived camelCase of the
field name (using protoc's exact algorithm — see spec 0039).
Add a _camel_case() utility and a should_render_json_name() helper to
syntax.py.ctx.target_syntax == "proto3",render weak
imports as plain import and emit a cli_warning per occurrence. Add
allow_weak_import(target_syntax) to syntax.py.ctx.target_syntax == "proto3",omit
extensions N to M; declarations and extend Foo { ... } blocks; emit
a cli_warning per omitted declaration/block. Add
allow_extensions(target_syntax) to syntax.py.--force-proto2-output) must continue to pass.Note on module name: Spec 0016 created
reproto/src/reproto/syntax.pywithfdp_syntax()andpacked_option(). Spec 0015 §Architecture names this modulere_syntax.py. This spec adds to whichever file was actually created by spec 0016; if both exist, consolidate intore_syntax.pyand update the import inre_field.py. If onlysyntax.pyexists, add the new functions there and rename it in a single commit.
json_name, import weak, and extensions (groups,
message_set_wire_format) — deferred to later specs in the 0015 series.--syntax-overrides mechanism (deferred to spec 0015).Proto3 singular fields always carry label = LABEL_OPTIONAL in the
descriptor, regardless of whether the source had an explicit optional
keyword. The distinction between implicit and explicit presence is encoded
in proto3_optional:
| Source form | label | proto3_optional | synthetic oneof |
|---|---|---|---|
T f = N; (implicit) | LABEL_OPTIONAL | False | none |
optional T f = N; (explicit) | LABEL_OPTIONAL | True | _f created |
repeated T f = N; | LABEL_REPEATED | False | none |
field inside oneof | LABEL_OPTIONAL | False | none (real oneof) |
The rendering rules follow directly:
proto3_optional == False, not in real oneof): emit no label.proto3_optional == True): emit optional (the field is
rendered outside any oneof block).LABEL_REPEATED: emit repeated (same in proto2 and proto3).LABEL_REQUIRED cannot appear in a well-formed proto3 descriptor;
inconsistency handling is out of scope for this spec (see spec 0015
§Inconsistency handling).
Detection rule — a oneof is synthetic iff all three conditions hold:
oneof.name starts with _.proto3_optional == True.A synthetic oneof must be:
oneof _foo { ... } block emitted).optional T f = N;.Real oneofs (even those whose name happens to start with _) must never be
suppressed. The three-condition rule is sufficient to distinguish them
because protoc guarantees that a real oneof always has more than one field or
its field has proto3_optional == False.
Empirically confirmed (mockup f10_synthetic_oneof.proto and
f06_field_labels_proto3.proto in docs/mockup/): protoc creates one
synthetic oneof per optional field, never merging two optional fields
into the same synthetic oneof.
Proto2 allows [default = <value>] on optional/required scalar fields.
Proto3 forbids explicit defaults entirely (the zero value is always the
implicit default and is never stored in the descriptor).
FieldDescriptorProto.default_value is a string field that is absent
(HasField("default_value") == False) when no default was declared. In a
well-formed proto3 descriptor this field is never set. If it is set (e.g.
in a hand-crafted .pb), reproto must treat it as an inconsistency: emit a
cli_warning and omit the [default = ...] option from the output.
| Condition | proto2 rendering | proto3 rendering |
|---|---|---|
HasField("default_value") == False | nothing | nothing |
HasField("default_value") == True | [default = <val>] | omit + cli_warning |
The warning must include the file name and the fully-qualified field name.
Empirically confirmed (mockup f11_default_values_proto2.proto):
protoc never sets default_value in proto3 descriptors.
json_name (spec 0015 §16)Note: The camelCase algorithm described in the original version of this section was incorrect. See spec 0039 for the full findings and the correct specification. The summary below reflects the corrected understanding.
FieldDescriptorProto.json_name is always set by protoc in both proto2 and
proto3. It cannot be used to detect a user-supplied override — protoc writes
the same value (user-supplied or auto-derived) either way, and source_code_info
(the only other signal) is absent from most .pb files in practice.
The auto-derived camelCase of a field name follows protoc's character-by-character
algorithm (from descriptor.cc): consume an underscore and capitalize the next
character only when that next character is an ASCII letter. When the character
after _ is a digit or the string ends, the underscore is kept as-is. This
differs from a naive split-on-underscore approach in several edge cases:
| Input | Correct (protoc) | Wrong (split-based) |
|---|---|---|
foo_ | 'foo_' | 'foo' |
foo_1bar | 'foo_1bar' | 'foo1bar' |
foo__bar | 'foo_Bar' | 'fooBar' |
FOO_BAR | 'FOOBAR' | 'FOOBar' |
Additionally, when json_name is absent from the .pb, the Python protobuf
library returns "" (the default for an unset string field). Emitting
[json_name = ""] is syntactically invalid; an empty value must be treated as
absent.
Emit [json_name = "..."] only when:
field.json_name is non-empty, andfield.name.This fix applies in both proto2 and proto3 modes.
Empirically confirmed (mockup f09_json_name.proto): protoc stores the
user-supplied value when it differs from the auto-derived name, and the
auto-derived value when it is the same — so the stored value can always be
compared directly against the auto-derivation to decide whether to emit the
option. One edge case: same_as_auto in the mockup has an explicit
[json_name = "sameAsAuto"] in source, but the stored value equals the
auto-derived value, so the option is correctly suppressed.
import weak (spec 0015 §8)import weak is a proto2-specific directive. In the descriptor,
FileDescriptorProto.dependency lists all imports (both regular and weak)
by path. FileDescriptorProto.weak_dependency contains the indices into
dependency that are weak. Reproto already uses these fields to emit
import weak "..." for proto2 files.
In proto3, weak imports are illegal. A proto3 descriptor with non-empty
weak_dependency is inconsistent. The degraded rendering is: emit
import "..." (plain import, same path) and a cli_warning per weak
import. The import itself is preserved so that type resolution in the
output is not broken.
Empirically confirmed (mockup f14_weak_import_proto2.proto and
f14_weak_import_proto2_dep.proto): weak_dependency contains the
0-based index of the weak import within dependency.
Proto2 supports two extension constructs:
extensions N to M; inside a message body.
These appear as DescriptorProto.extension_range entries in the
descriptor.extend blocks — extend Foo { ... } at file or message scope.
File-level extensions appear in FileDescriptorProto.extension; nested
extensions appear in DescriptorProto.extension.Proto3 forbids both for user-defined message types. (Extending *Options
messages for custom options is an exception, but such extensions appear in
FileDescriptorProto.extension just like any other extension — this spec
does not attempt to distinguish them. Omitting all extension constructs in
proto3 output is the safe, conservative choice for roundtrip purposes.)
Degraded rendering in proto3 mode:
extensions N to M; declaration; emit a cli_warning naming
the message and the range.extend Foo { ... } block (file-level or nested); emit a
cli_warning naming the extendee and the enclosing scope.Empirically confirmed (mockup f12_extensions_proto2.proto): both
file-level and message-nested extend blocks appear in the descriptor as
expected.
syntax.py (or re_syntax.py)Convention: every helper in syntax.py takes ctx as its first
argument and reads whatever it needs (ctx.syntax, ctx.target_syntax,
etc.) from it directly. This keeps call sites simple and avoids the
parameter list growing as more context-dependent decisions are added.
fdp_syntax() is the sole exception: it is called to populate ctx.syntax
and therefore cannot receive ctx.
Add the following two functions:
def field_label(ctx: Context, field, is_oneof: bool) -> str:
"""
Return the label keyword to emit before the field type (with a trailing
space), or '' if no label should be emitted.
Args:
ctx: rendering context (reads ctx.target_syntax)
field: FieldDescriptorProto
is_oneof: True if this field is rendered inside a real oneof block
(synthetic oneof members are passed is_oneof=False)
Rules:
- is_oneof → ''
- field.label == LABEL_REPEATED → 'repeated '
- ctx.target_syntax == "proto2":
field.label == LABEL_REQUIRED → 'required '
field.label == LABEL_OPTIONAL → 'optional '
- ctx.target_syntax == "proto3":
field.proto3_optional → 'optional '
else → '' (implicit singular)
"""
def is_synthetic_oneof(ctx: Context, oneof_name: str, members: list) -> bool:
"""
Return True iff the given oneof is a proto3 synthetic oneof.
Returns False immediately when ctx.target_syntax != "proto3", making
the function safe to call unconditionally regardless of syntax.
Detection rule (all conditions must hold):
1. ctx.target_syntax == "proto3"
2. oneof_name starts with '_'
3. exactly one field is in members
4. that field has proto3_optional == True
"""
re_field.py — use field_label()Replace the existing label-emission logic with a call to field_label().
Key detail: a field whose synthetic oneof is suppressed is rendered at
message level (not inside a oneof block), so is_oneof must be False
for such a field. The caller (re_descriptor.py, see §3) is responsible
for passing the correct is_oneof value.
The render(ctx, depth, is_oneof) signature is unchanged. Inside:
from .syntax import field_label
label_str = field_label(ctx, self.this, is_oneof)
# emit: f"{label_str}{type_str} {field.name} = {field.number}{opts};"
The existing branch that hardcodes optional/required/repeated is
removed and replaced by label_str.
The packed_option() call is similarly updated to the ctx-first
convention: packed_option(ctx, has_packed, effective_packed).
re_descriptor.py — synthetic oneof suppressionThis is the most structurally significant change in this spec.
At the top of ReDescriptorProto.render(), before iterating over fields and
oneofs, build two sets:
# Map from oneof_index → list of fields in that oneof
from collections import defaultdict
oneof_fields: dict[int, list] = defaultdict(list)
for f in self.this.field:
if f.HasField('oneof_index') or f.oneof_index >= 0:
# Note: oneof_index is always present when the field is in a oneof;
# use HasField only if the proto uses proto3 optional detection.
# Safe fallback: check oneof_index against the oneof_decl length.
pass # filled in implementation
synthetic_oneof_indices: set[int] = set()
for idx, oneof in enumerate(self.this.oneof_decl):
members = [f for f in self.this.field
if f.HasField('oneof_index') and f.oneof_index == idx]
if is_synthetic_oneof(oneof, members):
synthetic_oneof_indices.add(idx)
Implementation note on
HasField('oneof_index'): In proto3,oneof_indexis a scalarint32with noHasFieldsupport in the Python API — presence is inferred from whether the field is a member of a oneof at all (i.e.,field.HasField('oneof_index')raisesValueErrorfor scalar fields). The correct check in prost-reflect / protobuf Python is:f.WhichOneof('oneof_index') is not Noneor simply iteratemessage.oneofs[idx].fields. Implementors should use whichever API the existing codebase already uses for oneof membership detection.
When iterating self.this.oneof_decl to emit oneof blocks, skip any
index in synthetic_oneof_indices.
The current code likely separates "oneof fields" from "non-oneof fields"
when iterating self.this.field. Fields whose oneof_index is in
synthetic_oneof_indices must be treated as non-oneof fields for
rendering purposes:
oneof block.is_oneof=False is passed to their render() call.proto3_optional == True, so field_label() returns
'optional ' for them.Fields whose oneof_index is in a real oneof index (not in
synthetic_oneof_indices) continue to be rendered inside the oneof block
with is_oneof=True.
syntax.py — should_render_default() and json_name helpersAdd three more functions to syntax.py:
def should_render_default(target_syntax: str, field) -> bool:
"""
Return True iff [default = ...] should be rendered for this field.
Args:
target_syntax: "proto2" or "proto3" (ctx.target_syntax)
field: FieldDescriptorProto
Emits a cli_warning if default_value is set in a proto3 file.
The caller is responsible for including file/field context in the warning.
"""
has_default = field.HasField('default_value')
if not has_default:
return False
if target_syntax == "proto3":
return False # caller must emit cli_warning
return True
def _camel_case(name: str) -> str:
"""
Derive the default JSON name (camelCase) for a proto field name.
Implements protoc's exact character-by-character algorithm: consume an
underscore and capitalize the next character only when that next character
is an ASCII letter. When the character after '_' is a digit or the string
ends, the underscore is kept as-is.
Examples (matching protoc):
'field_name' → 'fieldName'
'x' → 'x'
'foo_' → 'foo_' (trailing underscore kept)
'foo_1bar' → 'foo_1bar' (underscore before digit kept)
'foo__bar' → 'foo_Bar' (first _ kept, second consumed)
'FOO_BAR' → 'FOOBAR' (no lowercasing of existing caps)
See spec 0039 for the full edge-case analysis.
"""
result = []
i = 0
while i < len(name):
if name[i] == '_' and i + 1 < len(name) and name[i + 1].isalpha():
result.append(name[i + 1].upper())
i += 2
else:
result.append(name[i])
i += 1
return ''.join(result)
def should_render_json_name(field) -> bool:
"""
Return True iff [json_name = "..."] should be emitted for this field.
Emit only when the stored json_name is non-empty (empty means absent from
the .pb) and differs from the auto-derived camelCase of field.name.
This is syntax-independent.
"""
return bool(field.json_name) and field.json_name != _camel_case(field.name)
should_render_default() returns False for proto3; the caller in
re_field.py must separately emit the cli_warning when
target_syntax == "proto3" and field.HasField('default_value') is True.
re_field.py — default value and json_name gatesReplace the existing guard around [default = ...] emission:
from .syntax import should_render_default
if not should_render_default(ctx.target_syntax, self.this):
if (ctx.target_syntax == "proto3"
and self.this.HasField('default_value')):
cli_warning(
f"{ctx.current_file}: field '{self.this.name}': "
f"explicit default values are not valid in proto3; omitting"
)
else:
# existing default-value rendering code unchanged
...
json_nameReplace the existing json_name emission guard (or add one if missing):
from .syntax import should_render_json_name
if should_render_json_name(self.this):
opt_block.append(BlockLine(f'json_name = "{self.this.json_name}",', depth + 1))
Remove any unconditional json_name emission.
syntax.py — allow_weak_import() and allow_extensions()Add two more predicate functions:
def allow_weak_import(target_syntax: str) -> bool:
"""Return True iff import weak is legal in this syntax."""
return target_syntax == "proto2"
def allow_extensions(target_syntax: str) -> bool:
"""Return True iff extension ranges and extend blocks are legal."""
return target_syntax == "proto2"
re_file.py — import weak degradationReproto already iterates fdp.dependency and uses fdp.weak_dependency
to decide whether to emit import weak "..." or import "...". Add a
guard around the weak keyword:
from .syntax import allow_weak_import
for i, dep in enumerate(self.this.dependency):
is_weak = i in weak_set # weak_set built from weak_dependency indices
if is_weak and not allow_weak_import(ctx.target_syntax):
cli_warning(
f"{ctx.current_file}: 'import weak' is not valid in proto3; "
f"rendering as plain import: \"{dep}\""
)
is_weak = False
keyword = "weak " if is_weak else ""
lines.append(f'import {keyword}"{dep}";')
The import path is always emitted; only the weak keyword is suppressed.
re_descriptor.py — extension range and extend block guardsextensions N to M;)When iterating self.this.extension_range to emit extensions statements,
skip ranges and warn when extensions are not allowed:
from .syntax import allow_extensions
if not allow_extensions(ctx.target_syntax):
for er in self.this.extension_range:
cli_warning(
f"{ctx.current_file}: message '{self.this.name}': "
f"extension range [{er.start}, {er.end}) is not valid in "
f"proto3; omitting"
)
else:
# existing extension_range rendering unchanged
...
extend blocks (FileDescriptorProto.extension)File-level extensions are rendered in re_file.py (or wherever
fdp.extension is iterated). Wrap that iteration:
if not allow_extensions(ctx.target_syntax):
for ext in self.this.extension:
cli_warning(
f"{ctx.current_file}: top-level extend block for "
f"'{ext.extendee}' is not valid in proto3; omitting"
)
else:
# existing file-level extension rendering unchanged
...
extend blocks (DescriptorProto.extension)Same pattern as 8b, applied wherever msg.extension is iterated:
if not allow_extensions(ctx.target_syntax):
for ext in self.this.extension:
cli_warning(
f"{ctx.current_file}: message '{self.this.name}': "
f"nested extend block for '{ext.extendee}' is not valid "
f"in proto3; omitting"
)
else:
# existing nested extension rendering unchanged
...
Copy six mockup files into reproto/src/reproto/tests/fixtures/:
field_labels_proto3.proto — sourced from docs/mockup/f06_field_labels_proto3.protosynthetic_oneof.proto — sourced from docs/mockup/f10_synthetic_oneof.protodefault_values_proto2.proto — sourced from docs/mockup/f11_default_values_proto2.protojson_name.proto — sourced from docs/mockup/f09_json_name.protoweak_import_proto2.proto — sourced from docs/mockup/f14_weak_import_proto2.proto
(also copy f14_weak_import_proto2_dep.proto → weak_import_proto2_dep.proto since the
import path inside the fixture references the dependency by name)extensions_proto2.proto — sourced from docs/mockup/f12_extensions_proto2.protoDrop the f06_/f09_/f10_/f11_/f12_/f14_ prefixes. Leave
package mockup; as-is. Update the import weak "..." path inside
weak_import_proto2.proto to reference weak_import_proto2_dep.proto.
Proto2 fixtures go into the existing DEFAULT_FIXTURES list.
Proto3 polyglot fixtures are split into two categories based on whether a
lossless roundtrip is possible with --force-proto2-output:
Strict (POLYGLOT_FIXTURES_STRICT): fixtures where the only
force-proto2 .pb difference is syntax. The existing
differing <= {"syntax"} assertion applies.
Lossy (POLYGLOT_FIXTURES_LOSSY): fixtures that use proto3-only
descriptor fields (proto3_optional, synthetic oneof_decl, oneof_index)
which are structurally impossible to reproduce from proto2 source. The
force-proto2 roundtrip still runs for crash-safety, but the field-diff
assertion is widened to differing <= PROTO3_ONLY_FIELDS where:
PROTO3_ONLY_FIELDS = {"syntax", "proto3_optional", "oneof_index", "oneof_decl", "name"}
"name" appears in the set because pb_diff_fields traverses into
missing oneof_decl sub-messages and surfaces their name child field;
it is an artifact of the diff algorithm, not a top-level field change.
Fixture assignments:
packed_proto2.proto → POLYGLOT_FIXTURES_STRICT (spec 0016)packed_proto3.proto → POLYGLOT_FIXTURES_STRICT (spec 0016)json_name.proto → POLYGLOT_FIXTURES_STRICT (proto3, but no synthetic oneofs)field_labels_proto3.proto → POLYGLOT_FIXTURES_LOSSY (has synthetic oneofs)synthetic_oneof.proto → POLYGLOT_FIXTURES_LOSSY (has synthetic oneofs)default_values_proto2.proto → DEFAULT_FIXTURESweak_import_proto2.proto → DEFAULT_FIXTURESextensions_proto2.proto → DEFAULT_FIXTUREStest_roundtrip_polyglot is updated to accept both lists. For strict
fixtures it behaves exactly as before. For lossy fixtures it substitutes
PROTO3_ONLY_FIELDS for {"syntax"} in the force-proto2 assertion. The
default (polyglot) pass (full .pb + .proto comparison) is identical for
both categories.
After this spec is implemented, running pytest must show:
test_roundtrip[*] tests pass (no regression).test_roundtrip_polyglot[packed_proto2.proto] and
test_roundtrip_polyglot[packed_proto3.proto] pass (spec 0016 regression,
strict category).test_roundtrip_polyglot[json_name.proto] passes (strict category).test_roundtrip_polyglot[field_labels_proto3.proto] passes (lossy category):
PROTO3_ONLY_FIELDS.optional fields render with optional label outside any oneof;
repeated fields render with repeated label; fields inside a real
oneof render with no label inside the oneof block.test_roundtrip_polyglot[synthetic_oneof.proto] passes (lossy category):
PROTO3_ONLY_FIELDS._opt_scalar, _opt_string) are
suppressed; their member fields render at message level as
optional int32 opt_scalar = 1; etc.; the real oneof real_choice { ... }
block is preserved.test_roundtrip[default_values_proto2.proto] passes:
[default = ...] annotations are reproduced exactly.no_def) produces no default option.test_roundtrip_polyglot[json_name.proto] passes:
field_name, already_camel, under_score_heavy — no [json_name]
emitted (value equals auto-derived camelCase).custom — [json_name = "My"] emitted (differs from auto).same_as_auto — no [json_name] emitted (stored value equals
auto-derived value, even though it was explicit in source).test_roundtrip[weak_import_proto2.proto] passes:
import weak "weak_import_proto2_dep.proto"; is reproduced exactly.test_roundtrip[extensions_proto2.proto] passes:
extensions 100 to 199; inside Extendable is reproduced.extend Extendable { ... } block is reproduced.extend Extendable { ... } inside Holder is reproduced.None.