Skip to content

Commit 69dbf46

Browse files
Update arena fusion documentation to cover one way refs
PiperOrigin-RevId: 828552912
1 parent 2e61833 commit 69dbf46

File tree

1 file changed

+126
-4
lines changed

1 file changed

+126
-4
lines changed

docs/upb/arena_fusion.md

Lines changed: 126 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ in Visual Studio Code:
99
https://marketplace.visualstudio.com/items?itemName=shd101wyy.markdown-preview-enhanced
1010
--->
1111

12-
# μpb Arena Fusion
12+
# μpb Arena Fusion and References
1313

1414
μpb generally follows a thread-compatibility model where only operations on a
1515
`const` pointer may occur concurrently from multiple threads; non-const
@@ -32,7 +32,8 @@ in synchronizing lifetimes from multiple things observing the same arena (which
3232
may be a non-`const` pointer for multiple writers in a single threaded context,
3333
or a `const` pointer for multiple readers in a multithreaded context).
3434

35-
To be usable everywhere, it has a lock-free implementation, documented below.
35+
To be usable everywhere, fusion has a lock-free implementation, documented
36+
below.
3637

3738
## Data Structure
3839

@@ -60,8 +61,8 @@ a subset of fused arenas while racing with fuse calls.
6061
6162
## Finding the root
6263
63-
Each set of fused arenas is uniquely identified by the root node of its tree.
64-
To find the root for a given arena, traverse the `parent_or_count` pointer until
64+
Each set of fused arenas is uniquely identified by the root node of its tree. To
65+
find the root for a given arena, traverse the `parent_or_count` pointer until
6566
you reach a node with a count instead of a parent pointer - that's the root. To
6667
avoid expensive lookups if frequently repeated, this data structure uses path
6768
splitting to halve the distance from the root for every node traversed.
@@ -260,6 +261,30 @@ C -> B [weight=0.001, color="red"]
260261
}
261262
```
262263

264+
## One-Way References
265+
266+
Fusion creates a bi-directional lifetime dependency between arenas: fused arenas
267+
will have the same lifetime. In some cases, only a one-way dependency is needed.
268+
If messages in arena A contain pointers to messages in arena B but not the
269+
reverse, then arena B only needs to live at least as long as arena A.
270+
271+
`upb_Arena_RefArena(A, B)` establishes such a one-way dependency by having arena
272+
`A` increment `B`'s refcount. When arena A is freed, it releases its reference
273+
on `B`. This is implemented by allocating a special block in `A` that holds a
274+
pointer to `B`; this block is processed during `upb_Arena_Free(A)`. Care is
275+
taken to ensure that these blocks are processed before the memory blocks
276+
containing them are freed from the underlying allocator.
277+
278+
Unlike fusion, `upb_Arena_RefArena(A, B)` is not thread-safe when called
279+
concurrently with other operations on arena `A`, but it is thread-safe on `B`.
280+
281+
It is an error to create cycles of references (e.g., `upb_Arena_RefArena(A, B);
282+
upb_Arena_RefArena(B, A)`), or to create a reference between arenas that are
283+
fused. Since fusion creates a bi-directional dependency, fusing arenas A and B
284+
and calling `upb_Arena_RefArena(B, A)` would create a cycle A <-> B -> A. In
285+
debug builds, upb checks for such cycles, with the implementation documented
286+
below.
287+
263288
## Counting allocated space
264289

265290
Traversing the linked list while the arenas are still live can be tricky, as we
@@ -271,3 +296,100 @@ fuse operation that joined them is still in progress. But counting allocated
271296
space is always consistent with itself and any fully complete fuses - nodes are
272297
only appended or prepended to the list, so starting the traversal at the same
273298
point each time guarantees that future scans see a superset of previous ones.
299+
Referenced arenas are not included in the count of allocated space.
300+
301+
## Detecting reference cycles
302+
303+
To help prevent memory leaks from uncollectable arenas, upb checks for cycles in
304+
debug builds when creating a reference or fusing arenas. A cycle can be formed
305+
purely by references (eg. `A->B->A`) or by a combination of references and
306+
fusions (eg. `Fuse(A, B)`, then `RefArena(B, A)` creates the cycle `A<->B->A`).
307+
If such cycles were permitted, the arenas involved could never be freed.
308+
309+
After performing a fusion or adding a reference, the cycle detector runs. This
310+
check is performed after the operation because cycle detection cannot be
311+
performed atomically; if the check was done prior to the fusion or reference, we
312+
could miss a case where two concurrent operations resulted in a cycle. Both
313+
would check for a possible cycle, find none, and then proceed.
314+
315+
Consider arenas with references `A->B->C`.
316+
317+
```dot
318+
digraph {
319+
rankdir=LR;
320+
A; B; C;
321+
A -> B [label="ref", color=blue];
322+
B -> C [label="ref", color=blue];
323+
}
324+
```
325+
326+
If we attempt `RefArena(C, A)`, the call will add ref `C->A`, then check that
327+
`C` is not reachable from `A`.
328+
329+
```dot
330+
digraph {
331+
rankdir=LR;
332+
A -> B [label="ref", color=blue];
333+
B -> C [label="ref", color=blue];
334+
C -> A [label="ref", color=blue];
335+
}
336+
```
337+
338+
The traversal will reveal `A -> B -> C` and trigger an assertion failure.
339+
340+
If instead attempt `Fuse(A, C)`, fusion will occur:
341+
342+
```dot
343+
digraph {
344+
rankdir=LR;
345+
node [shape=box, style=rounded];
346+
A -> B [label="ref", color=blue];
347+
B -> C [label="ref", color=blue];
348+
C -> A [label="fuse", dir=both, color=red];
349+
}
350+
```
351+
352+
The traversal will find `C <-> A -> B -> C`, and trigger an assertion failure.
353+
354+
### Cycle detection algorithm
355+
356+
Cycles are detected with a non-memoizing recursive depth-first search (DFS). A
357+
path can traverse fusions bi-directionally and references uni-directionally to
358+
find a cycle that contains at least one unidirectional edge. While this is not
359+
asymptotically optimal as it may traverse the same set of nodes numerous times,
360+
it doesn't require allocating memory, making it less intrusive as a debug-only
361+
check. It also may result in infinite recursion if a cycle is created on one
362+
thread while another thread is performing cycle checks, but that would have
363+
caused an assertion failure anyway.
364+
365+
#### Fast check for fusion
366+
367+
If a directional ref was added and `from` and `to` are already fused
368+
(`upb_Arena_IsFused(from, to)` returns true), they are mutually reachable, so a
369+
path containing a directed edge exists and the assertion fails.
370+
371+
#### Traversing fusion group members
372+
373+
To check all references originating from `from`'s fusion group, we must visit
374+
every arena fused with `from`. Because fusion operations can race with this
375+
check, we cannot rely on traversing from the fusion root, which might change.
376+
Instead, like `upb_Arena_SpaceAllocated`, the algorithm finds members of
377+
`from`'s fusion group by first traversing backwards from `from` using
378+
`previous_or_tail`, then iterates forwards.
379+
380+
#### Traverse fusion group and DFS on references
381+
382+
From the head of the list segment, we traverse forwards using `next` to visit
383+
all nodes in `from`'s fusion group. For each arena `X` in this group, we iterate
384+
through all of its outgoing references (the list of arenas `Y` such that
385+
`RefArena(X, Y)` was called). For each such reference `X -> Y`:
386+
387+
* If `Y == to`, then `to` is directly referenced, so a path exists. We return
388+
`true` and eventually trigger an assertion failure.
389+
* If `Y != to`, we recurse to continue the depth-first search from `Y`,
390+
finding members of its fusion group and inspecting their outgoing
391+
references. If this recursive call returns `true`, then `to` is reachable
392+
via `Y`, and we return `true` to trigger an assertion failure.
393+
394+
If we explore all arenas in the fusion group and all their transitive references
395+
without finding a path to `to`, no path exists, and we return `false`.

0 commit comments

Comments
 (0)