Skip to content

Batch pushes and pops#103

Open
raphael-s-steiner wants to merge 15 commits intomax0x7ba:masterfrom
raphael-s-steiner:master
Open

Batch pushes and pops#103
raphael-s-steiner wants to merge 15 commits intomax0x7ba:masterfrom
raphael-s-steiner:master

Conversation

@raphael-s-steiner
Copy link
Copy Markdown

This pull request adds batch pushes and pops using iterator semantics to alleviate pressure on the atomic heads and tails of the queue.

In particular, it adds the following functions with the following signature:

  1. ATOMIC_QUEUE_INLINE InputIt try_push(InputIt first, InputIt const last) noexcept
  2. ATOMIC_QUEUE_INLINE int try_pop(OutputIt& first, int n) noexcept
  3. ATOMIC_QUEUE_INLINE InputIt push(InputIt first, InputIt const last) noexcept
  4. ATOMIC_QUEUE_INLINE OutputIt pop(OutputIt first, unsigned n) noexcept

Some details:

  1. The return iterator is one past the last pushed element to the queue, which of course can be different from last when the queue is empty.
  2. The return integer is the number of successful pop which can be small than the desired number of pops n. The iterator is taken by reference. Its value after the function is one past the last successful pop of the queue. This is such that it may be used to more effectively implement the RetryDecorator and other use cases where both the number of pops and the iterator is required. Note that returning the iterator is not sufficient for iterator mimics, such as std::back_inserter.
  3. The return iterator is one past the last pushed element to the queue.
  4. The return iterator is one past the last popped element from the queue.

Note that int in 2. and unsigned in 4. are purposely chosen such that the implementation uses fewer conversions and is more efficient.

@max0x7ba
Copy link
Copy Markdown
Owner

Wow, Raphael, this is a massive contribution with no precedents in the past.

I am most thankful, of course, but the feeling of being most impressed overwhelms me.

«You will know them by their fruit» and your fruit looks like a product of the most delicate labour of love to me 🤷‍♂️💯.

Comment thread src/tests.cc
@raphael-s-steiner
Copy link
Copy Markdown
Author

Wow, Raphael, this is a massive contribution with no precedents in the past.

I am most thankful, of course, but the feeling of being most impressed overwhelms me.

«You will know them by their fruit» and your fruit looks like a product of the most delicate labour of love to me 🤷‍♂️💯.

Thank you for your kind comments and for such a great library - Truly one of the greatest and most thoroughly engineered multi-producer multi-consumer queues out there.

@max0x7ba
Copy link
Copy Markdown
Owner

The unit-tests fail because of taking too much time to execute:

2/2 tests   TIMEOUT        30.04s   killed by signal 15 SIGTERM

It looks like the tests gets stuck in:

src/tests.cc(103): Entering test case "stress_batch<atomic_queue__CapacityArgAdaptor<atomic_queue__AtomicQueueB<unsigned int_ std__allocator<unsigned int>_ 0u_ true_ false_ true>_ 4096ul>>"

Copy link
Copy Markdown
Owner

@max0x7ba max0x7ba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably the missing checks for sizes in push and pop what cause the unit-test to deadlock.

unsigned head;
if(Derived::spsc_) {
head = head_.load(X);
head_.store(head + n, X);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n can be greater than the buffer size of the number of free slots in the queue. These conditions must be checked.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There does seem to be an issue here, but it is not so trivial to pinpoint the exact data race with a loop around of the buffer that triggers it.

I will need some time to figure this one out.

unsigned tail;
if(Derived::spsc_) {
tail = tail_.load(X);
tail_.store(tail + n, X);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here.

@raphael-s-steiner
Copy link
Copy Markdown
Author

raphael-s-steiner commented Apr 21, 2026

The unit-tests fail because of taking too much time to execute:

2/2 tests   TIMEOUT        30.04s   killed by signal 15 SIGTERM

It looks like the tests gets stuck in:

src/tests.cc(103): Entering test case "stress_batch<atomic_queue__CapacityArgAdaptor<atomic_queue__AtomicQueueB<unsigned int_ std__allocator<unsigned int>_ 0u_ true_ false_ true>_ 4096ul>>"

The batch sizes in the tests are too small relative to the capacity to trigger the mentioned issue (to be addressed). The more likely culprit here is that the tests do take longer with the sanitizers. On my machine, they run in a little more than 60s. The additional tests in the current PR increase the test time by 5x and the pre-existing test does take around 12-13s.

@max0x7ba
Copy link
Copy Markdown
Owner

May be do shorter tests with sanitisers?

Building with sanitizers defines extra macros that can be used to adjust the number of test iterations.

May be do random batch sizes.

@max0x7ba
Copy link
Copy Markdown
Owner

max0x7ba commented Apr 21, 2026

Thinking more about iterators,

It is conceivable that the caller of push knows the exact size of the iterator range,

Yet the iterators may not necessarily be of random-access category. std::distance complexity is O(1) for random-access iterators only and O(n) for anything else.

Calling std::distance has non-zero cost, in general. We must not call std::distance. Let the caller supply the length of the iterator range, it may have the length already available.

Zero-cost batch interface, is:

push(input_iterator begin, unsigned size);

It also enables passing in any kind of iterator, including single-pass input iterators, which are often generator objects, producing the next value in its overloaded operator*(). The latter can also be handy for unit-testing.

@max0x7ba
Copy link
Copy Markdown
Owner

The batch sizes in the tests are too small relative to the capacity to trigger the mentioned issue

GitHub actions hosts may be using cheapest shared CPUs, threads get little CPU time. Consumer threads may get delayed and and the queues can easily get full in the unit-tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants