We're using oncrpc4j version 3.3.2 (which, from what I can see from diffing the releases, is functionally identical to 3.3.3 - the latest).
here's some log snippet from our product for context:
2025-06-16 19:18:07.641 +0000,1837242509882 {} DEBUG c.p.d.m.MobilityControl:318 [msm-message-consumer-2-0] msmId=2-1750099925069: objId=2-2562: COW: SET_INSTANCES_COW_SPLIT: calling SICS (async) with pdfsSessionId=2370338415347, ...
2025-06-16 19:18:07.641 +0000,1837242675698 {} WARN c.p.d.m.MobilityControl:318 [msm-message-consumer-2-0] msmId=2-1750099925069: objId=2-2562: COW: SET_INSTANCES_COW_SPLIT: SICS (async) call failed: did not get a response within 5 MINUTES
The important part to note here is that we're calling an async RPC endpoint with a timeout of 5 minutes in the first line at 19:18:07.641. Literally within the same millisecond, we're getting a TimeoutException on the failed handler from the oncrpc4j library indicating "did not get a response within 5 MINUTES". This exception message attached to the TimeoutException comes from rpc/ReplyQueue here:
public void registerKey(int xid, SocketAddress addr, CompletionHandler<RpcReply, RpcTransport> callback, final long timeout, final TimeUnit timeoutUnits) throws EOFException {
ScheduledFuture<?> scheduledTimeout = null;
if (timeout > 0 && timeoutUnits != null) {
scheduledTimeout = executorService.schedule(() -> {
CompletionHandler<RpcReply, RpcTransport> handler = get(xid);
if (handler != null) { //means we're 1st, no response yet
handler.failed(new TimeoutException("did not get a response within " + timeout + " " + timeoutUnits), null);
}
}, timeout, timeoutUnits);
}
_queue.put(xid, new PendingRequest(addr, callback, scheduledTimeout));
}
I don't really know how this can be happening - it would imply a bug in the scheduler I think, which I find hard to imagine, but the proof is clear. The message "did not get a response within" is clearly coming from oncrpc4j ReplyQueue code, and the timeout is clearly 5 minutes, based on the content of the same message. Yet, we're failing immediately. It's like schedule is running the event on the same thread inline (though that is not proven here).
[update] We're building our code against the 3.3.2 version of oncrpc4j library on maven central - we compile with Java 21.
[update 2] I've analyzed our logs and found that there were hundreds of legitimate 5 MINUTE timeouts in this session. Only this anomalous zero-delay "5 MINUTE" timeout occurred during the session. Submissions and time outs that occurred on either side if this event seemed to work perfectly.
We're using oncrpc4j version 3.3.2 (which, from what I can see from diffing the releases, is functionally identical to 3.3.3 - the latest).
here's some log snippet from our product for context:
The important part to note here is that we're calling an async RPC endpoint with a timeout of 5 minutes in the first line at 19:18:07.641. Literally within the same millisecond, we're getting a TimeoutException on the failed handler from the oncrpc4j library indicating "did not get a response within 5 MINUTES". This exception message attached to the TimeoutException comes from rpc/ReplyQueue here:
I don't really know how this can be happening - it would imply a bug in the scheduler I think, which I find hard to imagine, but the proof is clear. The message "did not get a response within" is clearly coming from oncrpc4j ReplyQueue code, and the timeout is clearly 5 minutes, based on the content of the same message. Yet, we're failing immediately. It's like schedule is running the event on the same thread inline (though that is not proven here).
[update] We're building our code against the 3.3.2 version of oncrpc4j library on maven central - we compile with Java 21.
[update 2] I've analyzed our logs and found that there were hundreds of legitimate 5 MINUTE timeouts in this session. Only this anomalous zero-delay "5 MINUTE" timeout occurred during the session. Submissions and time outs that occurred on either side if this event seemed to work perfectly.