Skip to content

Add Python client and server for UiPath.Ipc#125

Open
eduard-dumitru wants to merge 52 commits into
masterfrom
feature/python
Open

Add Python client and server for UiPath.Ipc#125
eduard-dumitru wants to merge 52 commits into
masterfrom
feature/python

Conversation

@eduard-dumitru

@eduard-dumitru eduard-dumitru commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a Python client and server for UiPath.Ipc that speaks the same wire protocol as the .NET package, plus the CI/CD plumbing to build, test, and publish it. Published and resolvable on the project-scoped uipath-ipc-deps feed. The Python package version tracks UiPath.CoreIpc (<Version>2.5.1</Version>), stamped to a PEP 440 local-version string per CI run (e.g. 2.5.1+<build>) by src/CI/stamp-python-version.py — so pyproject.toml's version is a placeholder the pipeline overwrites.

Scope — landed incrementally, each in its own commit:

  1. Client — dynamic-proxy RPC over Named Pipe / TCP.
  2. Bidirectional callbacks — the client hosts callback services the server invokes.
  3. ServerIpcServer hosts services that a .NET or Python client calls.
  4. Handler reach-backmessage.client.get_callback(...), the Python analog of .NET's m.Client.GetCallback<T>().

Streams (upload / download) remain explicitly out of scope.

Python client (src/Clients/python/uipath-ipc/)

  • Pure-Python, asyncio-based; talks the same wire to any UiPath.Ipc peer.
  • Transports: NamedPipe (Win + POSIX via /tmp/CoreFxPipe_*) and TCP — both client and server halves — with bounded retry on FileNotFoundError to ride out the peer's accept-loop warm-up.
  • IpcClient(transport=..., callbacks={IClientCallback: impl}): lazy connect, dynamic __getattr__ proxy via get_proxy(IContract), auto-reconnect on transport drop, optional request_timeout.
  • RemoteException preserves message / type_name / stack_trace / inner chain, sets __cause__ for traceback display.
  • Cancellation is task-based; no CancellationToken parameter on signatures. The proxy forwards CancellationRequest on the wire when its task is cancelled; peer-initiated cancellation aborts the matching in-flight handler.
  • Wheel ships as py3-none-any with PEP 561 py.typed.

Callback (server → client) support

  • IpcClient(callbacks={Contract: instance}) registers handler objects keyed by interface name.
  • IpcConnection dispatches incoming REQUEST frames: looks up the endpoint by interface name, resolves the method by name, json.loads each parameter individually (matching the .NET convention), invokes (awaitable or sync), and writes a Response back. A write lock keeps concurrent outbound frames aligned.
  • Handler exceptions wire back as RemoteException. Peer-initiated CancellationRequest cancels the matching handler task; the emitted error mimics .NET's OperationCanceledException.
  • Callback methods must NOT declare CancellationToken parameters — the caller doesn't include CT in the wire Parameters array (matches the existing .NET IComputingCallback convention).

Server — IpcServer (NEW)

  • IpcServer(transport, services={IContract: impl}, request_timeout=None): listens via a ServerTransport and wraps each accepted client in the existing symmetric IpcConnection, whose callbacks dict is the hosted-service set. Duck-typed dispatch by contract __name__ (the instance need not inherit the contract).
  • Server transports: TcpServerTransport (asyncio.start_server) and NamedPipeServerTransport (start_serving_pipe on Windows, start_unix_server at /tmp/CoreFxPipe_<name> on POSIX).
  • start / serve_forever / aclose + async context manager; connection_count introspection. Live connections are pruned via a new IpcConnection.add_close_callback hook that fires once on explicit close or peer disconnect.
  • aclose closes connections before awaiting the listener, so Python 3.12+'s Server.wait_closed() (which blocks until active connections finish) doesn't hang.

Handler reach-back — Message.Client / GetCallback (NEW)

  • message.py: Message / Message[T] + an IClient protocol. A service or callback method opts into a handle on its caller by declaring a Message parameter; the wire never carries it (mirrors .NET's trailing-Message convention — the caller sends the same args either way).
  • IpcConnection.get_callback(Contract) returns a proxy bound to that connection, so a handler can call the peer back mid-request — the inverse direction of the in-flight call. Python analog of m.Client.GetCallback<T>(). Reached as m.client.get_callback(IContract).
  • Dispatch (_invoke_callback) does signature-aware binding: wire args fill the real parameters, a Message (.client = the connection, .request_timeout carried through) is injected at its slot; the no-Message fast path is unchanged and cached per function. Works symmetrically for client-hosted callbacks and server-hosted services, since both run through the one IpcConnection.

Dedicated .NET test server (src/IpcSample.PythonClientTestServer/)

Purpose-built for the Python integration suite — net8.0, console logging on, stable READY pipe=<name> startup marker. Hosts IComputingService / ISystemService (callback-free) plus ICallbackTester, whose handlers call m.Client.GetCallback<IClientCallback>() to exercise the server→client callback path against the real .NET Message.Client reach-back.

Wire-shape tests

tests/wire/test_dotnet_compatibility.py asserts our serialized JSON against the .NET schema literally (field sets, types, TimeoutInSeconds is a non-null double, etc.) — catches a class of regression at unit-test time that previously only surfaced when the live integration suite ran. This came out of finding (and fixing) an actual mismatch: TimeoutInSeconds: null made Newtonsoft drop the entire Request silently.

CI / pipeline redesign

  • Parameterized publish: publishNuGet, publishNpm, publishPyPI (all default false). Default CI runs build + test only. Old behavior of always-pushing NuGet on every branch and always-popping an approval prompt is gone — and a rejected approval no longer leaves the run as Failed.
  • Parameterized build (opt-out): buildNuGet, buildNpm, buildPython (all default true).
  • Per-technology Build stages so NuGet / NPM / Python race to the finish line independently.
  • reuseArtifactsFromBuildId parameter lets a Publish stage replay against a prior successful build, skipping Build entirely.
  • Job names shifted from environment-centric (".NET on Windows", "node.js on Ubuntu") to deliverable-centric ("NuGet — .NET on Windows", "NPM — Node + Web on Linux (test-only)", "Python — Windows"). The (test-only) marker disambiguates the Linux matrix runs from the artifact-producing Windows ones.
  • New environments NuGet-Packages and PyPI-Packages mirror the existing NPM-Packages approval shape (same approver, same 12h timeout).

Feed swap + Supply Chain Guard bypass

  • npm install customFeed moved from the org-level npm-packages (managed outside CoreIpc) to a project-scoped uipath-ipc-deps (CoreIpc-owned, mirrors npmjs.org + PyPI + NuGet).
  • Pipeline-level SCG_KILL_SWITCH=true to bypass the org-wide Aikido Safe Chain shim, which was interfering with npm tarball downloads. Comments in azp-start.yaml and azp-nodejs.yaml document the trail (with #devops Slack links) so the next person on this doesn't redo the dead-ends.

NPM publish

  • Now writes to uipath-ipc-deps first (uses the pipeline's built-in identity — no PAT). GitHub Packages stays wired up as continueOnError: true until the platform team ships the post-Mini-Shai-Hulud pipeline-auth replacement (tracked in Liviu Bud's #dev thread).
  • Runs of Publish_NPM finish as "Succeeded with issues" until the GitHub side resolves — packages still ship to uipath-ipc-deps.

Test plan

  • CI green on Windows + Linux for Build (all jobs).
  • 111 non-.NET + 11 .NET-interop Python tests pass (122 total) locally (pytest) and in CI.
  • Client RPC over Named Pipe + TCP, incl. RemoteException propagation and task-based cancellation.
  • Server→client callbacks: 3 round-trips against the .NET ICallbackTester (real m.Client.GetCallback<IClientCallback>()) — test_dotnet_interop.py#L136-L183.
  • Server (IpcServer): TCP + named-pipe loopback (Python client ↔ Python server), lifecycle, connection-count pruning — test_ipc_server.py.
  • Handler reach-back: full-duplex re-entrancy E2E — client → server handler → get_callback → client callback → back — test_ipc_server.py#L196-L242; plus Message-injection / get_callback units — test_message_injection.py.
  • Close-callback hook (server connection pruning) — test_connection.py#L133-L173.
  • Publish_PyPI publishes the wheel to uipath-ipc-deps, version stamped from UiPath.CoreIpc + build number (PEP 440 local segment, e.g. 2.5.1+<build>).
  • Consumer-side install round-trips: uipath-robot-client (separate repo, own venv, index-url = uipath-ipc-deps) pins uipath-ipc>=2.5.1+20260611.6 and its 100-test suite passes against the published wheel.
  • .NET-client ↔ Python-IpcServer interop (reverse of IpcSample.PythonClientTestServer): a real .NET client drives the Python server — direct calls, reach-back into a .NET-hosted callback, and a RemoteException round-trip — test_dotnet_client_interop.py + IpcSample.PythonServerTestClient. Surfaced two dispatch fixes: bind wire args to the handler signature (tolerate a .NET client's trailing CancellationToken), and close the writer on peer disconnect (transport-leak fix).
  • High-effort code-review pass (7 finder angles) — fixed a real bug (serve_forever() returned immediately for Windows named pipes, tearing the server down), made Message injection keyword-based so keyword-only / Optional[Message] params inject correctly, unified connection teardown (now cancels in-flight handlers on peer disconnect), and dropped the _ConnectionInvoker shim. 4 regression tests added.
  • (deferred) Pre-existing .NET flake SystemTestsOverTcp.NotPassingAnOptionalMessage_ShouldWork (tight 800ms timeout, pre-dates this PR). Tracked separately.
  • (deferred) Restore GitHub Packages NPM publish + remove continueOnError once platform team ships sanctioned pipeline auth.
  • (deferred) Re-enable Safe Chain Guard once DevOps fixes the underlying shim.
  • Deleted src/Clients/python/_attempt0/ (the reference port; shipping impl is src/Clients/python/uipath-ipc/).

🤖 Generated with Claude Code

eduard-dumitru and others added 24 commits March 31, 2026 20:35
Implements a Python port of the CoreIpc framework, wire-compatible
with the .NET server/client. Includes RPC server and client with
TCP and Named Pipe transports, asyncio-based, zero mandatory deps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add playground/interop_with_dotnet.py that starts the .NET
IpcSample.ConsoleServer and calls IComputingService + ISystemService
from the Python client over named pipes.

Fix named pipe client to use ProactorEventLoop's native pipe I/O
instead of blocking win32file calls in an executor, which deadlocked
on concurrent read/write with a non-overlapped handle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Preserves the prior client+server sketch as a reference while we rebuild
the Python client from scratch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empty package skeleton (pyproject.toml + README + __init__) ready for the
phased port. Built with hatchling, requires Python >= 3.10.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add pytest as a dev extra in pyproject.toml + [tool.pytest.ini_options]
- Configure VS pyproj to use pytest as the test framework
- Add tests/ folder with a smoke test that verifies the package imports
- Update CoreIpc.sln to drop the old Py projects and add uipath-ipc

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the explicit <Compile> and <Folder> lists with a single
**\*.py glob (excluding .venv, __pycache__, build, dist, egg-info).
Manage files via the filesystem to keep the glob intact; PTVS will
rewrite the entry on UI-driven Add/Remove.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces uipath_ipc.wire with the four wire message types (Request,
Response, CancellationRequest, Error) and the MessageType enum. All
DTOs are frozen+slotted dataclasses with explicit to_dict/from_dict
(snake_case <-> PascalCase) and to_json/from_json convenience methods.

Covers the .NET wire-format gotcha that Request.Parameters is a list
of *already JSON-encoded* strings, one per argument.

14 tests in tests/wire/test_messages.py verify round-tripping and
match captured .NET-shape JSON payloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5-byte header (uint8 MessageType + int32 LE PayloadLength) + payload.
read_frame and write_frame operate against asyncio.StreamReader and
a structural FrameWriter protocol (any object with write/drain).

Also pulls in pytest-asyncio as a dev extra and enables asyncio_mode=auto
so async test funcs run without per-test markers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-platform: \<server>\pipe\<name> on Windows (via ProactorEventLoop's
create_pipe_connection), /tmp/CoreFxPipe_<name> Unix Domain Socket on
POSIX (matches .NET's NamedPipeClient cross-platform convention).

ClientTransport ABC abstracts the connect step, returning an
(asyncio.StreamReader, asyncio.StreamWriter) pair that downstream layers
(connection, proxy) consume. Transport instances are frozen+slotted
dataclasses; each connect() opens a fresh stream.

Top-level package re-exports ClientTransport + NamedPipeClientTransport.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps one (StreamReader, StreamWriter) pair with a background receive
loop that decodes frames and resolves pending response futures by
Request.id. send_request(req) sends a Request frame and awaits the
matching Response.

Supports `async with` for lifecycle management. Failure modes covered:
underlying-stream close fails all in-flight futures; send on a closed
connection raises ConnectionError.

Exception translation (Error -> RemoteException) and cancellation
forwarding (CancellationRequest on caller cancel) are deferred to
Phase C.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
IpcClient lazily opens an IpcConnection over the configured
ClientTransport; reused across calls. get_proxy(contract) returns a
proxy that satisfies the contract type. __getattr__ on the proxy
intercepts method access — each call json.dumps each positional arg
into Request.Parameters, sends the Request, awaits the matching
Response, json.loads(Data) for non-null Data (or returns None).

Server-returned Errors raise RemoteException (placeholder — Phase C.1
will refine the exception model: chain, type-name mapping).

Keyword arguments are not supported (.NET wire is positional only);
unknown method names raise AttributeError up front.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RemoteException now exposes message, type_name, stack_trace, and inner
as first-class attributes. from_error(error) walks the nested wire
Error chain producing a matching RemoteException chain, and sets
__cause__ so Python tracebacks display the full chain naturally.

str(exc) renders as "[Type] Message" when type is known.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On asyncio.CancelledError during send_request, IpcConnection fires off
a best-effort CancellationRequest frame (matching the original
Request.id) before re-raising. The send is a background task so the
caller's cancellation propagates immediately and the message goes out
asynchronously on the same writer.

Failures during the cancellation send are swallowed (writer may already
be closing). The original CancelledError reaches the caller intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
IpcClient(transport, request_timeout=5.0) configures a single knob
that both:
  - sets Request.TimeoutInSeconds on every outgoing call (server-side
    deadline), and
  - wraps each proxy call in asyncio.wait_for(...) (client-side
    deadline → asyncio.TimeoutError).

When the client-side timeout fires, asyncio.wait_for cancels
send_request, which triggers the existing C.2 cancellation forwarding
— the server receives a CancellationRequest matching the timed-out id.

Per-call override is left to the caller via `async with asyncio.timeout(t)`
or `asyncio.wait_for(...)`. No timeout parameter on signatures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
IpcConnection's receive loop now sets is_closed=True in a finally,
regardless of which path exited (clean EOF, OSError, unexpected
exception). IpcClient._ensure_connected sees the dead connection,
acloses it cleanly (idempotent), and re-dials via the transport.

The proxy instance is stable across reconnects — same get_proxy
result keeps working after the underlying stream is replaced.

In-flight calls when the drop happens still propagate the underlying
error (no silent retry). Auto-reconnect only fires on the *next* call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps asyncio.open_connection(host, port) behind the same
ClientTransport interface. Same shape as NamedPipeClientTransport —
frozen+slotted dataclass, connect() returns the standard
(StreamReader, StreamWriter) pair.

Includes a loopback smoke test that spins up an asyncio TCP server,
connects, and exchanges bytes — covering the actual networking path
in addition to the constructor/immutability unit tests.

Re-exported at the top-level package alongside the named-pipe one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README.md replaces the stub; mirrors the .NET README structure
  (install, quick start, contracts, cancellation/timeouts/errors,
  auto-reconnect, transports, what's out of scope) but Python-idiomatic
  throughout.
- src/uipath_ipc/py.typed (PEP 561 marker) — signals to mypy/pyright
  that this package ships inline type information.
- pyproject.toml: explicit [tool.hatch.build.targets.wheel].packages
  so hatchling reliably picks up the src layout and includes py.typed.
- Smoke tests now verify the documented public surface stays exported
  and that py.typed travels into the package.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tests/integration/ holds tests that exercise the Python client against
the real IpcSample.ConsoleServer. A session-scoped fixture launches
`dotnet run --framework net6.0`, waits for "Server started" on stdout,
yields, then signals CTRL_BREAK (Win) / SIGINT (POSIX) to shut it down.

Gated behind `--integration`. Default `pytest` skips them so the unit
loop stays fast (62 passed, 7 skipped, ~0.36s). Run with
`pytest --integration` to exercise the live interop path.

Coverage: AddFloats, MultiplyInts, EchoString, AddComplexNumbers,
DivideByZero (verifying RemoteException with type_name), and
multi-call reuse on a single client.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The --integration flag becomes --no-integration. Integration tests
now run as part of the default `pytest` invocation; pass
--no-integration to skip them.

VS Test Explorer's Run All will now include the .NET interop suite
(launching IpcSample.ConsoleServer via dotnet run). First cold run
incurs the dotnet build cost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
.NET's NamedPipeServerStream pattern creates pipe instances on demand
— there's a small window between accepting one connection and creating
the next during which CreateFile returns ERROR_FILE_NOT_FOUND. This
shows up as `FileNotFoundError: [WinError 2]` for clients connecting
in the wrong moment (test sessions, server restarts, deploys).

NamedPipeClientTransport._connect_windows now retries on
FileNotFoundError with a bounded backoff (total ~1.85s across 6 tries)
before giving up. All other errors still propagate immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds IpcSample.PythonClientTestServer/ — a net8.0 console host purpose-
built for the Python integration suite:
  - AddConsole() logging so handler activity is visible.
  - Simple callback-free service implementations (the existing
    IpcSample.ConsoleServer's MultiplyInts depends on a client-side
    callback, unusable from a callback-less Python client).
  - Stable "READY pipe=<name>" startup marker.
  - Pipe name configurable via CLI arg (defaults to "uipath-ipc-py-test").

The Python integration fixture launches this project, runs a background
thread to drain the server's stdout, and dumps the full transcript at
session teardown for diagnostics.

Fixes a Python-side wire bug: .NET Request.TimeoutInSeconds is a
non-nullable `double`, with 0 as the "no timeout, use default" sentinel.
Emitting JSON null on this field made Newtonsoft.Json reject the entire
Request during constructor binding and the server silently dropped the
connection. Request.to_dict now emits 0.0 in place of None;
from_dict symmetrically decodes 0/0.0 back to None.

Adds tests/wire/test_dotnet_compatibility.py — 14 tests asserting the
serialized wire shape literally matches the .NET schema
(UiPath.CoreIpc/Wire/Dtos.cs). Catches this class of regression at
unit-test time without needing the integration suite to run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VS's Python Tools (and other IDE debuggers) need debugpy on the active
interpreter to launch a debug session. Without it, "Debug Test" in VS
Test Explorer fails with the vague "Path to debug adapter executable
not specified" dialog.

Putting it in [project.optional-dependencies].dev means a fresh
`pip install -e ".[dev]"` after clone Just Works for debugging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two CI-side changes needed before the Python work can land green:

- Switch Npm@1's customFeed from the org-level `npm-packages` feed
  (managed outside CoreIpc) to a project-scoped `uipath-ipc-deps`
  (CoreIpc/9a5bdfb1-...). Same npmjs.org upstream, project ownership,
  PyPI upstream already enabled for the eventual Python publishing.

- Disable the org-wide Safe Chain Guard pipeline decorator via
  SCG_KILL_SWITCH=true at pipeline scope. The Aikido shim that SCG
  injects ahead of every npm/python invocation started failing
  installs with Azure-Storage-SAS-shaped 403s (last green CoreIpc
  build was 2026-04-30, after the SCG rollout). Pipeline scope is
  required — task-env scope is too late, the decorator runs in
  pre-job. CoreIpc temporarily opts out of SCG-side malware scanning;
  revisit when the DevOps fix lands.

See azp-nodejs.yaml's inline comment for the full Slack-referenced
story so the next person on this trail doesn't repeat the dead-ends.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eduard-dumitru and others added 3 commits May 29, 2026 00:04
Reshapes the pipeline so:

- Publishing is opt-in via parameters (`publishNuGet`, `publishNpm`,
  default false). Default runs build + test, never push. When a
  publish parameter is true, its stage runs and gates on its
  environment's approval check.
- NuGet now follows the same approval-gated pattern as NPM, via a
  new `NuGet-Packages` environment that mirrors `NPM-Packages`'
  approval check. The `dotnet nuget push` moves out of
  azp-dotnet-dist.yaml (which built+pushed unconditionally) into a
  new `azp-nuget.publish.steps.yaml` under the gated stage.
- Publish stages can replay against a previous successful build via
  `reuseArtifactsFromBuildId`. When set, the Build stage is Skipped
  (not Failed) and the Publish stages download from the specified
  build. When unset, both behave as before.
- Job names move from environment-centric (".NET on Windows",
  "node.js on Windows", "node.js on Ubuntu") to deliverable-centric
  ("NuGet — .NET on Windows", "NPM — Node + Web on Windows",
  "NPM — Node + Web on Linux (test-only)"). The "test-only" marker
  signals the Linux job is a cross-platform check, not a second
  source of artifacts.
- Rejecting an approval no longer leaves the run as Failed — the
  Publish stages start in `Skipped` state when their parameter is
  false, so the rejection-as-failure footgun is gone.

Out of this pass: Python jobs in Build, Python publish stage, and any
move of `PublishSymbols` into the gated NuGet publish — left as
follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reshape the NPM publish so the project-scoped Azure Artifacts feed
(uipath-ipc-deps) is the primary, always-working target — the pipeline's
build-service identity is already an administrator on it, no PAT
rotation involved.

The existing GitHub Packages publish stays wired up but is marked
continueOnError: true. It's currently expected to fail: post Mini
Shai-Hulud (2026-05-11/12 npm supply-chain incident), UiPath revoked
classic PATs org-wide and migrated everyone to fine-grained PATs, which
don't expose the Packages permission at org level. Per Liviu Bud's
2026-05-25 #dev announcement, a sanctioned pipeline-auth replacement
is in progress but not yet available.

When the platform team ships the replacement, updating the PublishNPM
service connection and dropping continueOnError will restore the
GitHub Packages publish without any other code change.

Until then, runs of Publish_NPM finish as "Succeeded with issues"
rather than Failed — packages still ship to uipath-ipc-deps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous attempt used publishFeedCombined which the Npm@1 task ignored,
making it fall back to a default registry URL (uipath.pkgs.visualstudio
.com/_packaging/npm/registry/ — no project, no feed id) and 404 on PUT.

The correct input name on the task is publishFeed. The value format
"project/feedId" stays the same as we used for customFeed on the
install side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eduard-dumitru and others added 10 commits May 29, 2026 01:04
Build stage gains two parallel jobs:
- Python_Windows (windows-2022): installs Python 3.12, pip installs the
  package with [dev] extras, runs pytest (unit + integration against
  the dedicated .NET test server), then python -m build to produce
  wheel + sdist, published as the "Python package" artifact.
- Python_Linux (ubuntu-22.04): same install + pytest run; test-only,
  no artifact. Mirrors the NPM cross-platform pattern.

New Publish_PyPI stage, gated on a new `publishPyPI` parameter
(default false), uploads wheel + sdist to the project-scoped Azure
Artifacts feed `uipath-ipc-deps` via twine. Approval-gated by the
new `PyPI-Packages` environment, mirroring the NuGet/NPM pattern
(same approver, same 12h timeout).

Templates added:
- azp-python.yaml         (install + tests)
- azp-python-dist.yaml    (build wheel + sdist, publish artifact)
- azp-python.publish.steps.yaml  (twine upload + reuseArtifactsFromBuildId)

reuseArtifactsFromBuildId works for Python the same way as for the
NuGet/NPM stages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Linux CI run of tests/integration/test_add_floats failed with
asyncio.open_unix_connection raising FileNotFoundError on
/tmp/CoreFxPipe_uipath-ipc-py-test — the .NET test server prints its
"READY" marker before the accept loop has actually bound the Unix
Domain Socket file. Only the first integration test of the session
hits the window; by the time the second runs, the UDS is up.

The Windows _connect_windows already has a bounded retry loop on
FileNotFoundError for the same class of race. Refactor: shared
_CONNECT_RETRY_DELAYS now covers both code paths; _connect_posix
gets the same retry shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Azure Pipelines's UI marks string parameters with empty defaults as
visually required, even when our condition logic accepts empty. Use
'0' as the default sentinel value meaning "no reuse, run Build
normally" — natural to type, clearly not a real build id.

Conditions in all four affected files now accept both '0' and ''
as no-reuse for backward compatibility with any in-flight runs
parameterized the old way.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Until now uipath-ipc shipped with the literal "0.1.0" from
pyproject.toml. Bring it in line with the other artifacts: the version
comes from $(FullVersion), which azp-initialization.yaml computes from
UiPath.CoreIpc.csproj plus $(Build.BuildNumber).

PEP 440 doesn't allow .NET-style pre-release suffixes ("2.5.1-20260528-08"),
so the mapping is:
  - Release version "2.5.1"               -> "2.5.1"             (unchanged)
  - Pre-release "2.5.1-20260528-08"       -> "2.5.1+20260528.08" (local version)

Implementation: a tiny src/CI/stamp-python-version.py rewrites the
version line in pyproject.toml right before `python -m build` runs.
Tests already happen against the editable install (which keeps the
pre-stamp version) — they don't care about the value, just that the
package is importable.

Also wires azp-initialization.yaml into both Python jobs (previously
only used by .NET/NPM jobs), so $(FullVersion) is defined when the
stamper runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each technology's Build now runs as its own stage with `dependsOn: []`,
so all three race in parallel. Each Publish_X depends only on its
matching Build_X, so a slow technology (e.g. Python integration tests
against the .NET test server) doesn't gate a fast one (NuGet) from
publishing as soon as it's ready.

Matrix jobs (NPM Windows + Linux, Python Windows + Linux) stay
together inside their stage as sibling jobs — the artifact-producing
Windows one and the test-only Linux one finish around the same time,
so there's no benefit to splitting them further.

No change to the publish-gating, environment approvals, or the
reuseArtifactsFromBuildId behaviour. The only externally visible
difference is wall-clock time and the stage graph in the run UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new boolean parameters, all default true. Unchecking one
compile-time excludes its Build_X stage from the run — useful when
you only care about one technology and don't want to wait for the
others (or for selective re-runs).

The matching Publish_X stage handles the missing Build_X by switching
its dependsOn to [] at compile time. Combined with
reuseArtifactsFromBuildId, this lets you publish-without-building from
a specific prior build's artifacts, even if some technologies aren't
in the current run at all.

Parameter ordering on the queue-time form: Build_* first, then
Publish_*, then reuseArtifactsFromBuildId — matches the natural
top-to-bottom reading of "what should this run do?".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The .NET server can already call back into clients via
`m.Client.GetCallback<TContract>()`. This adds the matching path on the
Python side: an `IpcClient` constructed with `callbacks={Contract: impl}`
hosts the contract; inbound REQUEST frames are dispatched to the
registered instance and a Response frame is written back.

Wire/dispatch details:
- IpcConnection now owns a write lock (concurrent outbound responses
  from multiple callback handlers stay frame-aligned) and a handler-task
  registry keyed by request id.
- Incoming REQUEST: deserialize, look up endpoint by interface name
  (Contract.__name__), look up method by name, json.loads each parameter
  individually (matches the .NET wire convention), invoke (awaitable or
  sync), encode the result into Response.data.
- Handler exceptions: build Response.Error with type_name / message /
  stack_trace; the .NET side raises RemoteException as normal.
- Incoming CANCELLATION_REQUEST: cancels the matching handler task; the
  emitted error mimics .NET's OperationCanceledException so the server
  sees what it would from any other client.
- Callback methods must NOT declare CancellationToken parameters — the
  server-side caller doesn't include CT in the wire Parameters array
  (matches the existing .NET IComputingCallback convention).

Tests:
- tests/client/test_callbacks.py — 7 unit tests covering happy path,
  multi-arg, concurrent inbound requests, handler exception → error
  response, unknown endpoint, unknown method, server cancellation.
- tests/integration/test_dotnet_interop.py — 3 round-trip tests against
  the new ICallbackTester on the .NET sample server.
- IpcSample.PythonClientTestServer gains IClientCallback /
  ICallbackTester / CallbackTester so the Python integration tests have
  a real server to bounce callbacks off of.

Version bumped to 0.2.0 (callbacks are an additive feature; the
client-only-no-callback API from 0.1.0 is unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The proxy did `if resp.data is None: return None` then `json.loads(resp.data)`.
But void / fire-and-forget operations answer with an empty Data *string*
(not null) — e.g. .NET CoreIpc's response for a Task-returning method — so
json.loads("") raised JSONDecodeError. This surfaced in a consumer as a
crash on IUserOperations.Subscribe() (a void op): the call succeeds on the
wire, but parsing its empty Data blew up.

Fix: `if not resp.data: return None` (covers null and empty string). Adds
test_proxy_empty_data_return alongside the existing void (null) test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add server-side support so Python can host services that a .NET or Python
client calls — the inbound half of the duplex protocol, mirroring the
existing client. A server is a thin listen/accept layer over the symmetric
IpcConnection whose callbacks dict is the set of hosted services.

- transport: ServerTransport ABC + ServerHandle protocol; TcpServerTransport
  (asyncio.start_server) and NamedPipeServerTransport (start_serving_pipe on
  Windows, start_unix_server at /tmp/CoreFxPipe_<name> on POSIX).
- IpcConnection.add_close_callback: fire-once close hook (explicit aclose or
  peer disconnect) so the server can prune dead connections.
- IpcServer(transport, services): one IpcConnection per accepted client,
  duck-typed dispatch by contract __name__; start/serve_forever/aclose +
  async context manager; connection_count introspection.
- Export IpcServer, ServerTransport, {Tcp,NamedPipe}ServerTransport.

Tests: TCP loopback (call/void/error/concurrent-clients/conn-count),
named-pipe loopback (real Windows pipe, Proactor loop), lifecycle, and
close-callback unit tests. 101 passed.

Server->client calls from inside a handler are deferred to a follow-up
(needs a per-connection handle exposed to the service).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Handlers can now call their specific caller back mid-request — the Python
analog of .NET's `m.Client.GetCallback<T>()`. This is what makes the new
server useful for the executor-host direction (a hosted service pushing
work back to the connected client).

- message.py: `Message`/`Message[T]` + `IClient` protocol. A service or
  callback method opts in by declaring a `Message` parameter; the wire
  never carries it (mirrors .NET's trailing-Message convention).
- IpcConnection.get_callback(Contract): a proxy bound to THIS connection
  (via a small _ConnectionInvoker adapter over _IpcProxy), so reach-back
  needs no owning IpcClient. The connection now carries a default
  request_timeout for these proxies.
- _invoke_callback: signature-aware arg binding injects a Message (with
  .client = the connection) into any Message-typed parameter; the
  no-Message fast path is unchanged. Message-param detection is cached
  per function (WeakKeyDictionary) and tolerates string annotations from
  `from __future__ import annotations`.
- Thread request_timeout: IpcClient -> connection, and a new optional
  IpcServer(request_timeout=...) -> each accepted connection.
- Export Message, IClient.

Works symmetrically for client-hosted callbacks and server-hosted
services, since both dispatch through the one IpcConnection.

Tests: Python<->Python full-duplex re-entrancy (client -> server handler
-> get_callback -> client callback -> back), Message-injection units, and
get_callback wire-format unit. 107 unit + 10 .NET integration pass (the
.NET callback tests exercise real m.Client.GetCallback into Python).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@eduard-dumitru eduard-dumitru changed the title Add Python client for UiPath.Ipc Add Python client and server for UiPath.Ipc Jun 9, 2026
eduard-dumitru and others added 10 commits June 9, 2026 20:58
…atch fixes

Prove a real .NET client drives a Python-hosted IpcServer end to end — the
reverse of IpcSample.PythonClientTestServer. Two dispatch fixes fell out of
making it pass against an idiomatic .NET client.

New .NET client (src/IpcSample.PythonServerTestClient): connects over a named
pipe and checks direct calls (AddFloats/EchoString/MultiplyInts), an error
round-trip (RemoteException), and handler-initiated reach-back — the Python
handler calls back into THIS client's IClientCallback via
message.client.get_callback(...). Reports per-check PASS/FAIL + exit code.

New integration test (tests/integration/test_dotnet_client_interop.py): hosts
the Python IpcServer in-process on a named pipe and launches the .NET client
via `dotnet run`; awaiting the subprocess keeps the loop serving. Asserts
exit 0, the ALL TESTS PASSED marker, and the in-process service's recorded
calls. Skips without `dotnet` / off a ProactorEventLoop.

Dispatch fixes (connection.py):
- Bind wire args positionally to the handler signature and IGNORE extra
  trailing args. An idiomatic .NET client contract carries a trailing
  CancellationToken, which ServiceClient.SerializeArguments sends as one
  extra wire parameter (JSON "") — so Python handlers must tolerate it, the
  way the .NET server tolerates optional trailing Message/CT params. Replaces
  the no-Message fast path with a cached per-function binding plan.
- Close the writer when the receive loop ends. On peer disconnect the
  connection is pruned from its IpcServer, so aclose() never runs for it;
  without this the accepted pipe transport leaked until GC (surfaced as a
  ResourceWarning on Windows).

118 tests pass (107 non-.NET + 11 .NET-interop, incl. the new reverse test).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Preserved during the initial port as a reference; the shipping
implementation lives in src/Clients/python/uipath-ipc/. No longer needed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes from a high-effort review of the server/reach-back/interop work.

- serve_forever() no longer returns immediately for a named-pipe server on
  Windows. _PipeServerHandle.wait_closed() was a no-op, so the documented
  `async with server: await server.serve_forever()` (which uses a pipe) tore
  the server down at once. It now blocks on an Event set by close(), matching
  asyncio.Server.wait_closed() semantics. Only TCP was previously tested.
- Message injection is now by KEYWORD, so a keyword-only `Message` parameter
  is injected (was tagged "skip" and never filled -> TypeError) and a trailing
  Message is injected even when an optional positional arg is omitted (the old
  positional `break` could skip it).
- _is_message_annotation recognizes Optional[Message] / `Message | None` (was
  tagged a wire param -> not injected, consumed a wire slot).
- Unify connection teardown: aclose() and the receive-loop finally now share
  one idempotent _teardown() that also cancels in-flight incoming handlers
  (the finally previously didn't), removing the divergence/duplication.
- Drop the _ConnectionInvoker shim: IpcConnection gains _ensure_connected(),
  so get_callback() builds _IpcProxy(self, ...) directly.
- Hoist the per-request "missing arg" sentinel to a module-level _MISSING.
- Honest _bind_handler_args docstring re: positional CT-tolerance.

Noted (not changed): Message[T] wire-payload binding (documented follow-up),
endpoint-name / posix-path / test-harness duplication, sequential aclose drain.

New regression tests: serve_forever blocks for a named pipe; keyword-only and
Optional Message injection; extra trailing wire arg (Ct placeholder) ignored.
122 tests pass (111 non-.NET + 11 .NET-interop).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two entry points replace the single parameterized azp-start.yaml:

- azp-ci.yaml — always builds & tests everything (NuGet / NPM / Python) in
  parallel and publishes each as a pipeline artifact. No parameters.
- azp-publish.yaml — manual, parameterized: a required `buildId` (the CI run
  to publish from) plus publishNuGet/publishNpm/publishPyPI checkboxes (all
  default-on). Compile-time guards fail fast if buildId is missing or no
  package is selected. Each selected package is pulled from the CI build and
  pushed behind its existing approval-gated environment.

The publish-step templates gain a `sourcePipelineId` parameter (default
$(System.DefinitionId), preserving the old single-pipeline reuse path); the
Publish pipeline passes the CI definition id via a `ci` pipeline resource so
the artifact download targets the CI build, not the Publish run.

azp-start.yaml is left in place for the transition; it can be deleted once the
two new pipelines are provisioned.

SETUP: in azp-publish.yaml set the `ci` resource `source:` to the CI
pipeline's name (currently a placeholder), and authorize the resource on the
first Publish run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The CI pipeline definition (and every branch) already points at
src/CI/azp-start.yaml, so keep that stable path as the CI entry and let its
content be the param-free always-build pipeline. This way CI transitions
cleanly as branches merge — no repointing, no "YAML file not found", and the
"Run" panel stops showing the old build/publish parameters once a branch
carries the new content.

- azp-start.yaml: now the no-parameters CI pipeline (was the combined
  parameterized build+publish). Builds NuGet/NPM/Python in parallel and
  publishes pipeline artifacts; pushes to no feed.
- azp-ci.yaml: removed — its content moved into azp-start.yaml.
- azp-publish.yaml: unchanged behavior; comments updated to reference
  azp-start.yaml as the CI source.

Publishing remains the separate manual azp-publish.yaml (buildId + per-package
checkboxes), which consumes a CI build's artifacts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Parity with .NET/TS: a `Message` argument can carry its own timeout for a
single call. The proxy reads `Message.request_timeout` (when set) as the
per-call timeout — overriding the client-wide default for that call only —
and uses it for both the client-side `asyncio.wait_for` deadline and the wire
`TimeoutInSeconds`. The Message is serialized to its wire form (`{}` for a
payload-less Message, `{"Payload": ...}` for `Message[T]`); `client` /
`request_timeout` stay transport-only. Non-Message args are unchanged.

Lets a caller do e.g. `await svc.HandleConsentCode(code, Message(
request_timeout=60))` while leaving the client default infinite — matching
the TS client's per-operation deadlines (40s default / 20-min install / 60s
consent / infinite sign-in).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…a param

The `resources.pipelines: ci` entry failed compile-time validation ("Pipeline
Resource ci Input Must be Valid") because its `source:` placeholder had to
match an existing CI pipeline by name. Replace it with a required
`ciPipelineId` parameter (the CI pipeline's definition id, from its URL
?definitionId=N); the publish-step templates already accept it as
`sourcePipelineId` for the cross-pipeline artifact download. No resource means
no name to match and no compile-time dependency on the CI pipeline existing.

Validation now requires both buildId and ciPipelineId.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Parity with .NET BeforeConnect / BeforeOutgoingCall / BeforeIncomingCall.

- hooks.py: `CallInfo` (endpoint, method_name, arguments) + the
  BeforeConnectHandler / BeforeCallHandler type aliases (sync or async).
- IpcClient(before_connect=, before_call=): before_connect is awaited before
  each (re)connect; before_call is awaited before each OUTGOING request with
  its CallInfo. Reach-back proxies are bound to a bare connection (no
  before_call), so callbacks skip it — matching .NET ("calls not callbacks").
- IpcServer(before_call=) -> IpcConnection.before_incoming_call: awaited
  before each INCOMING request is dispatched to a service.
- Either hook raising aborts the guarded connect/call (server side surfaces it
  as an Error response) — so hooks can gate, not just observe.
- Export CallInfo + handler types.

Tests: before_connect fires before connect; before_call fires with the right
CallInfo and aborts on raise; server before_call fires before dispatch and
its raise becomes an Error response. 118 unit + 11 .NET-interop pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The CI pipeline now exists (named "CI"), so the resource validates. Drop the
interim ciPipelineId parameter: the publish stages resolve the CI definition
via $(resources.pipeline.ci.pipelineID) again, and the run dialog is back to
buildId + the three package checkboxes. The resource also grants
cross-pipeline artifact access (authorize on first run if prompted).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…healing)

Per-call timeout, triangulated against the real .NET server (2s default):
- control: Wait(3s) with no Message dies at the server default
- override: WaitWithMessage(3s, Message(request_timeout=10)) completes —
  the Message timeout rides the wire and beats the default
- deadline: WaitWithMessage(10s, Message(request_timeout=0.5)) raises
  asyncio.TimeoutError client-side in <2s
The test server gains WaitWithMessage(TimeSpan, Message, CT) — a slow method
with the Message slot the per-call mechanism requires.

before_call:
- outgoing (.NET parity test): the client hook sees TriggerEcho but NOT the
  inbound EchoToClient callback (mirrors BeforeCall_ShouldApplyToCallsButNot
  ToCallbacks)
- incoming from a real .NET client (reverse interop): the server hook sees
  every .NET-initiated call incl. FailWith, but NOT the server`s own
  outgoing Decorate reach-back
- incoming over real TCP loopback (non-.NET path)

before_connect — the killer app, emphasized two ways:
- test_self_healing.py: the hook spawns the real .NET server binary (built
  once, launched directly so kill() truly kills it). First call launches;
  healthy calls don`t refire; hard-kill the server; the next ordinary call
  relaunches it and succeeds — client-owned, self-healing server lifecycle.
- Python<->Python loopback: same launch/no-refire/aclose/relaunch sequence
  in-process over a named pipe.

120 unit + 16 .NET-interop tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@eduard-dumitru

Copy link
Copy Markdown
Collaborator Author

/azp run CI

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

… form)

Two refinements needed for exact TS/.NET timeout parity in consumers:

- INFINITE_REQUEST_TIMEOUT = -0.001: the wire rendition of .NET
  Timeout.InfiniteTimeSpan (-1 ms) — exactly what the TS client sends for
  Timeout.infiniteTimeSpan (sign-in / disconnect). A negative request_timeout
  now applies NO client-side deadline and rides the wire verbatim; the .NET
  server maps it back to an infinite timeout (verified e2e: a 3s operation
  survives the test server`s 2s default).
- Message(wire_body=dict): the rendition of a .NET Message *subclass*, whose
  own properties serialize at the top level (SignInParameters,
  InstallProcessParameters, HandleConsentCodeMessage, ...). wire_body is the
  argument`s exact wire form; mutually exclusive with payload.

Tests: late-response survival under a negative timeout, wire -0.001, top-
level wire_body serialization, exclusivity; e2e infinite-override against
the real .NET server. 123 unit + 17 .NET-interop pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
"""
header = await reader.readexactly(_HEADER_LEN)
msg_type_byte, payload_len = struct.unpack(_HEADER_FORMAT, header)
payload = await reader.readexactly(payload_len) if payload_len > 0 else b""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

payload_len is a signed int32 fed to readexactly with no bound — a header claiming ~2 GB OOMs the process, and a negative length silently desyncs the stream. .NET validates the length against _maxMessageSize before allocating (Connection.cs:260; 2 MB server default). Bound it here; ideally thread a configurable cap from IpcConnection to mirror MaxReceivedMessageSizeInMegabytes.

Suggested change
payload = await reader.readexactly(payload_len) if payload_len > 0 else b""
if not 0 <= payload_len <= 2 * 1024 * 1024: # match .NET's 2 MB server cap
raise ValueError(f"frame payload length {payload_len} out of bounds")
payload = await reader.readexactly(payload_len) if payload_len > 0 else b""

Claude said this was a problem on JS as well.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body


@property
def _posix_address(self) -> str:
return f"/tmp/CoreFxPipe_{self.pipe_name}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macOS interop: don't hardcode /tmp/ .NET builds this path as Path.Combine(Path.GetTempPath(), "CoreFxPipe_"), and GetTempPath() honors $TMPDIR (the TS port does too — Platform.ts:66-73). On macOS $TMPDIR is always set, so .NET binds under /var/folders/… while we look in /tmp/ and connect fails; vanilla Linux/Windows are unaffected. Same fix needed on the server property at line 136.

Suggested change
return f"/tmp/CoreFxPipe_{self.pipe_name}"
return os.path.join(os.environ.get("TMPDIR") or "/tmp", f"CoreFxPipe_{self.pipe_name}")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body


@property
def _posix_address(self) -> str:
return f"/tmp/CoreFxPipe_{self.pipe_name}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same $TMPDIR fix here on the server side — see the client property above.

Suggested change
return f"/tmp/CoreFxPipe_{self.pipe_name}"
return os.path.join(os.environ.get("TMPDIR") or "/tmp", f"CoreFxPipe_{self.pipe_name}")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body

self._handle_incoming_request(payload)
elif msg_type == MessageType.CANCELLATION_REQUEST:
self._handle_incoming_cancellation(payload)
# UPLOAD_REQUEST / DOWNLOAD_RESPONSE are not yet handled.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When an stream requests is initiated from the other side the stream frame silently desyncs the connection. Skipping UPLOAD_REQUEST/DOWNLOAD_RESPONSE leaves the trailing 8-byte length + raw stream bytes on the wire, so the next read_frame reads them as a header and every later frame is garbage. Streams are out of scope, but fail closed instead of mis-framing (the JS port errors the channel on unknown types — RpcChannel.ts:167).

Suggested change
# UPLOAD_REQUEST / DOWNLOAD_RESPONSE are not yet handled.
else:
# streams are out of scope; we can't consume the trailing
# length+bytes, so fail closed instead of desyncing
raise ValueError(f"unsupported message type {msg_type!r}")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body

raise
except (asyncio.IncompleteReadError, ConnectionResetError, OSError) as ex:
self._fail_pending(ex)
except Exception as ex: # noqa: BLE001 — surface anything unexpected via futures

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Receive loop swallows unexpected errors silently — add logging (and consider narrowing the blast radius)

_receive_loop's catch-all except Exception funnels every non-transport error into _fail_pending(ex) and then closes the connection, with no log anywhere on the failure path. The intent (per the # noqa: BLE001 comment) is to "surface anything unexpected via futures" — but that channel only works if a call happens to be in flight. The result is a few sharp edges:

  • Silent loss. If the error originates while parsing an incoming frame (_handle_incoming_request / _handle_incoming_cancellation run synchronously on the loop) and there are no pending outgoing requests, _fail_pending iterates an empty dict and the exception evaporates — the connection just drops with zero diagnostics.
  • Misattribution. If a call is in flight, an unrelated incoming-frame parse error (e.g. a malformed CancellationRequest) fails that call with a JSONDecodeError / KeyError that has nothing to do with it — looks like that call returned garbage.
  • Bugs masquerade as disconnects. A genuine programming error in the dispatch path is indistinguishable from a peer hangup.

For reference, the .NET receive loop logs both exits — Connection_ReceiveLoopEndedSuccessfully on clean EOF and Connection_ReceiveLoopFailed(DebugName, ex) in its catch (Connection.cs:267/271). We have no equivalent here.

Minimal ask: log on the failure path, and split the clean-EOF case from the unexpected case:

# module level
import logging
_logger = logging.getLogger(__name__)

# in _receive_loop:
        except (asyncio.IncompleteReadError, ConnectionResetError, OSError) as ex:
            _logger.debug("receive loop ended (transport closed): %r", ex)
            self._fail_pending(ex)
        except Exception as ex:  # noqa: BLE001
            # Unexpected: a protocol/parse error or a genuine bug. The futures
            # channel only surfaces this when a call is in flight, so log it.
            _logger.exception("receive loop failed")
            self._fail_pending(ex)

Optional, if you want to go further (happy to file as a follow-up):

  • Treat a payload parse error as recoverable rather than fatal — read_frame has already consumed the whole frame, so the stream is still aligned; only a true framing desync needs to tear the connection down. Logging + continue on a single bad payload would localize the blast radius.
  • Don't fail unrelated outgoing _pending futures with an error that came from an incoming frame; if the request id parsed, an Error response back to the peer is more honest than nuking local calls.

Not blocking — the fail-fast instinct (vs. hanging callers forever) is right. This is about observability and not letting one bad frame disappear without a trace. Relatedly, there's no test that feeds a malformed frame; worth adding one that asserts the intended behavior here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body

type_name="System.OperationCanceledException",
),
)
except BaseException as ex:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except BaseException here is broader than the C# behavior it mirrors — it swallows SystemExit / KeyboardInterrupt

The dedicated except asyncio.CancelledError just above is correct and necessary: CancelledError derives from BaseException (not Exception) since 3.8, so it must be caught explicitly to emit the System.OperationCanceledException wire type. (.NET gets this for free — its OperationCanceledException derives from Exception, so the server's single catch (Exception) handles cancellation as just another exception — Server.cs:103.)

But the follow-on except BaseException is wider than that catch (Exception). Once CancelledError is already handled above, the only extra types BaseException catches over Exception are SystemExit, KeyboardInterrupt, and GeneratorExit — and here they're converted into an Error response and never re-raised, so a process-termination signal raised inside a handler is silently turned into a wire error and swallowed. C#'s catch (Exception) never touches those (Ctrl+C is Console.CancelKeyPress, not an exception injected into running code), so this is strictly broader than the behavior we're matching.

The intent is presumably "always answer the peer so its pending future never hangs," which is reasonable. The best-of-both keeps that guarantee but still lets fatal signals propagate:

except BaseException as ex:
    resp = Response(request_id=req.id, error=Error(
        message=str(ex) or type(ex).__name__,
        type_name=type(ex).__name__, stack_trace=traceback.format_exc()))
    _logger.exception("callback %s.%s failed", req.endpoint, req.method_name)
    if isinstance(ex, (SystemExit, KeyboardInterrupt)):
        if not self._closed:
            try: await self._send_frame(MessageType.RESPONSE, resp.to_json().encode("utf-8"))
            except Exception: pass
        raise

Minor / non-blocking — under asyncio these signals are usually delivered to the loop driver rather than into a task, so it rarely triggers. Two related notes: (1) there's no log on this path either (same observability gap as the receive-loop comment), and (2) swallowing CancelledError without re-raising in the branch above is the usual asyncio smell — fine when the cancel is peer-initiated for this task, but on connection teardown you'd ideally re-raise (the self._closed check already guards the teardown send, so this is cosmetic).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body

Comment thread src/CI/azp-python.yaml Outdated
- task: UsePythonVersion@0
displayName: 'Use Python 3.12'
inputs:
versionSpec: '3.12'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI only runs Python 3.12, but pyproject.toml declares requires-python = ">=3.10" — and the suite does not pass on 3.10/3.11

Running pytest on Python 3.10 (the declared minimum) fails 3 tests:

  • tests/client/test_ipc_client.py::test_message_arg_sets_per_call_timeout
  • tests/client/test_ipc_client.py::test_message_arg_with_payload_serializes_payload
  • tests/client/test_timeout.py::test_request_includes_timeout_in_seconds_field

All three set a per-call/client timeout, so the proxy goes through asyncio.wait_for(conn.send_request(req), ...) (client/proxy.py:101). Each then does:

task = asyncio.create_task(svc.DoWork(Message(request_timeout=2.0)))
await asyncio.sleep(0)         # assumes ONE event-loop turn is enough
req = _sent_request(t.writer)  # buffer still empty -> JSONDecodeError / IndexError

On 3.10/3.11, asyncio.wait_for wraps the coroutine in a Task (ensure_future) that is scheduled after the test's own sleep(0) resumption, so the REQUEST frame is written one event-loop turn later than the test reads the buffer. On 3.12, wait_for was reimplemented to await the coroutine inline, so the write lands in the same turn. (The no-timeout tests take the plain await send_request path and pass on every version — which is why only these three, the timeout path, fail.)

This is a test-synchronization bug, not a product bug — the request is written correctly, just one turn later. Repro on 3.10.19:

no-timeout (direct await):    bytes after 1x sleep(0) = 105
per-call timeout (wait_for):  bytes after 1x sleep(0) = 0;  written after 1 extra turn = 109

Two fixes, ideally both:

  1. Make the 3 tests poll for the bytes instead of assuming a single turn — the codebase already has the right pattern in test_timeout_sends_cancellation_to_server (client/test_timeout.py:86):
    for _ in range(20):
        await asyncio.sleep(0)
        if len(t.writer.buffer) >= 5:
            break
  2. Add 3.10 (and 3.11) to this CI matrix, or bump requires-python to >=3.12 if 3.10 is not actually supported. As-is, CI runs only 3.12, so it can never catch a regression on the declared-minimum interpreter.

Not blocking the protocol work, but a consumer on 3.10 (the stated floor) hits this the moment they run the suite.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.Collections.Hashtable.body

eduard-dumitru and others added 3 commits June 12, 2026 11:53
…ogging, signals, 3.10/3.11)

All seven review comments from PR #125:

- framing: bound payload length to MAX_PAYLOAD_BYTES (2 MB, matching .NET
  MaxReceivedMessageSizeInMegabytes); negative or oversized lengths raise
  instead of OOMing / desyncing. Cap is a read_frame param for future
  configurability.
- named pipes (POSIX): honor $TMPDIR (fallback /tmp) in both client and
  server addresses — matches .NET Path.GetTempPath() / TS Platform.ts; fixes
  macOS interop where .NET binds under /var/folders.
- connection: fail closed on UPLOAD_REQUEST/DOWNLOAD_RESPONSE instead of
  silently skipping (their trailing stream bytes would desync the framing).
- connection: log receive-loop exits — debug for transport-closed, exception
  for unexpected (mirrors .NET Connection_ReceiveLoopFailed); malformed
  frames no longer vanish without a trace.
- connection: handlers still answer the peer on any failure, but
  SystemExit/KeyboardInterrupt are re-raised after the response (C#`s
  catch (Exception) never swallows those); failures are logged.
- tests: poll for the REQUEST frame instead of assuming one event-loop turn
  (asyncio.wait_for schedules a turn later on 3.10/3.11 than 3.12+).
- CI: Linux Python job is now a 3.10/3.11/3.12 matrix (versionSpec is a
  template parameter), so the declared requires-python floor is exercised.

131 unit + 17 .NET-interop tests pass (7 new).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…gaps

A request naming a known endpoint but unknown method previously threw
ArgumentOutOfRangeException BEFORE the inner try that produces error
responses — swallowed by the outer catch (log-only), so the client hung
until its RequestTimeout. And EndpointNotFoundException was only ever
tested via callbacks, never regular calls.

.NET:
- New UiPath.Ipc.MethodNotFoundException (mirrors EndpointNotFoundException;
  ServerDebugName/EndpointName/MethodName), sent via OnError exactly like
  the endpoint branch — unknown-method calls now fail fast with
  RemoteException matching Is<MethodNotFoundException>().
- Server.TryGetMethod: cache fast path unchanged; not-found stays uncached
  (no negative-cache growth from garbage method names); the generic-method
  AOORE is preserved bit-for-bit. Helpers gains GetInterfaceMethodOrDefault.
- Tests (SystemTests, all transports): ClientCallingInexistentEndpoint /
  ClientCallingInexistentMethod (decoy interface with a colliding Type.Name)
  / ServerCallingInexistentCallbackMethod (mirrors CallUnregisteredCallback).

Python parity:
- New EndpointNotFoundError / MethodNotFoundError (RuntimeError subclasses)
  raised by the dispatcher; they cross the wire as the .NET type names so
  .NET callers match with Is<T>(). Generic handler honors wire_type_name.
- Reverse interop check #6: .NET client calls a method missing on the
  Python server and asserts Is<MethodNotFoundException>().

69 SystemTests x2 TFMs, 131 Python unit + 17 interop pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The step occasionally wedges forever on hosted agents with no output (e.g.
build 12340525: 50+ min on a ~1-min step, zero log lines, undiagnosable;
both full local suites pass in ~33s). Make the next occurrence fast and
self-diagnosing:

- dotnet test --blame-hang --blame-hang-timeout 10m --blame-hang-dump-type
  mini: vstest aborts the run, names the in-flight test(s), and attaches
  mini dumps to the build.
- 20-min step timeout + 30-min job timeout as backstops (normal job ~5 min),
  so a wedge outside vstest can`t eat the 60-min default.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
async def EchoString(self, value: str) -> str: ...

@abstractmethod
async def ReverseBytes(self, bytes_: list[int]) -> list[int]: ...

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byte[] / BCL value types. byte[] round-trips silently wrong — .NET returns it base64, so the caller gets 'BAMCAQ==', not [4,3,2,1] (same shape issue for DateTime / Decimal / Guid). The robot client's _wire.py doesn't cover these either — only enums + nested dataclasses.

ISystemService.ReverseBytes is declared in the Python contract but no test actually calls it, and the obvious round-trip test wouldn't pass — for exactly the base64 reason above.

Question: the robot client had to hand-roll _wire.py (dataclass↔dict via fields + type hints, enum→value, PascalCase) to bridge all of this. Should we pull something like that into uipath-ipc itself as a base conversion layer — typed (de)serialization out of the box, instead of every consumer reinventing it?

async def test_add_complex_numbers(dotnet_server) -> None:
async with _new_client() as client:
svc = client.get_proxy(IComputingService)
a = {"I": 1.0, "J": 2.0}

@Radu-Niculae Radu-Niculae Jun 12, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key-mismatch silent loss is unguarded. A wrong, snake_case, typo'd, or nested-mismatched key is silently dropped and the field defaults, with no error (Person snake_case → {FirstName: None, LastName: None}).

small test of behavior:

class ISerializationProbe(ABC):
    @abstractmethod
    async def EchoPerson(self, p: dict) -> dict: ...

async def test_C_person_snake_case_silently_lost(dotnet_server) -> None:
    """Idiomatic snake_case keys can't be case-bridged (the underscore), so
    every field is silently dropped: the .NET Person comes back all-null, no
    error raised."""
    async with _client() as client:
        probe = client.get_proxy(ISerializationProbe)
        result = await probe.EchoPerson(
            {"first_name": "Ada", "last_name": "Lovelace"}
        )
        assert result == {"FirstName": None, "LastName": None}

.NET declares Person like this:

public sealed record Person
{
    public string? FirstName { get; init; }
    public string? LastName { get; init; }
    public override string ToString() =>
        $"FirstName={FirstName ?? "<null>"}, LastName={LastName ?? "<null>"}";
}

I think we should be more strict on deserialization, add required fields and type mismatch errors. Extra fields can be ignored for future version compatibility. On .NET strong typing should help us, but on python we might need to bring the _wire.py (or Claude suggested to use pydantic v2) into the lib to check type integrity (ex for dataclass) . What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants