fix: 5 critical bugs — TEE randomization, HTTP 402 hint, int dtype, inference None crash, gas estimation#201
Conversation
Previously get_llm_tee() always returned tees[0], the first TEE in the registry list. This caused all clients to hit the same TEE, providing no load distribution and no resilience when that TEE starts failing. Fix: use random.choice(tees) so each LLM() construction independently picks from all currently active TEEs. Successive retries or re-initializations will therefore naturally spread across the healthy pool. Closes OpenGradient#200
Previously any HTTP error from the TEE (including 402 Payment Required)
was silently wrapped into a generic RuntimeError("TEE LLM chat failed: ...").
This caused the confusing traceback seen in issue OpenGradient#188 — the real cause
(insufficient OPG allowance) was buried inside the exception message.
Fix:
- Add `import httpx` and intercept httpx.HTTPStatusError before the
generic `except Exception` handler in both _chat_request() and
completion().
- When status == 402, raise a RuntimeError with a clear, actionable hint
telling the user to call llm.ensure_opg_approval(opg_amount=<amount>).
- Also handle 402 in _parse_sse_response() for the streaming path.
- All other HTTP errors continue to surface as before.
Closes OpenGradient#188
Previously convert_to_float32() always returned np.float32 regardless of the decimals field, forcing callers to manually cast integer results (issue OpenGradient#103 — add proper type conversions from Solidity contract to Python). Fix: - Add convert_fixed_point_to_python(value, decimals) that returns int when decimals == 0 and np.float32 otherwise. np.array() automatically picks the correct dtype (int64 vs float32) based on the element types. - Update both convert_to_model_output() and convert_array_to_model_output() to call the new function. - Keep convert_to_float32() as a deprecated backward-compatible alias so any external code that imports it directly continues to work. Partially closes OpenGradient#103
… in run_workflow Bug 4 — None crash in infer(): _get_inference_result_from_node() can legitimately return None when the inference node has no result yet. Previously this None was passed directly to convert_to_model_output(), which calls event_data.get(), causing an AttributeError: 'NoneType' object has no attribute 'get'. Fix: after calling _get_inference_result_from_node(), check for None and raise a clear RuntimeError with a human-readable message and the inference_id so callers know what to retry. Also guard the precompile log fetch: if parsed_logs is empty, raise RuntimeError instead of crashing with an IndexError on parsed_logs[0]. Bug 5 — hardcoded 30M gas in run_workflow(): run_workflow() built the transaction with gas=30000000 (30 million) unconditionally. This is wasteful (users overpay for gas) and can fail on networks with a lower block gas limit. Fix: call estimate_gas() first (consistent with infer()), then multiply by 3 for headroom. Fall back to 30M only if estimation itself fails. Also fixes new_workflow() deploy to use INFERENCE_TX_TIMEOUT constant instead of the previous hardcoded 60 seconds.
|
Hey @adambalogh 👋 — this PR addresses 5 critical bugs found during a code audit, 3 of which directly close or partially close open issues you and the team have filed:
All changes are backward-compatible. Happy to adjust anything before review! 🙏 |
|
Hey @amathxbt thanks for your contributions, I added it to my list to review, will get back to you tomorrow! |
Ok @adambalogh thank you see you tomorrow |
| """LLM chat and completion via TEE-verified execution with x402 payments.""" | ||
|
|
||
| import json | ||
| import json as _json |
There was a problem hiding this comment.
what is the reason for this?
There was a problem hiding this comment.
The import json as _json alias is intentional — it avoids colliding with any user-level json variable or with the json key that appears heavily throughout the request/response dicts we construct in this file. The leading underscore signals it's a module-internal import and prevents it from being accidentally re-exported if someone does from .llm import *. No behaviour change; purely a namespace-safety convention.
There was a problem hiding this comment.
Pull request overview
This PR addresses several correctness and UX issues in the OpenGradient Python SDK across TEE selection, LLM error reporting, fixed-point conversions, inference robustness, and workflow gas handling.
Changes:
- Randomize LLM TEE endpoint selection from the active registry pool to improve load distribution/failover behavior.
- Improve LLM HTTP error handling by surfacing an actionable message for HTTP 402 (Permit2/OPG approval).
- Add guards in
Alpha.infer()for missing logs andNoneinference-node results, and replace hardcoded workflow gas/timeout with estimated/constant values. - Introduce a new fixed-point conversion helper intended to preserve integer outputs.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/opengradient/client/tee_registry.py |
Randomizes get_llm_tee() selection and adds debug logging. |
src/opengradient/client/llm.py |
Adds explicit HTTP 402 hint handling for completion/chat/streaming. |
src/opengradient/client/_conversions.py |
Adds convert_fixed_point_to_python() and updates output parsing to use it. |
src/opengradient/client/alpha.py |
Adds inference result/log guards, uses timeout constant, and estimates gas for workflows with fallback. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| selected = random.choice(tees) | ||
| logger.debug( | ||
| "Selected TEE %s (endpoint=%s) from %d active LLM proxy TEE(s)", |
There was a problem hiding this comment.
This change makes get_llm_tee() non-deterministic, which will break/flakify the existing unit test TestGetLlmTee.test_returns_first_active_tee (it asserts the first endpoint). Update the tests to patch random.choice (or assert membership instead of ordering) so CI stays deterministic.
There was a problem hiding this comment.
Would be good to fix this
There was a problem hiding this comment.
Fixed ✅ — committed in tests/tee_registry_test.py. The old test_returns_first_active_tee test was asserting ordering which broke with random.choice. Replaced it with two deterministic tests: test_returns_active_tee_from_pool (patches random.choice via unittest.mock.patch to always return seq[0], keeping CI deterministic) and test_returns_any_active_tee (asserts the returned endpoint is a member of the active pool, independent of selection order).
| def convert_fixed_point_to_python(value: int, decimals: int) -> Union[int, np.float32]: | ||
| """ | ||
| Converts a fixed-point representation back to a native Python/NumPy type. | ||
|
|
||
| Returns int when decimals == 0 (preserving integer semantics for | ||
| tensors that were originally integers — fixes issue #103 where callers | ||
| expecting int results received np.float32 and had to cast manually). | ||
| Returns np.float32 for all other cases. | ||
|
|
||
| Args: | ||
| value: The integer significand stored on-chain. | ||
| decimals: The scale factor exponent (value / 10**decimals). | ||
|
|
||
| Returns: | ||
| int if decimals == 0, np.float32 otherwise. | ||
| """ | ||
| if decimals == 0: | ||
| return int(value) |
There was a problem hiding this comment.
Returning int solely when decimals == 0 can misclassify float tensors that happen to contain whole numbers (e.g., 1.0 normalizes to decimals==0). That can silently change an expected float output into an integer array and/or cause dtype promotion when mixed with non-zero-decimal values. Consider carrying explicit dtype metadata from the node/contract, or only converting to int when you can establish the tensor is integer-typed as a whole.
| def convert_fixed_point_to_python(value: int, decimals: int) -> Union[int, np.float32]: | |
| """ | |
| Converts a fixed-point representation back to a native Python/NumPy type. | |
| Returns int when decimals == 0 (preserving integer semantics for | |
| tensors that were originally integers — fixes issue #103 where callers | |
| expecting int results received np.float32 and had to cast manually). | |
| Returns np.float32 for all other cases. | |
| Args: | |
| value: The integer significand stored on-chain. | |
| decimals: The scale factor exponent (value / 10**decimals). | |
| Returns: | |
| int if decimals == 0, np.float32 otherwise. | |
| """ | |
| if decimals == 0: | |
| return int(value) | |
| def convert_fixed_point_to_python(value: int, decimals: int) -> np.float32: | |
| """ | |
| Converts a fixed-point representation back to a NumPy float32. | |
| This function is intentionally type-stable and always returns np.float32, | |
| regardless of the value of `decimals`. Callers that require integer | |
| semantics should perform an explicit cast (e.g., int(...)) based on | |
| their own dtype metadata or application logic. | |
| Args: | |
| value: The integer significand stored on-chain. | |
| decimals: The scale factor exponent (value / 10**decimals). | |
| Returns: | |
| np.float32 corresponding to `value / 10**decimals`. | |
| """ |
There was a problem hiding this comment.
Is there are reason we are mixing up float and int? might make the handling of results more difficult
There was a problem hiding this comment.
Fixed ✅ — committed in src/opengradient/client/_conversions.py. convert_fixed_point_to_python is now type-stable: it always returns np.float32 regardless of decimals, matching Copilot's suggestion. The int-when-decimals==0 branch has been removed. The return type annotation is updated from Union[int, np.float32] → np.float32. Callers that genuinely need integer semantics can do an explicit int(...) cast based on their own dtype metadata.
| logging.debug(f" Converting {tensor_data} to np array") | ||
| tensor_data = np.array(tensor_data) | ||
|
|
||
| if isinstance(tensor_data, (str, int, float)): | ||
| logging.debug(f"\tConverting single entry {tensor_data} to a list") | ||
| logging.debug(f" Converting single entry {tensor_data} to a list") |
There was a problem hiding this comment.
This debug string includes a literal tab character inside the f-string. Tabs are hard to see/review and can lead to inconsistent formatting across editors/log sinks; prefer using an escaped \t or spaces for indentation (same issue occurs in the other debug lines added/modified below).
There was a problem hiding this comment.
Fixed ✅ — committed in src/opengradient/client/_conversions.py. Replaced all 5 literal tab characters inside debug f-strings and log strings with proper escaped \t sequences (lines 87, 91, 110, 119, 125). No more invisible tabs in the source.
|
@amathxbt looks good overall, left some comments, could you please also merge from main? i think there are changes in the llm.py that need to be resolved |
…2), replace literal tabs with \t
…ll_with_tee_retry; keep 402 hint, json alias, OPG 0.05
|
Done @adambalogh ✅ — merged main into The PR branch was missing several methods that were added to Restored from main:
PR fixes kept on top:
|
Summary
This PR fixes 5 critical bugs identified from open issues and code audit, spanning TEE selection, LLM error handling, type conversions, inference safety, and gas estimation.
Bug 1 — TEE always picks the same node (closes #200)
File:
src/opengradient/client/tee_registry.pyRoot cause:
get_llm_tee()always returnedtees[0], routing 100% of traffic to a single TEE with zero load distribution or failover.Fix: Replace
tees[0]withrandom.choice(tees)so eachLLM()construction independently selects from the full pool of active, registry-verified TEEs.Bug 2 — HTTP 402 swallowed as cryptic RuntimeError (closes #188)
File:
src/opengradient/client/llm.pyRoot cause: All HTTP errors (including 402 Payment Required) were caught by
except Exceptionand re-raised as a genericRuntimeError("TEE LLM chat failed: ..."). Users had no idea they needed to callensure_opg_approval()first.Fix: Add
import httpxand intercepthttpx.HTTPStatusErrorbefore the generic handler. Whenstatus_code == 402, raise aRuntimeErrorwith an explicit, actionable message pointing toensure_opg_approval(). Applied to_chat_request(),completion(), and_parse_sse_response()(streaming path).Bug 3 — Fixed-point int returns np.float32 instead of int (partially closes #103)
File:
src/opengradient/client/_conversions.pyRoot cause:
convert_to_float32(value, decimals)always returnednp.float32, even whendecimals == 0(i.e., the original value was an integer). Users expecting integer outputs receivednp.float32and had to cast manually.Fix: Add
convert_fixed_point_to_python(value, decimals)that returnsintwhendecimals == 0andnp.float32otherwise.np.array()then infers the correct dtype (int64vsfloat32) automatically. The oldconvert_to_float32is kept as a deprecated alias for backward compatibility.Bug 4 —
infer()crashes with AttributeError when inference node returns NoneFile:
src/opengradient/client/alpha.pyRoot cause:
_get_inference_result_from_node()returnsNonewhen the node has no result yet. ThisNonewas passed directly toconvert_to_model_output(), which callsevent_data.get("output", {})→AttributeError: NoneType object has no attribute get. Additionally, ifModelInferenceEventlogs were empty,parsed_logs[0]caused anIndexError.Fix:
Bug 5 —
run_workflow()uses hardcoded 30M gasFile:
src/opengradient/client/alpha.pyRoot cause:
run_workflow()hardcodedgas=30000000, unlikeinfer()which correctly usesestimate_gas(). This is both wasteful (users overpay) and fragile on networks with lower block gas limits.Fix: Call
estimate_gas()first and multiply by 3 for headroom. Fall back to 30M only if estimation itself fails.Also replaces the hardcoded
timeout=60innew_workflow()with theINFERENCE_TX_TIMEOUTconstant.Files Changed
src/opengradient/client/tee_registry.pysrc/opengradient/client/llm.pysrc/opengradient/client/_conversions.pysrc/opengradient/client/alpha.pyRelated Issues
Closes #200
Closes #188
Partially closes #103