Skip to content

Conversation

@noel2004
Copy link
Member

@noel2004 noel2004 commented Jul 21, 2025

The PR has induced a "coordinator_proxy" app to support the provers cluster mode.

Designation: https://www.notion.so/scrollzkp/Almost-stateless-prover-proxy-2537792d22af80b08db3f8397e0824bd?d=29c7792d22af80ee98df001caf30b7e1#2537792d22af806880efc8cea92d0e19

Summary by CodeRabbit

  • New Features

    • Coordinator proxy: provers can connect via a proxy to multiple coordinators with intelligent task routing, priority upstream selection, and persistent per-prover sessions.
    • Proxy authentication & lifecycle: JWT-based proxy login, per-client sessions, automatic token management and background login maintenance.
    • New proxy API endpoints and a CLI/server entry to run the proxy service.
  • Chores

    • DB migrations, config files, Dockerfile, CI workflow, and build/Make targets to support the proxy module.

@coderabbitai
Copy link

coderabbitai bot commented Jul 21, 2025

Walkthrough

Adds a coordinator proxy subsystem: CLI/server, upstream HTTP clients and managers, per-prover session management with optional persistence, proxy-aware auth/middleware, get_task/submit_proof proxy controllers, DB migrations, libzkp reorganizations, and related tests and CI/build artifacts. (49 words)

Changes

Cohort / File(s) Summary
Configuration & Build
coordinator/Makefile, coordinator/conf/config_proxy.json, coordinator/internal/config/proxy_config.go, build/dockerfiles/coordinator-proxy.Dockerfile, build/dockerfiles/coordinator-proxy.Dockerfile.dockerignore, .github/workflows/docker.yml
Add coordinator_proxy build target, Dockerfile and workflow; add proxy JSON config and loader NewProxyConfig with Normalize and env overrides.
Proxy CLI / Server
coordinator/cmd/proxy/main.go, coordinator/cmd/proxy/app/app.go, coordinator/cmd/proxy/app/flags.go
New proxy CLI entrypoint and app.Run; flag definitions; Gin server setup, observability, controller init, and graceful shutdown.
Upstream HTTP Client
coordinator/internal/controller/proxy/client.go
Implements upClient with Login/ProxyLogin/GetTask/SubmitProof flows, response parsing, compatibility signing, and error handling.
Client Manager
coordinator/internal/controller/proxy/client_manager.go
ClientManager derives private key, manages cached upClient, performs login/retry orchestration, exposes Reset/Client/Name.
Auth (API & Proxy)
coordinator/internal/controller/api/auth.go, coordinator/internal/controller/proxy/auth.go
API auth made proxy-aware; new proxy AuthController with Login, PayloadFunc, IdentityHandler; orchestrates per-client ProxyLogin and session creation.
Prover Session & Persistence
coordinator/internal/controller/proxy/prover_session.go, coordinator/internal/controller/proxy/persistent.go, database/migrate/migrations/proxy/0001_running_tables.sql
ProverManager + proverSession with async maintainLogin, ProxyLogin/GetTask/SubmitProof; GORM-backed persistence for prover sessions and priority upstreams; migration added.
Controllers: GetTask & SubmitProof
coordinator/internal/controller/proxy/get_task.go, coordinator/internal/controller/proxy/submit_proof.go, coordinator/internal/controller/proxy/controller.go
Add GetTaskController and SubmitProofController, PriorityUpstreamManager (in-memory + persistent), and InitController wiring clients/controllers.
Middleware & Routes
coordinator/internal/middleware/challenge_jwt.go, coordinator/internal/middleware/login_jwt.go, coordinator/internal/route/route.go
JWT middlewares refactored to accept Auth config, add ProxyLoginMiddleware, and expose proxy route group with /proxy_login, /get_task, /submit_proof.
Types & Tests
common/types/response.go, coordinator/internal/types/prover.go, coordinator/internal/types/response_test.go
Add Response.DecodeData (mapstructure JSON tag mapping); add ProverProviderTypeProxy; unit test for Response decode round-trip.
libzkp Reorg & Mocks
coordinator/internal/logic/libzkp/lib.go, .../message_types.go, .../universal_task.go, .../lib_mock.go, .../mock_universal_task.go
Reorganize task-type mapping, add GenerateUniversalTask wrappers, and introduce mock_verifier build variant with mock implementations.
Login Logic Changes
coordinator/internal/logic/auth/login.go
LoginLogic now accepts VerifierConfig, injects ChallengeDeduplicator, splits VerifyMsg and CompatiblityCheck, adds simple deduplicator constructor.
Migrations & DB Helpers
database/migrate/migrate.go
Replace SQL file embedding with directory; add module-scoped helpers: MigrateModule, RollbackModule, ResetModuleDB.
Tests / Test Helpers
coordinator/test/api_test.go, coordinator/test/mock_prover.go, coordinator/test/proxy_test.go, coordinator/internal/controller/proxy/prover_session_test.go
Add proxy integration/unit tests and scaffolding; mockProver token caching and test helpers for proxy scenarios.
Go Modules
common/go.mod
Add direct dependency on github.com/mitchellh/mapstructure v1.5.0.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant P as Prover (client)
    participant PX as Proxy Server
    participant UP as Upstream Coordinator

    note right of PX `#f0f4c3`: Proxy login & session orchestration
    P->>PX: POST /coordinator/v1/challenge
    PX->>P: challenge token
    P->>PX: POST /coordinator/v1/login
    PX->>UP: POST /coordinator/v1/challenge
    UP->>PX: upstream challenge
    PX->>UP: POST /coordinator/v1/login (proxy-signed)
    UP->>PX: login token
    PX->>PX: cache/persist session & token
    PX->>P: return JWT

    note right of PX `#e8f5e9`: Task retrieval flow
    P->>PX: POST /coordinator/v1/get_task (JWT)
    PX->>PX: resolve identity, pick upstream (priority)
    PX->>UP: POST /coordinator/v1/get_task (use token)
    UP->>PX: task
    PX->>PX: set priority for prover
    PX->>P: task (annotated with upstream)

    note right of PX `#fff3e0`: Proof submission flow
    P->>PX: POST /coordinator/v1/submit_proof (upstream-tagged taskID)
    PX->>PX: resolve upstream
    PX->>UP: POST /coordinator/v1/submit_proof (with token)
    UP->>PX: success
    PX->>P: success
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas needing extra attention:

  • coordinator/internal/controller/proxy/prover_session.go — concurrency, maintainLogin lifecycle, persistence sync.
  • coordinator/internal/controller/proxy/client_manager.go and client.go — private key derivation, retry/backoff, token lifecycle and compatibility signing.
  • coordinator/internal/controller/proxy/persistent.go and migration SQL — GORM upsert correctness and schema/table constraints.
  • get_task/submit_proof controllers and priority selection — taskID encoding/decoding and upstream failover/error mapping.
  • coordinator/internal/logic/auth/login.go and middleware changes — compatibility checks, deduplicator injection semantics, and auth claim wiring.

Possibly related PRs

Suggested labels

bump-version

Suggested reviewers

  • georgehao
  • lispc
  • Thegaram

Poem

🐇 I hopped a proxy, quick and spry,

I signed a token, watched it fly,
Sessions nested, safe and sound,
Tasks returned from upstream ground,
I twiddle whiskers — builds compile, oh my! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete and lacks required template sections. Missing: Purpose/design rationale explanation, PR title checkbox selection, deployment tag versioning checkbox, and breaking change label checkbox. Complete the PR description by filling all required template sections: explain the purpose and design rationale, select appropriate checkboxes for PR title type, versioning, and breaking changes, and provide detailed context about the changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The PR title '[Feat] Support "proxy" arch between coordinator and prover' follows conventional commits format with 'feat' type and clearly describes the main change: adding proxy architecture support.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch coordinator_proxy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@noel2004 noel2004 marked this pull request as ready for review November 5, 2025 13:04
@codecov-commenter
Copy link

codecov-commenter commented Nov 5, 2025

Codecov Report

❌ Patch coverage is 15.63545% with 1009 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.62%. Comparing base (1985e54) to head (cf9e368).

Files with missing lines Patch % Lines
...inator/internal/controller/proxy/prover_session.go 19.69% 156 Missing and 3 partials ⚠️
coordinator/internal/controller/proxy/client.go 0.00% 150 Missing ⚠️
...inator/internal/controller/proxy/client_manager.go 0.00% 142 Missing ⚠️
coordinator/internal/controller/proxy/get_task.go 0.00% 139 Missing ⚠️
coordinator/internal/controller/proxy/auth.go 0.00% 94 Missing ⚠️
coordinator/cmd/proxy/app/app.go 0.00% 79 Missing ⚠️
...oordinator/internal/controller/proxy/persistent.go 5.71% 66 Missing ⚠️
...rdinator/internal/controller/proxy/submit_proof.go 0.00% 59 Missing ⚠️
database/migrate/migrate.go 4.00% 24 Missing ⚠️
coordinator/internal/config/proxy_config.go 0.00% 22 Missing ⚠️
... and 10 more
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1701      +/-   ##
===========================================
- Coverage    36.90%   35.62%   -1.28%     
===========================================
  Files          245      257      +12     
  Lines        20957    21992    +1035     
===========================================
+ Hits          7734     7835     +101     
- Misses       12401    13335     +934     
  Partials       822      822              
Flag Coverage Δ
common 29.49% <0.00%> (-0.26%) ⬇️
coordinator 28.49% <16.03%> (-4.89%) ⬇️
database 37.72% <4.00%> (-4.33%) ⬇️
rollup 35.81% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (7)
coordinator/internal/logic/libzkp/message_types.go (1)

8-13: Consider enhancing the documentation for TaskType constants.

The comment "matching the Rust enum" is vague. Consider adding details about which Rust enum this matches and the purpose of these constants.

Example:

-// TaskType enum values matching the Rust enum
+// TaskType represents the type of ZK proof task (Chunk, Batch, or Bundle).
+// These values must match the corresponding Rust enum in libzkp.
 const (
-	TaskTypeChunk  = 0
-	TaskTypeBatch  = 1
-	TaskTypeBundle = 2
+	TaskTypeChunk  = 0 // Chunk-level proof
+	TaskTypeBatch  = 1 // Batch-level proof
+	TaskTypeBundle = 2 // Bundle-level proof
 )
coordinator/internal/logic/libzkp/lib_mock.go (2)

9-25: Remove commented-out code.

The commented functions appear to be deprecated. If these are no longer needed, remove them to improve code clarity. If they might be needed in the future, rely on version control history instead.

Apply this diff to clean up:

-// // InitVerifier is a no-op in the mock.
-// func InitVerifier(configJSON string) {}
-
-// // VerifyChunkProof returns a fixed success in the mock.
-// func VerifyChunkProof(proofData, forkName string) bool {
-// 	return true
-// }
-
-// // VerifyBatchProof returns a fixed success in the mock.
-// func VerifyBatchProof(proofData, forkName string) bool {
-// 	return true
-// }
-
-// // VerifyBundleProof returns a fixed success in the mock.
-// func VerifyBundleProof(proofData, forkName string) bool {
-// 	return true
-// }
-

27-29: Add documentation explaining why this function panics.

The panic with "should not run here" is unclear. Add a comment explaining that this function is not supported in mock mode and why.

Apply this diff:

+// UniversalTaskCompatibilityFix is not supported in mock_verifier mode.
+// This function should not be called when using the mock verifier.
 func UniversalTaskCompatibilityFix(taskJSON string) (string, error) {
-	panic("should not run here")
+	panic("UniversalTaskCompatibilityFix is not supported in mock_verifier mode")
 }
common/types/response.go (1)

17-28: Consider enabling WeaklyTypedInput for JSON compatibility.

The current decoder configuration may fail when JSON numeric types don't exactly match Go field types (e.g., JSON numbers decoded as float64 but target field is int). Since resp.Data typically holds unmarshaled JSON (map[string]any), enabling WeaklyTypedInput: true would allow weak type conversions and improve robustness.

Apply this diff to enable weak typing:

 func (resp *Response) DecodeData(out interface{}) error {
 	// Decode generically unmarshaled JSON (map[string]any, []any) into a typed struct
 	// honoring `json` tags and allowing weak type conversions.
 	dec, err := mapstructure.NewDecoder(&mapstructure.DecoderConfig{
-		TagName: "json",
-		Result:  out,
+		TagName:          "json",
+		WeaklyTypedInput: true,
+		Result:           out,
 	})
 	if err != nil {
 		return err
 	}
 	return dec.Decode(resp.Data)
 }
coordinator/conf/config_proxy.json (1)

1-31: Document that this is a template configuration with insecure defaults.

The configuration file contains several settings unsuitable for production:

  • Placeholder secrets (lines 5, 8)
  • sslmode=disable for database connection (line 27)
  • HTTP instead of HTTPS for coordinator URL (line 19)

Consider adding a comment header or README documentation clarifying that this is a template/example configuration requiring customization for production use.

coordinator/internal/types/response_test.go (1)

11-48: Good test coverage, consider adding edge case tests.

The test properly validates the happy path for DecodeData. Consider adding tests for edge cases such as:

  • Empty or nil Data field
  • Type mismatches (e.g., JSON numbers vs. Go int types)
  • Invalid data structure
coordinator/internal/config/proxy_config.go (1)

55-74: Consider calling Normalize() automatically in the constructor.

The Normalize method is defined but not called within NewProxyConfig. This requires callers to remember to normalize the config manually. For convenience and consistency, consider calling cfg.ProxyManager.Normalize() before returning.

Apply this diff to auto-normalize:

 	// Override config with environment variables
 	err = utils.OverrideConfigWithEnv(cfg, "SCROLL_COORDINATOR_PROXY")
 	if err != nil {
 		return nil, err
 	}
 
+	// Apply defaults from nested config
+	if cfg.ProxyManager != nil {
+		cfg.ProxyManager.Normalize()
+	}
+
 	return cfg, nil
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1985e54 and 20fde41.

📒 Files selected for processing (33)
  • common/types/response.go (2 hunks)
  • coordinator/Makefile (1 hunks)
  • coordinator/cmd/proxy/app/app.go (1 hunks)
  • coordinator/cmd/proxy/app/flags.go (1 hunks)
  • coordinator/cmd/proxy/main.go (1 hunks)
  • coordinator/conf/config_proxy.json (1 hunks)
  • coordinator/internal/config/proxy_config.go (1 hunks)
  • coordinator/internal/controller/api/auth.go (2 hunks)
  • coordinator/internal/controller/proxy/auth.go (1 hunks)
  • coordinator/internal/controller/proxy/client.go (1 hunks)
  • coordinator/internal/controller/proxy/client_manager.go (1 hunks)
  • coordinator/internal/controller/proxy/controller.go (1 hunks)
  • coordinator/internal/controller/proxy/get_task.go (1 hunks)
  • coordinator/internal/controller/proxy/persistent.go (1 hunks)
  • coordinator/internal/controller/proxy/prover_session.go (1 hunks)
  • coordinator/internal/controller/proxy/prover_session_test.go (1 hunks)
  • coordinator/internal/controller/proxy/submit_proof.go (1 hunks)
  • coordinator/internal/logic/auth/login.go (2 hunks)
  • coordinator/internal/logic/libzkp/lib.go (1 hunks)
  • coordinator/internal/logic/libzkp/lib_mock.go (1 hunks)
  • coordinator/internal/logic/libzkp/message_types.go (1 hunks)
  • coordinator/internal/logic/libzkp/mock_universal_task.go (2 hunks)
  • coordinator/internal/logic/libzkp/universal_task.go (2 hunks)
  • coordinator/internal/middleware/challenge_jwt.go (2 hunks)
  • coordinator/internal/middleware/login_jwt.go (1 hunks)
  • coordinator/internal/route/route.go (2 hunks)
  • coordinator/internal/types/prover.go (2 hunks)
  • coordinator/internal/types/response_test.go (1 hunks)
  • coordinator/test/api_test.go (7 hunks)
  • coordinator/test/mock_prover.go (9 hunks)
  • coordinator/test/proxy_test.go (1 hunks)
  • database/migrate/migrate.go (2 hunks)
  • database/migrate/migrations/proxy/0001_running_tables.sql (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (24)
coordinator/internal/middleware/challenge_jwt.go (2)
coordinator/internal/config/config.go (1)
  • Auth (44-48)
coordinator/internal/controller/api/controller.go (1)
  • Auth (22-22)
coordinator/internal/logic/libzkp/mock_universal_task.go (1)
coordinator/internal/logic/libzkp/universal_task.go (1)
  • GenerateUniversalTask (24-26)
coordinator/test/mock_prover.go (2)
coordinator/internal/types/prover.go (2)
  • ProverType (20-20)
  • MakeProverType (47-56)
common/types/message/message.go (1)
  • ProofType (14-14)
coordinator/cmd/proxy/app/app.go (8)
common/utils/flags.go (2)
  • CommonFlags (9-20)
  • ConfigFileFlag (30-34)
common/utils/logger.go (1)
  • LogSetup (15-42)
common/utils/simulation.go (2)
  • RegisterSimulation (30-40)
  • CoordinatorAPIApp (24-24)
coordinator/internal/config/proxy_config.go (3)
  • NewProxyConfig (55-74)
  • ProxyManager (13-19)
  • ProxyConfig (48-52)
common/database/db.go (2)
  • InitDB (48-80)
  • CloseDB (83-92)
common/observability/server.go (1)
  • Server (24-53)
coordinator/internal/controller/proxy/controller.go (1)
  • InitController (24-44)
coordinator/internal/route/route.go (1)
  • ProxyRoute (45-53)
coordinator/internal/logic/libzkp/lib_mock.go (1)
coordinator/internal/logic/libzkp/lib.go (4)
  • UniversalTaskCompatibilityFix (127-141)
  • GenerateWrappedProof (76-98)
  • DumpVk (101-117)
  • SetDynamicFeature (120-124)
coordinator/cmd/proxy/main.go (1)
coordinator/cmd/proxy/app/app.go (1)
  • Run (115-121)
coordinator/internal/controller/proxy/controller.go (6)
coordinator/internal/controller/proxy/get_task.go (3)
  • GetTaskController (102-109)
  • NewPriorityUpstreamManagerPersistent (56-61)
  • NewGetTaskController (112-119)
coordinator/internal/controller/proxy/submit_proof.go (2)
  • SubmitProofController (18-22)
  • NewSubmitProofController (25-31)
coordinator/internal/controller/proxy/auth.go (2)
  • AuthController (18-22)
  • NewAuthController (29-44)
coordinator/internal/controller/proxy/client_manager.go (2)
  • Client (18-22)
  • NewClientManager (56-70)
coordinator/internal/config/proxy_config.go (2)
  • ProxyConfig (48-52)
  • ProxyManager (13-19)
coordinator/internal/controller/proxy/prover_session.go (1)
  • NewProverManagerWithPersistent (34-41)
coordinator/internal/controller/api/auth.go (6)
coordinator/internal/logic/auth/login.go (3)
  • LoginLogic (22-29)
  • NewLoginLogic (48-50)
  • VerifyMsg (69-77)
coordinator/internal/controller/proxy/auth.go (2)
  • AuthController (18-22)
  • NewAuthController (29-44)
coordinator/internal/config/config.go (2)
  • Config (51-56)
  • ProverManager (13-29)
coordinator/internal/logic/verifier/types.go (1)
  • Verifier (11-17)
coordinator/internal/types/auth.go (4)
  • ProverProviderTypeKey (22-22)
  • LoginParameter (58-62)
  • Message (42-49)
  • PublicKey (16-16)
coordinator/internal/types/prover.go (1)
  • ProverProviderTypeProxy (82-82)
coordinator/internal/config/proxy_config.go (2)
coordinator/internal/config/config.go (1)
  • VerifierConfig (67-71)
common/utils/utils.go (1)
  • OverrideConfigWithEnv (87-135)
coordinator/internal/middleware/login_jwt.go (4)
coordinator/internal/controller/proxy/controller.go (1)
  • Auth (16-16)
coordinator/internal/config/config.go (1)
  • Auth (44-48)
coordinator/internal/controller/api/controller.go (1)
  • Auth (22-22)
coordinator/internal/types/auth.go (1)
  • PublicKey (16-16)
coordinator/internal/types/response_test.go (2)
coordinator/internal/types/get_task.go (1)
  • GetTaskSchema (12-19)
common/types/response.go (1)
  • Response (11-15)
coordinator/test/proxy_test.go (6)
coordinator/internal/config/proxy_config.go (2)
  • ProxyClient (32-36)
  • UpStream (39-45)
common/version/version.go (1)
  • Version (31-31)
coordinator/internal/controller/proxy/client_manager.go (2)
  • NewClientManager (56-70)
  • Client (18-22)
common/types/message/message.go (1)
  • ProofTypeChunk (33-33)
common/types/errno.go (1)
  • ErrCoordinatorEmptyProofData (28-28)
common/types/db.go (2)
  • ProvingStatus (140-140)
  • ProvingTaskVerified (152-152)
coordinator/internal/logic/libzkp/message_types.go (1)
common/types/message/message.go (4)
  • ProofType (14-14)
  • ProofTypeChunk (33-33)
  • ProofTypeBatch (35-35)
  • ProofTypeBundle (37-37)
coordinator/internal/route/route.go (6)
coordinator/internal/middleware/challenge_jwt.go (1)
  • ChallengeMiddleware (17-50)
coordinator/internal/controller/proxy/controller.go (3)
  • Auth (16-16)
  • GetTask (12-12)
  • SubmitProof (14-14)
coordinator/internal/config/config.go (1)
  • Auth (44-48)
coordinator/internal/middleware/login_jwt.go (2)
  • LoginMiddleware (21-46)
  • ProxyLoginMiddleware (49-74)
common/observability/middleware.go (1)
  • Use (11-18)
coordinator/internal/config/proxy_config.go (2)
  • ProxyConfig (48-52)
  • ProxyManager (13-19)
coordinator/internal/controller/proxy/auth.go (9)
coordinator/internal/controller/api/auth.go (3)
  • AuthController (18-20)
  • NewAuthController (29-33)
  • NewAuthControllerWithLogic (22-26)
coordinator/internal/controller/proxy/controller.go (1)
  • Clients (21-21)
coordinator/internal/controller/proxy/prover_session.go (1)
  • ProverManager (18-24)
coordinator/internal/config/proxy_config.go (2)
  • ProxyConfig (48-52)
  • ProxyManager (13-19)
coordinator/internal/logic/verifier/types.go (1)
  • Verifier (11-17)
coordinator/internal/logic/auth/login.go (1)
  • NewLoginLogicWithSimpleDeduplicator (43-45)
coordinator/internal/types/auth.go (7)
  • LoginParameterWithHardForkName (52-55)
  • LoginParameter (58-62)
  • Message (42-49)
  • PublicKey (16-16)
  • ProverName (18-18)
  • ProverVersion (20-20)
  • ProverProviderTypeKey (22-22)
coordinator/internal/types/prover.go (3)
  • ProverProviderType (59-59)
  • ProverProviderTypeProxy (82-82)
  • ProverType (20-20)
coordinator/internal/controller/proxy/client_manager.go (1)
  • Client (18-22)
coordinator/internal/controller/proxy/get_task.go (8)
coordinator/internal/types/auth.go (2)
  • PublicKey (16-16)
  • ProverName (18-18)
common/types/response.go (2)
  • RenderFailure (50-52)
  • RenderSuccess (45-47)
common/types/errno.go (3)
  • ErrCoordinatorParameterInvalidNo (22-22)
  • ErrCoordinatorGetTaskFailure (24-24)
  • ErrCoordinatorEmptyProofData (28-28)
coordinator/internal/controller/proxy/persistent.go (1)
  • NewProverPriorityPersist (87-89)
coordinator/internal/controller/proxy/prover_session.go (1)
  • ProverManager (18-24)
coordinator/internal/controller/proxy/controller.go (2)
  • Clients (21-21)
  • GetTask (12-12)
coordinator/internal/types/get_task.go (2)
  • GetTaskParameter (4-9)
  • GetTaskSchema (12-19)
coordinator/internal/controller/proxy/client_manager.go (1)
  • Client (18-22)
coordinator/internal/controller/proxy/prover_session_test.go (1)
coordinator/internal/controller/proxy/prover_session.go (1)
  • NewProverManager (26-32)
coordinator/internal/controller/proxy/submit_proof.go (6)
coordinator/internal/controller/proxy/prover_session.go (1)
  • ProverManager (18-24)
coordinator/internal/controller/proxy/controller.go (2)
  • Clients (21-21)
  • SubmitProof (14-14)
coordinator/internal/controller/proxy/get_task.go (1)
  • PriorityUpstreamManager (42-46)
coordinator/internal/types/submit_proof.go (1)
  • SubmitProofParameter (4-13)
common/types/response.go (2)
  • RenderFailure (50-52)
  • RenderSuccess (45-47)
common/types/errno.go (2)
  • ErrCoordinatorParameterInvalidNo (22-22)
  • ErrCoordinatorGetTaskFailure (24-24)
coordinator/internal/controller/proxy/client.go (4)
coordinator/internal/controller/proxy/client_manager.go (1)
  • Client (18-22)
coordinator/internal/config/proxy_config.go (1)
  • UpStream (39-45)
coordinator/internal/types/auth.go (3)
  • LoginParameter (58-62)
  • PublicKey (16-16)
  • Message (42-49)
common/types/response.go (1)
  • Response (11-15)
coordinator/internal/controller/proxy/prover_session.go (8)
coordinator/internal/controller/proxy/persistent.go (1)
  • NewProverDataPersist (17-19)
coordinator/internal/types/auth.go (5)
  • LoginSchema (28-31)
  • LoginParameter (58-62)
  • Message (42-49)
  • ProverName (18-18)
  • PublicKey (16-16)
coordinator/internal/controller/proxy/client_manager.go (1)
  • Client (18-22)
common/types/errno.go (1)
  • ErrJWTTokenExpired (12-12)
coordinator/internal/types/get_task.go (1)
  • GetTaskParameter (4-9)
common/types/response.go (1)
  • Response (11-15)
coordinator/internal/controller/proxy/auth.go (1)
  • LoginParamCache (24-24)
coordinator/internal/types/submit_proof.go (1)
  • SubmitProofParameter (4-13)
coordinator/internal/logic/auth/login.go (7)
coordinator/internal/config/config.go (1)
  • VerifierConfig (67-71)
coordinator/internal/logic/verifier/types.go (1)
  • Verifier (11-17)
coordinator/internal/orm/challenge.go (1)
  • NewChallenge (25-27)
coordinator/internal/types/auth.go (4)
  • LoginParameter (58-62)
  • Message (42-49)
  • ProverName (18-18)
  • ProverVersion (20-20)
common/version/prover_version.go (2)
  • CheckScrollRepoVersion (37-55)
  • CheckScrollProverVersion (11-28)
common/version/version.go (1)
  • Version (31-31)
coordinator/internal/types/prover.go (5)
  • ProverProviderType (59-59)
  • ProverProviderTypeInternal (78-78)
  • ProverProviderTypeExternal (80-80)
  • ProverProviderTypeProxy (82-82)
  • ProverProviderTypeUndefined (76-76)
coordinator/internal/controller/proxy/persistent.go (1)
coordinator/internal/types/auth.go (2)
  • PublicKey (16-16)
  • LoginSchema (28-31)
coordinator/internal/controller/proxy/client_manager.go (3)
coordinator/internal/config/proxy_config.go (2)
  • ProxyClient (32-36)
  • UpStream (39-45)
coordinator/internal/types/auth.go (5)
  • LoginParameter (58-62)
  • PublicKey (16-16)
  • Message (42-49)
  • ProverName (18-18)
  • ProverVersion (20-20)
coordinator/internal/types/prover.go (3)
  • ProverProviderType (59-59)
  • ProverProviderTypeProxy (82-82)
  • ProverType (20-20)
coordinator/test/api_test.go (7)
coordinator/internal/config/config.go (4)
  • Config (51-56)
  • VerifierConfig (67-71)
  • AssetConfig (59-64)
  • Auth (44-48)
coordinator/internal/config/proxy_config.go (3)
  • ProxyConfig (48-52)
  • UpStream (39-45)
  • ProxyManager (13-19)
database/migrate/migrate.go (2)
  • ResetModuleDB (56-61)
  • MigrateModule (30-38)
coordinator/internal/controller/proxy/client_manager.go (1)
  • Client (18-22)
coordinator/internal/controller/proxy/controller.go (2)
  • Auth (16-16)
  • InitController (24-44)
coordinator/internal/controller/api/controller.go (2)
  • Auth (22-22)
  • InitController (26-48)
coordinator/internal/route/route.go (1)
  • ProxyRoute (45-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: tests
  • GitHub Check: check
  • GitHub Check: tests
  • GitHub Check: tests
  • GitHub Check: check
  • GitHub Check: check
  • GitHub Check: tests
  • GitHub Check: check
  • GitHub Check: tests
🔇 Additional comments (8)
coordinator/internal/logic/libzkp/mock_universal_task.go (1)

18-20: LGTM: Consistent wrapper implementation.

The public wrapper correctly normalizes inputs (task type mapping and fork name lowercasing) before delegating to the internal implementation. This maintains API consistency between mock and production builds.

Note: The potential panic from fromMessageTaskType was already flagged in the review of message_types.go.

coordinator/internal/logic/libzkp/universal_task.go (1)

23-26: LGTM: Clean API wrapper with proper input normalization.

The public wrapper correctly maps the task type and normalizes the fork name before delegating to the internal C-binding layer. The implementation is consistent with the mock version, ensuring predictable behavior across build configurations.

Note: The potential panic from fromMessageTaskType was already flagged in the review of message_types.go.

coordinator/internal/logic/libzkp/lib.go (1)

1-2: LGTM: Proper build constraint added for code organization.

The build tag correctly separates mock and production implementations. The removed code has been appropriately relocated to message_types.go (TaskType constants and mapping function) and universal_task.go (GenerateUniversalTask wrapper), improving code organization and maintainability.

coordinator/internal/logic/libzkp/message_types.go (1)

15-26: Original review comment is incorrect and should be dismissed.

The concern about untrusted input reaching fromMessageTaskType is not valid. TaskType is always set internally from the message.ProofType enum constants (0, 1, or 2), never from external input. The function's panic in the default case is a programming error check for the internal codebase, not a reachable DoS vector.

Likely an incorrect or invalid review comment.

coordinator/cmd/proxy/app/flags.go (1)

1-30: LGTM!

The CLI flag definitions are well-structured with appropriate defaults. The restrictive localhost binding and disabled-by-default HTTP server provide a secure initial configuration.

coordinator/test/mock_prover.go (1)

37-38: LGTM!

The token caching implementation is well-structured and consistent across all methods. The conditional re-authentication logic and defensive assertions ensure correct token management in test scenarios.

Also applies to: 55-61, 128-128, 158-165, 191-198, 269-275

coordinator/test/proxy_test.go (1)

1-284: LGTM!

The test suite provides comprehensive coverage of proxy functionality including handshake, task retrieval, proof submission, and session persistence. The tests properly validate multi-coordinator routing and sticky session behavior.

coordinator/internal/middleware/challenge_jwt.go (1)

17-17: Good refactor to accept only required dependencies.

The signature change from *config.Config to *config.Auth follows the interface segregation principle. All call sites have been correctly updated to pass the Auth field: conf.Auth at route.go:29 and conf.ProxyManager.Auth at route.go:58.

Comment on lines 3 to 108
import (
"context"
"errors"
"fmt"
"net/http"
"os"
"os/signal"
"time"

"github.com/gin-gonic/gin"
"github.com/prometheus/client_golang/prometheus"
"github.com/scroll-tech/go-ethereum/log"
"github.com/urfave/cli/v2"
"gorm.io/gorm"

"scroll-tech/common/database"
"scroll-tech/common/observability"
"scroll-tech/common/utils"
"scroll-tech/common/version"

"scroll-tech/coordinator/internal/config"
"scroll-tech/coordinator/internal/controller/proxy"
"scroll-tech/coordinator/internal/route"
)

var app *cli.App

func init() {
// Set up coordinator app info.
app = cli.NewApp()
app.Action = action
app.Name = "coordinator proxy"
app.Usage = "Proxy for multiple Scroll L2 Coordinators"
app.Version = version.Version
app.Flags = append(app.Flags, utils.CommonFlags...)
app.Flags = append(app.Flags, apiFlags...)
app.Before = func(ctx *cli.Context) error {
return utils.LogSetup(ctx)
}
// Register `coordinator-test` app for integration-test.
utils.RegisterSimulation(app, utils.CoordinatorAPIApp)
}

func action(ctx *cli.Context) error {
cfgFile := ctx.String(utils.ConfigFileFlag.Name)
cfg, err := config.NewProxyConfig(cfgFile)
if err != nil {
log.Crit("failed to load config file", "config file", cfgFile, "error", err)
}

var db *gorm.DB
if dbCfg := cfg.ProxyManager.DB; dbCfg != nil {
db, err = database.InitDB(cfg.ProxyManager.DB)
if err != nil {
log.Crit("failed to init db connection", "err", err)
}
defer func() {
if err = database.CloseDB(db); err != nil {
log.Error("can not close db connection", "error", err)
}
}()
observability.Server(ctx, db)
}
registry := prometheus.DefaultRegisterer

apiSrv := server(ctx, cfg, db, registry)

log.Info(
"Start coordinator api successfully.",
"version", version.Version,
)

// Catch CTRL-C to ensure a graceful shutdown.
interrupt := make(chan os.Signal, 1)
signal.Notify(interrupt, os.Interrupt)

// Wait until the interrupt signal is received from an OS signal.
<-interrupt
log.Info("start shutdown coordinator proxy server ...")

closeCtx, cancelExit := context.WithTimeout(context.Background(), 5*time.Second)
defer cancelExit()
if err = apiSrv.Shutdown(closeCtx); err != nil {
log.Warn("shutdown coordinator proxy server failure", "error", err)
return nil
}

<-closeCtx.Done()
log.Info("coordinator proxy server exiting success")
return nil
}

func server(ctx *cli.Context, cfg *config.ProxyConfig, db *gorm.DB, reg prometheus.Registerer) *http.Server {
router := gin.New()
proxy.InitController(cfg, db, reg)
route.ProxyRoute(router, cfg, reg)
port := ctx.String(httpPortFlag.Name)
srv := &http.Server{
Addr: fmt.Sprintf(":%s", port),
Handler: router,
ReadHeaderTimeout: time.Minute,
}

go func() {
if runServerErr := srv.ListenAndServe(); runServerErr != nil && !errors.Is(runServerErr, http.ErrServerClosed) {
log.Crit("run coordinator proxy http server failure", "error", runServerErr)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle SIGTERM for graceful shutdown.

Orchestrators (systemd, Kubernetes, Docker) issue SIGTERM on shutdown. Because the proxy only listens for os.Interrupt, SIGTERM bypasses this handler and the process exits immediately without running Server.Shutdown, dropping in-flight requests. Please register SIGTERM alongside Interrupt so the graceful shutdown path always runs.

@@
-import (
+import (
 	"context"
 	"errors"
 	"fmt"
 	"net/http"
 	"os"
 	"os/signal"
+	"syscall"
 	"time"
@@
-	signal.Notify(interrupt, os.Interrupt)
+	signal.Notify(interrupt, os.Interrupt, syscall.SIGTERM)
🤖 Prompt for AI Agents
In coordinator/cmd/proxy/app/app.go around lines 3 to 108, the shutdown handler
only listens for os.Interrupt so SIGTERM sent by orchestrators is ignored; add
syscall to the imports and register SIGTERM with signal.Notify (e.g.,
signal.Notify(interrupt, os.Interrupt, syscall.SIGTERM)) so the graceful
shutdown path (apiSrv.Shutdown) runs for both SIGINT and SIGTERM.

Comment on lines +21 to +29
func (m *ProxyManager) Normalize() {
if m.Client.Secret == "" {
m.Client.Secret = m.Auth.Secret
}

if m.Client.ProxyVersion == "" {
m.Client.ProxyVersion = m.Verifier.MinProverVersion
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add nil checks to prevent potential panic.

The Normalize method accesses nested fields without nil checks. If Client, Auth, or Verifier are nil, this will panic.

Apply this diff to add defensive nil checks:

 func (m *ProxyManager) Normalize() {
+	if m.Client == nil || m.Auth == nil || m.Verifier == nil {
+		return
+	}
+
 	if m.Client.Secret == "" {
 		m.Client.Secret = m.Auth.Secret
 	}
 
 	if m.Client.ProxyVersion == "" {
 		m.Client.ProxyVersion = m.Verifier.MinProverVersion
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (m *ProxyManager) Normalize() {
if m.Client.Secret == "" {
m.Client.Secret = m.Auth.Secret
}
if m.Client.ProxyVersion == "" {
m.Client.ProxyVersion = m.Verifier.MinProverVersion
}
}
func (m *ProxyManager) Normalize() {
if m.Client == nil || m.Auth == nil || m.Verifier == nil {
return
}
if m.Client.Secret == "" {
m.Client.Secret = m.Auth.Secret
}
if m.Client.ProxyVersion == "" {
m.Client.ProxyVersion = m.Verifier.MinProverVersion
}
}
🤖 Prompt for AI Agents
In coordinator/internal/config/proxy_config.go around lines 21 to 29, the
Normalize method dereferences nested fields (m.Client, m.Auth, m.Verifier)
without nil checks which can panic; add defensive nil checks at the start
(return early if m == nil), ensure m.Client is non-nil (either return or
instantiate m.Client before assigning Secret/ProxyVersion), and check m.Auth and
m.Verifier are non-nil before reading their Secret or MinProverVersion and only
copy those values when the sources exist.

Comment on lines +41 to +45
var viaProxy bool
if proverType, proverTypeExist := c.Get(types.ProverProviderTypeKey); proverTypeExist {
proverType := uint8(proverType.(float64))
viaProxy = proverType == types.ProverProviderTypeProxy
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix unsafe type assertion in proxy detection.

proxy.AuthController marks proxy-originated logins by storing types.ProverProviderTypeProxy in the context. That value is a typed integer, so asserting it to float64 will panic at runtime. Normalize the interface value with a type switch instead of forcing a float64.

@@
-	var viaProxy bool
-	if proverType, proverTypeExist := c.Get(types.ProverProviderTypeKey); proverTypeExist {
-		proverType := uint8(proverType.(float64))
-		viaProxy = proverType == types.ProverProviderTypeProxy
-	}
+	var viaProxy bool
+	if raw, exists := c.Get(types.ProverProviderTypeKey); exists {
+		switch v := raw.(type) {
+		case float64:
+			viaProxy = uint8(v) == types.ProverProviderTypeProxy
+		case int:
+			viaProxy = uint8(v) == types.ProverProviderTypeProxy
+		case uint8:
+			viaProxy = v == types.ProverProviderTypeProxy
+		case types.ProverProviderType:
+			viaProxy = v == types.ProverProviderTypeProxy
+		}
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var viaProxy bool
if proverType, proverTypeExist := c.Get(types.ProverProviderTypeKey); proverTypeExist {
proverType := uint8(proverType.(float64))
viaProxy = proverType == types.ProverProviderTypeProxy
}
var viaProxy bool
if raw, exists := c.Get(types.ProverProviderTypeKey); exists {
switch v := raw.(type) {
case float64:
viaProxy = uint8(v) == types.ProverProviderTypeProxy
case int:
viaProxy = uint8(v) == types.ProverProviderTypeProxy
case uint8:
viaProxy = v == types.ProverProviderTypeProxy
case types.ProverProviderType:
viaProxy = v == types.ProverProviderTypeProxy
}
}
🤖 Prompt for AI Agents
In coordinator/internal/controller/api/auth.go around lines 41 to 45, the code
unsafely asserts the context value to float64 which can panic because the stored
prover type is a typed integer; replace the direct float64 assertion with a type
switch over the interface value (handle int, int32, int64, uint8, uint32,
float64, and the expected typed int) to normalize it to a uint8, then set
viaProxy = normalized == types.ProverProviderTypeProxy; ensure you also guard
for nil/missing values and avoid panics by defaulting to false on unexpected
types.

Comment on lines +95 to +99
func (p *PriorityUpstreamManager) Delete(key string) {
p.Lock()
defer p.Unlock()
delete(p.data, key)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Persist priority deletions as well.
Line 96 only evicts the in-memory record, but the row in priority_upstream stays untouched, so the very next Get call reloads the stale upstream from persistence and the proxy never actually drops priority. Please also delete the persistent entry before returning so failover can stick.

Apply this diff:

 func (p *PriorityUpstreamManager) Delete(key string) {
-	p.Lock()
-	defer p.Unlock()
-	delete(p.data, key)
+	if p.proverPriorityPersist != nil {
+		defer func() {
+			if err := p.proverPriorityPersist.Del(key); err != nil {
+				log.Error("persistent priority record delete failure", "error", err, "key", key)
+			}
+		}()
+	}
+	p.Lock()
+	defer p.Unlock()
+	delete(p.data, key)
 }
🤖 Prompt for AI Agents
In coordinator/internal/controller/proxy/get_task.go around lines 95 to 99, the
Delete method currently only evicts the in-memory entry and leaves the row in
priority_upstream, so a subsequent Get reloads the stale record; update Delete
to remove the persistent entry as well: call the persistence/store layer's
delete function for the given key (e.g., p.store.DeletePriorityUpstream or
equivalent) before removing p.data[key], handle and log any error returned (and
avoid silently proceeding if the persistent delete fails), and preserve
locking/consistency (acquire lock around the in-memory mutation but perform the
persistent delete such that the persistent state is removed before the in-memory
map is cleared).

Comment on lines +214 to +219
for _, n := range keys {
if err, code := getTask(ptc.clients[n]); err == nil && code == 0 {
// get task done
return
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't mask upstream failures as empty-task.
Lines 215-219 drop through whenever err != nil, so if every upstream call fails we end up responding with ErrCoordinatorEmptyProofData, hiding the outage from the prover. Capture the last error code/message and surface it via RenderFailure when we never get a successful/empty response.

Apply this diff:

-	for _, n := range keys {
-		if err, code := getTask(ptc.clients[n]); err == nil && code == 0 {
-			// get task done
-			return
-		}
-	}
-
-	log.Debug("get no task from upstream", "cli", proverName)
-	// if all get task failed, throw empty proof resp
-	types.RenderFailure(ctx, types.ErrCoordinatorEmptyProofData, fmt.Errorf("get empty prover task"))
+	var lastErr error
+	var lastCode int
+	for _, n := range keys {
+		if err, code := getTask(ptc.clients[n]); err == nil && code == 0 {
+			// get task done
+			return
+		} else if err != nil {
+			lastErr = err
+			lastCode = code
+		}
+	}
+
+	if lastErr != nil {
+		types.RenderFailure(ctx, lastCode, lastErr)
+		return
+	}
+
+	log.Debug("get no task from upstream", "cli", proverName)
+	// if all get task failed, throw empty proof resp
+	types.RenderFailure(ctx, types.ErrCoordinatorEmptyProofData, fmt.Errorf("get empty prover task"))

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In coordinator/internal/controller/proxy/get_task.go around lines 214-219, the
loop currently ignores all upstream errors and falls through to respond with
ErrCoordinatorEmptyProofData; instead declare variables lastErr error and
lastCode int before the loop, update them inside the loop whenever a call does
not satisfy err == nil && code == 0 (i.e., set lastErr = err and lastCode = code
for non-success cases), and after the loop check if lastErr != nil or lastCode
!= 0 — if so call RenderFailure with the captured lastErr and lastCode to
surface the upstream failure; keep the existing immediate return when a call
succeeds (err == nil && code == 0).

Comment on lines +288 to +291
newToken, err := c.maintainLogin(ctx, cliMgr, up, loginParam, token.phase)
if err != nil {
return nil, fmt.Errorf("update prover token fail: %V", err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use %v so the returned error is usable.
Line 290 formats the error with %V, which is not a valid fmt verb, so callers/logs get strings like %!V(MISSING) instead of the real failure reason when token refresh fails. Switch to %v to surface the upstream error.

Apply this diff:

-		return nil, fmt.Errorf("update prover token fail: %V", err)
+		return nil, fmt.Errorf("update prover token fail: %v", err)
🤖 Prompt for AI Agents
In coordinator/internal/controller/proxy/prover_session.go around lines 288 to
291, the error formatting uses the invalid fmt verb "%V" causing unusable
output; replace it with a valid verb (e.g., "%v") so the wrapped error is
rendered correctly (or use "%w" if you want to wrap the error for callers) —
update the fmt.Errorf call accordingly to return the real underlying error
message.

Comment on lines +78 to +83
resp, err := session.SubmitProof(ctx, &submitParameter, cli)
if err != nil {
log.Error("Upstream has error resp for submit", "code", resp.ErrCode, "msg", resp.ErrMsg, "up", upstream, "cli", proverName)
types.RenderFailure(ctx, types.ErrCoordinatorGetTaskFailure, err)
return
} else if resp.ErrCode != 0 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard against nil response on upstream submit failure

session.SubmitProof can return a non-nil error with a nil resp. Logging resp.ErrCode / resp.ErrMsg before checking for nil will panic and take down the handler on upstream failures. Please guard the error path so we never dereference a nil response object.

Apply this diff:

 resp, err := session.SubmitProof(ctx, &submitParameter, cli)
 if err != nil {
-	log.Error("Upstream has error resp for submit", "code", resp.ErrCode, "msg", resp.ErrMsg, "up", upstream, "cli", proverName)
+	if resp != nil {
+		log.Error("Upstream has error resp for submit", "code", resp.ErrCode, "msg", resp.ErrMsg, "up", upstream, "cli", proverName)
+	} else {
+		log.Error("Upstream submit failed", "error", err, "up", upstream, "cli", proverName)
+	}
 	types.RenderFailure(ctx, types.ErrCoordinatorGetTaskFailure, err)
 	return
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
resp, err := session.SubmitProof(ctx, &submitParameter, cli)
if err != nil {
log.Error("Upstream has error resp for submit", "code", resp.ErrCode, "msg", resp.ErrMsg, "up", upstream, "cli", proverName)
types.RenderFailure(ctx, types.ErrCoordinatorGetTaskFailure, err)
return
} else if resp.ErrCode != 0 {
resp, err := session.SubmitProof(ctx, &submitParameter, cli)
if err != nil {
if resp != nil {
log.Error("Upstream has error resp for submit", "code", resp.ErrCode, "msg", resp.ErrMsg, "up", upstream, "cli", proverName)
} else {
log.Error("Upstream submit failed", "error", err, "up", upstream, "cli", proverName)
}
types.RenderFailure(ctx, types.ErrCoordinatorGetTaskFailure, err)
return
} else if resp.ErrCode != 0 {
🤖 Prompt for AI Agents
In coordinator/internal/controller/proxy/submit_proof.go around lines 78 to 83,
the error branch logs resp.ErrCode and resp.ErrMsg without ensuring resp is
non-nil which can cause a panic when session.SubmitProof returns (nil, err);
modify the error handling to first check if resp is nil: if resp != nil log
using resp.ErrCode/resp.ErrMsg and upstream/cli, otherwise log the error using
err (and upstream/cli) and then call types.RenderFailure(ctx,
types.ErrCoordinatorGetTaskFailure, err) and return; keep the subsequent
resp.ErrCode != 0 check only after ensuring resp is non-nil.

Comment on lines +38 to 41
r.POST("/proxy_login", loginMiddleware.LoginHandler)
r.POST("/get_task", api.GetTask.GetTasks)
r.POST("/submit_proof", api.SubmitProof.SubmitProof)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unreachable /proxy_login endpoint
/proxy_login is registered inside the block guarded by loginMiddleware.MiddlewareFunc(), so callers must already present a valid login token before they can obtain one. That makes the new login flow unusable. Please move the route outside the authenticated group (and wrap it with the challenge middleware, just like /login) so proxy clients can actually log in.

 func v1(router *gin.RouterGroup, conf *config.Config) {
 	r := router.Group("/v1")

 	challengeMiddleware := middleware.ChallengeMiddleware(conf.Auth)
 	r.GET("/challenge", challengeMiddleware.LoginHandler)

 	loginMiddleware := middleware.LoginMiddleware(conf.Auth)
 	r.POST("/login", challengeMiddleware.MiddlewareFunc(), loginMiddleware.LoginHandler)
+	r.POST("/proxy_login", challengeMiddleware.MiddlewareFunc(), loginMiddleware.LoginHandler)

 	// need jwt token api
 	r.Use(loginMiddleware.MiddlewareFunc())
 	{
-		r.POST("/proxy_login", loginMiddleware.LoginHandler)
 		r.POST("/get_task", api.GetTask.GetTasks)
 		r.POST("/submit_proof", api.SubmitProof.SubmitProof)
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
r.POST("/proxy_login", loginMiddleware.LoginHandler)
r.POST("/get_task", api.GetTask.GetTasks)
r.POST("/submit_proof", api.SubmitProof.SubmitProof)
}
func v1(router *gin.RouterGroup, conf *config.Config) {
r := router.Group("/v1")
challengeMiddleware := middleware.ChallengeMiddleware(conf.Auth)
r.GET("/challenge", challengeMiddleware.LoginHandler)
loginMiddleware := middleware.LoginMiddleware(conf.Auth)
r.POST("/login", challengeMiddleware.MiddlewareFunc(), loginMiddleware.LoginHandler)
r.POST("/proxy_login", challengeMiddleware.MiddlewareFunc(), loginMiddleware.LoginHandler)
// need jwt token api
r.Use(loginMiddleware.MiddlewareFunc())
{
r.POST("/get_task", api.GetTask.GetTasks)
r.POST("/submit_proof", api.SubmitProof.SubmitProof)
}
}
🤖 Prompt for AI Agents
In coordinator/internal/route/route.go around lines 38 to 41, the /proxy_login
route is incorrectly registered inside the group protected by
loginMiddleware.MiddlewareFunc(), making it unreachable for clients that need to
obtain a token; move r.POST("/proxy_login", loginMiddleware.LoginHandler) out of
that authenticated group and register it alongside /login, wrapped with the same
challenge middleware used for /login so proxy clients can access the login
endpoint without an existing token.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
coordinator/internal/controller/proxy/prover_session.go (1)

289-292: Fix invalid fmt verb (duplicate issue).

Line 291 still uses the invalid fmt verb %V instead of %v, causing unusable error output. This issue was previously flagged but not yet addressed.

Apply this diff:

 	newToken, err := c.maintainLogin(ctx, cliMgr, up, loginParam, token.phase)
 	if err != nil {
-		return nil, fmt.Errorf("update prover token fail: %V", err)
+		return nil, fmt.Errorf("update prover token fail: %v", err)
 	}
coordinator/internal/controller/proxy/auth.go (1)

66-84: Verify the async fire-and-forget pattern for proxy login is intentional.

The Login method spawns goroutines to call ProxyLogin on all clients and returns immediately without waiting for completion. This creates a fire-and-forget pattern where:

  1. Line 66 uses context.Background() instead of c.Request.Context(), so proxy logins continue even if the original HTTP request is cancelled.
  2. Lines 73-77 log errors but don't propagate them to the caller.
  3. Line 86 returns success even if all proxy logins fail.

This appears intentional for async processing, but please confirm:

  • Is it acceptable for the login endpoint to return success while upstream logins might be failing?
  • Should the timeout context be derived from the request context for proper cancellation?

Additionally, note that the past review comment about gin.Context concurrency appears to be resolved—the goroutines no longer use c directly, only extracted data (loginParam) and a separate context (loginCtx).

🧹 Nitpick comments (1)
coordinator/internal/controller/proxy/client_manager.go (1)

76-121: Consider adding retry limits or exponential backoff.

The login retry loop runs indefinitely with a fixed wait duration. While this may be intentional for a persistent background process, consider adding either:

  • A maximum retry count with eventual failure propagation
  • Exponential backoff to reduce load during extended outages
  • Periodic health checks to detect prolonged failures

This would improve observability and allow callers to detect persistent upstream issues.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20fde41 and c22d9ec.

📒 Files selected for processing (5)
  • common/go.mod (1 hunks)
  • coordinator/internal/controller/proxy/auth.go (1 hunks)
  • coordinator/internal/controller/proxy/client_manager.go (1 hunks)
  • coordinator/internal/controller/proxy/prover_session.go (1 hunks)
  • coordinator/internal/logic/libzkp/message_types.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
coordinator/internal/controller/proxy/auth.go (9)
coordinator/internal/controller/api/auth.go (3)
  • AuthController (18-20)
  • NewAuthController (29-33)
  • NewAuthControllerWithLogic (22-26)
coordinator/internal/controller/proxy/controller.go (1)
  • Clients (21-21)
coordinator/internal/controller/proxy/prover_session.go (1)
  • ProverManager (18-24)
coordinator/internal/config/proxy_config.go (2)
  • ProxyConfig (48-52)
  • ProxyManager (13-19)
coordinator/internal/logic/verifier/types.go (1)
  • Verifier (11-17)
coordinator/internal/logic/auth/login.go (1)
  • NewLoginLogicWithSimpleDeduplicator (43-45)
coordinator/internal/types/auth.go (6)
  • LoginParameterWithHardForkName (52-55)
  • LoginParameter (58-62)
  • Message (42-49)
  • PublicKey (16-16)
  • ProverName (18-18)
  • ProverVersion (20-20)
coordinator/internal/types/prover.go (3)
  • ProverProviderType (59-59)
  • ProverProviderTypeProxy (82-82)
  • ProverType (20-20)
coordinator/internal/controller/proxy/client_manager.go (1)
  • Client (18-22)
coordinator/internal/logic/libzkp/message_types.go (1)
common/types/message/message.go (4)
  • ProofType (14-14)
  • ProofTypeChunk (33-33)
  • ProofTypeBatch (35-35)
  • ProofTypeBundle (37-37)
coordinator/internal/controller/proxy/client_manager.go (3)
coordinator/internal/config/proxy_config.go (2)
  • ProxyClient (32-36)
  • UpStream (39-45)
coordinator/internal/types/auth.go (5)
  • LoginParameter (58-62)
  • PublicKey (16-16)
  • Message (42-49)
  • ProverName (18-18)
  • ProverVersion (20-20)
coordinator/internal/types/prover.go (3)
  • ProverProviderType (59-59)
  • ProverProviderTypeProxy (82-82)
  • ProverType (20-20)
coordinator/internal/controller/proxy/prover_session.go (8)
coordinator/internal/controller/proxy/persistent.go (1)
  • NewProverDataPersist (17-19)
coordinator/internal/types/auth.go (5)
  • LoginSchema (28-31)
  • LoginParameter (58-62)
  • Message (42-49)
  • ProverName (18-18)
  • PublicKey (16-16)
coordinator/internal/controller/proxy/client_manager.go (1)
  • Client (18-22)
common/types/errno.go (1)
  • ErrJWTTokenExpired (12-12)
coordinator/internal/types/get_task.go (1)
  • GetTaskParameter (4-9)
common/types/response.go (1)
  • Response (11-15)
coordinator/internal/controller/proxy/auth.go (1)
  • LoginParamCache (28-28)
coordinator/internal/types/submit_proof.go (1)
  • SubmitProofParameter (4-13)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: check
  • GitHub Check: tests
  • GitHub Check: tests
  • GitHub Check: tests
  • GitHub Check: check
  • GitHub Check: tests
  • GitHub Check: check
  • GitHub Check: tests
🔇 Additional comments (10)
coordinator/internal/logic/libzkp/message_types.go (1)

10-13: LGTM!

The task type constants are clearly defined and align with the message.ProofType values.

common/go.mod (1)

15-15: Address low test coverage for the new decoding functionality.

The Codecov report flags common/types/response.go with 0.00% patch coverage and 11 missing lines. Since the DecodeData method that uses this new mapstructure dependency appears to lack test coverage, consider adding unit tests to verify the decoding behavior, especially edge cases around JSON tag mapping.

coordinator/internal/controller/proxy/client_manager.go (2)

136-184: Client initialization pattern looks correct.

The double-checked locking pattern and context management are appropriate. The use of context.Background() (line 164) for the login goroutine is intentional—it allows the login process to continue even if the initial caller cancels, which is correct for a persistent upstream connection.


186-210: The public key format is correct and follows the established pattern.

The codebase already uses common.Bytes2Hex without the "0x" prefix for public key generation in test files and mock code, exactly matching the code under review. The existing test suite (auth_test.go, mock_prover.go) validates this format works correctly with the LoginParameter.SignWithKey() and Verify() methods. The hex format without prefix is the intended standard for internal coordinator authentication, consistent throughout the codebase.

coordinator/internal/controller/proxy/prover_session.go (3)

110-200: Concurrency and retry logic looks correct.

The phase-based token versioning and completion context pattern properly handle concurrent login attempts. The JWT expiry retry (lines 164-179) aligns with the SDK behavior as noted in the comment.


222-258: GetTask error handling is correct.

The token refresh and error wrapping (line 253) properly uses %v, unlike the similar code in SubmitProof.


73-94: Concern partially addressed; consider explicit cleanup of deprecated data.

The willDeprecatedData mechanism safely promotes old sessions back to active storage when accessed via Get() (lines 60-62), and concurrent evictions are thread-safe due to proper locking. However, willDeprecatedData is only overwritten on the next cache rollover (line 88), meaning if the size limit isn't reached again, deprecated data persists indefinitely. To prevent potential memory accumulation in low-traffic scenarios, consider explicitly clearing willDeprecatedData when it's no longer needed, such as after a timeout or when promoting all remaining entries back to m.data.

coordinator/internal/controller/proxy/auth.go (3)

1-30: LGTM!

Imports and constants are well-organized and appropriate for the proxy authentication functionality.


59-61: LGTM on recursive proxy login guard.

This check prevents infinite recursion when a proxy tries to log in to another proxy.


35-39: No issues found—the dummy Verifier is correctly scoped and safe.

The login logic only accesses OpenVMVkMap, which is initialized in the dummy verifier. The uninitialized maps (ChunkVk, BatchVk, BundleVk) are never accessed in the login flow. The dummy verifier is locally scoped to NewAuthController and isolated from proof_receiver, which uses a separate verifier instance. No runtime panic risk exists.

github.com/gin-gonic/gin v1.9.1
github.com/mattn/go-colorable v0.1.13
github.com/mattn/go-isatty v0.0.20
github.com/mitchellh/mapstructure v1.5.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify mapstructure v1.5.0 is current and has no security advisories.

The dependency is being added as a direct dependency (moved from indirect) to support the new DecodeData method in common/types.Response. However, since my knowledge is current only through March 2025 and this PR is from July 2025, I cannot verify whether v1.5.0 is the latest/recommended version or if security advisories exist.


🌐 Web query:

What is the latest version of mapstructure package and are there any security advisories for v1.5.0?

💡 Result:

Latest release: go-viper/mapstructure v2.4.0 (release 15 Jul 2025). [1]

Security advisories: yes — multiple advisories (e.g., CVE-2025-52893 / GHSA-fv92-fjc5-jj9h) list mapstructure v1.5.0 as affected (sensitive data insertion into logs); upgrade to a fixed version (v2.3.0+ or v2.4.0) is recommended. [2][3]

Sources:


Upgrade mapstructure to v2.4.0 to resolve security vulnerability CVE-2025-52893.

Version 1.5.0 is affected by security advisory CVE-2025-52893 / GHSA-fv92-fjc5-jj9h (sensitive data insertion into logs); upgrade to v2.3.0+ or v2.4.0 is required. The latest version is v2.4.0 (released July 15, 2025).

🤖 Prompt for AI Agents
In common/go.mod at line 15 update the mapstructure dependency to the v2 module
and version and fix imports: replace the existing
github.com/mitchellh/mapstructure v1.5.0 entry by requiring
github.com/mitchellh/mapstructure/v2 v2.4.0, then run `go get
github.com/mitchellh/mapstructure/[email protected]` and `go mod tidy` to update go.sum;
also update any source files that import "github.com/mitchellh/mapstructure" to
import "github.com/mitchellh/mapstructure/v2" and run the project's
unit/integration tests to verify nothing broke.

Comment on lines +89 to +104
// PayloadFunc returns jwt.MapClaims with {public key, prover name}.
func (a *AuthController) PayloadFunc(data interface{}) jwt.MapClaims {
v, ok := data.(types.LoginParameter)
if !ok {
return jwt.MapClaims{}
}

return jwt.MapClaims{
types.PublicKey: v.PublicKey,
types.ProverName: v.Message.ProverName,
types.ProverVersion: v.Message.ProverVersion,
types.ProverProviderTypeKey: v.Message.ProverProviderType,
SignatureKey: v.Signature,
ProverTypesKey: v.Message.ProverTypes,
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Silent failure on type assertion may cause invalid JWT tokens.

Lines 91-94 return empty jwt.MapClaims if the type assertion fails. This will produce a JWT without required claims (public_key, prover_name, etc.), causing downstream authentication failures that may be difficult to debug.

Consider whether:

  1. This should panic (since it indicates a programming error).
  2. An error should be logged before returning empty claims.
  3. The function signature should return an error.
 func (a *AuthController) PayloadFunc(data interface{}) jwt.MapClaims {
 	v, ok := data.(types.LoginParameter)
 	if !ok {
+		log.Error("PayloadFunc received unexpected type", "type", fmt.Sprintf("%T", data))
 		return jwt.MapClaims{}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// PayloadFunc returns jwt.MapClaims with {public key, prover name}.
func (a *AuthController) PayloadFunc(data interface{}) jwt.MapClaims {
v, ok := data.(types.LoginParameter)
if !ok {
return jwt.MapClaims{}
}
return jwt.MapClaims{
types.PublicKey: v.PublicKey,
types.ProverName: v.Message.ProverName,
types.ProverVersion: v.Message.ProverVersion,
types.ProverProviderTypeKey: v.Message.ProverProviderType,
SignatureKey: v.Signature,
ProverTypesKey: v.Message.ProverTypes,
}
}
// PayloadFunc returns jwt.MapClaims with {public key, prover name}.
func (a *AuthController) PayloadFunc(data interface{}) jwt.MapClaims {
v, ok := data.(types.LoginParameter)
if !ok {
log.Error("PayloadFunc received unexpected type", "type", fmt.Sprintf("%T", data))
return jwt.MapClaims{}
}
return jwt.MapClaims{
types.PublicKey: v.PublicKey,
types.ProverName: v.Message.ProverName,
types.ProverVersion: v.Message.ProverVersion,
types.ProverProviderTypeKey: v.Message.ProverProviderType,
SignatureKey: v.Signature,
ProverTypesKey: v.Message.ProverTypes,
}
}

Comment on lines +107 to +149
func (a *AuthController) IdentityHandler(c *gin.Context) interface{} {
claims := jwt.ExtractClaims(c)
loginParam := &types.LoginParameter{}

if proverName, ok := claims[types.ProverName]; ok {
loginParam.Message.ProverName, _ = proverName.(string)
}

if proverVersion, ok := claims[types.ProverVersion]; ok {
loginParam.Message.ProverVersion, _ = proverVersion.(string)
}

if providerType, ok := claims[types.ProverProviderTypeKey]; ok {
num, _ := providerType.(float64)
loginParam.Message.ProverProviderType = types.ProverProviderType(num)
}

if signature, ok := claims[SignatureKey]; ok {
loginParam.Signature, _ = signature.(string)
}

if proverTypes, ok := claims[ProverTypesKey]; ok {
arr, _ := proverTypes.([]any)
for _, elm := range arr {
num, _ := elm.(float64)
loginParam.Message.ProverTypes = append(loginParam.Message.ProverTypes, types.ProverType(num))
}
}

if publicKey, ok := claims[types.PublicKey]; ok {
loginParam.PublicKey, _ = publicKey.(string)
}

if loginParam.PublicKey != "" {

c.Set(LoginParamCache, loginParam)
c.Set(types.ProverName, loginParam.Message.ProverName)
// publickey will also be set since we have specified public_key as identical key
return loginParam.PublicKey
}

return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Silent type assertion failures may create partially initialized LoginParameter.

The IdentityHandler extracts claims from JWT but silently ignores type assertion failures (lines 111-138). This can result in a partially populated LoginParameter being stored in the context, potentially causing subtle bugs downstream.

Consider:

  1. Logging warnings when type assertions fail.
  2. Validating that critical fields (PublicKey, ProverName) are present before proceeding.
  3. Validating numeric ranges for uint8 conversions (lines 121, 132) to prevent overflow when float64 values exceed 255.

Example improvement for critical fields:

 	if publicKey, ok := claims[types.PublicKey]; ok {
 		loginParam.PublicKey, _ = publicKey.(string)
 	}
+	
+	// Validate critical fields before storing in context
+	if loginParam.PublicKey == "" {
+		log.Warn("IdentityHandler: missing or invalid public_key in JWT claims")
+		return nil
+	}

For numeric conversions:

 	if providerType, ok := claims[types.ProverProviderTypeKey]; ok {
 		num, _ := providerType.(float64)
+		if num < 0 || num > 255 {
+			log.Warn("IdentityHandler: prover_provider_type out of range", "value", num)
+		}
 		loginParam.Message.ProverProviderType = types.ProverProviderType(num)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (a *AuthController) IdentityHandler(c *gin.Context) interface{} {
claims := jwt.ExtractClaims(c)
loginParam := &types.LoginParameter{}
if proverName, ok := claims[types.ProverName]; ok {
loginParam.Message.ProverName, _ = proverName.(string)
}
if proverVersion, ok := claims[types.ProverVersion]; ok {
loginParam.Message.ProverVersion, _ = proverVersion.(string)
}
if providerType, ok := claims[types.ProverProviderTypeKey]; ok {
num, _ := providerType.(float64)
loginParam.Message.ProverProviderType = types.ProverProviderType(num)
}
if signature, ok := claims[SignatureKey]; ok {
loginParam.Signature, _ = signature.(string)
}
if proverTypes, ok := claims[ProverTypesKey]; ok {
arr, _ := proverTypes.([]any)
for _, elm := range arr {
num, _ := elm.(float64)
loginParam.Message.ProverTypes = append(loginParam.Message.ProverTypes, types.ProverType(num))
}
}
if publicKey, ok := claims[types.PublicKey]; ok {
loginParam.PublicKey, _ = publicKey.(string)
}
if loginParam.PublicKey != "" {
c.Set(LoginParamCache, loginParam)
c.Set(types.ProverName, loginParam.Message.ProverName)
// publickey will also be set since we have specified public_key as identical key
return loginParam.PublicKey
}
return nil
}
func (a *AuthController) IdentityHandler(c *gin.Context) interface{} {
claims := jwt.ExtractClaims(c)
loginParam := &types.LoginParameter{}
if proverName, ok := claims[types.ProverName]; ok {
loginParam.Message.ProverName, _ = proverName.(string)
}
if proverVersion, ok := claims[types.ProverVersion]; ok {
loginParam.Message.ProverVersion, _ = proverVersion.(string)
}
if providerType, ok := claims[types.ProverProviderTypeKey]; ok {
num, _ := providerType.(float64)
if num < 0 || num > 255 {
log.Warn("IdentityHandler: prover_provider_type out of range", "value", num)
}
loginParam.Message.ProverProviderType = types.ProverProviderType(num)
}
if signature, ok := claims[SignatureKey]; ok {
loginParam.Signature, _ = signature.(string)
}
if proverTypes, ok := claims[ProverTypesKey]; ok {
arr, _ := proverTypes.([]any)
for _, elm := range arr {
num, _ := elm.(float64)
loginParam.Message.ProverTypes = append(loginParam.Message.ProverTypes, types.ProverType(num))
}
}
if publicKey, ok := claims[types.PublicKey]; ok {
loginParam.PublicKey, _ = publicKey.(string)
}
// Validate critical fields before storing in context
if loginParam.PublicKey == "" {
log.Warn("IdentityHandler: missing or invalid public_key in JWT claims")
return nil
}
if loginParam.PublicKey != "" {
c.Set(LoginParamCache, loginParam)
c.Set(types.ProverName, loginParam.Message.ProverName)
// publickey will also be set since we have specified public_key as identical key
return loginParam.PublicKey
}
return nil
}
🤖 Prompt for AI Agents
In coordinator/internal/controller/proxy/auth.go around lines 107-149, the
IdentityHandler silently ignores type assertion failures causing a
partially-initialized LoginParameter to be stored; update the handler to check
each type assertion result and log a warning when an assertion fails, validate
that required fields (at minimum PublicKey and Message.ProverName) are non-empty
before calling c.Set and returning, and for numeric claims (provider type and
prover types) verify the float64 is within the valid uint8/enum range (0..255 or
defined enum bounds) before casting — if any critical validation fails, do not
set the context or return the public key (return nil) to avoid downstream
surprises. Ensure logging includes which claim failed and the raw value to aid
debugging.

Comment on lines +38 to +54
func buildPrivateKey(inputBytes []byte) (*ecdsa.PrivateKey, error) {
// Try appending bytes from 0x0 to 0x20 until we get a valid private key
for appendByte := byte(0x0); appendByte <= 0x20; appendByte++ {
// Append the byte to input
extendedBytes := append(inputBytes, appendByte)

// Calculate 256-bit hash
hash := crypto.Keccak256(extendedBytes)

// Try to create private key from hash
if k, err := crypto.ToECDSA(hash); err == nil {
return k, nil
}
}

return nil, fmt.Errorf("failed to generate valid private key from input bytes")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Non-standard private key derivation approach.

The iterative append-and-hash approach to generate a valid ECDSA key is unusual and non-standard. While it ensures deterministic key generation from the secret, this pattern is not a recognized cryptographic key derivation function (KDF). Standard approaches like HKDF or PBKDF2 are more appropriate and well-vetted for deriving keys from secrets.

Consider using a standard KDF:

-func buildPrivateKey(inputBytes []byte) (*ecdsa.PrivateKey, error) {
-	// Try appending bytes from 0x0 to 0x20 until we get a valid private key
-	for appendByte := byte(0x0); appendByte <= 0x20; appendByte++ {
-		// Append the byte to input
-		extendedBytes := append(inputBytes, appendByte)
-
-		// Calculate 256-bit hash
-		hash := crypto.Keccak256(extendedBytes)
-
-		// Try to create private key from hash
-		if k, err := crypto.ToECDSA(hash); err == nil {
-			return k, nil
-		}
-	}
-
-	return nil, fmt.Errorf("failed to generate valid private key from input bytes")
+func buildPrivateKey(inputBytes []byte) (*ecdsa.PrivateKey, error) {
+	// Use a single hash of the input as the key material
+	// This is deterministic and simpler than iterating
+	hash := crypto.Keccak256(inputBytes)
+	return crypto.ToECDSA(hash)
}

Note: If the secret is already a valid 32-byte key material, the single-hash approach should work. If crypto.ToECDSA still fails, the secret itself may need to be regenerated rather than working around invalid key material.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func buildPrivateKey(inputBytes []byte) (*ecdsa.PrivateKey, error) {
// Try appending bytes from 0x0 to 0x20 until we get a valid private key
for appendByte := byte(0x0); appendByte <= 0x20; appendByte++ {
// Append the byte to input
extendedBytes := append(inputBytes, appendByte)
// Calculate 256-bit hash
hash := crypto.Keccak256(extendedBytes)
// Try to create private key from hash
if k, err := crypto.ToECDSA(hash); err == nil {
return k, nil
}
}
return nil, fmt.Errorf("failed to generate valid private key from input bytes")
}
func buildPrivateKey(inputBytes []byte) (*ecdsa.PrivateKey, error) {
// Use a single hash of the input as the key material
// This is deterministic and simpler than iterating
hash := crypto.Keccak256(inputBytes)
return crypto.ToECDSA(hash)
}

Comment on lines +44 to +71
func (m *ProverManager) Get(userKey string) (ret *proverSession) {
defer func() {
if ret == nil {
var err error
ret, err = m.persistent.Get(userKey)
if err != nil {
log.Error("Get persistent layer for prover tokens fail", "error", err)
} else if ret != nil {
fmt.Println("restore record from persistent", userKey, ret.proverToken)
ret.persistent = m.persistent
}
}

if ret != nil {
m.Lock()
m.data[userKey] = ret
m.Unlock()
}
}()

m.RLock()
defer m.RUnlock()
if r, existed := m.data[userKey]; existed {
return r
} else {
return m.willDeprecatedData[userKey]
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove debug print statement.

Line 52 contains a fmt.Println debug statement that should be replaced with proper logging or removed entirely.

Apply this diff:

 			} else if ret != nil {
-				fmt.Println("restore record from persistent", userKey, ret.proverToken)
+				log.Info("restored prover session from persistent storage", "userKey", userKey)
 				ret.persistent = m.persistent
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (m *ProverManager) Get(userKey string) (ret *proverSession) {
defer func() {
if ret == nil {
var err error
ret, err = m.persistent.Get(userKey)
if err != nil {
log.Error("Get persistent layer for prover tokens fail", "error", err)
} else if ret != nil {
fmt.Println("restore record from persistent", userKey, ret.proverToken)
ret.persistent = m.persistent
}
}
if ret != nil {
m.Lock()
m.data[userKey] = ret
m.Unlock()
}
}()
m.RLock()
defer m.RUnlock()
if r, existed := m.data[userKey]; existed {
return r
} else {
return m.willDeprecatedData[userKey]
}
}
func (m *ProverManager) Get(userKey string) (ret *proverSession) {
defer func() {
if ret == nil {
var err error
ret, err = m.persistent.Get(userKey)
if err != nil {
log.Error("Get persistent layer for prover tokens fail", "error", err)
} else if ret != nil {
log.Info("restored prover session from persistent storage", "userKey", userKey)
ret.persistent = m.persistent
}
}
if ret != nil {
m.Lock()
m.data[userKey] = ret
m.Unlock()
}
}()
m.RLock()
defer m.RUnlock()
if r, existed := m.data[userKey]; existed {
return r
} else {
return m.willDeprecatedData[userKey]
}
}
🤖 Prompt for AI Agents
In coordinator/internal/controller/proxy/prover_session.go around lines 44 to
71, remove the debug fmt.Println on line 52 and replace it with a proper logger
call (e.g. log.Info or log.Debug) that includes clear context (userKey and
restored session) or omit sensitive fields (mask the proverToken if it’s
secret); ensure the logger uses the existing log package and keep the message
concise and consistent with other logs.

Comment on lines +16 to +27
func fromMessageTaskType(taskType int) int {
switch message.ProofType(taskType) {
case message.ProofTypeChunk:
return TaskTypeChunk
case message.ProofTypeBatch:
return TaskTypeBatch
case message.ProofTypeBundle:
return TaskTypeBundle
default:
panic(fmt.Sprintf("unsupported proof type: %d", taskType))
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Replace panic with error return and validate input range.

The function has two critical issues:

  1. Panic on unsupported type: The panic on line 25 will crash the entire coordinator/proxy application if an unexpected proof type is received. In a production proxy/coordinator architecture, this could cause cascading failures.

  2. Unsafe type conversion: Casting int to uint8 (message.ProofType) without validation is unsafe. Negative values or values >255 will be truncated/wrapped, potentially causing the switch to match incorrect cases or trigger the panic with a misleading error message.

Apply this diff to return an error instead of panicking and validate the input range:

-func fromMessageTaskType(taskType int) int {
+func fromMessageTaskType(taskType int) (int, error) {
+	if taskType < 0 || taskType > 255 {
+		return 0, fmt.Errorf("task type out of valid range: %d", taskType)
+	}
 	switch message.ProofType(taskType) {
 	case message.ProofTypeChunk:
-		return TaskTypeChunk
+		return TaskTypeChunk, nil
 	case message.ProofTypeBatch:
-		return TaskTypeBatch
+		return TaskTypeBatch, nil
 	case message.ProofTypeBundle:
-		return TaskTypeBundle
+		return TaskTypeBundle, nil
 	default:
-		panic(fmt.Sprintf("unsupported proof type: %d", taskType))
+		return 0, fmt.Errorf("unsupported proof type: %d", taskType)
 	}
 }

Note: This change will require updating all callers of fromMessageTaskType to handle the returned error.

🤖 Prompt for AI Agents
In coordinator/internal/logic/libzkp/message_types.go around lines 16 to 27,
replace the panic-based conversion with a safe, validated error-returning
implementation: change the function signature to return (int, error), first
validate that the incoming int is within the uint8 range (0..255) and return a
descriptive error if out of range, cast to message.ProofType only after
validation, map the supported cases to TaskTypeChunk/Batch/Bundle and return the
mapped int with nil error, and for any unsupported proof type return a clear
error instead of panicking; update all callers to handle the (int, error) return
and propagate or handle errors appropriately.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
coordinator/cmd/proxy/app/app.go (1)

76-83: Also handle SIGTERM for graceful shutdown.

SIGTERM is the default shutdown signal from systemd/Kubernetes/Docker. Right now only os.Interrupt is caught, so SIGTERM will bypass the shutdown path and drop in-flight requests—this was already flagged earlier. Please import syscall and register syscall.SIGTERM with signal.Notify.

+	"syscall"
@@
-	signal.Notify(interrupt, os.Interrupt)
+	signal.Notify(interrupt, os.Interrupt, syscall.SIGTERM)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c22d9ec and e9470ff.

📒 Files selected for processing (5)
  • .github/workflows/docker.yml (1 hunks)
  • build/dockerfiles/coordinator-proxy.Dockerfile (1 hunks)
  • build/dockerfiles/coordinator-proxy.Dockerfile.dockerignore (1 hunks)
  • coordinator/cmd/proxy/app/app.go (1 hunks)
  • coordinator/conf/config_proxy.json (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • coordinator/conf/config_proxy.json
🧰 Additional context used
🧬 Code graph analysis (1)
coordinator/cmd/proxy/app/app.go (8)
common/utils/flags.go (2)
  • CommonFlags (9-20)
  • ConfigFileFlag (30-34)
common/utils/logger.go (1)
  • LogSetup (15-42)
common/utils/simulation.go (2)
  • RegisterSimulation (30-40)
  • CoordinatorAPIApp (24-24)
coordinator/internal/config/proxy_config.go (3)
  • NewProxyConfig (55-74)
  • ProxyManager (13-19)
  • ProxyConfig (48-52)
common/database/db.go (2)
  • InitDB (48-80)
  • CloseDB (83-92)
common/observability/server.go (1)
  • Server (24-53)
coordinator/internal/controller/proxy/controller.go (1)
  • InitController (24-44)
coordinator/internal/route/route.go (1)
  • ProxyRoute (45-53)
🪛 actionlint (1.7.8)
.github/workflows/docker.yml

370-370: the runner of "docker/setup-qemu-action@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


372-372: the runner of "docker/setup-buildx-action@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


374-374: the runner of "docker/login-action@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


393-393: the runner of "docker/build-push-action@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: tests
  • GitHub Check: tests
  • GitHub Check: check
  • GitHub Check: tests
  • GitHub Check: tests
  • GitHub Check: check
  • GitHub Check: check
  • GitHub Check: tests

Comment on lines +369 to +404
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: check repo and create it if not exist
env:
REPOSITORY: coordinator-proxy
run: |
aws --region ${{ env.AWS_REGION }} ecr describe-repositories --repository-names ${{ env.REPOSITORY }} && : || aws --region ${{ env.AWS_REGION }} ecr create-repository --repository-name ${{ env.REPOSITORY }}
- name: Build and push
uses: docker/build-push-action@v3
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
REPOSITORY: coordinator-proxy
IMAGE_TAG: ${{ github.ref_name }}
with:
context: .
file: ./build/dockerfiles/coordinator-proxy.Dockerfile
push: true
tags: |
scrolltech/${{ env.REPOSITORY }}:${{ env.IMAGE_TAG }}
${{ env.ECR_REGISTRY }}/${{ env.REPOSITORY }}:${{ env.IMAGE_TAG }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Upgrade Docker GitHub actions to Node20-compatible releases.

docker/setup-qemu-action@v2, docker/setup-buildx-action@v2, docker/login-action@v2, and docker/build-push-action@v3 all run on the removed Node16 runtime, so this job will fail on GitHub-hosted runners after Node16 was retired on November 12, 2024. Bump them to the Node20-ready releases (@v3 for the first three and @v5 for build-push) before merging. (github.blog)

-      - name: Set up QEMU
-        uses: docker/setup-qemu-action@v2
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@v3
@@
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v2
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
@@
-      - name: Login to Docker Hub
-        uses: docker/login-action@v2
+      - name: Login to Docker Hub
+        uses: docker/login-action@v3
@@
-      - name: Build and push
-        uses: docker/build-push-action@v3
+      - name: Build and push
+        uses: docker/build-push-action@v5
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: check repo and create it if not exist
env:
REPOSITORY: coordinator-proxy
run: |
aws --region ${{ env.AWS_REGION }} ecr describe-repositories --repository-names ${{ env.REPOSITORY }} && : || aws --region ${{ env.AWS_REGION }} ecr create-repository --repository-name ${{ env.REPOSITORY }}
- name: Build and push
uses: docker/build-push-action@v3
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
REPOSITORY: coordinator-proxy
IMAGE_TAG: ${{ github.ref_name }}
with:
context: .
file: ./build/dockerfiles/coordinator-proxy.Dockerfile
push: true
tags: |
scrolltech/${{ env.REPOSITORY }}:${{ env.IMAGE_TAG }}
${{ env.ECR_REGISTRY }}/${{ env.REPOSITORY }}:${{ env.IMAGE_TAG }}
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
- name: check repo and create it if not exist
env:
REPOSITORY: coordinator-proxy
run: |
aws --region ${{ env.AWS_REGION }} ecr describe-repositories --repository-names ${{ env.REPOSITORY }} && : || aws --region ${{ env.AWS_REGION }} ecr create-repository --repository-name ${{ env.REPOSITORY }}
- name: Build and push
uses: docker/build-push-action@v5
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
REPOSITORY: coordinator-proxy
IMAGE_TAG: ${{ github.ref_name }}
with:
context: .
file: ./build/dockerfiles/coordinator-proxy.Dockerfile
push: true
tags: |
scrolltech/${{ env.REPOSITORY }}:${{ env.IMAGE_TAG }}
${{ env.ECR_REGISTRY }}/${{ env.REPOSITORY }}:${{ env.IMAGE_TAG }}
🧰 Tools
🪛 actionlint (1.7.8)

370-370: the runner of "docker/setup-qemu-action@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


372-372: the runner of "docker/setup-buildx-action@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


374-374: the runner of "docker/login-action@v2" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


393-393: the runner of "docker/build-push-action@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
.github/workflows/docker.yml around lines 369 to 404: the workflow uses Docker
GitHub Actions pinned to Node16-era releases which will fail on GitHub-hosted
runners; update the action versions to Node20-compatible releases by changing
docker/setup-qemu-action@v2 -> @v3, docker/setup-buildx-action@v2 -> @v3,
docker/login-action@v2 -> @v3, and docker/build-push-action@v3 -> @v5 so the job
runs on the supported Node20 runtime.


var db *gorm.DB
if dbCfg := cfg.ProxyManager.DB; dbCfg != nil {
log.Info("Apply persistent storage", "via", cfg.ProxyManager.DB.DSN)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stop logging the raw DB DSN.

Including cfg.ProxyManager.DB.DSN in INFO logs will leak database credentials or connection tokens into stdout/log aggregation; that’s a compliance/security violation. Please drop the DSN from the log entry (or explicitly redact it) before merging.

-        log.Info("Apply persistent storage", "via", cfg.ProxyManager.DB.DSN)
+        log.Info("Apply persistent storage")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
log.Info("Apply persistent storage", "via", cfg.ProxyManager.DB.DSN)
log.Info("Apply persistent storage")
🤖 Prompt for AI Agents
In coordinator/cmd/proxy/app/app.go around line 55, the log entry currently
prints cfg.ProxyManager.DB.DSN which leaks credentials; remove the raw DSN from
the log (or replace it with a redacted value such as "<redacted>" or just omit
the field) and instead log a non-sensitive message like "Apply persistent
storage" or include only a safe indicator (e.g., db configured=true). Ensure no
plaintext DSN is written to stdout/log aggregation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
coordinator/internal/controller/proxy/client_manager.go (1)

40-56: Non-standard private key derivation approach.

This iterative append-and-hash approach was already flagged in previous reviews. Standard KDFs like HKDF or PBKDF2 should be used instead.

🧹 Nitpick comments (1)
coordinator/internal/controller/proxy/client_manager.go (1)

138-186: Consider simplifying the double-checked locking pattern.

The current implementation correctly handles concurrent access, but the combination of double-checked locking, completion context tracking, and goroutine coordination adds significant complexity. Consider whether sync.Once or a simpler state machine might reduce cognitive load.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9470ff and cf9e368.

📒 Files selected for processing (1)
  • coordinator/internal/controller/proxy/client_manager.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
coordinator/internal/controller/proxy/client_manager.go (4)
coordinator/internal/config/proxy_config.go (2)
  • ProxyClient (32-36)
  • UpStream (39-45)
coordinator/internal/types/auth.go (5)
  • LoginParameter (58-62)
  • PublicKey (16-16)
  • Message (42-49)
  • ProverName (18-18)
  • ProverVersion (20-20)
common/version/version.go (1)
  • Version (31-31)
coordinator/internal/types/prover.go (3)
  • ProverProviderType (59-59)
  • ProverProviderTypeProxy (82-82)
  • ProverType (20-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: check
  • GitHub Check: tests
  • GitHub Check: tests
  • GitHub Check: tests
  • GitHub Check: tests
  • GitHub Check: check
  • GitHub Check: tests
  • GitHub Check: check
🔇 Additional comments (4)
coordinator/internal/controller/proxy/client_manager.go (4)

58-72: LGTM on constructor logic.

The initialization and configuration handling is correct. The security concern with key derivation is covered separately.


125-132: LGTM!

The Reset implementation correctly uses locking and pointer comparison to safely clear the cached client.


134-136: LGTM!

Simple getter method is correctly implemented.


188-212: LGTM!

The login parameter generation is correctly implemented with appropriate metadata, public key compression, and signature generation.

Comment on lines +92 to +122
for {
log.Info("proxy attempting login to upstream coordinator", "name", cliMgr.name)
loginResp, err := loginCli.Login(ctx, cliMgr.genLoginParam)
if err == nil && loginResp.ErrCode == 0 {
var loginResult loginSchema
err = loginResp.DecodeData(&loginResult)
if err != nil {
log.Error("login parsing data fail", "error", err)
} else {
loginCli.loginToken = loginResult.Token
log.Info("login to upstream coordinator successful", "name", cliMgr.name, "time", loginResult.Time)
// TODO: we need to parse time if we start making use of it
return
}
} else if err != nil {
log.Error("login process fail", "error", err)
} else {
log.Error("login get fail resp", "code", loginResp.ErrCode, "msg", loginResp.ErrMsg)
}

log.Info("login to upstream coordinator failed, retrying", "name", cliMgr.name, "error", err, "waitDuration", waitDuration)

timer := time.NewTimer(waitDuration)
select {
case <-ctx.Done():
timer.Stop()
return
case <-timer.C:
// Continue to next retry
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Retry loop ignores configured RetryCount.

The method retries indefinitely without checking cfg.RetryCount, which exists in the UpStream config. This could lead to excessive retry attempts when the upstream is persistently unavailable.

Consider adding retry count enforcement:

 func (cliMgr *ClientManager) doLogin(ctx context.Context, loginCli *upClient) {
 	if cliMgr.cfg.CompatibileMode {
 		loginCli.loginToken = "dummy"
 		log.Info("Skip login process for compatible mode")
 		return
 	}
 
 	// Calculate wait time between 2 seconds and cfg.RetryWaitTime
 	minWait := 2 * time.Second
 	waitDuration := time.Duration(cliMgr.cfg.RetryWaitTime) * time.Second
 	if waitDuration < minWait {
 		waitDuration = minWait
 	}
 
+	retryCount := uint(0)
 	for {
+		if cliMgr.cfg.RetryCount > 0 && retryCount >= cliMgr.cfg.RetryCount {
+			log.Error("login retry limit exceeded", "name", cliMgr.name, "retries", retryCount)
+			return
+		}
+		retryCount++
+		
 		log.Info("proxy attempting login to upstream coordinator", "name", cliMgr.name)
 		loginResp, err := loginCli.Login(ctx, cliMgr.genLoginParam)
 		if err == nil && loginResp.ErrCode == 0 {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for {
log.Info("proxy attempting login to upstream coordinator", "name", cliMgr.name)
loginResp, err := loginCli.Login(ctx, cliMgr.genLoginParam)
if err == nil && loginResp.ErrCode == 0 {
var loginResult loginSchema
err = loginResp.DecodeData(&loginResult)
if err != nil {
log.Error("login parsing data fail", "error", err)
} else {
loginCli.loginToken = loginResult.Token
log.Info("login to upstream coordinator successful", "name", cliMgr.name, "time", loginResult.Time)
// TODO: we need to parse time if we start making use of it
return
}
} else if err != nil {
log.Error("login process fail", "error", err)
} else {
log.Error("login get fail resp", "code", loginResp.ErrCode, "msg", loginResp.ErrMsg)
}
log.Info("login to upstream coordinator failed, retrying", "name", cliMgr.name, "error", err, "waitDuration", waitDuration)
timer := time.NewTimer(waitDuration)
select {
case <-ctx.Done():
timer.Stop()
return
case <-timer.C:
// Continue to next retry
}
}
retryCount := uint(0)
for {
if cliMgr.cfg.RetryCount > 0 && retryCount >= cliMgr.cfg.RetryCount {
log.Error("login retry limit exceeded", "name", cliMgr.name, "retries", retryCount)
return
}
retryCount++
log.Info("proxy attempting login to upstream coordinator", "name", cliMgr.name)
loginResp, err := loginCli.Login(ctx, cliMgr.genLoginParam)
if err == nil && loginResp.ErrCode == 0 {
var loginResult loginSchema
err = loginResp.DecodeData(&loginResult)
if err != nil {
log.Error("login parsing data fail", "error", err)
} else {
loginCli.loginToken = loginResult.Token
log.Info("login to upstream coordinator successful", "name", cliMgr.name, "time", loginResult.Time)
// TODO: we need to parse time if we start making use of it
return
}
} else if err != nil {
log.Error("login process fail", "error", err)
} else {
log.Error("login get fail resp", "code", loginResp.ErrCode, "msg", loginResp.ErrMsg)
}
log.Info("login to upstream coordinator failed, retrying", "name", cliMgr.name, "error", err, "waitDuration", waitDuration)
timer := time.NewTimer(waitDuration)
select {
case <-ctx.Done():
timer.Stop()
return
case <-timer.C:
// Continue to next retry
}
}
🤖 Prompt for AI Agents
In coordinator/internal/controller/proxy/client_manager.go around lines 92 to
122, the login retry loop currently retries forever and ignores the UpStream
cfg.RetryCount; fix this by adding a retry counter (or loop over attempts) that
increments on each failed attempt and checks against cfg.RetryCount before
sleeping/retrying, log a clear error when retry limit is reached and stop
retrying (return or cancel), also ensure ctx.Done() is still respected and any
timer is stopped when exiting so we don't leak timers.

completionCtx = cliMgr.cachedCli.completionCtx
} else {
// Set new completion context and launch login goroutine
ctx, completionDone := context.WithCancel(context.TODO())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace context.TODO() with context.Background().

context.TODO() signals that the appropriate context is unclear or not yet determined. Since this is for a background login process, context.Background() is more appropriate.

Apply this diff:

-		ctx, completionDone := context.WithCancel(context.TODO())
+		ctx, completionDone := context.WithCancel(context.Background())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ctx, completionDone := context.WithCancel(context.TODO())
ctx, completionDone := context.WithCancel(context.Background())
🤖 Prompt for AI Agents
In coordinator/internal/controller/proxy/client_manager.go around line 158,
replace context.TODO() with context.Background() so the background login process
uses the proper background context; update the call creating ctx, completionDone
to use context.Background() instead of context.TODO().

case <-ctx.Done():
return nil
case <-completionCtx.Done():
cli := completionCtx.Value(loginCliKey).(*upClient)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unchecked type assertion could panic.

If the context value is missing or has the wrong type, this will panic. While it should always be *upClient in normal flow, defensive programming suggests checking.

Apply this diff:

-		cli := completionCtx.Value(loginCliKey).(*upClient)
+		cli, ok := completionCtx.Value(loginCliKey).(*upClient)
+		if !ok {
+			log.Error("unexpected context value type", "name", cliMgr.name)
+			return nil
+		}
 		return cli
🤖 Prompt for AI Agents
In coordinator/internal/controller/proxy/client_manager.go around line 183, the
code does an unchecked type assertion cli :=
completionCtx.Value(loginCliKey).(*upClient) which can panic if the context
value is missing or of the wrong type; change it to retrieve the value, perform
a comma-ok type assertion (v, ok :=
completionCtx.Value(loginCliKey).(*upClient)), check ok and v != nil, and handle
the error path by returning an appropriate error (or logging and aborting the
operation) instead of allowing a panic so callers get a clear failure instead of
a runtime crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants