Skip to content

Refactor existing logs and add logs#840

Open
hellolittlej wants to merge 2 commits intomasterfrom
refactor-logs
Open

Refactor existing logs and add logs#840
hellolittlej wants to merge 2 commits intomasterfrom
refactor-logs

Conversation

@hellolittlej
Copy link
Copy Markdown
Collaborator

@hellolittlej hellolittlej commented Mar 29, 2026

Context

Not all scheduling constraints had enough workers available to fulfill the request
ResourceClusterActor.TaskExecutorBatchAssignmentRequest(allocationRequests=[TaskExecutorAllocationRequest(workerId=kafka-cluster-monitor-21-worker-18-168,
constraints=SchedulingConstraints(machineDefinition=MachineDefinition{cpuCores=2.0, memoryMB=14336.0, networkMbps=700.0, diskMB=65536.0, numPorts=1}, sizeName=Optional.empty,
 schedulingAttributes={jdk=17plus, jenkins_job=unknown, repo_name=corp/kafka-mantis-kafka-monitor}), jobMetadata=io.mantisrx.server.core.domain.JobMetadata@3240fd30,
stageNum=1, readyAt=-1, durationType=Perpetual)], clusterID=ClusterID(resourceID=mantisrc.kaasall),
reservation=Reservation(key=MantisResourceClusterReservationProto.ReservationKey(jobId=kafka-cluster-monitor-21, stageNumber=1),
schedulingConstraints=SchedulingConstraints(machineDefinition=MachineDefinition{cpuCores=2.0, memoryMB=14336.0, networkMbps=700.0, diskMB=65536.0, numPorts=1},
sizeName=Optional.empty, schedulingAttributes={jdk=17plus, jenkins_job=unknown, repo_name=corp/kafka-mantis-kafka-monitor}),
canonicalConstraintKey=md:2.0/14336.0/65536.0/700.0/1;size=~;attr=jdk=17plus,jenkins_job=unknown,repo_name=corp/kafka-mantis-kafka-monitor,, stageTargetSize=35,
priority=MantisResourceClusterReservationProto.ReservationPriority(type=REPLACE, tier=0, timestamp=1774746017312), createdAt=1774746017312))

currently logs are way too long to read, it basically just saying we don't have enough worker to fulfill the request that request for one single worker, and we don't need all these machineDefinition=MachineDefinition{cpuCores=2.0, memoryMB=14336.0, networkMbps=700.0, diskMB=65536.0, numPorts=1 to be part of the details.

Besides, we don't have logs to explain why we can't find the TE for the worker even though the scheduler sees there are 2 idle TE, adding logs to show details why TE not selected for the worker

Checklist

  • ./gradlew build compiles code correctly
  • Added new tests where applicable
  • ./gradlew test passes all tests
  • Extended README or added javadocs where applicable

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 29, 2026

Test Results

781 tests  ±0   770 ✅ ±0   10m 11s ⏱️ +3s
162 suites ±0    11 💤 ±0 
162 files   ±0     0 ❌ ±0 

Results for commit 859c95d. ± Comparison against base commit b9bc562.

♻️ This comment has been updated with latest results.


if (noResourcesAvailable) {
log.warn("Not all scheduling constraints had enough workers available to fulfill the request {}", request);
log.warn("Not all scheduling constraints had enough workers for jobId={}, cluster={}",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you still need workerId + schedulingConstraint info

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worker id and constraint info already logged at findTaskExecutorsFor before coming into this log line.

I can put constraint again in this log line.
worker id we can't output here because it's in the array nested fields.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe include the reservation instance too? could make search easier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants