-
Notifications
You must be signed in to change notification settings - Fork 750
Description
Google Batch: Report actual zone where tasks execute in trace records
New feature
The Google Batch executor currently reports the configured region (e.g., europe-west2) in trace records rather than the actual zone where tasks execute (e.g., europe-west2-a, europe-west2-b, europe-west2-c). This prevents accurate cost estimation for spot instances, as spot pricing varies by zone rather than region.
Current behavior:
CloudMachineInfo.zoneis populated withconfig.location(region-level setting)- Trace records contain only the region:
cloudZone = "europe-west2"
Desired behavior:
CloudMachineInfo.zoneshould contain the actual zone where Google Batch allocated the task- Trace records should contain the specific zone:
cloudZone = "europe-west2-a"
Use case
Cost estimation for Google Batch workloads:
When running workflows with spot instances on Google Batch, accurate cost tracking requires knowing the specific zone where each task executed. Cloud pricing databases (like those used by Seqera Platform) store spot prices per zone (e.g., europe-west2-a, europe-west2-b, europe-west2-c) rather than per region.
Current limitation:
- Spot price lookup fails because trace records contain region (
europe-west2) but price database is keyed by zone (europe-west2-a) - Cost estimates cannot be calculated for Google Batch workflows
- Users lack visibility into actual resource costs
Deployment scenarios:
- Seqera Platform integration for cost tracking and billing
- Custom monitoring solutions that track per-task resource costs
- Audit and compliance reporting requiring accurate zone information
- Multi-zone resource optimization analysis
Suggested implementation
1. Retrieve zone information from Google Batch API:
After task completion, query the Google Batch API to retrieve the actual zone where the task executed:
// In GoogleBatchTaskHandler.groovy
def getTaskStatus() {
final job = client.getJob(jobId)
final taskStatus = job.getStatus()
// Extract actual zone from task allocation metadata
final actualZone = extractZoneFromTaskStatus(taskStatus)
return actualZone
}2. Update CloudMachineInfo with actual zone:
Modify GoogleBatchTaskHandler to populate the zone field with the actual execution zone rather than the configured region:
// In GoogleBatchTaskHandler.groovy, line ~351
machineInfo = new CloudMachineInfo(
type: machineType.type,
zone: getActualExecutionZone(), // Instead of machineType.location
priceModel: machineType.priceModel
)3. Google Batch API reference:
The zone information should be available from:
- Job status metadata after task allocation
- Instance policy or allocation policy fields
- Task group status details
Alternative approach:
If retrieving zone information adds too much API overhead, consider:
- Lazy retrieval: Only fetch zone when trace records are generated
- Cache zone information per job to minimize API calls
- Make it optional via configuration flag
Related components:
plugins/nf-google/src/main/nextflow/cloud/google/batch/GoogleBatchTaskHandler.groovy(lines 351-356, 656-658)plugins/nf-google/src/main/nextflow/cloud/google/batch/client/BatchClient.groovymodules/nextflow/src/main/groovy/nextflow/cloud/types/CloudMachineInfo.groovymodules/nextflow/src/main/groovy/nextflow/trace/TraceRecord.groovy
Backwards compatibility:
This change should be backwards compatible as it improves the accuracy of existing data without changing the field structure or API contracts.