Add Jobs 2.0 decision guidance for populate modes #117
+18,938
−9,678
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Adds comprehensive decision guidance for choosing between different
populate()modes, addressing a critical gap for users deploying distributed pipelines.Problem
From cohesion review (COHESION-REVIEW.md Issue #9, Medium Priority):
run-computations.mddoesn't explain whenreserve_jobs=Trueis necessarydistributed-computing.mdjumps to advanced patterns without decision criteriaSolution
New "When to Use Distributed Mode" section in
how-to/run-computations.mdContent Structure
Three Populate Modes with Clear Criteria:
1.
populate()(Default - Simple Mode)Use when:
✅ Single worker
✅ Fast computations (< 1 minute each)
✅ Small job count (< 100 entries)
✅ Development/testing
Advantages:
2.
populate(reserve_jobs=True)(Distributed Mode)Use when:
✅ Multiple workers (different machines/processes)
✅ Long computations (> 1 minute each)
✅ Production pipelines
✅ Worker crashes expected
Advantages:
Performance note:
3.
populate(reserve_jobs=True, processes=N)(Parallel Mode)Use when:
✅ Multi-core machine
✅ CPU-bound tasks
✅ Independent computations
Advantages:
Caution: Don't exceed CPU core count
Decision Tree
User Impact
Before (Confusion)
After (Clarity)
Performance Guidance
Key insights added:
Placement
Inserted before "Distributed Computing" section in
run-computations.md:Rationale: Users need decision guidance before learning distributed computing patterns.
Related
Completes Medium-Priority Cohesion Review
This PR completes all medium-priority issues from the cohesion review:
datajoint/datajoint-elementsrepository #7: Reference specs index (PR Enhance specs index with reading order and cross-references #115)