-
Notifications
You must be signed in to change notification settings - Fork 83
Add 2 workflows (long and short-reads) for host and contamination removal from microbiome data #991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 2 workflows (long and short-reads) for host and contamination removal from microbiome data #991
Conversation
|
@bebatut @paulzierep just remembered something: https://github.com/bede/deacon Have you thought about this as an apparently very performant alternative? |
...ost-contamination-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml
Outdated
Show resolved
Hide resolved
@santino is working on this galaxyproject/tools-iuc#7438 |
…tion-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml Co-authored-by: paulzierep <[email protected]>
...ost-contamination-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml
Outdated
Show resolved
Hide resolved
…tion-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml Co-authored-by: paulzierep <[email protected]>
|
Not sure why the last run was cancelled, the artifact said the run was successful. @bebatut can you make a small update to retrigger the CI ? |
...ost-contamination-removal/host-contamination-removal-long-reads/plnmotmptestjob4912th98.json
Outdated
Show resolved
Hide resolved
|
@paulzierep I can add you to the org, but can you please go through the reviewer checklist ? In particular
|
…tion-removal-long-reads/plnmotmptestjob4912th98.json
sure, thanks, will make sure to check it |
I might have stupid questions. Can that be done in the workflow editor or only in the |
...removal/host-contamination-removal-long-reads/host-or-contamination-removal-on-long-reads.ga
Show resolved
Hide resolved
|
Here's the full review: PR #991 Review: Host and Contamination Removal Workflows Summary This PR adds two new workflows for host and contamination removal from microbiome data:
Detailed Review Against Checklist ✅ PASS: .dockstore.yml files
Files checked:
✅ PASS: Workflow is sufficiently generic
✅ PASS: Annotation field Both workflows have appropriate annotation fields: Long-reads workflow (host-or-contamination-removal-on-long-reads.ga:3): Short-reads workflow (host-or-contamination-removal-on-short-reads.ga:3): Both follow the recommended pattern and clearly describe what the workflow does. Long-reads workflow has THREE workflow outputs with underscore naming:
Short-reads workflow has TWO workflow outputs with underscore naming:
Note: contamination_filtered_reads should also be updated to "Contamination Filtered Reads" or "Reads without Host or Contamination" The test files (*-tests.yml) must also be updated to match the new labels. ✅ PASS: Workflow input labels All input labels are human-readable:
No underscores or unnecessary abbreviations. Issue: "long-reads" vs "long-read" as compound adjective The folder name and workflow names use "long-reads" (plural), but when used as a compound adjective modifying another noun, it should be singular:
Same issue with short-reads workflow:
Explanation: When a technical term is used as a compound adjective (modifying another noun), it should be singular. Examples:
In this case, "long-reads" appears to be part of the workflow context, but it's technically modifying the type of workflow/process. For consistency Both workflow names have inconsistent capitalization:
Should be: "Host or Contamination Removal on Long-Read" (or equivalent) ✅ PASS: Workflow folder naming
Both READMEs are good and explain:
Minor issues:
Suggested addition to both READMEs: For long-reads README: When to use this workflowUse this workflow for long-read sequencing data (e.g., Nanopore, PacBio). For short-read Illumina data, see the Host or Contamination removal on For short-reads README: When to use this workflowUse this workflow for short-read paired-end Illumina sequencing data. For long-read data (Nanopore, PacBio), see the Host or Contamination removal Both CHANGELOG files have placeholder dates: [0.1] yyyy-mm-ddAction needed: Replace yyyy-mm-dd with the actual release date (typically the PR merge date). ✅ PASS: Large files
Overall Recommendation: REQUEST CHANGES Required Changes:
Recommended Changes (not blocking):
Files That Need Updates: Must fix:
Recommended to fix:
|
Address PR review feedback for galaxyproject#991: - Update workflow output labels to use human-readable names without underscores: * Long-reads: "qualimap_stats" → "QualiMap Statistics" * Long-reads: "samtools_fastx" → "Reads without Host or Contamination" * Long-reads: "multiqc_html_report" → "MultiQC HTML Report" * Short-reads: "bowtie2_mapping_statistics" → "Bowtie2 Mapping Statistics" * Short-reads: "contamination_filtered_reads" → "Contamination Filtered Reads" * Short-reads: "multiqc_html_report" → "MultiQC HTML Report" - Update corresponding labels in test files to match workflow outputs - Fix workflow name capitalization: * "removal" → "Removal" in both workflow names - Update CHANGELOG dates from "yyyy-mm-dd" to actual date (2025-12-03) - Improve README documentation: * Fix step numbering in long-reads README (was: 1,2,3,2; now: 1,2,3,4) * Add "When to use this workflow" sections to both READMEs * Cross-reference between long-reads and short-reads workflows 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
I've pushed a commit that should address the review comments, please have a close look at the changes. |
|
I do not understand the error: The test passes when I test on EU |
|
For the record this is the error: https://github.com/galaxyproject/iwc/actions/runs/19961703316/job/57243548336?pr=991#step:7:2130. Most likely running out of memory, there are 2 minimap 2 jobs each consuming 13GB of memory, changing the index for the test might be helpful. |
Test Results (powered by Planemo)Test Summary
Failed Tests
Passed Tests
|

FOR CONTRIBUTOR:
FOR REVIEWERS:
This workflow does/runs/performs … xyz … to generate/analyze/etc …namefield should be human readable (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless generally understood-) over underscore (_), prefer all lowercase. Folder becomes repository in iwc-workflows organization and is included in TRS id