Add 2 workflows (long and short-reads) for host and contamination removal from microbiome data #991

bebatut · 2025-10-09T15:02:50Z

FOR CONTRIBUTOR:

I have read the Adding workflows guidelines
License permits unrestricted use (educational + commercial)
Please also take note of the reviewer guidelines below to facilitate a smooth review process.

FOR REVIEWERS:

.dockstore.yml: file is present and aligned with creator metadata in workflow. ORCID identifiers are strongly encouraged in creator metadata. The .dockstore.yml file is required to run tests
Workflow is sufficiently generic to be used with lab data and does not hardcode sample names, reference data and can be run without reading an accompanying tutorial.
In workflow: annotation field contains short description of what the workflow does. Should start with This workflow does/runs/performs … xyz … to generate/analyze/etc …
In workflow: workflow inputs and outputs have human readable names (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless it is generally understood. Altering input or output labels requires adjusting these labels in the the workflow-tests.yml file as well
In workflow: name field should be human readable (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless generally understood
Workflow folder: prefer dash (-) over underscore (_), prefer all lowercase. Folder becomes repository in iwc-workflows organization and is included in TRS id
Readme explains what workflow does, what are valid inputs and what outputs users can expect. If a tutorial or other resources exist they can be linked. If a similar workflow exists in IWC readme should explain differences with existing workflow and when one might prefer one workflow over another
Changelog contains appropriate entries
Large files (> 100 KB) are uploaded to zenodo and location urls are used in test file

wm75 · 2025-10-10T14:11:17Z

@bebatut @paulzierep just remembered something: https://github.com/bede/deacon

Have you thought about this as an apparently very performant alternative?

...ost-contamination-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml

paulzierep · 2025-11-10T14:52:58Z

@bebatut @paulzierep just remembered something: https://github.com/bede/deacon

Have you thought about this as an apparently very performant alternative?

@santino is working on this galaxyproject/tools-iuc#7438
We will add it as alternative to this wf once its ready, for now it would be good to merge this first version when actions pass.

…tion-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml Co-authored-by: paulzierep <[email protected]>

...ost-contamination-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml

…tion-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml Co-authored-by: paulzierep <[email protected]>

paulzierep · 2025-12-03T13:59:44Z

Not sure why the last run was cancelled, the artifact said the run was successful. @bebatut can you make a small update to retrigger the CI ?

...ost-contamination-removal/host-contamination-removal-long-reads/plnmotmptestjob4912th98.json

mvdbeek · 2025-12-03T14:11:23Z

@paulzierep I can add you to the org, but can you please go through the reviewer checklist ? In particular

workflow inputs and outputs have human readable names

…tion-removal-long-reads/plnmotmptestjob4912th98.json

paulzierep · 2025-12-03T14:15:03Z

@paulzierep I can add you to the org, but can you please go through the reviewer checklist ? In particular

workflow inputs and outputs have human readable names

sure, thanks, will make sure to check it

bebatut · 2025-12-03T14:20:23Z

workflow inputs and outputs have human readable names

I might have stupid questions. Can that be done in the workflow editor or only in the .ga? Because in the editor, I only see the Label. And what is the difference between the Label and Name for an input?

...removal/host-contamination-removal-long-reads/host-or-contamination-removal-on-long-reads.ga

mvdbeek · 2025-12-03T15:30:40Z

Can that be done in the workflow editor

Yes, find the tool and output, then in the right side

mvdbeek · 2025-12-03T15:37:35Z

Here's the full review:

PR #991 Review: Host and Contamination Removal Workflows

Summary

This PR adds two new workflows for host and contamination removal from microbiome data:

Long-reads workflow (Nanopore): Uses Minimap2 for mapping
Short-reads workflow (Illumina): Uses Bowtie2 for mapping

Detailed Review Against Checklist

✅ PASS: .dockstore.yml files

✅ Both workflows have .dockstore.yml files present
✅ ORCID identifiers included for both authors (Paul Zierep: 0000-0003-2982-388X, Bérénice Batut: 0000-0001-9852-1987)
✅ Creator metadata in .dockstore.yml aligns with workflow creator metadata
✅ Test files are correctly referenced

Files checked:

workflows/microbiome/host-contamination-removal/host-contamination-removal-long-reads/.dockstore.yml:10-13
workflows/microbiome/host-contamination-removal/host-contamination-removal-short-reads/.dockstore.yml:10-13

✅ PASS: Workflow is sufficiently generic

✅ No hardcoded sample names
✅ Reference genome is a parameter input (not hardcoded)
✅ Workflows can be run with any lab data
✅ Does not require reading a tutorial to understand usage

✅ PASS: Annotation field

Both workflows have appropriate annotation fields:

Long-reads workflow (host-or-contamination-removal-on-long-reads.ga:3):
"This workflow takes Nanopore fastq(.gz) files and runs Minimap2 to map the reads against a reference genome (human, by default). It filters the output
to keep only the unmapped reads and generates mapping statistics that are aggregated into a MultiQC report."

Short-reads workflow (host-or-contamination-removal-on-short-reads.ga:3):
"This workflow takes paired-end Illumina fastq(.gz) files and runs Bowtie to map the reads against a reference genome (human, by default) and keep only
the reads that do not align. MultiQC is used to aggregate the mapping reports."

Both follow the recommended pattern and clearly describe what the workflow does.

⚠️ NEEDS ATTENTION: Workflow output labels contain underscores

Long-reads workflow has THREE workflow outputs with underscore naming:

"label": "qualimap_stats" (host-or-contamination-removal-on-long-reads.ga:227)
- Should be: "QualiMap Statistics" or "QualiMap Stats"
"label": "samtools_fastx" (host-or-contamination-removal-on-long-reads.ga:373)
- Should be: "Reads without host or contamination" or "Filtered Reads"
"label": "multiqc_html_report" (host-or-contamination-removal-on-long-reads.ga:440)
- Should be: "MultiQC HTML Report"

Short-reads workflow has TWO workflow outputs with underscore naming:

"label": "bowtie2_mapping_statistics" (host-or-contamination-removal-on-short-reads.ga:167)
- Should be: "Bowtie2 Mapping Statistics"
"label": "multiqc_html_report" (host-or-contamination-removal-on-short-reads.ga:285)
- Should be: "MultiQC HTML Report"

Note: contamination_filtered_reads should also be updated to "Contamination Filtered Reads" or "Reads without Host or Contamination"

The test files (*-tests.yml) must also be updated to match the new labels.

✅ PASS: Workflow input labels

All input labels are human-readable:

✅ "Long-reads" (long-reads workflow)
✅ "Short-reads" (short-reads workflow)
✅ "Host/Contaminant Reference Genome" (both)
✅ "Profile of preset options for the mapping (long-read)" (long-reads workflow)

No underscores or unnecessary abbreviations.

⚠️ NEEDS ATTENTION: Grammar issue in naming

Issue: "long-reads" vs "long-read" as compound adjective

The folder name and workflow names use "long-reads" (plural), but when used as a compound adjective modifying another noun, it should be singular:

Folder: host-contamination-removal-long-reads
- Should be: host-contamination-removal-long-read
Workflow name: "Host or Contamination removal on long-reads"
- Should be: "Host or Contamination removal on long-read" (if keeping hyphenated form)
- OR better: "Host or Contamination Removal for Long-Read Data"

Same issue with short-reads workflow:

Folder: host-contamination-removal-short-reads
- Should be: host-contamination-removal-short-read
Workflow name: "Host or Contamination removal on short-reads"
- Should be: "Host or Contamination removal on short-read"
- OR better: "Host or Contamination Removal for Short-Read Data"

Explanation: When a technical term is used as a compound adjective (modifying another noun), it should be singular. Examples:

✅ "short-read quality control" (short-read modifies "quality control")
❌ "short-reads quality control"
✅ "single-cell RNA-seq" (single-cell modifies "RNA-seq")
❌ "single-cells RNA-seq"

In this case, "long-reads" appears to be part of the workflow context, but it's technically modifying the type of workflow/process. For consistency
with IWC naming conventions, consider using the singular form.

⚠️ NEEDS ATTENTION: Workflow name capitalization

Both workflow names have inconsistent capitalization:

"Host or Contamination removal on long-reads" - "removal" should be capitalized
"Host or Contamination removal on short-reads" - "removal" should be capitalized

Should be: "Host or Contamination Removal on Long-Read" (or equivalent)

✅ PASS: Workflow folder naming

✅ Both folders use dashes (not underscores)
✅ All lowercase
ℹ️ See grammar note above regarding "long-reads" vs "long-read"

⚠️ NEEDS MINOR IMPROVEMENT: README

Both READMEs are good and explain:

✅ What the workflow does
✅ Valid inputs
✅ Expected outputs

Minor issues:

Numbering error in long-reads README (README.md:9):
- Step 2 appears twice (lines 7 and 9)
- Should renumber step 4 as step 4, not step 2
Missing comparison: Since there are TWO workflows in this PR that serve similar purposes (host removal), the READMEs should explain:
- When to use the long-reads workflow vs the short-reads workflow
- What are the key differences

When to use this workflow

Use this workflow for long-read sequencing data (e.g., Nanopore, PacBio). For short-read Illumina data, see the Host or Contamination removal on
short-reads workflow.

For short-reads README:

When to use this workflow

Use this workflow for short-read paired-end Illumina sequencing data. For long-read data (Nanopore, PacBio), see the Host or Contamination removal
on long-reads workflow.

⚠️ NEEDS ATTENTION: Changelog

Both CHANGELOG files have placeholder dates:

[0.1] yyyy-mm-dd

Action needed: Replace yyyy-mm-dd with the actual release date (typically the PR merge date).

✅ PASS: Large files

✅ Test files use Zenodo URLs for large test data (> 100 KB)
✅ Example: https://zenodo.org/record/12190648/files/...

Overall Recommendation: REQUEST CHANGES

Required Changes:

Fix workflow output labels to use human-readable names without underscores:
- Update all workflow_outputs labels in both .ga files
- Update corresponding labels in test files (*-tests.yml)
Update CHANGELOG dates from yyyy-mm-dd to actual date

Recommended Changes (not blocking):

Fix grammar: Consider changing folder names and workflow names from "long-reads"/"short-reads" to "long-read"/"short-read" for grammatical
consistency
Fix workflow name capitalization: "removal" should be "Removal"
Add comparison section to READMEs: Explain when to use each workflow
Fix numbering error in long-reads README (step numbering)

Files That Need Updates:

Must fix:

workflows/microbiome/host-contamination-removal/host-contamination-removal-long-reads/host-or-contamination-removal-on-long-reads.ga (lines 227, 373,

workflows/microbiome/host-contamination-removal/host-contamination-removal-long-reads/host-or-contamination-removal-on-long-reads-tests.yml (lines
18, 168, 174)
workflows/microbiome/host-contamination-removal/host-contamination-removal-short-reads/host-or-contamination-removal-on-short-reads.ga (lines 167,
218, 285)
workflows/microbiome/host-contamination-removal/host-contamination-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml (lines
21, 27)
Both CHANGELOG.md files (line 3)

Recommended to fix:

Both README.md files (add comparison sections)
Long-reads README.md (fix step numbering)
Consider renaming folders and updating workflow names for grammar consistency

Address PR review feedback for galaxyproject#991: - Update workflow output labels to use human-readable names without underscores: * Long-reads: "qualimap_stats" → "QualiMap Statistics" * Long-reads: "samtools_fastx" → "Reads without Host or Contamination" * Long-reads: "multiqc_html_report" → "MultiQC HTML Report" * Short-reads: "bowtie2_mapping_statistics" → "Bowtie2 Mapping Statistics" * Short-reads: "contamination_filtered_reads" → "Contamination Filtered Reads" * Short-reads: "multiqc_html_report" → "MultiQC HTML Report" - Update corresponding labels in test files to match workflow outputs - Fix workflow name capitalization: * "removal" → "Removal" in both workflow names - Update CHANGELOG dates from "yyyy-mm-dd" to actual date (2025-12-03) - Improve README documentation: * Fix step numbering in long-reads README (was: 1,2,3,2; now: 1,2,3,4) * Add "When to use this workflow" sections to both READMEs * Cross-reference between long-reads and short-reads workflows 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

mvdbeek · 2025-12-03T15:43:53Z

I've pushed a commit that should address the review comments, please have a close look at the changes.

bebatut · 2025-12-05T12:50:53Z

I do not understand the error:

Error reading tool from path: /home/runner/work/iwc/iwc/workflows/microbiome/host-contamination-removal/host-contamination-removal-long-reads/host-or-contamination-removal-on-long-reads.ga
Traceback (most recent call last):
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/tool_util/toolbox/base.py", line 950, in _load_tool_tag_set
    tool = self.load_tool(concrete_path, use_cached=False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/tool_util/toolbox/base.py", line 1203, in load_tool
    tool = self.create_tool(
           ^^^^^^^^^^^^^^^^^
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/tools/__init__.py", line 587, in create_tool
    tool_source = self.get_expanded_tool_source(config_file)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/tools/__init__.py", line 600, in get_expanded_tool_source
    raise e
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/tools/__init__.py", line 592, in get_expanded_tool_source
    return get_tool_source(
           ^^^^^^^^^^^^^^^^
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/tool_util/parser/factory.py", line 106, in get_tool_source
    tree, macro_paths = load_tool_with_refereces(config_file)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/util/xml_macros.py", line 36, in load_with_references
    tree = raw_xml_tree(path)
           ^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/util/xml_macros.py", line 90, in raw_xml_tree
    tree = parse_xml(path, strip_whitespace=False, remove_comments=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/util/__init__.py", line 392, in parse_xml
    tree = cast(ElementTree, etree.parse(f, parser=parser))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/lxml/etree.pyx", line 3711, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 2052, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 2070, in lxml.etree._parseFilelikeDocument
  File "src/lxml/parser.pxi", line 1965, in lxml.etree._parseDocFromFilelike
  File "src/lxml/parser.pxi", line 1254, in lxml.etree._BaseParser._parseDocFromFilelike
  File "src/lxml/parser.pxi", line 647, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 765, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 689, in lxml.etree._raiseParseError
  File "/home/runner/work/iwc/iwc/workflows/microbiome/host-contamination-removal/host-contamination-removal-long-reads/host-or-contamination-removal-on-long-reads.ga", line 1
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

The test passes when I test on EU

mvdbeek · 2025-12-05T14:08:34Z

For the record this is the error: https://github.com/galaxyproject/iwc/actions/runs/19961703316/job/57243548336?pr=991#step:7:2130. Most likely running out of memory, there are 2 minimap 2 jobs each consuming 13GB of memory, changing the index for the test might be helpful.

github-actions · 2025-12-05T15:57:34Z

Test Results (powered by Planemo)

Test Summary

Test State	Count
Total	2
Passed	1
Error	0
Failure	1
Skipped	0

Failed Tests

❌ host-or-contamination-removal-on-long-reads.ga_0

Problems:

Output with path /tmp/tmp684z2veb/QualiMap BamQC__bb9f461e-a9cd-4fe9-862f-a05c8fe52b2e.txt different than expected
Expected text '3,209,286,105 bp' in output ('BamQC report
-----------------------------------

>>>>>>> Input

     bam file = Spike3bBarcode10
     outfile = results/genome_results.txt


>>>>>>> Reference

     number of bases = 586,300,787 bp
     number of contigs = 17


>>>>>>> Globals

     number of windows = 416

     number of reads = 24,877
     number of mapped reads = 15 (0.06%)
     number of supplementary alignments = 15 (0.06%)
     number of secondary alignments = 23

     number of mapped bases = 6,267 bp
     number of sequenced bases = 5,873 bp
     number of aligned bases = 0 bp
     number of duplicated reads (estimated) = 9
     duplication rate = 19.05%


>>>>>>> Insert size

     mean insert size = 0
     std insert size = 0
     median insert size = 0


>>>>>>> Mapping quality

     mean mapping quality = 0.295


>>>>>>> ACTG content

     number of A's = 822 bp (14%)
     number of C's = 1,331 bp (22.66%)
     number of T's = 1,549 bp (26.37%)
     number of G's = 2,171 bp (36.97%)
     number of N's = 0 bp (0%)

     GC percentage = 59.63%


>>>>>>> Mismatches and indels

    general error rate = 0.3094
    number of mismatches = 0
    number of insertions = 217
    mapped reads with insertion percentage = 173.33%
    number of deletions = 108
    mapped reads with deletion percentage = 146.67%
    homopolymer indels = 23.08%


>>>>>>> Coverage

     mean coverageData = 0X
     std coverageData = 0.0067X

     There is a 0% of reference with a coverageData >= 1X
     There is a 0% of reference with a coverageData >= 2X
     There is a 0% of reference with a coverageData >= 3X
     There is a 0% of reference with a coverageData >= 4X
     There is a 0% of reference with a coverageData >= 5X
     There is a 0% of reference with a coverageData >= 6X
     There is a 0% of reference with a coverageData >= 7X
     There is a 0% of reference with a coverageData >= 8X
     There is a 0% of reference with a coverageData >= 9X
     There is a 0% of reference with a coverageData >= 10X
     There is a 0% of reference with a coverageData >= 11X
     There is a 0% of reference with a coverageData >= 12X
     There is a 0% of reference with a coverageData >= 13X
     There is a 0% of reference with a coverageData >= 14X
     There is a 0% of reference with a coverageData >= 15X
     There is a 0% of reference with a coverageData >= 16X
     There is a 0% of reference with a coverageData >= 17X
     There is a 0% of reference with a coverageData >= 18X
     There is a 0% of reference with a coverageData >= 19X
     There is a 0% of reference with a coverageData >= 20X
     There is a 0% of reference with a coverageData >= 21X
     There is a 0% of reference with a coverageData >= 22X
     There is a 0% of reference with a coverageData >= 23X
     There is a 0% of reference with a coverageData >= 24X
     There is a 0% of reference with a coverageData >= 25X
     There is a 0% of reference with a coverageData >= 26X
     There is a 0% of reference with a coverageData >= 27X
     There is a 0% of reference with a coverageData >= 28X
     There is a 0% of reference with a coverageData >= 29X
     There is a 0% of reference with a coverageData >= 30X
     There is a 0% of reference with a coverageData >= 31X
     There is a 0% of reference with a coverageData >= 32X
     There is a 0% of reference with a coverageData >= 33X
     There is a 0% of reference with a coverageData >= 34X
     There is a 0% of reference with a coverageData >= 35X
     There is a 0% of reference with a coverageData >= 36X
     There is a 0% of reference with a coverageData >= 37X
     There is a 0% of reference with a coverageData >= 38X
     There is a 0% of reference with a coverageData >= 39X
     There is a 0% of reference with a coverageData >= 40X
     There is a 0% of reference with a coverageData >= 41X
     There is a 0% of reference with a coverageData >= 42X
     There is a 0% of reference with a coverageData >= 43X
     There is a 0% of reference with a coverageData >= 44X
     There is a 0% of reference with a coverageData >= 45X
     There is a 0% of reference with a coverageData >= 46X
     There is a 0% of reference with a coverageData >= 47X
     There is a 0% of reference with a coverageData >= 48X
     There is a 0% of reference with a coverageData >= 49X
     There is a 0% of reference with a coverageData >= 50X
     There is a 0% of reference with a coverageData >= 51X


>>>>>>> Coverage per contig

	Group10	11440700	0	0.0	0.0
	Group11	12576330	247	1.9640069877301247E-5	0.0035106469234644855
	Group12	9182753	0	0.0	0.0
	Group13	8929068	124	1.3887227647947132E-5	0.0034938767695826276
	Group14	8318479	0	0.0	0.0
	Group15	7856270	0	0.0	0.0
	Group16	5631066	0	0.0	0.0
	Group1	25854376	118	4.564024287416567E-6	0.002127281278587259
	Group2	14465785	646	4.465709949373643E-5	0.006461575798427947
	Group3	12341916	0	0.0	0.0
	Group4	10796202	0	0.0	0.0
	Group5	13386189	2572	1.9213833003553138E-4	0.03819991449493488
	Group6	14581788	610	4.183300429275203E-5	0.011327358078360415
	Group7	9974240	0	0.0	0.0
	Group8	11452794	77	6.723250239199273E-6	0.00259291439062409
	Group9	10282195	0	0.0	0.0
	GroupUn	399230636	1873	4.6915237236452965E-6	0.0031478625233298647


')

Workflow invocation details

Invocation Messages

Steps

Step 1: Host/Contaminant Reference Genome (long-reads):
- step_state: scheduled
Step 2: Long-reads:
- step_state: scheduled
Step 3: Profile of preset options for the mapping (long-read):
- step_state: scheduled

Step 4: minimap2:

step_state: scheduled

Jobs

Job 1:

Job state is ok

Command Line:

ln -f -s '/cvmfs/data.galaxyproject.org/byhand/apiMel3/seq/apiMel3.fa' reference.fa && minimap2 -x map-pb    --q-occ-frac 0.01       -t ${GALAXY_SLOTS:-4} reference.fa '/tmp/tmp6lbm2yua/files/6/f/5/dataset_6f5c06e5-9cc8-462d-bad0-8e53d2483203.dat' -a | samtools view --no-PG -hT reference.fa | samtools sort -@${GALAXY_SLOTS:-2} -T "${TMPDIR:-.}" -O BAM -o '/tmp/tmp6lbm2yua/job_working_directory/000/3/outputs/dataset_6b5acde1-9bd9-4e82-9359-ae879a76b274.dat'

Exit Code:

```
0
```

Standard Error:

[M::mm_idx_gen::17.979*0.40] collected minimizers
[M::mm_idx_gen::19.955*0.46] sorted minimizers
[M::main::19.955*0.46] loaded/built the index for 17 target sequence(s)
[M::mm_mapopt_update::20.249*0.47] mid_occ = 138
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 17
[M::mm_idx_stat::20.446*0.47] distinct minimizers: 17801079 (80.12% are singletons); average occurrences: 1.615; average spacing: 20.392; total length: 586300787
[M::worker_pipeline::22.293*0.51] mapped 24877 sequences
[M::main] Version: 2.28-r1209
[M::main] CMD: minimap2 -x map-pb --q-occ-frac 0.01 -t 1 -a reference.fa /tmp/tmp6lbm2yua/files/6/f/5/dataset_6f5c06e5-9cc8-462d-bad0-8e53d2483203.dat
[M::main] Real time: 22.304 sec; CPU: 11.358 sec; Peak RSS: 2.067 GB

Traceback:

Job Parameters:

Job parameter	Parameter value
__input_ext	`"input"`
__workflow_invocation_uuid__	`"cbd54a0ed1f111f0ae2b7ced8d8023f5"`
alignment_options	`{"A": null, "B": null, "E": null, "E2": null, "O": null, "O2": null, "no_end_flt": true, "s": null, "splicing": {"__current_case__": 0, "splice_mode": "preset"}, "z": null, "z2": null}`
chromInfo	`"/tmp/tmp6lbm2yua/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"`
dbkey	`"?"`
fastq_input	`{"__current_case__": 0, "analysis_type_selector": "map-pb", "fastq_input1": {"values": [{"id": 1, "src": "dce"}]}, "fastq_input_selector": "single"}`
indexing_options	`{"H": false, "I": null, "k": null, "w": null}`
io_options	`{"K": null, "L": false, "Q": false, "Y": false, "c": false, "cs": null, "eqx": false, "output_format": "BAM"}`
mapping_options	`{"F": null, "N": null, "X": false, "f": null, "g": null, "kmer_ocurrence_interval": {"__current_case__": 1, "interval": ""}, "m": null, "mask_len": null, "max_chain_iter": null, "max_chain_skip": null, "min_occ_floor": null, "n": null, "p": null, "q_occ_frac": "0.01", "r": null}`
reference_source	`{"__current_case__": 0, "ref_file": "apiMel3", "reference_source_selector": "cached"}`

Job 2:

Job state is ok

Command Line:

ln -f -s '/cvmfs/data.galaxyproject.org/byhand/apiMel3/seq/apiMel3.fa' reference.fa && minimap2 -x map-pb    --q-occ-frac 0.01       -t ${GALAXY_SLOTS:-4} reference.fa '/tmp/tmp6lbm2yua/files/a/e/c/dataset_aec527c0-5b27-481d-94c0-51d67018f616.dat' -a | samtools view --no-PG -hT reference.fa | samtools sort -@${GALAXY_SLOTS:-2} -T "${TMPDIR:-.}" -O BAM -o '/tmp/tmp6lbm2yua/job_working_directory/000/4/outputs/dataset_5cbf10ca-a189-41bf-a2bf-bd8862624762.dat'

Exit Code:

```
0
```

Standard Error:

[M::mm_idx_gen::18.024*0.40] collected minimizers
[M::mm_idx_gen::19.995*0.46] sorted minimizers
[M::main::19.996*0.46] loaded/built the index for 17 target sequence(s)
[M::mm_mapopt_update::20.284*0.47] mid_occ = 138
[M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 17
[M::mm_idx_stat::20.486*0.47] distinct minimizers: 17801079 (80.12% are singletons); average occurrences: 1.615; average spacing: 20.392; total length: 586300787
[M::worker_pipeline::22.162*0.51] mapped 16951 sequences
[M::main] Version: 2.28-r1209
[M::main] CMD: minimap2 -x map-pb --q-occ-frac 0.01 -t 1 -a reference.fa /tmp/tmp6lbm2yua/files/a/e/c/dataset_aec527c0-5b27-481d-94c0-51d67018f616.dat
[M::main] Real time: 22.178 sec; CPU: 11.258 sec; Peak RSS: 2.067 GB

Traceback:

Job Parameters:

Job parameter	Parameter value
__input_ext	`"input"`
__workflow_invocation_uuid__	`"cbd54a0ed1f111f0ae2b7ced8d8023f5"`
alignment_options	`{"A": null, "B": null, "E": null, "E2": null, "O": null, "O2": null, "no_end_flt": true, "s": null, "splicing": {"__current_case__": 0, "splice_mode": "preset"}, "z": null, "z2": null}`
chromInfo	`"/tmp/tmp6lbm2yua/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"`
dbkey	`"?"`
fastq_input	`{"__current_case__": 0, "analysis_type_selector": "map-pb", "fastq_input1": {"values": [{"id": 2, "src": "dce"}]}, "fastq_input_selector": "single"}`
indexing_options	`{"H": false, "I": null, "k": null, "w": null}`
io_options	`{"K": null, "L": false, "Q": false, "Y": false, "c": false, "cs": null, "eqx": false, "output_format": "BAM"}`
mapping_options	`{"F": null, "N": null, "X": false, "f": null, "g": null, "kmer_ocurrence_interval": {"__current_case__": 1, "interval": ""}, "m": null, "mask_len": null, "max_chain_iter": null, "max_chain_skip": null, "min_occ_floor": null, "n": null, "p": null, "q_occ_frac": "0.01", "r": null}`
reference_source	`{"__current_case__": 0, "ref_file": "apiMel3", "reference_source_selector": "cached"}`

Step 5: QualiMap:

step_state: scheduled

Jobs

Job 1:

Job state is ok

Command Line:

export JAVA_OPTS="-Djava.awt.headless=true -Xmx${GALAXY_MEMORY_MB:-1024}m" &&    ln -s '/tmp/tmp6lbm2yua/files/6/b/5/dataset_6b5acde1-9bd9-4e82-9359-ae879a76b274.dat' 'Spike3bBarcode10' &&  qualimap bamqc -bam 'Spike3bBarcode10' -outdir results -outformat html --collect-overlap-pairs -nw 400 --paint-chromosome-limits -hm 3  --skip-duplicated --skip-dup-mode 0 -nt ${GALAXY_SLOTS:-1} &&   sed 's|images_qualimapReport/||g;s|css/||g' results/qualimapReport.html > '/tmp/tmp6lbm2yua/job_working_directory/000/5/outputs/dataset_21d60356-05fe-44ec-b85b-681b72e24b09.dat' && mkdir '/tmp/tmp6lbm2yua/job_working_directory/000/5/outputs/dataset_21d60356-05fe-44ec-b85b-681b72e24b09_files' && mv results/css/*.css '/tmp/tmp6lbm2yua/job_working_directory/000/5/outputs/dataset_21d60356-05fe-44ec-b85b-681b72e24b09_files' && mv results/css/*.png '/tmp/tmp6lbm2yua/job_working_directory/000/5/outputs/dataset_21d60356-05fe-44ec-b85b-681b72e24b09_files' && if [ -d results/images_qualimapReport ]; then mv results/images_qualimapReport/* '/tmp/tmp6lbm2yua/job_working_directory/000/5/outputs/dataset_21d60356-05fe-44ec-b85b-681b72e24b09_files' && for file in $(ls -A results/raw_data_qualimapReport); do mv "results/raw_data_qualimapReport/$file" `echo "results/$file" | sed 's/(//;s/)//'`; done fi && mv results/genome_results.txt results/summary_report.txt

Exit Code:

```
0
```

Standard Output:

Java memory size is set to 1200M
Launching application...

detected environment java options -Djava.awt.headless=true -Xmx1024m
QualiMap v.2.3
Built on 2023-05-19 16:57

Selected tool: bamqc
Available memory (Mb): 253
Max memory (Mb): 1037
Starting bam qc....
Loading sam header...
Loading locator...
Loading reference...
Only flagged duplicate alignments will be skipped...
Number of windows: 400, effective number of windows: 416
Chunk of reads size: 1000
Number of threads: 1
Processed 50 out of 416 windows...
Processed 100 out of 416 windows...
Processed 150 out of 416 windows...
Processed 200 out of 416 windows...
Processed 250 out of 416 windows...
Processed 300 out of 416 windows...
Processed 350 out of 416 windows...
Processed 400 out of 416 windows...
Total processed windows:416
Number of reads: 24877
Number of valid reads: 30
Number of correct strand reads:0

Inside of regions...
Num mapped reads: 15
Num mapped first of pair: 0
Num mapped second of pair: 0
Num singletons: 0
Time taken to analyze reads: 4
Computing descriptors...
numberOfMappedBases: 6267
referenceSize: 586300787
numberOfSequencedBases: 5873
numberOfAs: 822
Computing per chromosome statistics...
Computing histograms...
Overall analysis time: 5
end of bam qc
Computing report...
Writing HTML report...
HTML report created successfully

Finished

Traceback:

Job Parameters:

Job parameter	Parameter value
__input_ext	`"input"`
__workflow_invocation_uuid__	`"cbd54a0ed1f111f0ae2b7ced8d8023f5"`
chromInfo	`"/cvmfs/data.galaxyproject.org/managed/len/ucsc/apiMel3.len"`
dbkey	`"apiMel3"`
duplicate_skipping	`"0"`
per_base_coverage	`false`
plot_specific	`{"genome_gc_distr": null, "homopolymer_size": "3", "n_bins": "400", "paint_chromosome_limits": true}`
stats_regions	`{"__current_case__": 0, "region_select": "all"}`

Job 2:

Job state is ok

Command Line:

export JAVA_OPTS="-Djava.awt.headless=true -Xmx${GALAXY_MEMORY_MB:-1024}m" &&    ln -s '/tmp/tmp6lbm2yua/files/5/c/b/dataset_5cbf10ca-a189-41bf-a2bf-bd8862624762.dat' 'Spike3bBarcode12' &&  qualimap bamqc -bam 'Spike3bBarcode12' -outdir results -outformat html --collect-overlap-pairs -nw 400 --paint-chromosome-limits -hm 3  --skip-duplicated --skip-dup-mode 0 -nt ${GALAXY_SLOTS:-1} &&   sed 's|images_qualimapReport/||g;s|css/||g' results/qualimapReport.html > '/tmp/tmp6lbm2yua/job_working_directory/000/6/outputs/dataset_ebaeac9e-4746-41da-9145-f966441d0619.dat' && mkdir '/tmp/tmp6lbm2yua/job_working_directory/000/6/outputs/dataset_ebaeac9e-4746-41da-9145-f966441d0619_files' && mv results/css/*.css '/tmp/tmp6lbm2yua/job_working_directory/000/6/outputs/dataset_ebaeac9e-4746-41da-9145-f966441d0619_files' && mv results/css/*.png '/tmp/tmp6lbm2yua/job_working_directory/000/6/outputs/dataset_ebaeac9e-4746-41da-9145-f966441d0619_files' && if [ -d results/images_qualimapReport ]; then mv results/images_qualimapReport/* '/tmp/tmp6lbm2yua/job_working_directory/000/6/outputs/dataset_ebaeac9e-4746-41da-9145-f966441d0619_files' && for file in $(ls -A results/raw_data_qualimapReport); do mv "results/raw_data_qualimapReport/$file" `echo "results/$file" | sed 's/(//;s/)//'`; done fi && mv results/genome_results.txt results/summary_report.txt

Exit Code:

```
0
```

Standard Output:

Java memory size is set to 1200M
Launching application...

detected environment java options -Djava.awt.headless=true -Xmx1024m
QualiMap v.2.3
Built on 2023-05-19 16:57

Selected tool: bamqc
Available memory (Mb): 253
Max memory (Mb): 1037
Starting bam qc....
Loading sam header...
Loading locator...
Loading reference...
Only flagged duplicate alignments will be skipped...
Number of windows: 400, effective number of windows: 416
Chunk of reads size: 1000
Number of threads: 1
Processed 50 out of 416 windows...
Processed 100 out of 416 windows...
Processed 150 out of 416 windows...
Processed 200 out of 416 windows...
Processed 250 out of 416 windows...
Processed 300 out of 416 windows...
Processed 350 out of 416 windows...
Processed 400 out of 416 windows...
Total processed windows:416
Number of reads: 16951
Number of valid reads: 10
Number of correct strand reads:0

Inside of regions...
Num mapped reads: 10
Num mapped first of pair: 0
Num mapped second of pair: 0
Num singletons: 0
Time taken to analyze reads: 4
Computing descriptors...
numberOfMappedBases: 2240
referenceSize: 586300787
numberOfSequencedBases: 2165
numberOfAs: 635
Computing per chromosome statistics...
Computing histograms...
Overall analysis time: 4
end of bam qc
Computing report...
Writing HTML report...
HTML report created successfully

Finished

Traceback:

Job Parameters:

Job parameter	Parameter value
__input_ext	`"input"`
__workflow_invocation_uuid__	`"cbd54a0ed1f111f0ae2b7ced8d8023f5"`
chromInfo	`"/cvmfs/data.galaxyproject.org/managed/len/ucsc/apiMel3.len"`
dbkey	`"apiMel3"`
duplicate_skipping	`"0"`
per_base_coverage	`false`
plot_specific	`{"genome_gc_distr": null, "homopolymer_size": "3", "n_bins": "400", "paint_chromosome_limits": true}`
stats_regions	`{"__current_case__": 0, "region_select": "all"}`

Step 6: Split BAM:

step_state: scheduled

Jobs

Job 1:

Job state is ok

Command Line:

ln -s '/tmp/tmp6lbm2yua/files/6/b/5/dataset_6b5acde1-9bd9-4e82-9359-ae879a76b274.dat' 'localbam.bam' && ln -s '/tmp/tmp6lbm2yua/files/_metadata_files/1/a/1/metadata_1a1759c2-d97f-41b5-8460-604d89109b6b.dat' 'localbam.bam.bai' && bamtools split -mapped -in localbam.bam -stub split_bam

Exit Code:

```
0
```

Traceback:

Job Parameters:

Job parameter	Parameter value
__input_ext	`"bam"`
__workflow_invocation_uuid__	`"cbd54a0ed1f111f0ae2b7ced8d8023f5"`
chromInfo	`"/cvmfs/data.galaxyproject.org/managed/len/ucsc/apiMel3.len"`
dbkey	`"apiMel3"`

Job 2:

Job state is ok

Command Line:

ln -s '/tmp/tmp6lbm2yua/files/5/c/b/dataset_5cbf10ca-a189-41bf-a2bf-bd8862624762.dat' 'localbam.bam' && ln -s '/tmp/tmp6lbm2yua/files/_metadata_files/b/d/2/metadata_bd2699ec-24f6-4aeb-8bb5-5f329386c8d0.dat' 'localbam.bam.bai' && bamtools split -mapped -in localbam.bam -stub split_bam

Exit Code:

```
0
```

Traceback:

Job Parameters:

Job parameter	Parameter value
__input_ext	`"bam"`
__workflow_invocation_uuid__	`"cbd54a0ed1f111f0ae2b7ced8d8023f5"`
chromInfo	`"/cvmfs/data.galaxyproject.org/managed/len/ucsc/apiMel3.len"`
dbkey	`"apiMel3"`

Step 7: Flatten collection:
- step_state: scheduled
- Jobs
  - Job 1:
    - Job state is ok
    Traceback:
    Job Parameters:
    - Job parameter Parameter value
      
      __workflow_invocation_uuid__ "cbd54a0ed1f111f0ae2b7ced8d8023f5"
      
      input {"values": [{"id": 3, "src": "hdca"}]}
      
      join_identifier "_"

Step 8: toolshed.g2.bx.psu.edu/repos/iuc/samtools_fastx/samtools_fastx/1.21+galaxy0:

step_state: scheduled

Jobs

Job 1:

Job state is ok

Command Line:

addthreads=${GALAXY_SLOTS:-1} && (( addthreads-- )) &&  samtools sort -@ $addthreads -n '/tmp/tmp6lbm2yua/files/9/4/2/dataset_942a870c-8ec0-4806-b233-981750cae363.dat' -T "${TMPDIR:-.}" > input &&   samtools fastq       -f 0   -F 2304   -G 0  input  > output.fastqsanger && ln -s output.fastqsanger output

Exit Code:

```
0
```

Standard Error:

[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 24862 reads

Traceback:

Job Parameters:

Job parameter	Parameter value
__input_ext	`"input"`
__workflow_invocation_uuid__	`"cbd54a0ed1f111f0ae2b7ced8d8023f5"`
chromInfo	`"/cvmfs/data.galaxyproject.org/managed/len/ucsc/apiMel3.len"`
copy_arb_tags	`None`
copy_tags	`false`
dbkey	`"apiMel3"`
exclusive_filter	`["256", "2048"]`
exclusive_filter_all	`None`
idxout_cond	`{"__current_case__": 0, "idxout_select": "no"}`
inclusive_filter	`None`
output_fmt_cond	`{"__current_case__": 0, "default_quality": null, "ilumina_casava": false, "output_fmt_select": "fastqsanger", "output_quality": false}`
outputs	`"other"`
read_numbering	`""`

Job 2:

Job state is ok

Command Line:

addthreads=${GALAXY_SLOTS:-1} && (( addthreads-- )) &&  samtools sort -@ $addthreads -n '/tmp/tmp6lbm2yua/files/1/9/b/dataset_19b6b2ee-9ee3-43d7-8814-fec586098884.dat' -T "${TMPDIR:-.}" > input &&   samtools fastq       -f 0   -F 2304   -G 0  input  > output.fastqsanger && ln -s output.fastqsanger output

Exit Code:

```
0
```

Standard Error:

[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 16941 reads

Traceback:

Job Parameters:

Job parameter	Parameter value
__input_ext	`"input"`
__workflow_invocation_uuid__	`"cbd54a0ed1f111f0ae2b7ced8d8023f5"`
chromInfo	`"/cvmfs/data.galaxyproject.org/managed/len/ucsc/apiMel3.len"`
copy_arb_tags	`None`
copy_tags	`false`
dbkey	`"apiMel3"`
exclusive_filter	`["256", "2048"]`
exclusive_filter_all	`None`
idxout_cond	`{"__current_case__": 0, "idxout_select": "no"}`
inclusive_filter	`None`
output_fmt_cond	`{"__current_case__": 0, "default_quality": null, "ilumina_casava": false, "output_fmt_select": "fastqsanger", "output_quality": false}`
outputs	`"other"`
read_numbering	`""`

Step 9: MultiQC:

step_state: scheduled

Jobs

Job 1:

Job state is ok

Command Line:

die() { echo "$@" 1>&2 ; exit 1; } &&  mkdir multiqc_WDir &&   mkdir multiqc_WDir/qualimap_0 &&  sample="$(grep 'bam file = ' /tmp/tmp6lbm2yua/files/b/b/9/dataset_bb9f461e-a9cd-4fe9-862f-a05c8fe52b2e.dat | sed 's/bam file = //g' | sed 's: ::g')" && dir_name="multiqc_WDir/qualimap_0/${sample}" && mkdir -p ${dir_name} && filepath_1="${dir_name}/genome_results.txt" && ln -sf '/tmp/tmp6lbm2yua/files/b/b/9/dataset_bb9f461e-a9cd-4fe9-862f-a05c8fe52b2e.dat' ${filepath_1} && nested_dir_name="${dir_name}/raw_data_qualimapReport/" && mkdir -p ${nested_dir_name} && filepath_2="${nested_dir_name}/coverage_histogram.txt" && ln -sf '/tmp/tmp6lbm2yua/files/c/2/0/dataset_c20ae67a-7924-4a77-84af-3e135378cd00.dat' ${filepath_2} && nested_dir_name="${dir_name}/raw_data_qualimapReport/" && mkdir -p ${nested_dir_name} && filepath_3="${nested_dir_name}/mapped_reads_gc-content_distribution.txt" && ln -sf '/tmp/tmp6lbm2yua/files/d/e/6/dataset_de65f4dc-20e4-40d4-b06f-c87c8b03c094.dat' ${filepath_3} && sample="$(grep 'bam file = ' /tmp/tmp6lbm2yua/files/6/3/c/dataset_63cc325a-03f9-4ca3-abbb-4680a263aaf8.dat | sed 's/bam file = //g' | sed 's: ::g')" && dir_name="multiqc_WDir/qualimap_0/${sample}" && mkdir -p ${dir_name} && filepath_1="${dir_name}/genome_results.txt" && ln -sf '/tmp/tmp6lbm2yua/files/6/3/c/dataset_63cc325a-03f9-4ca3-abbb-4680a263aaf8.dat' ${filepath_1} && nested_dir_name="${dir_name}/raw_data_qualimapReport/" && mkdir -p ${nested_dir_name} && filepath_2="${nested_dir_name}/coverage_histogram.txt" && ln -sf '/tmp/tmp6lbm2yua/files/4/4/a/dataset_44a4239e-aaed-42c5-b3d1-5483008c5bd5.dat' ${filepath_2} && nested_dir_name="${dir_name}/raw_data_qualimapReport/" && mkdir -p ${nested_dir_name} && filepath_3="${nested_dir_name}/mapped_reads_gc-content_distribution.txt" && ln -sf '/tmp/tmp6lbm2yua/files/9/3/f/dataset_93f22b45-b5ec-4ff8-8414-0b0778fc979e.dat' ${filepath_3} &&    multiqc multiqc_WDir --filename 'report'  --title 'HostContamination Removal'      && mkdir -p ./plots && ls -l ./report_data/ && cp ./report_data/*plot*.txt ./plots/ | true

Exit Code:

```
0
```

Standard Error:

/// MultiQC 🔍 v1.27

     update_config | Report title: HostContamination Removal
     version_check | MultiQC Version v1.32 now available!
       file_search | Search path: /tmp/tmp6lbm2yua/job_working_directory/000/12/working/multiqc_WDir

          qualimap | Found 2 BamQC reports

     write_results | Data        : report_data
     write_results | Report      : report.html
           multiqc | MultiQC complete
cp: cannot stat './report_data/*plot*.txt': No such file or directory

Standard Output:

total 188
-rw-r--r-- 1 1001 1001    531 Dec  5 15:50 multiqc.log
-rw-r--r-- 1 1001 1001    186 Dec  5 15:50 multiqc_citations.txt
-rw-r--r-- 1 1001 1001 158476 Dec  5 15:50 multiqc_data.json
-rw-r--r-- 1 1001 1001    704 Dec  5 15:50 multiqc_general_stats.txt
-rw-r--r-- 1 1001 1001    422 Dec  5 15:50 multiqc_qualimap_bamqc_genome_results.txt
-rw-r--r-- 1 1001 1001   1136 Dec  5 15:50 multiqc_sources.txt
-rw-r--r-- 1 1001 1001    246 Dec  5 15:50 qualimap_coverage_histogram.txt
-rw-r--r-- 1 1001 1001   2552 Dec  5 15:50 qualimap_gc_content.txt
-rw-r--r-- 1 1001 1001    759 Dec  5 15:50 qualimap_genome_fraction.txt

Traceback:

Job Parameters:

Job parameter	Parameter value
__input_ext	`"input"`
__workflow_invocation_uuid__	`"cbd54a0ed1f111f0ae2b7ced8d8023f5"`
chromInfo	`"/cvmfs/data.galaxyproject.org/managed/len/ucsc/apiMel3.len"`
comment	`""`
dbkey	`"apiMel3"`
export	`false`
flat	`false`
image_content_input	`None`
results	`[{"__index__": 0, "software_cond": {"__current_case__": 20, "input": {"values": [{"id": 7, "src": "hdca"}]}, "software": "qualimap"}}]`
title	`"Host/Contamination Removal"`

Other invocation details
- history_id
  - 4eae5114ab312710
- history_state
  - ok
- invocation_id
  - 4eae5114ab312710
- invocation_state
  - scheduled
- workflow_id
  - 4eae5114ab312710

Passed Tests

✅ host-or-contamination-removal-on-short-reads.ga_0

Workflow invocation details

Invocation Messages

Steps

Step 1: Short-reads:
- step_state: scheduled
Step 2: Host/Contaminant Reference Genome:
- step_state: scheduled

Step 3: Bowtie2:

step_state: scheduled

Jobs

Job 1:

Job state is ok

Command Line:

set -o | grep -q pipefail && set -o pipefail;   ln -f -s '/tmp/tmpakt3d503/files/0/1/0/dataset_010887f7-8c2a-4f52-b640-8287c47d9112.dat' input_f.fastq.gz &&  ln -f -s '/tmp/tmpakt3d503/files/d/f/5/dataset_df519121-ede9-434c-bc12-6a8617b966da.dat' input_r.fastq.gz &&    THREADS=${GALAXY_SLOTS:-4} && if [ "$THREADS" -gt 1 ]; then (( THREADS-- )); fi &&   bowtie2  -p "$THREADS"  -x '/cvmfs/data.galaxyproject.org/byhand/hg38/hg38full/bowtie2_index/hg38full'   -1 'input_f.fastq.gz' -2 'input_r.fastq.gz' --un-conc-gz '/tmp/tmpakt3d503/job_working_directory/000/3/outputs/dataset_78be4b6d-81bd-49a9-a7fc-31226db3d9e0.dat'                2> >(tee '/tmp/tmpakt3d503/job_working_directory/000/3/outputs/dataset_f5653d43-a8bc-406e-bfdb-ee3303ba91e9.dat' >&2)  | samtools sort -l 0 -T "${TMPDIR:-.}" -O bam | samtools view --no-PG -O bam -@ ${GALAXY_SLOTS:-1} -o '/tmp/tmpakt3d503/job_working_directory/000/3/outputs/dataset_fa253e38-9dde-4dc8-bed0-96f3ff77a3b7.dat'  && mv '/tmp/tmpakt3d503/job_working_directory/000/3/outputs/dataset_78be4b6d-81bd-49a9-a7fc-31226db3d9e0.1.dat' '/tmp/tmpakt3d503/job_working_directory/000/3/outputs/dataset_78be4b6d-81bd-49a9-a7fc-31226db3d9e0.dat' && mv '/tmp/tmpakt3d503/job_working_directory/000/3/outputs/dataset_78be4b6d-81bd-49a9-a7fc-31226db3d9e0.2.dat' '/tmp/tmpakt3d503/job_working_directory/000/3/outputs/dataset_9c2499c0-a007-4b4c-8f1a-4e3a1ec7be6e.dat'

Exit Code:

```
0
```

Standard Error:

9462 reads; of these:
  9462 (100.00%) were paired; of these:
    9462 (100.00%) aligned concordantly 0 times
    0 (0.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    9462 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    9462 pairs aligned 0 times concordantly or discordantly; of these:
      18924 mates make up the pairs; of these:
        18924 (100.00%) aligned 0 times
        0 (0.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
0.00% overall alignment rate

Traceback:

Job Parameters:

Job parameter	Parameter value
__input_ext	`"input"`
__workflow_invocation_uuid__	`"5a13a61cd1f211f0ae2b00224808a459"`
analysis_type	`{"__current_case__": 0, "analysis_type_selector": "simple", "presets": "no_presets"}`
chromInfo	`"/tmp/tmpakt3d503/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"`
dbkey	`"?"`
library	`{"__current_case__": 2, "aligned_file": false, "input_1": {"values": [{"id": 1, "src": "dce"}]}, "paired_options": {"__current_case__": 1, "paired_options_selector": "no"}, "type": "paired_collection", "unaligned_file": true}`
reference_genome	`{"__current_case__": 0, "index": "hg38", "source": "indexed"}`
rg	`{"__current_case__": 3, "rg_selector": "do_not_set"}`
sam_options	`{"__current_case__": 1, "sam_options_selector": "no"}`
save_mapping_stats	`true`

Step 4: Create a paired collection:
- step_state: scheduled
- Jobs
  - Job 1:
    - Job state is ok
    Traceback:
    Job Parameters:
    - Job parameter Parameter value
      
      __workflow_invocation_uuid__ "5a13a61cd1f211f0ae2b00224808a459"

Step 5: MultiQC:

step_state: scheduled

Jobs

Job 1:

Job state is ok

Command Line:

die() { echo "$@" 1>&2 ; exit 1; } &&  mkdir multiqc_WDir &&   mkdir multiqc_WDir/bowtie2_0 &&         grep -Pq '% overall alignment rate' /tmp/tmpakt3d503/files/f/5/6/dataset_f5653d43-a8bc-406e-bfdb-ee3303ba91e9.dat || die "Module 'bowtie2: '% overall alignment rate' not found in the file 'pair'" && ln -s '/tmp/tmpakt3d503/files/f/5/6/dataset_f5653d43-a8bc-406e-bfdb-ee3303ba91e9.dat' 'multiqc_WDir/bowtie2_0/pair'  &&      multiqc multiqc_WDir --filename 'report'  --title 'Host Removal'      && mkdir -p ./plots && ls -l ./report_data/ && cp ./report_data/*plot*.txt ./plots/ | true

Exit Code:

```
0
```

Standard Error:

/// MultiQC 🔍 v1.27

     update_config | Report title: Host Removal
     version_check | MultiQC Version v1.32 now available!
       file_search | Search path: /tmp/tmpakt3d503/job_working_directory/000/5/working/multiqc_WDir

           bowtie2 | Found 1 reports

     write_results | Data        : report_data
     write_results | Report      : report.html
           multiqc | MultiQC complete

Standard Output:

total 56
-rw-r--r-- 1 1001 1001    43 Dec  5 15:56 bowtie2_pe_plot.txt
-rw-r--r-- 1 1001 1001   511 Dec  5 15:56 multiqc.log
-rw-r--r-- 1 1001 1001   368 Dec  5 15:56 multiqc_bowtie2.txt
-rw-r--r-- 1 1001 1001   245 Dec  5 15:56 multiqc_citations.txt
-rw-r--r-- 1 1001 1001 30791 Dec  5 15:56 multiqc_data.json
-rw-r--r-- 1 1001 1001    55 Dec  5 15:56 multiqc_general_stats.txt
-rw-r--r-- 1 1001 1001   151 Dec  5 15:56 multiqc_sources.txt

Traceback:

Job Parameters:

Job parameter	Parameter value
__input_ext	`"input"`
__workflow_invocation_uuid__	`"5a13a61cd1f211f0ae2b00224808a459"`
chromInfo	`"/cvmfs/data.galaxyproject.org/managed/len/ucsc/hg38.len"`
comment	`""`
dbkey	`"hg38"`
export	`false`
flat	`false`
image_content_input	`None`
results	`[{"__index__": 0, "software_cond": {"__current_case__": 3, "input": {"values": [{"id": 5, "src": "hdca"}]}, "software": "bowtie2"}}]`
title	`"Host Removal"`

Other invocation details
- history_id
  - 5fc334fd834f9169
- history_state
  - ok
- invocation_id
  - 5fc334fd834f9169
- invocation_state
  - scheduled
- workflow_id
  - 5fc334fd834f9169

bebatut added 2 commits October 9, 2025 16:31

Add long reads workflow for host or contamination removal

0dd9ce8

Add short reads workflow for host or contamination removal

8d2a487

paulzierep reviewed Nov 10, 2025

View reviewed changes

...ost-contamination-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml Outdated Show resolved Hide resolved

paulzierep approved these changes Nov 10, 2025

View reviewed changes

Update workflows/microbiome/host-contamination-removal/host-contamina…

d118fe5

…tion-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml Co-authored-by: paulzierep <[email protected]>

paulzierep reviewed Dec 3, 2025

View reviewed changes

...ost-contamination-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml Outdated Show resolved Hide resolved

Update workflows/microbiome/host-contamination-removal/host-contamina…

c03b9dc

…tion-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml Co-authored-by: paulzierep <[email protected]>

paulzierep approved these changes Dec 3, 2025

View reviewed changes

mvdbeek reviewed Dec 3, 2025

View reviewed changes

...ost-contamination-removal/host-contamination-removal-long-reads/plnmotmptestjob4912th98.json Outdated Show resolved Hide resolved

Delete workflows/microbiome/host-contamination-removal/host-contamina…

d99c889

…tion-removal-long-reads/plnmotmptestjob4912th98.json

paulzierep reviewed Dec 3, 2025

View reviewed changes

...removal/host-contamination-removal-long-reads/host-or-contamination-removal-on-long-reads.ga Show resolved Hide resolved

Try to fix tests

1659aac

Use smaller index

dc70072

Fix tests

7420550

mvdbeek approved these changes Dec 5, 2025

View reviewed changes

mvdbeek merged commit c911863 into galaxyproject:main Dec 5, 2025
8 checks passed

bebatut deleted the host-contamination-removal branch December 5, 2025 17:52

Add 2 workflows (long and short-reads) for host and contamination removal from microbiome data #991

Add 2 workflows (long and short-reads) for host and contamination removal from microbiome data #991

Uh oh!

Conversation

bebatut commented Oct 9, 2025

Uh oh!

wm75 commented Oct 10, 2025

Uh oh!

Uh oh!

paulzierep commented Nov 10, 2025

Uh oh!

Uh oh!

paulzierep commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mvdbeek commented Dec 3, 2025

Uh oh!

paulzierep commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bebatut commented Dec 3, 2025

Uh oh!

Uh oh!

mvdbeek commented Dec 3, 2025

Uh oh!

mvdbeek commented Dec 3, 2025

When to use this workflow

When to use this workflow

[0.1] yyyy-mm-dd

Uh oh!

mvdbeek commented Dec 3, 2025

Uh oh!

bebatut commented Dec 5, 2025

Uh oh!

mvdbeek commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Test Results (powered by Planemo)

Test Summary

Workflow invocation details

Workflow invocation details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

paulzierep commented Dec 3, 2025 •

edited

Loading

paulzierep commented Dec 3, 2025 •

edited

Loading