Skip to content

Conversation

@bebatut
Copy link
Member

@bebatut bebatut commented Oct 9, 2025

FOR CONTRIBUTOR:

  • I have read the Adding workflows guidelines
  • License permits unrestricted use (educational + commercial)
  • Please also take note of the reviewer guidelines below to facilitate a smooth review process.

FOR REVIEWERS:

  • .dockstore.yml: file is present and aligned with creator metadata in workflow. ORCID identifiers are strongly encouraged in creator metadata. The .dockstore.yml file is required to run tests
  • Workflow is sufficiently generic to be used with lab data and does not hardcode sample names, reference data and can be run without reading an accompanying tutorial.
  • In workflow: annotation field contains short description of what the workflow does. Should start with This workflow does/runs/performs … xyz … to generate/analyze/etc …
  • In workflow: workflow inputs and outputs have human readable names (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless it is generally understood. Altering input or output labels requires adjusting these labels in the the workflow-tests.yml file as well
  • In workflow: name field should be human readable (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless generally understood
  • Workflow folder: prefer dash (-) over underscore (_), prefer all lowercase. Folder becomes repository in iwc-workflows organization and is included in TRS id
  • Readme explains what workflow does, what are valid inputs and what outputs users can expect. If a tutorial or other resources exist they can be linked. If a similar workflow exists in IWC readme should explain differences with existing workflow and when one might prefer one workflow over another
  • Changelog contains appropriate entries
  • Large files (> 100 KB) are uploaded to zenodo and location urls are used in test file

@wm75
Copy link
Member

wm75 commented Oct 10, 2025

@bebatut @paulzierep just remembered something: https://github.com/bede/deacon

Have you thought about this as an apparently very performant alternative?

@paulzierep
Copy link
Collaborator

@bebatut @paulzierep just remembered something: https://github.com/bede/deacon

Have you thought about this as an apparently very performant alternative?

@santino is working on this galaxyproject/tools-iuc#7438
We will add it as alternative to this wf once its ready, for now it would be good to merge this first version when actions pass.

…tion-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml

Co-authored-by: paulzierep <[email protected]>
…tion-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml

Co-authored-by: paulzierep <[email protected]>
@paulzierep
Copy link
Collaborator

paulzierep commented Dec 3, 2025

Not sure why the last run was cancelled, the artifact said the run was successful. @bebatut can you make a small update to retrigger the CI ?

@mvdbeek
Copy link
Member

mvdbeek commented Dec 3, 2025

@paulzierep I can add you to the org, but can you please go through the reviewer checklist ? In particular

workflow inputs and outputs have human readable names

…tion-removal-long-reads/plnmotmptestjob4912th98.json
@paulzierep
Copy link
Collaborator

paulzierep commented Dec 3, 2025

@paulzierep I can add you to the org, but can you please go through the reviewer checklist ? In particular

workflow inputs and outputs have human readable names

sure, thanks, will make sure to check it

@bebatut
Copy link
Member Author

bebatut commented Dec 3, 2025

workflow inputs and outputs have human readable names

I might have stupid questions. Can that be done in the workflow editor or only in the .ga? Because in the editor, I only see the Label. And what is the difference between the Label and Name for an input?

@mvdbeek
Copy link
Member

mvdbeek commented Dec 3, 2025

Can that be done in the workflow editor

Yes, find the tool and output, then in the right side

Screenshot 2025-12-03 at 16 29 50

@mvdbeek
Copy link
Member

mvdbeek commented Dec 3, 2025

Here's the full review:

PR #991 Review: Host and Contamination Removal Workflows

Summary

This PR adds two new workflows for host and contamination removal from microbiome data:

  • Long-reads workflow (Nanopore): Uses Minimap2 for mapping
  • Short-reads workflow (Illumina): Uses Bowtie2 for mapping

Detailed Review Against Checklist

✅ PASS: .dockstore.yml files

  • ✅ Both workflows have .dockstore.yml files present
  • ✅ ORCID identifiers included for both authors (Paul Zierep: 0000-0003-2982-388X, Bérénice Batut: 0000-0001-9852-1987)
  • ✅ Creator metadata in .dockstore.yml aligns with workflow creator metadata
  • ✅ Test files are correctly referenced

Files checked:

  • workflows/microbiome/host-contamination-removal/host-contamination-removal-long-reads/.dockstore.yml:10-13
  • workflows/microbiome/host-contamination-removal/host-contamination-removal-short-reads/.dockstore.yml:10-13

✅ PASS: Workflow is sufficiently generic

  • ✅ No hardcoded sample names
  • ✅ Reference genome is a parameter input (not hardcoded)
  • ✅ Workflows can be run with any lab data
  • ✅ Does not require reading a tutorial to understand usage

✅ PASS: Annotation field

Both workflows have appropriate annotation fields:

Long-reads workflow (host-or-contamination-removal-on-long-reads.ga:3):
"This workflow takes Nanopore fastq(.gz) files and runs Minimap2 to map the reads against a reference genome (human, by default). It filters the output
to keep only the unmapped reads and generates mapping statistics that are aggregated into a MultiQC report."

Short-reads workflow (host-or-contamination-removal-on-short-reads.ga:3):
"This workflow takes paired-end Illumina fastq(.gz) files and runs Bowtie to map the reads against a reference genome (human, by default) and keep only
the reads that do not align. MultiQC is used to aggregate the mapping reports."

Both follow the recommended pattern and clearly describe what the workflow does.


⚠️ NEEDS ATTENTION: Workflow output labels contain underscores

Long-reads workflow has THREE workflow outputs with underscore naming:

  1. "label": "qualimap_stats" (host-or-contamination-removal-on-long-reads.ga:227)
    - Should be: "QualiMap Statistics" or "QualiMap Stats"
  2. "label": "samtools_fastx" (host-or-contamination-removal-on-long-reads.ga:373)
    - Should be: "Reads without host or contamination" or "Filtered Reads"
  3. "label": "multiqc_html_report" (host-or-contamination-removal-on-long-reads.ga:440)
    - Should be: "MultiQC HTML Report"

Short-reads workflow has TWO workflow outputs with underscore naming:

  1. "label": "bowtie2_mapping_statistics" (host-or-contamination-removal-on-short-reads.ga:167)
    - Should be: "Bowtie2 Mapping Statistics"
  2. "label": "multiqc_html_report" (host-or-contamination-removal-on-short-reads.ga:285)
    - Should be: "MultiQC HTML Report"

Note: contamination_filtered_reads should also be updated to "Contamination Filtered Reads" or "Reads without Host or Contamination"

The test files (*-tests.yml) must also be updated to match the new labels.


✅ PASS: Workflow input labels

All input labels are human-readable:

  • ✅ "Long-reads" (long-reads workflow)
  • ✅ "Short-reads" (short-reads workflow)
  • ✅ "Host/Contaminant Reference Genome" (both)
  • ✅ "Profile of preset options for the mapping (long-read)" (long-reads workflow)

No underscores or unnecessary abbreviations.


⚠️ NEEDS ATTENTION: Grammar issue in naming

Issue: "long-reads" vs "long-read" as compound adjective

The folder name and workflow names use "long-reads" (plural), but when used as a compound adjective modifying another noun, it should be singular:

  • Folder: host-contamination-removal-long-reads
    • Should be: host-contamination-removal-long-read
  • Workflow name: "Host or Contamination removal on long-reads"
    • Should be: "Host or Contamination removal on long-read" (if keeping hyphenated form)
    • OR better: "Host or Contamination Removal for Long-Read Data"

Same issue with short-reads workflow:

  • Folder: host-contamination-removal-short-reads
    • Should be: host-contamination-removal-short-read
  • Workflow name: "Host or Contamination removal on short-reads"
    • Should be: "Host or Contamination removal on short-read"
    • OR better: "Host or Contamination Removal for Short-Read Data"

Explanation: When a technical term is used as a compound adjective (modifying another noun), it should be singular. Examples:

  • ✅ "short-read quality control" (short-read modifies "quality control")
  • ❌ "short-reads quality control"
  • ✅ "single-cell RNA-seq" (single-cell modifies "RNA-seq")
  • ❌ "single-cells RNA-seq"

In this case, "long-reads" appears to be part of the workflow context, but it's technically modifying the type of workflow/process. For consistency
with IWC naming conventions, consider using the singular form.


⚠️ NEEDS ATTENTION: Workflow name capitalization

Both workflow names have inconsistent capitalization:

  • "Host or Contamination removal on long-reads" - "removal" should be capitalized
  • "Host or Contamination removal on short-reads" - "removal" should be capitalized

Should be: "Host or Contamination Removal on Long-Read" (or equivalent)


✅ PASS: Workflow folder naming

  • ✅ Both folders use dashes (not underscores)
  • ✅ All lowercase
  • ℹ️ See grammar note above regarding "long-reads" vs "long-read"

⚠️ NEEDS MINOR IMPROVEMENT: README

Both READMEs are good and explain:

  • ✅ What the workflow does
  • ✅ Valid inputs
  • ✅ Expected outputs

Minor issues:

  1. Numbering error in long-reads README (README.md:9):
    - Step 2 appears twice (lines 7 and 9)
    - Should renumber step 4 as step 4, not step 2
  2. Missing comparison: Since there are TWO workflows in this PR that serve similar purposes (host removal), the READMEs should explain:
    - When to use the long-reads workflow vs the short-reads workflow
    - What are the key differences

Suggested addition to both READMEs:

For long-reads README:

When to use this workflow

Use this workflow for long-read sequencing data (e.g., Nanopore, PacBio). For short-read Illumina data, see the Host or Contamination removal on
short-reads
workflow.

For short-reads README:

When to use this workflow

Use this workflow for short-read paired-end Illumina sequencing data. For long-read data (Nanopore, PacBio), see the Host or Contamination removal
on long-reads
workflow.


⚠️ NEEDS ATTENTION: Changelog

Both CHANGELOG files have placeholder dates:

[0.1] yyyy-mm-dd

Action needed: Replace yyyy-mm-dd with the actual release date (typically the PR merge date).


✅ PASS: Large files


Overall Recommendation: REQUEST CHANGES

Required Changes:

  1. Fix workflow output labels to use human-readable names without underscores:
    - Update all workflow_outputs labels in both .ga files
    - Update corresponding labels in test files (*-tests.yml)
  2. Update CHANGELOG dates from yyyy-mm-dd to actual date

Recommended Changes (not blocking):

  1. Fix grammar: Consider changing folder names and workflow names from "long-reads"/"short-reads" to "long-read"/"short-read" for grammatical
    consistency
  2. Fix workflow name capitalization: "removal" should be "Removal"
  3. Add comparison section to READMEs: Explain when to use each workflow
  4. Fix numbering error in long-reads README (step numbering)

Files That Need Updates:

Must fix:

  • workflows/microbiome/host-contamination-removal/host-contamination-removal-long-reads/host-or-contamination-removal-on-long-reads.ga (lines 227, 373,
  • workflows/microbiome/host-contamination-removal/host-contamination-removal-long-reads/host-or-contamination-removal-on-long-reads-tests.yml (lines
    18, 168, 174)
  • workflows/microbiome/host-contamination-removal/host-contamination-removal-short-reads/host-or-contamination-removal-on-short-reads.ga (lines 167,
    218, 285)
  • workflows/microbiome/host-contamination-removal/host-contamination-removal-short-reads/host-or-contamination-removal-on-short-reads-tests.yml (lines
    21, 27)
  • Both CHANGELOG.md files (line 3)

Recommended to fix:

  • Both README.md files (add comparison sections)
  • Long-reads README.md (fix step numbering)
  • Consider renaming folders and updating workflow names for grammar consistency

Address PR review feedback for galaxyproject#991:

- Update workflow output labels to use human-readable names without underscores:
  * Long-reads: "qualimap_stats" → "QualiMap Statistics"
  * Long-reads: "samtools_fastx" → "Reads without Host or Contamination"
  * Long-reads: "multiqc_html_report" → "MultiQC HTML Report"
  * Short-reads: "bowtie2_mapping_statistics" → "Bowtie2 Mapping Statistics"
  * Short-reads: "contamination_filtered_reads" → "Contamination Filtered Reads"
  * Short-reads: "multiqc_html_report" → "MultiQC HTML Report"

- Update corresponding labels in test files to match workflow outputs

- Fix workflow name capitalization:
  * "removal" → "Removal" in both workflow names

- Update CHANGELOG dates from "yyyy-mm-dd" to actual date (2025-12-03)

- Improve README documentation:
  * Fix step numbering in long-reads README (was: 1,2,3,2; now: 1,2,3,4)
  * Add "When to use this workflow" sections to both READMEs
  * Cross-reference between long-reads and short-reads workflows

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@mvdbeek
Copy link
Member

mvdbeek commented Dec 3, 2025

I've pushed a commit that should address the review comments, please have a close look at the changes.

@bebatut
Copy link
Member Author

bebatut commented Dec 5, 2025

I do not understand the error:

Error reading tool from path: /home/runner/work/iwc/iwc/workflows/microbiome/host-contamination-removal/host-contamination-removal-long-reads/host-or-contamination-removal-on-long-reads.ga
Traceback (most recent call last):
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/tool_util/toolbox/base.py", line 950, in _load_tool_tag_set
    tool = self.load_tool(concrete_path, use_cached=False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/tool_util/toolbox/base.py", line 1203, in load_tool
    tool = self.create_tool(
           ^^^^^^^^^^^^^^^^^
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/tools/__init__.py", line 587, in create_tool
    tool_source = self.get_expanded_tool_source(config_file)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/tools/__init__.py", line 600, in get_expanded_tool_source
    raise e
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/tools/__init__.py", line 592, in get_expanded_tool_source
    return get_tool_source(
           ^^^^^^^^^^^^^^^^
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/tool_util/parser/factory.py", line 106, in get_tool_source
    tree, macro_paths = load_tool_with_refereces(config_file)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/util/xml_macros.py", line 36, in load_with_references
    tree = raw_xml_tree(path)
           ^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/util/xml_macros.py", line 90, in raw_xml_tree
    tree = parse_xml(path, strip_whitespace=False, remove_comments=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp7aen1w7b/galaxy-dev/lib/galaxy/util/__init__.py", line 392, in parse_xml
    tree = cast(ElementTree, etree.parse(f, parser=parser))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/lxml/etree.pyx", line 3711, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 2052, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 2070, in lxml.etree._parseFilelikeDocument
  File "src/lxml/parser.pxi", line 1965, in lxml.etree._parseDocFromFilelike
  File "src/lxml/parser.pxi", line 1254, in lxml.etree._BaseParser._parseDocFromFilelike
  File "src/lxml/parser.pxi", line 647, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 765, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 689, in lxml.etree._raiseParseError
  File "/home/runner/work/iwc/iwc/workflows/microbiome/host-contamination-removal/host-contamination-removal-long-reads/host-or-contamination-removal-on-long-reads.ga", line 1
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

The test passes when I test on EU

@mvdbeek
Copy link
Member

mvdbeek commented Dec 5, 2025

For the record this is the error: https://github.com/galaxyproject/iwc/actions/runs/19961703316/job/57243548336?pr=991#step:7:2130. Most likely running out of memory, there are 2 minimap 2 jobs each consuming 13GB of memory, changing the index for the test might be helpful.

@github-actions
Copy link

github-actions bot commented Dec 5, 2025

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 2
Passed 1
Error 0
Failure 1
Skipped 0
Failed Tests
  • ❌ host-or-contamination-removal-on-long-reads.ga_0

    Problems:

    • Output with path /tmp/tmp684z2veb/QualiMap BamQC__bb9f461e-a9cd-4fe9-862f-a05c8fe52b2e.txt different than expected
      Expected text '3,209,286,105 bp' in output ('BamQC report
      -----------------------------------
      
      >>>>>>> Input
      
           bam file = Spike3bBarcode10
           outfile = results/genome_results.txt
      
      
      >>>>>>> Reference
      
           number of bases = 586,300,787 bp
           number of contigs = 17
      
      
      >>>>>>> Globals
      
           number of windows = 416
      
           number of reads = 24,877
           number of mapped reads = 15 (0.06%)
           number of supplementary alignments = 15 (0.06%)
           number of secondary alignments = 23
      
           number of mapped bases = 6,267 bp
           number of sequenced bases = 5,873 bp
           number of aligned bases = 0 bp
           number of duplicated reads (estimated) = 9
           duplication rate = 19.05%
      
      
      >>>>>>> Insert size
      
           mean insert size = 0
           std insert size = 0
           median insert size = 0
      
      
      >>>>>>> Mapping quality
      
           mean mapping quality = 0.295
      
      
      >>>>>>> ACTG content
      
           number of A's = 822 bp (14%)
           number of C's = 1,331 bp (22.66%)
           number of T's = 1,549 bp (26.37%)
           number of G's = 2,171 bp (36.97%)
           number of N's = 0 bp (0%)
      
           GC percentage = 59.63%
      
      
      >>>>>>> Mismatches and indels
      
          general error rate = 0.3094
          number of mismatches = 0
          number of insertions = 217
          mapped reads with insertion percentage = 173.33%
          number of deletions = 108
          mapped reads with deletion percentage = 146.67%
          homopolymer indels = 23.08%
      
      
      >>>>>>> Coverage
      
           mean coverageData = 0X
           std coverageData = 0.0067X
      
           There is a 0% of reference with a coverageData >= 1X
           There is a 0% of reference with a coverageData >= 2X
           There is a 0% of reference with a coverageData >= 3X
           There is a 0% of reference with a coverageData >= 4X
           There is a 0% of reference with a coverageData >= 5X
           There is a 0% of reference with a coverageData >= 6X
           There is a 0% of reference with a coverageData >= 7X
           There is a 0% of reference with a coverageData >= 8X
           There is a 0% of reference with a coverageData >= 9X
           There is a 0% of reference with a coverageData >= 10X
           There is a 0% of reference with a coverageData >= 11X
           There is a 0% of reference with a coverageData >= 12X
           There is a 0% of reference with a coverageData >= 13X
           There is a 0% of reference with a coverageData >= 14X
           There is a 0% of reference with a coverageData >= 15X
           There is a 0% of reference with a coverageData >= 16X
           There is a 0% of reference with a coverageData >= 17X
           There is a 0% of reference with a coverageData >= 18X
           There is a 0% of reference with a coverageData >= 19X
           There is a 0% of reference with a coverageData >= 20X
           There is a 0% of reference with a coverageData >= 21X
           There is a 0% of reference with a coverageData >= 22X
           There is a 0% of reference with a coverageData >= 23X
           There is a 0% of reference with a coverageData >= 24X
           There is a 0% of reference with a coverageData >= 25X
           There is a 0% of reference with a coverageData >= 26X
           There is a 0% of reference with a coverageData >= 27X
           There is a 0% of reference with a coverageData >= 28X
           There is a 0% of reference with a coverageData >= 29X
           There is a 0% of reference with a coverageData >= 30X
           There is a 0% of reference with a coverageData >= 31X
           There is a 0% of reference with a coverageData >= 32X
           There is a 0% of reference with a coverageData >= 33X
           There is a 0% of reference with a coverageData >= 34X
           There is a 0% of reference with a coverageData >= 35X
           There is a 0% of reference with a coverageData >= 36X
           There is a 0% of reference with a coverageData >= 37X
           There is a 0% of reference with a coverageData >= 38X
           There is a 0% of reference with a coverageData >= 39X
           There is a 0% of reference with a coverageData >= 40X
           There is a 0% of reference with a coverageData >= 41X
           There is a 0% of reference with a coverageData >= 42X
           There is a 0% of reference with a coverageData >= 43X
           There is a 0% of reference with a coverageData >= 44X
           There is a 0% of reference with a coverageData >= 45X
           There is a 0% of reference with a coverageData >= 46X
           There is a 0% of reference with a coverageData >= 47X
           There is a 0% of reference with a coverageData >= 48X
           There is a 0% of reference with a coverageData >= 49X
           There is a 0% of reference with a coverageData >= 50X
           There is a 0% of reference with a coverageData >= 51X
      
      
      >>>>>>> Coverage per contig
      
      	Group10	11440700	0	0.0	0.0
      	Group11	12576330	247	1.9640069877301247E-5	0.0035106469234644855
      	Group12	9182753	0	0.0	0.0
      	Group13	8929068	124	1.3887227647947132E-5	0.0034938767695826276
      	Group14	8318479	0	0.0	0.0
      	Group15	7856270	0	0.0	0.0
      	Group16	5631066	0	0.0	0.0
      	Group1	25854376	118	4.564024287416567E-6	0.002127281278587259
      	Group2	14465785	646	4.465709949373643E-5	0.006461575798427947
      	Group3	12341916	0	0.0	0.0
      	Group4	10796202	0	0.0	0.0
      	Group5	13386189	2572	1.9213833003553138E-4	0.03819991449493488
      	Group6	14581788	610	4.183300429275203E-5	0.011327358078360415
      	Group7	9974240	0	0.0	0.0
      	Group8	11452794	77	6.723250239199273E-6	0.00259291439062409
      	Group9	10282195	0	0.0	0.0
      	GroupUn	399230636	1873	4.6915237236452965E-6	0.0031478625233298647
      
      
      ')
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: Host/Contaminant Reference Genome (long-reads):

        • step_state: scheduled
      • Step 2: Long-reads:

        • step_state: scheduled
      • Step 3: Profile of preset options for the mapping (long-read):

        • step_state: scheduled
      • Step 4: minimap2:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • ln -f -s '/cvmfs/data.galaxyproject.org/byhand/apiMel3/seq/apiMel3.fa' reference.fa && minimap2 -x map-pb    --q-occ-frac 0.01       -t ${GALAXY_SLOTS:-4} reference.fa '/tmp/tmp6lbm2yua/files/6/f/5/dataset_6f5c06e5-9cc8-462d-bad0-8e53d2483203.dat' -a | samtools view --no-PG -hT reference.fa | samtools sort -@${GALAXY_SLOTS:-2} -T "${TMPDIR:-.}" -O BAM -o '/tmp/tmp6lbm2yua/job_working_directory/000/3/outputs/dataset_6b5acde1-9bd9-4e82-9359-ae879a76b274.dat'

            Exit Code:

            • 0

            Standard Error:

            • [M::mm_idx_gen::17.979*0.40] collected minimizers
              [M::mm_idx_gen::19.955*0.46] sorted minimizers
              [M::main::19.955*0.46] loaded/built the index for 17 target sequence(s)
              [M::mm_mapopt_update::20.249*0.47] mid_occ = 138
              [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 17
              [M::mm_idx_stat::20.446*0.47] distinct minimizers: 17801079 (80.12% are singletons); average occurrences: 1.615; average spacing: 20.392; total length: 586300787
              [M::worker_pipeline::22.293*0.51] mapped 24877 sequences
              [M::main] Version: 2.28-r1209
              [M::main] CMD: minimap2 -x map-pb --q-occ-frac 0.01 -t 1 -a reference.fa /tmp/tmp6lbm2yua/files/6/f/5/dataset_6f5c06e5-9cc8-462d-bad0-8e53d2483203.dat
              [M::main] Real time: 22.304 sec; CPU: 11.358 sec; Peak RSS: 2.067 GB
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "cbd54a0ed1f111f0ae2b7ced8d8023f5"
              alignment_options {"A": null, "B": null, "E": null, "E2": null, "O": null, "O2": null, "no_end_flt": true, "s": null, "splicing": {"__current_case__": 0, "splice_mode": "preset"}, "z": null, "z2": null}
              chromInfo "/tmp/tmp6lbm2yua/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              fastq_input {"__current_case__": 0, "analysis_type_selector": "map-pb", "fastq_input1": {"values": [{"id": 1, "src": "dce"}]}, "fastq_input_selector": "single"}
              indexing_options {"H": false, "I": null, "k": null, "w": null}
              io_options {"K": null, "L": false, "Q": false, "Y": false, "c": false, "cs": null, "eqx": false, "output_format": "BAM"}
              mapping_options {"F": null, "N": null, "X": false, "f": null, "g": null, "kmer_ocurrence_interval": {"__current_case__": 1, "interval": ""}, "m": null, "mask_len": null, "max_chain_iter": null, "max_chain_skip": null, "min_occ_floor": null, "n": null, "p": null, "q_occ_frac": "0.01", "r": null}
              reference_source {"__current_case__": 0, "ref_file": "apiMel3", "reference_source_selector": "cached"}
          • Job 2:

            • Job state is ok

            Command Line:

            • ln -f -s '/cvmfs/data.galaxyproject.org/byhand/apiMel3/seq/apiMel3.fa' reference.fa && minimap2 -x map-pb    --q-occ-frac 0.01       -t ${GALAXY_SLOTS:-4} reference.fa '/tmp/tmp6lbm2yua/files/a/e/c/dataset_aec527c0-5b27-481d-94c0-51d67018f616.dat' -a | samtools view --no-PG -hT reference.fa | samtools sort -@${GALAXY_SLOTS:-2} -T "${TMPDIR:-.}" -O BAM -o '/tmp/tmp6lbm2yua/job_working_directory/000/4/outputs/dataset_5cbf10ca-a189-41bf-a2bf-bd8862624762.dat'

            Exit Code:

            • 0

            Standard Error:

            • [M::mm_idx_gen::18.024*0.40] collected minimizers
              [M::mm_idx_gen::19.995*0.46] sorted minimizers
              [M::main::19.996*0.46] loaded/built the index for 17 target sequence(s)
              [M::mm_mapopt_update::20.284*0.47] mid_occ = 138
              [M::mm_idx_stat] kmer size: 19; skip: 10; is_hpc: 1; #seq: 17
              [M::mm_idx_stat::20.486*0.47] distinct minimizers: 17801079 (80.12% are singletons); average occurrences: 1.615; average spacing: 20.392; total length: 586300787
              [M::worker_pipeline::22.162*0.51] mapped 16951 sequences
              [M::main] Version: 2.28-r1209
              [M::main] CMD: minimap2 -x map-pb --q-occ-frac 0.01 -t 1 -a reference.fa /tmp/tmp6lbm2yua/files/a/e/c/dataset_aec527c0-5b27-481d-94c0-51d67018f616.dat
              [M::main] Real time: 22.178 sec; CPU: 11.258 sec; Peak RSS: 2.067 GB
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "cbd54a0ed1f111f0ae2b7ced8d8023f5"
              alignment_options {"A": null, "B": null, "E": null, "E2": null, "O": null, "O2": null, "no_end_flt": true, "s": null, "splicing": {"__current_case__": 0, "splice_mode": "preset"}, "z": null, "z2": null}
              chromInfo "/tmp/tmp6lbm2yua/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              fastq_input {"__current_case__": 0, "analysis_type_selector": "map-pb", "fastq_input1": {"values": [{"id": 2, "src": "dce"}]}, "fastq_input_selector": "single"}
              indexing_options {"H": false, "I": null, "k": null, "w": null}
              io_options {"K": null, "L": false, "Q": false, "Y": false, "c": false, "cs": null, "eqx": false, "output_format": "BAM"}
              mapping_options {"F": null, "N": null, "X": false, "f": null, "g": null, "kmer_ocurrence_interval": {"__current_case__": 1, "interval": ""}, "m": null, "mask_len": null, "max_chain_iter": null, "max_chain_skip": null, "min_occ_floor": null, "n": null, "p": null, "q_occ_frac": "0.01", "r": null}
              reference_source {"__current_case__": 0, "ref_file": "apiMel3", "reference_source_selector": "cached"}
      • Step 5: QualiMap:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • export JAVA_OPTS="-Djava.awt.headless=true -Xmx${GALAXY_MEMORY_MB:-1024}m" &&    ln -s '/tmp/tmp6lbm2yua/files/6/b/5/dataset_6b5acde1-9bd9-4e82-9359-ae879a76b274.dat' 'Spike3bBarcode10' &&  qualimap bamqc -bam 'Spike3bBarcode10' -outdir results -outformat html --collect-overlap-pairs -nw 400 --paint-chromosome-limits -hm 3  --skip-duplicated --skip-dup-mode 0 -nt ${GALAXY_SLOTS:-1} &&   sed 's|images_qualimapReport/||g;s|css/||g' results/qualimapReport.html > '/tmp/tmp6lbm2yua/job_working_directory/000/5/outputs/dataset_21d60356-05fe-44ec-b85b-681b72e24b09.dat' && mkdir '/tmp/tmp6lbm2yua/job_working_directory/000/5/outputs/dataset_21d60356-05fe-44ec-b85b-681b72e24b09_files' && mv results/css/*.css '/tmp/tmp6lbm2yua/job_working_directory/000/5/outputs/dataset_21d60356-05fe-44ec-b85b-681b72e24b09_files' && mv results/css/*.png '/tmp/tmp6lbm2yua/job_working_directory/000/5/outputs/dataset_21d60356-05fe-44ec-b85b-681b72e24b09_files' && if [ -d results/images_qualimapReport ]; then mv results/images_qualimapReport/* '/tmp/tmp6lbm2yua/job_working_directory/000/5/outputs/dataset_21d60356-05fe-44ec-b85b-681b72e24b09_files' && for file in $(ls -A results/raw_data_qualimapReport); do mv "results/raw_data_qualimapReport/$file" `echo "results/$file" | sed 's/(//;s/)//'`; done fi && mv results/genome_results.txt results/summary_report.txt

            Exit Code:

            • 0

            Standard Output:

            • Java memory size is set to 1200M
              Launching application...
              
              detected environment java options -Djava.awt.headless=true -Xmx1024m
              QualiMap v.2.3
              Built on 2023-05-19 16:57
              
              Selected tool: bamqc
              Available memory (Mb): 253
              Max memory (Mb): 1037
              Starting bam qc....
              Loading sam header...
              Loading locator...
              Loading reference...
              Only flagged duplicate alignments will be skipped...
              Number of windows: 400, effective number of windows: 416
              Chunk of reads size: 1000
              Number of threads: 1
              Processed 50 out of 416 windows...
              Processed 100 out of 416 windows...
              Processed 150 out of 416 windows...
              Processed 200 out of 416 windows...
              Processed 250 out of 416 windows...
              Processed 300 out of 416 windows...
              Processed 350 out of 416 windows...
              Processed 400 out of 416 windows...
              Total processed windows:416
              Number of reads: 24877
              Number of valid reads: 30
              Number of correct strand reads:0
              
              Inside of regions...
              Num mapped reads: 15
              Num mapped first of pair: 0
              Num mapped second of pair: 0
              Num singletons: 0
              Time taken to analyze reads: 4
              Computing descriptors...
              numberOfMappedBases: 6267
              referenceSize: 586300787
              numberOfSequencedBases: 5873
              numberOfAs: 822
              Computing per chromosome statistics...
              Computing histograms...
              Overall analysis time: 5
              end of bam qc
              Computing report...
              Writing HTML report...
              HTML report created successfully
              
              Finished
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "cbd54a0ed1f111f0ae2b7ced8d8023f5"
              chromInfo "/cvmfs/data.galaxyproject.org/managed/len/ucsc/apiMel3.len"
              dbkey "apiMel3"
              duplicate_skipping "0"
              per_base_coverage false
              plot_specific {"genome_gc_distr": null, "homopolymer_size": "3", "n_bins": "400", "paint_chromosome_limits": true}
              stats_regions {"__current_case__": 0, "region_select": "all"}
          • Job 2:

            • Job state is ok

            Command Line:

            • export JAVA_OPTS="-Djava.awt.headless=true -Xmx${GALAXY_MEMORY_MB:-1024}m" &&    ln -s '/tmp/tmp6lbm2yua/files/5/c/b/dataset_5cbf10ca-a189-41bf-a2bf-bd8862624762.dat' 'Spike3bBarcode12' &&  qualimap bamqc -bam 'Spike3bBarcode12' -outdir results -outformat html --collect-overlap-pairs -nw 400 --paint-chromosome-limits -hm 3  --skip-duplicated --skip-dup-mode 0 -nt ${GALAXY_SLOTS:-1} &&   sed 's|images_qualimapReport/||g;s|css/||g' results/qualimapReport.html > '/tmp/tmp6lbm2yua/job_working_directory/000/6/outputs/dataset_ebaeac9e-4746-41da-9145-f966441d0619.dat' && mkdir '/tmp/tmp6lbm2yua/job_working_directory/000/6/outputs/dataset_ebaeac9e-4746-41da-9145-f966441d0619_files' && mv results/css/*.css '/tmp/tmp6lbm2yua/job_working_directory/000/6/outputs/dataset_ebaeac9e-4746-41da-9145-f966441d0619_files' && mv results/css/*.png '/tmp/tmp6lbm2yua/job_working_directory/000/6/outputs/dataset_ebaeac9e-4746-41da-9145-f966441d0619_files' && if [ -d results/images_qualimapReport ]; then mv results/images_qualimapReport/* '/tmp/tmp6lbm2yua/job_working_directory/000/6/outputs/dataset_ebaeac9e-4746-41da-9145-f966441d0619_files' && for file in $(ls -A results/raw_data_qualimapReport); do mv "results/raw_data_qualimapReport/$file" `echo "results/$file" | sed 's/(//;s/)//'`; done fi && mv results/genome_results.txt results/summary_report.txt

            Exit Code:

            • 0

            Standard Output:

            • Java memory size is set to 1200M
              Launching application...
              
              detected environment java options -Djava.awt.headless=true -Xmx1024m
              QualiMap v.2.3
              Built on 2023-05-19 16:57
              
              Selected tool: bamqc
              Available memory (Mb): 253
              Max memory (Mb): 1037
              Starting bam qc....
              Loading sam header...
              Loading locator...
              Loading reference...
              Only flagged duplicate alignments will be skipped...
              Number of windows: 400, effective number of windows: 416
              Chunk of reads size: 1000
              Number of threads: 1
              Processed 50 out of 416 windows...
              Processed 100 out of 416 windows...
              Processed 150 out of 416 windows...
              Processed 200 out of 416 windows...
              Processed 250 out of 416 windows...
              Processed 300 out of 416 windows...
              Processed 350 out of 416 windows...
              Processed 400 out of 416 windows...
              Total processed windows:416
              Number of reads: 16951
              Number of valid reads: 10
              Number of correct strand reads:0
              
              Inside of regions...
              Num mapped reads: 10
              Num mapped first of pair: 0
              Num mapped second of pair: 0
              Num singletons: 0
              Time taken to analyze reads: 4
              Computing descriptors...
              numberOfMappedBases: 2240
              referenceSize: 586300787
              numberOfSequencedBases: 2165
              numberOfAs: 635
              Computing per chromosome statistics...
              Computing histograms...
              Overall analysis time: 4
              end of bam qc
              Computing report...
              Writing HTML report...
              HTML report created successfully
              
              Finished
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "cbd54a0ed1f111f0ae2b7ced8d8023f5"
              chromInfo "/cvmfs/data.galaxyproject.org/managed/len/ucsc/apiMel3.len"
              dbkey "apiMel3"
              duplicate_skipping "0"
              per_base_coverage false
              plot_specific {"genome_gc_distr": null, "homopolymer_size": "3", "n_bins": "400", "paint_chromosome_limits": true}
              stats_regions {"__current_case__": 0, "region_select": "all"}
      • Step 6: Split BAM:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • ln -s '/tmp/tmp6lbm2yua/files/6/b/5/dataset_6b5acde1-9bd9-4e82-9359-ae879a76b274.dat' 'localbam.bam' && ln -s '/tmp/tmp6lbm2yua/files/_metadata_files/1/a/1/metadata_1a1759c2-d97f-41b5-8460-604d89109b6b.dat' 'localbam.bam.bai' && bamtools split -mapped -in localbam.bam -stub split_bam

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "bam"
              __workflow_invocation_uuid__ "cbd54a0ed1f111f0ae2b7ced8d8023f5"
              chromInfo "/cvmfs/data.galaxyproject.org/managed/len/ucsc/apiMel3.len"
              dbkey "apiMel3"
          • Job 2:

            • Job state is ok

            Command Line:

            • ln -s '/tmp/tmp6lbm2yua/files/5/c/b/dataset_5cbf10ca-a189-41bf-a2bf-bd8862624762.dat' 'localbam.bam' && ln -s '/tmp/tmp6lbm2yua/files/_metadata_files/b/d/2/metadata_bd2699ec-24f6-4aeb-8bb5-5f329386c8d0.dat' 'localbam.bam.bai' && bamtools split -mapped -in localbam.bam -stub split_bam

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "bam"
              __workflow_invocation_uuid__ "cbd54a0ed1f111f0ae2b7ced8d8023f5"
              chromInfo "/cvmfs/data.galaxyproject.org/managed/len/ucsc/apiMel3.len"
              dbkey "apiMel3"
      • Step 7: Flatten collection:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "cbd54a0ed1f111f0ae2b7ced8d8023f5"
              input {"values": [{"id": 3, "src": "hdca"}]}
              join_identifier "_"
      • Step 8: toolshed.g2.bx.psu.edu/repos/iuc/samtools_fastx/samtools_fastx/1.21+galaxy0:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • addthreads=${GALAXY_SLOTS:-1} && (( addthreads-- )) &&  samtools sort -@ $addthreads -n '/tmp/tmp6lbm2yua/files/9/4/2/dataset_942a870c-8ec0-4806-b233-981750cae363.dat' -T "${TMPDIR:-.}" > input &&   samtools fastq       -f 0   -F 2304   -G 0  input  > output.fastqsanger && ln -s output.fastqsanger output

            Exit Code:

            • 0

            Standard Error:

            • [M::bam2fq_mainloop] discarded 0 singletons
              [M::bam2fq_mainloop] processed 24862 reads
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "cbd54a0ed1f111f0ae2b7ced8d8023f5"
              chromInfo "/cvmfs/data.galaxyproject.org/managed/len/ucsc/apiMel3.len"
              copy_arb_tags None
              copy_tags false
              dbkey "apiMel3"
              exclusive_filter ["256", "2048"]
              exclusive_filter_all None
              idxout_cond {"__current_case__": 0, "idxout_select": "no"}
              inclusive_filter None
              output_fmt_cond {"__current_case__": 0, "default_quality": null, "ilumina_casava": false, "output_fmt_select": "fastqsanger", "output_quality": false}
              outputs "other"
              read_numbering ""
          • Job 2:

            • Job state is ok

            Command Line:

            • addthreads=${GALAXY_SLOTS:-1} && (( addthreads-- )) &&  samtools sort -@ $addthreads -n '/tmp/tmp6lbm2yua/files/1/9/b/dataset_19b6b2ee-9ee3-43d7-8814-fec586098884.dat' -T "${TMPDIR:-.}" > input &&   samtools fastq       -f 0   -F 2304   -G 0  input  > output.fastqsanger && ln -s output.fastqsanger output

            Exit Code:

            • 0

            Standard Error:

            • [M::bam2fq_mainloop] discarded 0 singletons
              [M::bam2fq_mainloop] processed 16941 reads
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "cbd54a0ed1f111f0ae2b7ced8d8023f5"
              chromInfo "/cvmfs/data.galaxyproject.org/managed/len/ucsc/apiMel3.len"
              copy_arb_tags None
              copy_tags false
              dbkey "apiMel3"
              exclusive_filter ["256", "2048"]
              exclusive_filter_all None
              idxout_cond {"__current_case__": 0, "idxout_select": "no"}
              inclusive_filter None
              output_fmt_cond {"__current_case__": 0, "default_quality": null, "ilumina_casava": false, "output_fmt_select": "fastqsanger", "output_quality": false}
              outputs "other"
              read_numbering ""
      • Step 9: MultiQC:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • die() { echo "$@" 1>&2 ; exit 1; } &&  mkdir multiqc_WDir &&   mkdir multiqc_WDir/qualimap_0 &&  sample="$(grep 'bam file = ' /tmp/tmp6lbm2yua/files/b/b/9/dataset_bb9f461e-a9cd-4fe9-862f-a05c8fe52b2e.dat | sed 's/bam file = //g' | sed 's: ::g')" && dir_name="multiqc_WDir/qualimap_0/${sample}" && mkdir -p ${dir_name} && filepath_1="${dir_name}/genome_results.txt" && ln -sf '/tmp/tmp6lbm2yua/files/b/b/9/dataset_bb9f461e-a9cd-4fe9-862f-a05c8fe52b2e.dat' ${filepath_1} && nested_dir_name="${dir_name}/raw_data_qualimapReport/" && mkdir -p ${nested_dir_name} && filepath_2="${nested_dir_name}/coverage_histogram.txt" && ln -sf '/tmp/tmp6lbm2yua/files/c/2/0/dataset_c20ae67a-7924-4a77-84af-3e135378cd00.dat' ${filepath_2} && nested_dir_name="${dir_name}/raw_data_qualimapReport/" && mkdir -p ${nested_dir_name} && filepath_3="${nested_dir_name}/mapped_reads_gc-content_distribution.txt" && ln -sf '/tmp/tmp6lbm2yua/files/d/e/6/dataset_de65f4dc-20e4-40d4-b06f-c87c8b03c094.dat' ${filepath_3} && sample="$(grep 'bam file = ' /tmp/tmp6lbm2yua/files/6/3/c/dataset_63cc325a-03f9-4ca3-abbb-4680a263aaf8.dat | sed 's/bam file = //g' | sed 's: ::g')" && dir_name="multiqc_WDir/qualimap_0/${sample}" && mkdir -p ${dir_name} && filepath_1="${dir_name}/genome_results.txt" && ln -sf '/tmp/tmp6lbm2yua/files/6/3/c/dataset_63cc325a-03f9-4ca3-abbb-4680a263aaf8.dat' ${filepath_1} && nested_dir_name="${dir_name}/raw_data_qualimapReport/" && mkdir -p ${nested_dir_name} && filepath_2="${nested_dir_name}/coverage_histogram.txt" && ln -sf '/tmp/tmp6lbm2yua/files/4/4/a/dataset_44a4239e-aaed-42c5-b3d1-5483008c5bd5.dat' ${filepath_2} && nested_dir_name="${dir_name}/raw_data_qualimapReport/" && mkdir -p ${nested_dir_name} && filepath_3="${nested_dir_name}/mapped_reads_gc-content_distribution.txt" && ln -sf '/tmp/tmp6lbm2yua/files/9/3/f/dataset_93f22b45-b5ec-4ff8-8414-0b0778fc979e.dat' ${filepath_3} &&    multiqc multiqc_WDir --filename 'report'  --title 'HostContamination Removal'      && mkdir -p ./plots && ls -l ./report_data/ && cp ./report_data/*plot*.txt ./plots/ | true

            Exit Code:

            • 0

            Standard Error:

            • /// MultiQC 🔍 v1.27
              
                   update_config | Report title: HostContamination Removal
                   version_check | MultiQC Version v1.32 now available!
                     file_search | Search path: /tmp/tmp6lbm2yua/job_working_directory/000/12/working/multiqc_WDir
              
                        qualimap | Found 2 BamQC reports
              
                   write_results | Data        : report_data
                   write_results | Report      : report.html
                         multiqc | MultiQC complete
              cp: cannot stat './report_data/*plot*.txt': No such file or directory
              

            Standard Output:

            • total 188
              -rw-r--r-- 1 1001 1001    531 Dec  5 15:50 multiqc.log
              -rw-r--r-- 1 1001 1001    186 Dec  5 15:50 multiqc_citations.txt
              -rw-r--r-- 1 1001 1001 158476 Dec  5 15:50 multiqc_data.json
              -rw-r--r-- 1 1001 1001    704 Dec  5 15:50 multiqc_general_stats.txt
              -rw-r--r-- 1 1001 1001    422 Dec  5 15:50 multiqc_qualimap_bamqc_genome_results.txt
              -rw-r--r-- 1 1001 1001   1136 Dec  5 15:50 multiqc_sources.txt
              -rw-r--r-- 1 1001 1001    246 Dec  5 15:50 qualimap_coverage_histogram.txt
              -rw-r--r-- 1 1001 1001   2552 Dec  5 15:50 qualimap_gc_content.txt
              -rw-r--r-- 1 1001 1001    759 Dec  5 15:50 qualimap_genome_fraction.txt
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "cbd54a0ed1f111f0ae2b7ced8d8023f5"
              chromInfo "/cvmfs/data.galaxyproject.org/managed/len/ucsc/apiMel3.len"
              comment ""
              dbkey "apiMel3"
              export false
              flat false
              image_content_input None
              results [{"__index__": 0, "software_cond": {"__current_case__": 20, "input": {"values": [{"id": 7, "src": "hdca"}]}, "software": "qualimap"}}]
              title "Host/Contamination Removal"
    • Other invocation details
      • history_id

        • 4eae5114ab312710
      • history_state

        • ok
      • invocation_id

        • 4eae5114ab312710
      • invocation_state

        • scheduled
      • workflow_id

        • 4eae5114ab312710
Passed Tests
  • ✅ host-or-contamination-removal-on-short-reads.ga_0

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: Short-reads:

        • step_state: scheduled
      • Step 2: Host/Contaminant Reference Genome:

        • step_state: scheduled
      • Step 3: Bowtie2:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • set -o | grep -q pipefail && set -o pipefail;   ln -f -s '/tmp/tmpakt3d503/files/0/1/0/dataset_010887f7-8c2a-4f52-b640-8287c47d9112.dat' input_f.fastq.gz &&  ln -f -s '/tmp/tmpakt3d503/files/d/f/5/dataset_df519121-ede9-434c-bc12-6a8617b966da.dat' input_r.fastq.gz &&    THREADS=${GALAXY_SLOTS:-4} && if [ "$THREADS" -gt 1 ]; then (( THREADS-- )); fi &&   bowtie2  -p "$THREADS"  -x '/cvmfs/data.galaxyproject.org/byhand/hg38/hg38full/bowtie2_index/hg38full'   -1 'input_f.fastq.gz' -2 'input_r.fastq.gz' --un-conc-gz '/tmp/tmpakt3d503/job_working_directory/000/3/outputs/dataset_78be4b6d-81bd-49a9-a7fc-31226db3d9e0.dat'                2> >(tee '/tmp/tmpakt3d503/job_working_directory/000/3/outputs/dataset_f5653d43-a8bc-406e-bfdb-ee3303ba91e9.dat' >&2)  | samtools sort -l 0 -T "${TMPDIR:-.}" -O bam | samtools view --no-PG -O bam -@ ${GALAXY_SLOTS:-1} -o '/tmp/tmpakt3d503/job_working_directory/000/3/outputs/dataset_fa253e38-9dde-4dc8-bed0-96f3ff77a3b7.dat'  && mv '/tmp/tmpakt3d503/job_working_directory/000/3/outputs/dataset_78be4b6d-81bd-49a9-a7fc-31226db3d9e0.1.dat' '/tmp/tmpakt3d503/job_working_directory/000/3/outputs/dataset_78be4b6d-81bd-49a9-a7fc-31226db3d9e0.dat' && mv '/tmp/tmpakt3d503/job_working_directory/000/3/outputs/dataset_78be4b6d-81bd-49a9-a7fc-31226db3d9e0.2.dat' '/tmp/tmpakt3d503/job_working_directory/000/3/outputs/dataset_9c2499c0-a007-4b4c-8f1a-4e3a1ec7be6e.dat'

            Exit Code:

            • 0

            Standard Error:

            • 9462 reads; of these:
                9462 (100.00%) were paired; of these:
                  9462 (100.00%) aligned concordantly 0 times
                  0 (0.00%) aligned concordantly exactly 1 time
                  0 (0.00%) aligned concordantly >1 times
                  ----
                  9462 pairs aligned concordantly 0 times; of these:
                    0 (0.00%) aligned discordantly 1 time
                  ----
                  9462 pairs aligned 0 times concordantly or discordantly; of these:
                    18924 mates make up the pairs; of these:
                      18924 (100.00%) aligned 0 times
                      0 (0.00%) aligned exactly 1 time
                      0 (0.00%) aligned >1 times
              0.00% overall alignment rate
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "5a13a61cd1f211f0ae2b00224808a459"
              analysis_type {"__current_case__": 0, "analysis_type_selector": "simple", "presets": "no_presets"}
              chromInfo "/tmp/tmpakt3d503/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              library {"__current_case__": 2, "aligned_file": false, "input_1": {"values": [{"id": 1, "src": "dce"}]}, "paired_options": {"__current_case__": 1, "paired_options_selector": "no"}, "type": "paired_collection", "unaligned_file": true}
              reference_genome {"__current_case__": 0, "index": "hg38", "source": "indexed"}
              rg {"__current_case__": 3, "rg_selector": "do_not_set"}
              sam_options {"__current_case__": 1, "sam_options_selector": "no"}
              save_mapping_stats true
      • Step 4: Create a paired collection:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __workflow_invocation_uuid__ "5a13a61cd1f211f0ae2b00224808a459"
      • Step 5: MultiQC:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • die() { echo "$@" 1>&2 ; exit 1; } &&  mkdir multiqc_WDir &&   mkdir multiqc_WDir/bowtie2_0 &&         grep -Pq '% overall alignment rate' /tmp/tmpakt3d503/files/f/5/6/dataset_f5653d43-a8bc-406e-bfdb-ee3303ba91e9.dat || die "Module 'bowtie2: '% overall alignment rate' not found in the file 'pair'" && ln -s '/tmp/tmpakt3d503/files/f/5/6/dataset_f5653d43-a8bc-406e-bfdb-ee3303ba91e9.dat' 'multiqc_WDir/bowtie2_0/pair'  &&      multiqc multiqc_WDir --filename 'report'  --title 'Host Removal'      && mkdir -p ./plots && ls -l ./report_data/ && cp ./report_data/*plot*.txt ./plots/ | true

            Exit Code:

            • 0

            Standard Error:

            • /// MultiQC 🔍 v1.27
              
                   update_config | Report title: Host Removal
                   version_check | MultiQC Version v1.32 now available!
                     file_search | Search path: /tmp/tmpakt3d503/job_working_directory/000/5/working/multiqc_WDir
              
                         bowtie2 | Found 1 reports
              
                   write_results | Data        : report_data
                   write_results | Report      : report.html
                         multiqc | MultiQC complete
              

            Standard Output:

            • total 56
              -rw-r--r-- 1 1001 1001    43 Dec  5 15:56 bowtie2_pe_plot.txt
              -rw-r--r-- 1 1001 1001   511 Dec  5 15:56 multiqc.log
              -rw-r--r-- 1 1001 1001   368 Dec  5 15:56 multiqc_bowtie2.txt
              -rw-r--r-- 1 1001 1001   245 Dec  5 15:56 multiqc_citations.txt
              -rw-r--r-- 1 1001 1001 30791 Dec  5 15:56 multiqc_data.json
              -rw-r--r-- 1 1001 1001    55 Dec  5 15:56 multiqc_general_stats.txt
              -rw-r--r-- 1 1001 1001   151 Dec  5 15:56 multiqc_sources.txt
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "5a13a61cd1f211f0ae2b00224808a459"
              chromInfo "/cvmfs/data.galaxyproject.org/managed/len/ucsc/hg38.len"
              comment ""
              dbkey "hg38"
              export false
              flat false
              image_content_input None
              results [{"__index__": 0, "software_cond": {"__current_case__": 3, "input": {"values": [{"id": 5, "src": "hdca"}]}, "software": "bowtie2"}}]
              title "Host Removal"
    • Other invocation details
      • history_id

        • 5fc334fd834f9169
      • history_state

        • ok
      • invocation_id

        • 5fc334fd834f9169
      • invocation_state

        • scheduled
      • workflow_id

        • 5fc334fd834f9169

@mvdbeek mvdbeek merged commit c911863 into galaxyproject:main Dec 5, 2025
8 checks passed
@bebatut bebatut deleted the host-contamination-removal branch December 5, 2025 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants