HISAT2.wdl: replace output command substitutions with explicit fifo/wait#233
HISAT2.wdl: replace output command substitutions with explicit fifo/wait#233
Conversation
gbggrant
left a comment
There was a problem hiding this comment.
Looks good. Could you provide a test (or a set of example inputs) that can be run to verify that this works?
|
Given the wide impact to multiple pipelines of changes to this step it would be great if we had a test specifically for this step |
|
Hi @mlin - Any chance you're planning to add a test to this PR? I suspect it's also a bit out of date with the codebase at this point. Let us know if you don't have the right access to add a test. |
|
Hi @kbergin, all, The new lines of code proposed here run unconditionally and thus, should be exercised in any existing tests which call on the HISAT2 tasks. They're a mechanical restructuring of the existing commands, eliminating the concurrency race condition described above. I think it would be disproportionately difficult to design a test that specifically provokes the race condition. |
The hisat2 tasks stream output into samtools to avoid having to materialize a giant text SAM file on the scratch disk. This is a good idea but it's implemented in a slightly risky way, using an output command substitution like
hisat2 ... -S >(samtools view -o output.bam ...). In this construct samtools is spawned as a background process, and bash does not wait for it before proceeding to the next command or exiting at the end of the script. Furthermore according to this Q&A it does not even provide a way to wait for it!This creates a race condition where the next step is liable to start reading a partial BAM file, including the runtime system potentially outputting a truncated file (cf. chanzuckerberg/miniwdl#211).
Here we replace the output command substitutions with a less-elegant but hopefully reliable construct, which allows us to explicitly wait for samtools before proceeding.