HISAT2.wdl: replace output command substitutions with explicit fifo/wait by mlin · Pull Request #233 · HumanCellAtlas/skylab

mlin · 2019-07-18T21:43:00Z

The hisat2 tasks stream output into samtools to avoid having to materialize a giant text SAM file on the scratch disk. This is a good idea but it's implemented in a slightly risky way, using an output command substitution like hisat2 ... -S >(samtools view -o output.bam ...). In this construct samtools is spawned as a background process, and bash does not wait for it before proceeding to the next command or exiting at the end of the script. Furthermore according to this Q&A it does not even provide a way to wait for it!

This creates a race condition where the next step is liable to start reading a partial BAM file, including the runtime system potentially outputting a truncated file (cf. chanzuckerberg/miniwdl#211).

Here we replace the output command substitutions with a less-elegant but hopefully reliable construct, which allows us to explicitly wait for samtools before proceeding.

gbggrant

Looks good. Could you provide a test (or a set of example inputs) that can be run to verify that this works?

barkasn · 2019-07-19T15:40:36Z

Given the wide impact to multiple pipelines of changes to this step it would be great if we had a test specifically for this step

kbergin · 2019-11-07T20:54:32Z

Hi @mlin - Any chance you're planning to add a test to this PR? I suspect it's also a bit out of date with the codebase at this point. Let us know if you don't have the right access to add a test.

mlin · 2019-11-08T09:07:04Z

Hi @kbergin, all,

The new lines of code proposed here run unconditionally and thus, should be exercised in any existing tests which call on the HISAT2 tasks. They're a mechanical restructuring of the existing commands, eliminating the concurrency race condition described above. I think it would be disproportionately difficult to design a test that specifically provokes the race condition.

kbergin · 2019-11-09T10:48:53Z

That seems reasonable to me, Mike, but I was mostly just bumping the PR to help get it resolved. @barkasn @gbggrant opinions here?

HISAT2.wdl: replace output command substitutions with explicit fifo/wait

e9f5177

mlin requested a review from mckinsel July 18, 2019 21:43

[squash] fix typo

e225018

pullapprove bot requested a review from gbggrant July 18, 2019 22:22

gbggrant approved these changes Jul 19, 2019

View reviewed changes

mlin and others added 2 commits November 13, 2019 13:59

Merge branch 'master' into mlin-hisat2-command-sub

5b2264e

Merge branch 'master' into mlin-hisat2-command-sub

fa7524a

khajoue2 approved these changes Jul 20, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HISAT2.wdl: replace output command substitutions with explicit fifo/wait#233

HISAT2.wdl: replace output command substitutions with explicit fifo/wait#233
mlin wants to merge 4 commits intomasterfrom
mlin-hisat2-command-sub

mlin commented Jul 18, 2019

Uh oh!

gbggrant left a comment

Uh oh!

barkasn commented Jul 19, 2019

Uh oh!

kbergin commented Nov 7, 2019

Uh oh!

mlin commented Nov 8, 2019

Uh oh!

kbergin commented Nov 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

mlin commented Jul 18, 2019

Uh oh!

gbggrant left a comment

Choose a reason for hiding this comment

Uh oh!

barkasn commented Jul 19, 2019

Uh oh!

kbergin commented Nov 7, 2019

Uh oh!

mlin commented Nov 8, 2019

Uh oh!

kbergin commented Nov 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants