Skip to content

πŸ“‹ FlyDSL upstream tracker β€” jhinpan issues & PRsΒ #7

@jhinpan

Description

@jhinpan

Living tracker for all FlyDSL (ROCm/FlyDSL) issues filed and PRs opened by @jhinpan. Last updated 2026-06-17.

Tally: 11 issues, 22 PRs (= 18 distinct PR work items; 4 PRs are closed->reopened duplicates, see ♻️).

Recent tracker corrections since 2026-06-11: PR #639 and PR #675 are now merged; PR #670 is closed unmerged because merged PR #683 absorbed the relevant flash-attn work; PR #637 is closed unmerged; issue #614 is closed.

Live refresh 2026-06-17: PR #685 is no longer a draft β€” now ready for review (retitled to batch-aware dense seq_len routing β€” DUALWAVE_SWP vs generic; mergeable, review required, no CI checks reported yet). No new jhinpan-authored issues/PRs since the last update; all other PR/issue states below are unchanged and re-verified against GitHub.


🟒 Open PRs / review-CI state

  • FMHA gfx950: batch-aware dense seq_len routing (DUALWAVE_SWP vs generic) (ready for review) β€” PR #685 (no issue) Β· un-drafted 2026-06-17; mergeable, review required (branch-protection blocked), no CI checks reported yet
  • Complete BasisAttr support in IntTupleBuilder β€” PR #638 β†’ issue #574 Β· GitHub marks it conflicting / dirty against main; last CI run is stale (mixed pass/fail across reruns, untouched since 2026-06-11) β€” needs a rebase before merge review can move
  • Layout-algebra inference diagnostics / verifiers (draft) β€” PR #648 β†’ issue #583 Β· review required, no checks reported yet

βœ… Merged

  • rmsnorm: known_block_size on large-M small-N path β€” PR #639 β†’ fixed issue #614 Β· merged 2026-06-16; #614 closed 2026-06-16
  • CI performance dashboard at /ci-dashboard β€” PR #675 (no issue) Β· merged 2026-06-16
  • Onboarding notebooks (2/n): layout algebra β€” PR #665 β†’ issue #573 Β· merged 2026-06-11; #573 kept open for remaining notebooks
  • Onboarding notebooks (1/n): expr foundation β€” PR #635 β†’ issue #573 Β· merged 2026-06-06; 1/n β€” #573 kept open for remaining notebooks
  • run_benchmark: report base norm bandwidth, not the last variant β€” PR #654 β†’ fixed issue #655 Β· merged 2026-06-06; CI layernorm 1.69 -> 5.6 TB/s
  • docs: refresh CLAUDE.md (2-month changes + expr-neutral / helper-placement conventions) β€” PR #659 (no issue) Β· merged 2026-06-06
  • softmax: enable vectorized fast path β€” PR #650 β†’ fixed issue #627 Β· merged 2026-06-04
  • moe_blockscale: fix broken e2e test β€” PR #643 β†’ fixed issue #642 Β· merged 2026-06-04
  • vec_add wider vector copy chunks β€” PR #564 β†’ issue #563 (issue by others) Β· merged 2026-06-02
  • CI: allow PR authors to request failed CI reruns β€” PR #602 (no issue) Β· merged 2026-06-02
  • docs: sync compile pipeline references β€” PR #572 (no issue) Β· merged 2026-05-27

❌ Closed unmerged / inactive PRs

  • flash_attn gfx950: dwordx4 O stores + flash-decoding split-K (draft) β€” PR #670 (no issue) Β· closed 2026-06-16 unmerged because merged #683 absorbed this line of work, including pieces from the draft such as num_kv_splits plumbing and the gfx950 O-store work; CI was green before close, but the standalone branch is now conflicting / dirty
  • Fix stale JIT dependency caches β€” PR #637 β†’ issue #453 (issue by others) Β· closed 2026-06-16 unmerged after maintainer note that another PR already fixed it; #453 itself is still open
  • source_loc: finer ROCprof ATT source mapping β€” PR #593 β†’ issue #587 Β· closed 2026-06-02 unmerged; #587 was closed via upstream PR #586
  • Include static tensor layouts in JIT cache keys β€” PR #562 β†’ issue #317 (issue by others) Β· closed 2026-05-28 unmerged; #317 later closed

πŸ› Issues β€” status

Issue State Addressed by
#655 run_benchmark mislabels layernorm bandwidth closed #654 βœ…
#653 device printf invisible in Jupyter / piped stdout open β€” (no PR yet)
#642 moe_blockscale e2e harness bug closed #643 βœ…
#627 softmax fast path dead-coded off closed #650 βœ…
#614 rmsnorm large-M small-N crash closed #639 βœ…
#612 Discussion: make autotune usable open β€” (discussion)
#587 ATT source-location granularity closed upstream #586 βœ…; jhinpan #593 ❌ closed unmerged
#585 const-folded fp8 cast rounding closed β€” (working-as-intended)
#583 layout-algebra op verification open #648 🟒 draft
#574 Complete BasisAttr support open #638 🟑 open; conflicts with main (CI stale), needs rebase
#573 Onboarding Jupyter notebook open #635 βœ… (1/n merged), #665 βœ… (2/n merged); open for remaining notebooks

♻️ Superseded PR pairs (closed -> reopened, same work)

The old head repo was deleted during a fork rename, so GitHub lost the head association and these were recreated on the same branch:

  • #615 -> #639 βœ… merged (issue #614 closed)
  • #605 -> #638 🟑 open; conflicts with main (issue #574 open)
  • #584 -> #635 βœ… merged (issue #573 still open for remaining notebooks)
  • #565 -> #637 ❌ closed unmerged after maintainer said another PR already fixed it (issue #453 still open)

πŸ“Œ Follow-ups to decide

  • #638 / #574 β€” rebase or refresh branch; GitHub marks the PR conflicting / dirty and its CI is stale.
  • #685 β€” flash-attn/FMHA routing PR is now ready for review (un-drafted 2026-06-17). Next: nudge a reviewer / confirm CI starts on the self-hosted GPU runners.
  • #648 / #583 β€” draft verifier PR with no checks yet; decide whether to continue or rescope.
  • #670 β€” closed flash_attn perf draft; no standalone revival needed for the absorbed work because merged #683 already picked up the relevant pieces. Revisit only if there is a post-#683 gap.
  • #637 / #453 β€” PR closed as already fixed elsewhere, but the linked issue remains open; decide whether to verify and close/ask upstream.
  • #653 (device printf invisible in Jupyter) β€” no PR yet; relates to onboarding-notebook UX. Decide whether to drive a fix (libc stdout buffering).
  • #612 (autotune) β€” discussion; decide whether to drive to a PR.
  • #573 (onboarding notebook series) β€” 1/n and 2/n are merged; continue with remaining notebooks.
  • #585 (const-fold fp8 cast) β€” resolved: closed working-as-intended (apply fp8/bf16 rounding on runtime values, not a compile-time constant round-trip)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions