Backmerging with Msft commits #683

jatinwadhwa921 · 2025-05-02T05:54:26Z

Backmerging with Msft commits

### Description Add support to Upsample operator to op builder in QNN-EP. ### Motivation and Context - Enhance QNN-EP support for Upsample operator. - Add unit test for Upsample operator in QNN-EP.

### Description Add 8bits support for matmulnbits on x86 __AVX512 VNNI__ | M | N | K | 8-bit Time (ns) | 4-bit Time (ns) | Slow down (8-bit / 4-bit) | |:-----:|:-------:|:-------:|:----------------:|:----------------:|:------------------------:| | 1 | 4096 | 4096 | 34145 | 27723 | **1.23×** | | 1 | 11008 | 4096 | 415285 | 68656 | **6.05×** | | 1 | 4096 | 11008 | 407801 | 68061 | **5.99×** | | 1 | 11008 | 11008 | 2674538 | 1003532 | **2.67×** | | 4096 | 4096 | 4096 | 80338759 | 86321713 | **0.93×** | | 4096 | 11008 | 4096 | 213421935 | 225245276 | **0.95×** | | 4096 | 4096 | 11008 | 240164365 | 228966953 | **1.05×** | | 4096 | 11008 | 11008 | 628352046 | 596738340 | **1.05×** | __AVX512__ | M | N | K | 8-bit Time (ns) | 4-bit Time (ns) | Slow down (8-bit / 4-bit) | |:-----:|:-------:|:-------:|:----------------:|:----------------:|:------------------------:| | 1 | 4096 | 4096 | 53324 | 37882 | **1.41×** | | 1 | 11008 | 4096 | 244560 | 103255 | **2.37×** | | 1 | 4096 | 11008 | 435131 | 95734 | **4.55×** | | 1 | 11008 | 11008 | 2790710 | 1075216 | **2.60×** | | 4096 | 4096 | 4096 | 200629000 | 132841540 | **1.51×** | | 4096 | 11008 | 4096 | 532141914 | 350613184 | **1.52×** | | 4096 | 4096 | 11008 | 544011977 | 351679619 | **1.55×** | | 4096 | 11008 | 11008 | 1421865147 | 925593210 | **1.54×** | Token generation is bottlenecked at memory access. 8b model's 2x size is major reason of token generation slow down. For non-vnni platform, the i16 cannot fit in 4 i8. To avoid overflow extra instructions are needed. This is the major reason of non-vnni slow down. ### Motivation and Context MatMul4Bits model has repetition issue. 6b model resolved this issue.

This PR fixes incorrect input/output shape, according to [DML EP's implementation](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/dml/DmlExecutionProvider/src/Operators/DmlOperatorRotaryEmbedding.cpp#L142C47-L142C94), we should ensure the input shape to be [batch_size, sequence_length, num_heads, head_size].

…osoft#24533) ### Description This PR fixes the program variable data type and revise ProgramInput: - add support for int4/uint4 - fix inconsistency of dealing with number of components for int8/uint8 in ToProgramVariableDataType - add a constructor for `ProgramInput` to allow "flatten" the shape easily. - fix dequantize linear

…wnstream node is not QuantizeLinear (microsoft#24537) ### Description Updates the WeightBiasQuantization optimizer to skip processing on Conv/Gemm nodes if the downstream child node is not a QuantizeLinear. #### Before this PR Original graph: ``` input_0 -> DQ -> Conv -> graph_output (or non-Q node) ^ ^ | | weights_f32------+ | bias_f32------------+ ``` Becomes: ``` input_0 -> DQ ------> Conv -> graph_output (or non-Q node) ^ ^ | | weights_quant -> DQ --+ | bias_quant -> DQ --------+ ``` The above is **NOT** a valid QDQ node unit for Conv because the Conv's output is not consumed by a QuantizeLinear node. #### With this PR The above example graph remains unchanged after L1 optimizations: ``` input_0 -> DQ -> Conv -> graph_output (or non-Q node) ^ ^ | | weights_f32------+ | bias_f32------------+ ``` ### Motivation and Context Caused inaccuracy for a customer model. Automatically quantizing the weights and biases of a Conv/Gemm is detrimental if the output of the Conv/Gemm is not consumed by a QuantizeLinear node. In this scenario, the whole node group is not considered a valid QDQ node unit, and so the EP has to run the Conv/Gemm as float32/float16 anyway. If the Conv/Gemm is running as float32/float16, then quantizing the weights and biases introduces inaccuracy for no gain. PR that originally added this optimizer: microsoft#22969

### Description  Add wrappers for the AutoEP C API changes to the C++ API. ### Motivation and Context

…24525) ### Description  An additional check for non-constant inputs was added to ConvActivationFusion in microsoft#20282. This was to avoid fusing an Add in a Conv+Add+Relu that has another non-constant input. https://github.com/microsoft/onnxruntime/blob/6c8cb6a6d1993f84fcf4008f468a071c0b73aad3/onnxruntime/core/optimizer/conv_activation_fusion.cc#L26-L39 However, this check fails to account for implicit inputs and will read past the end of a node's explicit input defs if any implicit inputs are present. Moreover, this check is no longer necessary after microsoft#19470 removed Conv+Add+Relu fusion from ConvActivationFusion. This change removes the check and some other unused code. ### Motivation and Context  Fix microsoft#24473.

…crosoft#24492) ### Description  Fix MatMulScaleFusion handling of scales with leading dimensions. The previous approach accepted a Mul/Div with a scale that broadcasted additional leading dimensions to its output shape. This caused a shape mismatch in the fused replacement. ### Motivation and Context  Fix microsoft#24407.

### Description Fixes a segfault that occurs when an EP library is re-loaded in the same process. ### Motivation and Context A recent [PR ](microsoft#24430) updated the Environment to unload all EP libraries on destruction of `OrtEnv`. We forgot to properly update the state to mark the EP library as unloaded. Therefore, this caused a segfault when the EP library was re-loaded.

Fixed the bug in microsoft#24228 which causes the incorrect result for phi models when flash attention is disabled.

…nt, etc. (microsoft#24527) ### Description Fixed a few issues related to Conv2dMM and MatMul in the Native WebGPU backend. ### Motivation and Context

### Description  ### Motivation and Context

### Description Support 8 bits in MatMulNBits cuda kernel. The `MatMulFloat8bKernel` CUDA kernel performs a matrix-vector multiplication (GEMM) where the matrix B is quantized per block using 8-bit integers. The kernel computes $Output = A \times B$, where: * $A$ is a row vector (shape `[M, K]`) of type `T` (`float` or `half`). * $B$ is a matrix (shape `[K, N]`) quantized using 8-bit unsigned integers (`uint8_t`) with a block structure. It's stored as `[N, K/block_size, block_size]`. * `scales_data` contains the dequantization scales (shape `[N, K/block_size]`). * `zero_points` contains the dequantization zero points (shape `[N, K/block_size]`), if used (`has_zero_point` is true). * `output` is the resulting row vector (shape `[M, N]`). The kernel uses a thread block structure of `(kWarpSize, kColsPerThreadBlock)`, meaning each block handles `kColsPerThreadBlock` (which is 8) columns of the output. Each warp within the block is responsible for one output element (`[m_id, n_id]`). Threads within a warp cooperate to compute the dot product along the K dimension. Each thread (`lane_id`) handles `kElementsPerThreadPerIteration` (which is 8) elements of the K dimension in each step. Here's a breakdown of the three algorithms (`kKernelAlgo`): 1. **`kKernelAlgo = 0` (Unrolling):** * **Strategy:** This algorithm processes the K dimension by iterating in large steps (`k_per_iter = kWarpSize * kElementsPerThreadPerIteration = 32 * 8 = 256`). Inside the main loop, it uses a macro (`UnRollReduction`) with `#pragma unroll` directives to aggressively unroll the innermost computations. It tries unrolling factors of 16, 4, and 1 sequentially to cover as much of the K dimension as possible with unrolled code. * **Pros:** Can significantly reduce loop overhead (branching instructions, counter updates) and expose more instruction-level parallelism, potentially hiding memory latency. * **Cons:** Can lead to a large increase in compiled code size (register pressure, potential instruction cache misses). The effectiveness heavily depends on the compiler and the specific GPU architecture. The multi-stage unrolling adds complexity. It requires `k_per_iter` to be a multiple of `block_size` for correct scale/zp indexing within the unrolled loop. * **Performance Expectation:** Potentially the highest performance *if* the unrolling is effective on the target hardware and doesn't cause resource issues (registers, cache). Often good for compute-bound or latency-bound scenarios where loop overhead is a bottleneck. 2. **`kKernelAlgo = 1` (Simple Loop):** * **Strategy:** This algorithm also iterates along the K dimension in steps of `k_per_iter` (256), but uses a simple `for` loop without explicit `#pragma unroll`. It relies on the compiler's default loop optimization capabilities. * **Pros:** Simpler code, smaller code size compared to Algorithm 0. Less likely to cause register pressure or instruction cache issues. Easier for the compiler to reason about. * **Cons:** May incur higher loop overhead compared to effective unrolling. Performance might be lower if loop overhead is significant. * **Performance Expectation:** A solid baseline. Might be close to Algorithm 0 if the compiler performs implicit unrolling effectively, or faster if Algorithm 0 suffers from code bloat penalties. 3. **`kKernelAlgo = 2` (Block Size Iteration):** * **Strategy:** This algorithm changes the iteration strategy fundamentally. Instead of iterating in fixed steps of `k_per_iter`, it iterates based on the quantization `block_size`. The outer loop runs `blocks_per_K` (`K / block_size`) times. Inside this loop, the scale and zero point for the *entire block* are fetched once per warp. Then, each thread checks if its assigned K-elements (`lane_offset`) fall within the current `block_size` chunk and processes them using the fetched scale/zp. * **Pros:** Directly aligns with the block quantization data structure. Fetches scale/zero-point values less frequently (once per `block_size` chunk per warp), potentially reducing shared memory bank conflicts or register usage compared to calculating the index (`current_meta_k`) in every inner step as in Algo 0/1. Might have better memory access patterns for scale/zp data. * **Cons:** The outer loop iterates `K / block_size` times. If `block_size` is small (e.g., 16, 32), this could be many iterations. The logic inside the loop (`if (current_k_base < k_end_block ...)`) adds conditional execution. * **Performance Expectation:** Performance depends heavily on the `block_size`. If `block_size` is large (e.g., 128, 256), the number of outer loop iterations is small, and the efficiency gain from fetching scale/zp once per block might outweigh the overhead. If `block_size` is small, the overhead of the outer loop might dominate. **Next Step:** 1. **Profile:** The most reliable way is to benchmark all three algorithms (`kKernelAlgo = 0, 1, 2`) on your target GPU hardware with representative input sizes (`N`, `K`), data types (`T`), and `block_size` values. Use profiling tools like NVIDIA Nsight Compute to analyze performance metrics (execution time, occupancy, instruction throughput, memory bandwidth, cache hit rates, register spills). 2. **Hypothesize based on `block_size`:** * For **large `block_size`** (e.g., 128, 256), Algorithm 2 might be competitive or even the best due to efficient scale/ZP handling. Algorithm 0 could also be very fast. * For **small `block_size`** (e.g., 16, 32), Algorithm 0 (unroll) or Algorithm 1 (simple loop) might outperform Algorithm 2 due to lower loop overhead in the K dimension. 3. Compare performance with TRT LLM FpA IntB GEMM. ### Motivation and Context 4 bits has accuracy loss for some LLM, need more bits for some layers.

@quic-ashigarg

…t#24371) ### Description  Onnxruntime manages a number of CPU based accelerators. I.e. those that can operate on CPU based inputs. However, several of them like `Qnn`, `Openvino` and `Vitis` may require CPU based inputs to be either aligned to 4K so they can be memory mapped or prefer to override the device with their own CPU accessible allocator. To mitigate that, we introduce a new CPU based allocator that produces 4K aligned memory. We also adjust allocation planner to override plain CPU device. When we detect a compiled CPU based EP, we adjust the device according by requesting the EP to return `OrtMemType::OrtMemTypeCPUInput`. This gives the EP an opportunity to return either GPU/NPU device or CPU device depending on the mode it is operating. We select the device with larger alignment betrween CPU default devices. We also adjust memory patterns to make sure 4K alignment is respected in the contagious buffers when appropriate. ### Motivation and Context CPU Based providers, notably accept CPU based inputs, but they have a requirement of 4K allocations, otherwise the input incurs an extra copy. This is especially noticeable with intermediate values that are produced by upstream CPU based nodes. Qnn has its own allocator when it is enabled, we make sure it is correctly advertised to the allocation planner. This PR excludes Qnn allocator usage for intermediate values due to the overhead contributed by memhandle management. Cc: @quic-ashigarg --------- Co-authored-by: edgchen1 <[email protected]>

### Description 1. Update Github Actions pipelines' triggers. Make all of them to be the same. 2. Format yaml files. Before this change, the pipelines' triggers were set as following: ``` on: push: branches: [ main, 'rel-*'] pull_request: branches: [ main, 'rel-*'] concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true ``` I set "cancel-in-progress: true" because for pipeline runs triggered by pull requests if the pull request was updated(a new commit was added there), the old pipeline runs can be cancelled. However, this setting doesn't work well for the runs triggered by "push" events for the main branch. Let's say, we merged a PR , then it triggered this pipeline. Then before the pipeline is finished, we merged another PR. Then the old pipeline run will be cancelled. But we do want it to be cancelled. Each commit in the main branch should be verified. ### Motivation and Context

### Description  Fix memleakdbg call stack output. The call stack output was getting clobbered: `C:\dev\onnxruntime\build\Debug\_deps\googletest-src\googletest\include\gtest\internal\gtest-port.h(1631): l\gtest-port.h(1631): eadLocal<testing::Sequence *>::GetOrCreateValue` I think the issue is that this aliasing of `buffer` and `symbol`: https://github.com/microsoft/onnxruntime/blob/173a11a4e7a2f7a360c9db6abbe601a06a16f004/onnxruntime/core/platform/windows/debug_alloc.cc#L97-L100 does not play nicely with a call to `_snprintf_s` like this: https://github.com/microsoft/onnxruntime/blob/173a11a4e7a2f7a360c9db6abbe601a06a16f004/onnxruntime/core/platform/windows/debug_alloc.cc#L115 The clobbered output does not match the predefined, ignored patterns, so we see spurious mem leak check output. This change updates the memleakdbg output generation to use C++ ostreams and instead of fixed size buffers and `_snprintf_s`. ### Motivation and Context  Fix spurious mem leak check output. Fix microsoft#24535.

### Description This PR updates ONNX Runtime's LLM conversion tools to use [PyTorch 2.7](https://pytorch.org/blog/pytorch-2-7/) and reduces memory usage during export. ### Motivation and Context Importing the `transformers` package with `import transformers` will take a long time because of the many namespaces it has at the top level. It is more efficient to only import the desired class names. Additionally, the benchmarking of the PyTorch model includes the deep copy of the inputs when it does not need to. The deep copy can be performed before measuring latency.

…icrosoft#24196) ### Description During inference, using the QNN EP option to set enable_htp_shared_memory_allocator gives a hint that we use RPC allocated buffers to avoid buffer copy between CPU and NPU. With the current PR, we add hints in the compilation phase that if RPC memory is going to be used, any additional allocations done on the CPU can be avoided. ### Motivation and Context This should help reduce the peak CPU memory consumption while running AI work loads using shared memory. Related PR: microsoft#23136 Co-authored-by: Ashish Garg (AISW) <[email protected]>

These are source files, not executables, do not set the executable permission bit on these files.

### Description 1. Add benchmark script for MatMulNBits. 2. Update kernel based on benchmark results: - Change kernel back to handle m=1 - Use simple loop kernel instead of unrolling - Change partial sum to float type to trade-off precision and performance (less precision loss, no obvious performance drop) Example output of benchmark: ``` ------------------------------------------------------------------------------------------------------------------------ Benchmarking MatMulNBits on NVIDIA A100-SXM4-80GB (Compute Capability: 8.0) ------------------------------------------------------------------------------------------------------------------------ CUDA Graph | M | N | K | Bits | Block Size | Threads | Latency (us) | StdDev (us) | TFLOPS ------------------------------------------------------------------------------------------------------------------------ True | 1 | 3072 | 8192 | 4 | 32 | 0 | 95.7 | 5.7 | 0.526 True | 1 | 3072 | 8192 | 8 | 32 | 0 | 110.7 | 81.1 | 0.454 True | 1 | 3072 | 8192 | 4 | 128 | 0 | 93.7 | 41.2 | 0.537 True | 1 | 3072 | 8192 | 8 | 128 | 0 | 105.0 | 129.3 | 0.479 True | 1 | 5120 | 3072 | 4 | 32 | 0 | 86.7 | 49.9 | 0.363 True | 1 | 5120 | 3072 | 8 | 32 | 0 | 90.1 | 41.1 | 0.349 True | 1 | 5120 | 3072 | 4 | 128 | 0 | 83.9 | 46.7 | 0.375 True | 1 | 5120 | 3072 | 8 | 128 | 0 | 85.2 | 57.1 | 0.369 True | 1 | 8192 | 3072 | 4 | 32 | 0 | 107.3 | 29.2 | 0.469 True | 1 | 8192 | 3072 | 8 | 32 | 0 | 102.3 | 57.1 | 0.492 True | 1 | 8192 | 3072 | 4 | 128 | 0 | 99.2 | 61.2 | 0.507 True | 1 | 8192 | 3072 | 8 | 128 | 0 | 97.5 | 47.4 | 0.516 True | 1 | 200064 | 3072 | 4 | 32 | 0 | 1456.4 | 11.0 | 0.844 True | 1 | 200064 | 3072 | 8 | 32 | 0 | 1336.4 | 10.3 | 0.920 True | 1 | 200064 | 3072 | 4 | 128 | 0 | 1261.6 | 16.6 | 0.974 True | 1 | 200064 | 3072 | 8 | 128 | 0 | 1232.6 | 17.9 | 0.997 True | 256 | 3072 | 8192 | 4 | 32 | 0 | 211.1 | 5.8 | 61.030 True | 256 | 3072 | 8192 | 8 | 32 | 0 | 217.8 | 62.8 | 59.154 True | 256 | 3072 | 8192 | 4 | 128 | 0 | 208.7 | 63.3 | 61.751 True | 256 | 3072 | 8192 | 8 | 128 | 0 | 213.0 | 58.2 | 60.491 True | 256 | 5120 | 3072 | 4 | 32 | 0 | 151.9 | 57.4 | 53.028 True | 256 | 5120 | 3072 | 8 | 32 | 0 | 156.2 | 71.1 | 51.554 True | 256 | 5120 | 3072 | 4 | 128 | 0 | 151.4 | 22.6 | 53.198 True | 256 | 5120 | 3072 | 8 | 128 | 0 | 154.6 | 47.1 | 52.092 True | 256 | 8192 | 3072 | 4 | 32 | 0 | 219.0 | 4.4 | 58.847 True | 256 | 8192 | 3072 | 8 | 32 | 0 | 226.6 | 14.5 | 56.860 True | 256 | 8192 | 3072 | 4 | 128 | 0 | 206.7 | 39.9 | 62.333 True | 256 | 8192 | 3072 | 8 | 128 | 0 | 216.2 | 41.3 | 59.587 True | 256 | 200064 | 3072 | 4 | 32 | 0 | 3110.9 | 11.3 | 101.152 True | 256 | 200064 | 3072 | 8 | 32 | 0 | 3290.9 | 8.3 | 95.619 True | 256 | 200064 | 3072 | 4 | 128 | 0 | 3055.2 | 10.2 | 102.995 True | 256 | 200064 | 3072 | 8 | 128 | 0 | 3220.4 | 9.8 | 97.712 True | 1024 | 3072 | 8192 | 4 | 32 | 0 | 363.6 | 40.2 | 141.754 True | 1024 | 3072 | 8192 | 8 | 32 | 0 | 369.0 | 46.0 | 139.669 True | 1024 | 3072 | 8192 | 4 | 128 | 0 | 362.8 | 55.6 | 142.052 True | 1024 | 3072 | 8192 | 8 | 128 | 0 | 367.5 | 56.5 | 140.256 True | 1024 | 5120 | 3072 | 4 | 32 | 0 | 221.6 | 58.1 | 145.383 True | 1024 | 5120 | 3072 | 8 | 32 | 0 | 225.4 | 56.6 | 142.938 True | 1024 | 5120 | 3072 | 4 | 128 | 0 | 220.2 | 36.9 | 146.306 True | 1024 | 5120 | 3072 | 8 | 128 | 0 | 224.1 | 57.8 | 143.751 True | 1024 | 8192 | 3072 | 4 | 32 | 0 | 346.2 | 41.8 | 148.854 True | 1024 | 8192 | 3072 | 8 | 32 | 0 | 352.8 | 21.6 | 146.097 True | 1024 | 8192 | 3072 | 4 | 128 | 0 | 344.5 | 18.9 | 149.627 True | 1024 | 8192 | 3072 | 8 | 128 | 0 | 350.6 | 10.6 | 147.016 True | 1024 | 200064 | 3072 | 4 | 32 | 0 | 6822.0 | 44.1 | 184.504 True | 1024 | 200064 | 3072 | 8 | 32 | 0 | 7018.5 | 38.4 | 179.339 True | 1024 | 200064 | 3072 | 4 | 128 | 0 | 6757.8 | 51.5 | 186.257 True | 1024 | 200064 | 3072 | 8 | 128 | 0 | 6947.7 | 38.1 | 181.167 ------------------------------------------------------------------------------------------------------------------------ ``` ### Motivation and Context Follow up with microsoft#24509

) ### Description This PR updates how the K path is identified in Phi-4 multimodal. ### Motivation and Context This is needed as part of the updates made to the rewritten modeling code for the speech component of Phi-4 multimodal.

Added context to command line example when specifying platform. ### Description Docker image build fails due to missing context in the command line example when specifying the platform. The dockerfiles directory is assumed as the current directory in the command example, so the parent directory must be specified as the context of the docker build command

…icrosoft#24220) ### Description  This PR adds a new CMake option: onnxruntime_ENABLE_CONVSYMKERNELAVX2_SAT_CHECKER. When enabled, this option activates a saturation checker for the VPMADDUBSW instruction used in the ConvSymKernelAvx2 path. The checker works by calling a helper function before each VPMADDUBSW instruction. This function simulates the computation using C++ and intrinsics with higher-precision types (int32_t) to detect whether the result exceeds the bounds of int16_t (i.e., greater than INT16_MAX or less than INT16_MIN). By default, the checker logs a warning only once per inference session. However, the logic can be easily extended to print more frequently if needed. Developers can also reuse this pattern to implement similar saturation checks for other instructions. ### Motivation and Context  On some models running with AVX2 (instead of AVX-VNNI), we've observed accuracy degradation due to saturation in vectorized instructions. This saturation checker provides a way to debug and detect those cases by reporting potential overflow in intermediate computations.

There are 2 tests that appear twice in the same list, so I removed the duplicates: - `^test_batchnorm_example_training_mode` - `^test_batchnorm_epsilon_training_mode` The other 3 tests passed locally, so I am enabling them to see if they also pass on the pipelines: - `test_batchnorm_epsilon_old` - `test_batchnorm_example_old` - `test_gathernd_example_int32_batch_dim1` Sample run: ``` > .\build\Windows\Debug\Debug\onnx_test_runner.exe "C:\work\onnxruntime\build\Windows\Debug\_deps\onnx-src\onnx\backend\test\data\node\test_gathernd_example_int32_batch_dim1" Load Test Case: gathernd_example_int32_batch_dim1 in C:\work\onnxruntime\build\Windows\Debug\_deps\onnx-src\onnx\backend\test\data\node\test_gathernd_example_int32_batch_dim1 result: Models: 1 Total test cases: 1 Succeeded: 1 Not implemented: 0 Failed: 0 Stats by Operator type: Not implemented(0): Failed: ```

…oft#24575) Bumps [microsoft/onnxruntime-github-actions](https://github.com/microsoft/onnxruntime-github-actions) from 0.0.5 to 0.0.6. <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/microsoft/onnxruntime-github-actions/commit/9e3f6d0517cad4c4055d8ddd8b8bbadcc08e4e9a"><code>9e3f6d0</code></a> Release artifacts for v0.0.6</li> <li><a href="https://github.com/microsoft/onnxruntime-github-actions/commit/4bc5bccb384a9785d4cbe25104735780bf10a27b"><code>4bc5bcc</code></a> Initial commit on orphan branch</li> <li>See full diff in <a href="https://github.com/microsoft/onnxruntime-github-actions/compare/v0.0.5...v0.0.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=microsoft/onnxruntime-github-actions&package-manager=github_actions&previous-version=0.0.5&new-version=0.0.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.9.5 to 0.11.6. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/ruff/releases">ruff's releases</a>.</em></p> <blockquote> <h2>0.11.6</h2> <h2>Release Notes</h2> <h3>Preview features</h3> <ul> <li>Avoid adding whitespace to the end of a docstring after an escaped quote (<a href="https://redirect.github.com/astral-sh/ruff/pull/17216">#17216</a>)</li> <li>[<code>airflow</code>] Extract <code>AIR311</code> from <code>AIR301</code> rules (<code>AIR301</code>, <code>AIR311</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/17310">#17310</a>, <a href="https://redirect.github.com/astral-sh/ruff/pull/17422">#17422</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>Raise syntax error when <code>\</code> is at end of file (<a href="https://redirect.github.com/astral-sh/ruff/pull/17409">#17409</a>)</li> </ul> <h2>Contributors</h2> <ul> <li><a href="https://github.com/AlexWaygood"><code>@AlexWaygood</code></a></li> <li><a href="https://github.com/BurntSushi"><code>@BurntSushi</code></a></li> <li><a href="https://github.com/Lee-W"><code>@Lee-W</code></a></li> <li><a href="https://github.com/MatthewMckee4"><code>@MatthewMckee4</code></a></li> <li><a href="https://github.com/MichaReiser"><code>@MichaReiser</code></a></li> <li><a href="https://github.com/cake-monotone"><code>@cake-monotone</code></a></li> <li><a href="https://github.com/carljm"><code>@carljm</code></a></li> <li><a href="https://github.com/charliermarsh"><code>@charliermarsh</code></a></li> <li><a href="https://github.com/dcreager"><code>@dcreager</code></a></li> <li><a href="https://github.com/dhruvmanila"><code>@dhruvmanila</code></a></li> <li><a href="https://github.com/github-actions"><code>@github-actions</code></a></li> <li><a href="https://github.com/maxmynter"><code>@maxmynter</code></a></li> <li><a href="https://github.com/mishamsk"><code>@mishamsk</code></a></li> <li><a href="https://github.com/mtshiba"><code>@mtshiba</code></a></li> <li><a href="https://github.com/ntBre"><code>@ntBre</code></a></li> <li><a href="https://github.com/renovate"><code>@renovate</code></a></li> <li><a href="https://github.com/sharkdp"><code>@sharkdp</code></a></li> </ul> <h2>Install ruff 0.11.6</h2> <h3>Install prebuilt binaries via shell script</h3> <pre lang="sh"><code>curl --proto '=https' --tlsv1.2 -LsSf https://github.com/astral-sh/ruff/releases/download/0.11.6/ruff-installer.sh | sh </code></pre> <h3>Install prebuilt binaries via powershell script</h3> <pre lang="sh"><code>powershell -ExecutionPolicy Bypass -c "irm https://github.com/astral-sh/ruff/releases/download/0.11.6/ruff-installer.ps1 | iex" </code></pre> <h2>Download ruff 0.11.6</h2> <table> <thead> <tr> <th>File</th> <th>Platform</th> <th>Checksum</th> </tr> </thead> </table>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's changelog</a>.</em></p> <blockquote> <h2>0.11.6</h2> <h3>Preview features</h3> <ul> <li>Avoid adding whitespace to the end of a docstring after an escaped quote (<a href="https://redirect.github.com/astral-sh/ruff/pull/17216">#17216</a>)</li> <li>[<code>airflow</code>] Extract <code>AIR311</code> from <code>AIR301</code> rules (<code>AIR301</code>, <code>AIR311</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/17310">#17310</a>, <a href="https://redirect.github.com/astral-sh/ruff/pull/17422">#17422</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>Raise syntax error when <code>\</code> is at end of file (<a href="https://redirect.github.com/astral-sh/ruff/pull/17409">#17409</a>)</li> </ul> <h2>0.11.5</h2> <h3>Preview features</h3> <ul> <li>[<code>airflow</code>] Add missing <code>AIR302</code> attribute check (<a href="https://redirect.github.com/astral-sh/ruff/pull/17115">#17115</a>)</li> <li>[<code>airflow</code>] Expand module path check to individual symbols (<code>AIR302</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/17278">#17278</a>)</li> <li>[<code>airflow</code>] Extract <code>AIR312</code> from <code>AIR302</code> rules (<code>AIR302</code>, <code>AIR312</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/17152">#17152</a>)</li> <li>[<code>airflow</code>] Update oudated <code>AIR301</code>, <code>AIR302</code> rules (<a href="https://redirect.github.com/astral-sh/ruff/pull/17123">#17123</a>)</li> <li>[syntax-errors] Async comprehension in sync comprehension (<a href="https://redirect.github.com/astral-sh/ruff/pull/17177">#17177</a>)</li> <li>[syntax-errors] Check annotations in annotated assignments (<a href="https://redirect.github.com/astral-sh/ruff/pull/17283">#17283</a>)</li> <li>[syntax-errors] Extend annotation checks to <code>await</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/17282">#17282</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>[<code>flake8-pie</code>] Avoid false positive for multiple assignment with <code>auto()</code> (<code>PIE796</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/17274">#17274</a>)</li> </ul> <h3>Rule changes</h3> <ul> <li>[<code>ruff</code>] Fix <code>RUF100</code> to detect unused file-level <code>noqa</code> directives with specific codes (<a href="https://redirect.github.com/astral-sh/ruff/issues/17042">#17042</a>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/17061">#17061</a>)</li> <li>[<code>flake8-pytest-style</code>] Avoid false positive for legacy form of <code>pytest.raises</code> (<code>PT011</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/17231">#17231</a>)</li> </ul> <h3>Documentation</h3> <ul> <li>Fix formatting of "See Style Guide" link (<a href="https://redirect.github.com/astral-sh/ruff/pull/17272">#17272</a>)</li> </ul> <h2>0.11.4</h2> <h3>Preview features</h3> <ul> <li>[<code>ruff</code>] Implement <code>invalid-rule-code</code> as <code>RUF102</code> (<a href="https://redirect.github.com/astral-sh/ruff/pull/17138">#17138</a>)</li> <li>[syntax-errors] Detect duplicate keys in <code>match</code> mapping patterns (<a href="https://redirect.github.com/astral-sh/ruff/pull/17129">#17129</a>)</li> <li>[syntax-errors] Detect duplicate attributes in <code>match</code> class patterns (<a href="https://redirect.github.com/astral-sh/ruff/pull/17186">#17186</a>)</li> <li>[syntax-errors] Detect invalid syntax in annotations (<a href="https://redirect.github.com/astral-sh/ruff/pull/17101">#17101</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>[syntax-errors] Fix multiple assignment error for class fields in <code>match</code> patterns (<a href="https://redirect.github.com/astral-sh/ruff/pull/17184">#17184</a>)</li> <li>Don't skip visiting non-tuple slice in <code>typing.Annotated</code> subscripts (<a href="https://redirect.github.com/astral-sh/ruff/pull/17201">#17201</a>)</li> </ul>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/astral-sh/ruff/commit/fcd50a0496d725f773c6da149035f98bd90b6a30"><code>fcd50a0</code></a> Bump 0.11.6 (<a href="https://redirect.github.com/astral-sh/ruff/issues/17449">#17449</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/3ada36b766583c92c82bccce3519a467ae068630"><code>3ada36b</code></a> Auto generate <code>visit_source_order</code> (<a href="https://redirect.github.com/astral-sh/ruff/issues/17180">#17180</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/bd8983821289e436c2d4c1463c118baa02c7ef5b"><code>bd89838</code></a> [red-knot] Initial tests for protocols (<a href="https://redirect.github.com/astral-sh/ruff/issues/17436">#17436</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/b32407b6f3c300650b8a3b0a6cb1ce3c5f812c84"><code>b32407b</code></a> [red-knot] Dataclasses: synthesize <code>__init__</code> with proper signature (<a href="https://redirect.github.com/astral-sh/ruff/issues/17428">#17428</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/b4de245a5accc5ebe35e580a73040da8d99ed566"><code>b4de245</code></a> [red-knot] Dataclasses: support <code>order=True</code> (<a href="https://redirect.github.com/astral-sh/ruff/issues/17406">#17406</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/914095d08f02ed91b1acf807aca89723f3632fb9"><code>914095d</code></a> [red-knot] Super-basic generic inference at call sites (<a href="https://redirect.github.com/astral-sh/ruff/issues/17301">#17301</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/5350288d0773f986e90653c44a6304d9411b5782"><code>5350288</code></a> [red-knot] Check assignability of bound methods to callables (<a href="https://redirect.github.com/astral-sh/ruff/issues/17430">#17430</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/649610cc98add11d8ff48c6d0fba928fb1e00262"><code>649610c</code></a> [red-knot] Support <code>super</code> (<a href="https://redirect.github.com/astral-sh/ruff/issues/17174">#17174</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/1a79722ee0fb160f8929612508d5ee88b7838d09"><code>1a79722</code></a> [<code>airflow</code>] Extend <code>AIR311</code> rules (<a href="https://redirect.github.com/astral-sh/ruff/issues/17422">#17422</a>)</li> <li><a href="https://github.com/astral-sh/ruff/commit/b67590bfde9de44757a3365d43040b8f93c10f35"><code>b67590b</code></a> [red-knot] simplify union size limit handling (<a href="https://redirect.github.com/astral-sh/ruff/issues/17429">#17429</a>)</li> <li>Additional commits viewable in <a href="https://github.com/astral-sh/ruff/compare/0.9.5...0.11.6">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=pip&previous-version=0.9.5&new-version=0.11.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Changming Sun <[email protected]>

…ull knowledge (microsoft#24568) ### Description  GetDeviceInfoIfSupported -> GetSupportedDevices EP sees all devices so it can make decisions with full knowledge. This is mainly applicable to GPU EPs like WebGPU. EP has to iterate device and call CreateEpDevice for devices it supports. ### Motivation and Context

### Description  Fix DML autoep select test. It should only select one device as that's all the test infrastructure is setup to handle. ### Motivation and Context

…microsoft#24587) ### Description `LoadPluginOrProviderBridge` is called when attempting to load a Plugin. It uses the passed `library_path` to attempt to load the Plugin as a `Provider` - using `ProviderLibrary` - to see if it can be treated as a 'ProviderBridge'. `ProviderLibrary` attempts to load the Provider by prefixing the path to the onnxruntime.dll. Plugins needn't be redistributed with OnnxRuntime, so the path to the Plugin _may_ be an absolute path, and if so `ProviderLibrary` fails. At the same time - however - `LoadPluginOrProviderBridge` needs to support OnnxRuntime-relative paths: As 'Providers' are migrated to 'Plugins', existing Providers should be usable as Plugins. To accommodate both scenarios, this PR: 1. Adds support to `ProviderLibrary` to be created with an absolute path. 2. Validates the path passed to `LoadPluginOrProviderBridge`; 1. if it is absolute, the same absolute path is passed to `ProviderLibrary` and `EpLibraryPlugin`. 2. if the path is not absolute, it is converted to an absolute path by prefixing the OnnxRuntime location, and the same path is passed to `ProviderLibrary` and `EpLibraryPlugin`. ### Motivation and Context This PR enables `LoadPluginOrProviderBridge` to be called with an absolute path to the Plugin, allowing it to be used as a 'ProviderBridge', or with an OnnxRuntime-relative path to the Plugin. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

### Description  Add some logic to detect whether I8MM is actually supported. This info can be read from the registry. See the helpful comments here for more details: https://github.com/Dr-Noob/cpufetch/blob/a0c08ccc0b64b524ad2122e0595099f73cbba9c4/src/arm/midr.c#L30-L52 ### Motivation and Context  Detect I8MM correctly to enable better performance. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…icrosoft#24534) 1. Migrate "Linux CPU Minimal Build E2E CI Pipeline" and "onnxruntime-binary-size-checks-ci-pipeline" to Github Action 2. Add the support for building ONNX Runtime minimal build with vcpkg. 3. Auto format the yaml files with ruamel.yaml 4. Update vcpkg to the latest release.

- Registered the ScatterND Op in QNN EP - Created the op as part of the Simple Op Builder - Added unit test to verify the Op runs on QNN - Skipping ScatterND on QNN CPU (To Do) ### Description Add ScatterND Op Support in QNN EP ### Motivation and Context Performance improvement as ScatterND Op falls to ORT CPU due to missing support

Compute 'total_sequence_length' the same way as JSEP.

### Description  The PR adds CPU support by following release logics in https://github.com/onnx/onnx/wiki/Logistics-for-ONNX-Release-1.18.0. The goal is to do the minimal changes needed to ensure ONNXRUNTIME works fine with ONNX 1.18.0 ### Motivation and Context  Essentially, incoming ONNX 1.18.0 provides the following (1) Introduce opset 23 (included in this PR) (2) Support Attention, RMSNormalization, and RotaryEmbedding (**NOT** included in this PR) (3) Support float4e2m1 (**NOT** included in this PR) ### Remaining Issues 1. onnx.patch * ONNXRUNTIME is using static functions (shape inference) from ONNX (microsoft#24558) * GroupNormalization-18 is deprecated because its spec was wrong (microsoft#24560) * Contrib op registration api from ONNX: OpSchemaRegisterOnce is changed to explicit, and ONNXRUNTIME was leveraging it to do fluent-chaining style. (microsoft#24561) 2. Support float4e2m1 (microsoft#24553) 3. Support Attention(microsoft#24554), RMSNormalization(microsoft#24555), and RotaryEmbedding(microsoft#24556) 4. Disable QNN tests

### Description Fix a corner case for Expand when the output size is 0 ### Motivation and Context  This fix is required to pass YOLOv9

Fix: ```/local/mnt/workspace/onnxruntime-qnn-ep/onnxruntime/core/providers/qnn/builder/opbuilder/softmax_op_builder.cc: In function ‘std::vector<unsigned int> onnxruntime::qnn::FlattenShapeFromAxis(std::vector<unsigned int>&, int32_t)’: /local/mnt/workspace/onnxruntime-qnn-ep/onnxruntime/core/providers/qnn/builder/opbuilder/softmax_op_builder.cc:47:28: error: comparison of integer expressions of different signedness: ‘int32_t’ {aka ‘int’} and ‘std::vector<unsigned int>::size_type’ {aka ‘long unsigned int’} [-Werror=sign-compare] 47 | assert(axis >= 0 && axis < input_shape.size()); |```

…t#24578) ### Description Fix warning caused by `-Wstrict-aliasing`. Fix transpose store op. Test results: ``` $ ./onnxruntime_test_all [...] [----------] Global test environment tear-down [==========] 4761 tests from 311 test suites ran. (47828 ms total) [ PASSED ] 4759 tests. [ SKIPPED ] 2 tests, listed below: [ SKIPPED ] MatMulFpQ4.MatMul2DSym [ SKIPPED ] MatMulFpQ4.MatMul2DBlkZp YOU HAVE 6 DISABLED TESTS ```

…transformers/models/stable_diffusion/requirements (microsoft#24591) Bumps [transformers](https://github.com/huggingface/transformers) from 4.41.2 to 4.50.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/transformers/releases">transformers's releases</a>.</em></p> <blockquote> <h1>Release v4.50.0</h1> <h2>New Model Additions</h2> <h3>Model-based releases</h3> <p>Starting with version v4.49.0, we have been doing model-based releases, additionally to our traditional, software-based monthly releases. These model-based releases provide a tag from which models may be installed.</p> <p>Contrarily to our software-releases; these are not pushed to pypi and are kept on our GitHub. Each release has a tag attributed to it, such as:</p> <ul> <li><code>v4.49.0-Gemma-3</code></li> <li><code>v4.49.0-AyaVision</code></li> </ul> <p>⚠️ As bugs are identified and fixed on each model, the release tags are updated so that installing from that tag always gives the best experience possible with that model.</p> <p>Each new model release will always be based on the current state of the main branch at the time of its creation. This ensures that new models start with the latest features and fixes available.</p> <p>For example, if two models—Gemma-3 and AyaVision—are released from main, and then a fix for gemma3 is merged, it will look something like this:</p> <pre><code> o---- v4.49.0-Gemma-3 (includes AyaVision, plus main fixes) / \ ---o--o--o--o--o-- (fix for gemma3) --o--o--o main \ o---- v4.49.0-AyaVision </code></pre> <p>We strive to merge model specific fixes on their respective branches as fast as possible!</p> <h3>Gemma 3</h3> <p><img src="https://github.com/user-attachments/assets/2b7f31b3-02bd-496a-9d4e-a1867bd6d9d4" alt="image" /></p> <p>Gemma 3 is heavily referenced in the following <a href="https://github.com/huggingface/transformers/releases/tag/v4.49.0-Gemma-3">model-based release</a> and we recommend reading these if you want all the information relative to that model.</p> <p>The Gemma 3 model was proposed by Google. It is a vision-language model composed by a <a href="https://huggingface.co/docs/transformers/model_doc/siglip">SigLIP</a> vision encoder and a <a href="https://huggingface.co/docs/transformers/model_doc/gemma_2">Gemma 2</a> language decoder linked by a multimodal linear projection.</p> <p>It cuts an image into a fixed number of tokens same way as Siglip if the image does not exceed certain aspect ratio. For images that exceed the given aspect ratio, it crops the image into multiple smaller pacthes and concatenates them with the base image embedding.</p> <p>One particularity is that the model uses bidirectional attention on all the image tokens. Also, the model interleaves sliding window local attention with full causal attention in the language backbone, where each sixth layer is a full causal attention layer.</p> <ul> <li>Gemma3 by <a href="https://github.com/RyanMullins"><code>@RyanMullins</code></a> in <a href="https://redirect.github.com/huggingface/transformers/issues/36658">#36658</a></li> </ul> <h3>Shield Gemma2</h3> <p>ShieldGemma 2 is built on <a href="https://ai.google.dev/gemma/docs/core/model_card_3">Gemma 3</a>, is a 4 billion (4B) parameter model that checks the safety of both synthetic and natural images against key categories to help you build robust datasets and models. With this addition to the Gemma family of models, researchers and developers can now easily minimize the risk of harmful content in their models across key areas of harm as defined below:</p> <ul> <li>No Sexually Explicit content: The image shall not contain content that depicts explicit or graphic sexual acts (e.g., pornography, erotic nudity, depictions of rape or sexual assault).</li> <li>No Dangerous Content: The image shall not contain content that facilitates or encourages activities that could cause real-world harm (e.g., building firearms and explosive devices, promotion of terrorism, instructions for suicide).</li> <li>No Violence/Gore content: The image shall not contain content that depicts shocking, sensational, or gratuitous violence (e.g., excessive blood and gore, gratuitous violence against animals, extreme injury or moment of death).</li> </ul> <p>We recommend using ShieldGemma 2 as an input filter to vision language models, or as an output filter of image generation systems. To train a robust image safety model, we curated training datasets of natural and synthetic images and instruction-tuned Gemma 3 to demonstrate strong performance.</p>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/huggingface/transformers/commit/0b057e66b52556da3a1cbc29e2a98c0784ea9c33"><code>0b057e6</code></a> fix import issue</li> <li><a href="https://github.com/huggingface/transformers/commit/26fbd6919af810bf508eaea8b9eb9dcee829e228"><code>26fbd69</code></a> v 4.50.0</li> <li><a href="https://github.com/huggingface/transformers/commit/523f6e743c74ecea90d0c37a172c9819b5691a19"><code>523f6e7</code></a> Fix: dtype cannot be str (<a href="https://redirect.github.com/huggingface/transformers/issues/36262">#36262</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/3f9ff19b4ec7dcf4112225079f26ea756aafd211"><code>3f9ff19</code></a> Minor Gemma 3 fixes (<a href="https://redirect.github.com/huggingface/transformers/issues/36884">#36884</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/f94b0c59f20447c0e6bdb6d381ea014fa47ecac8"><code>f94b0c5</code></a> Use <code>deformable_detr</code> kernel from the Hub (<a href="https://redirect.github.com/huggingface/transformers/issues/36853">#36853</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/2638d54e7851f1323dc78a8b513b041835aba27b"><code>2638d54</code></a> Gemma 3 tests expect greedy decoding (<a href="https://redirect.github.com/huggingface/transformers/issues/36882">#36882</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/b8aadc31d56e49d8b9075e73e5c433f7c5b4e04b"><code>b8aadc3</code></a> :red_circle: :red_circle: :red_circle: supersede paligemma forward to shift p...</li> <li><a href="https://github.com/huggingface/transformers/commit/6321876b5bac106d7e7c84b53418ea31fe1d9754"><code>6321876</code></a> add eustlb as an actor</li> <li><a href="https://github.com/huggingface/transformers/commit/94f487626a296deac0022dda6462c0d9f2336106"><code>94f4876</code></a> [generate] model defaults being inherited only happens for newer models (<a href="https://redirect.github.com/huggingface/transformers/issues/36881">#36881</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/f19d018bfff1613ba05dcbf7e82c461d49aee73e"><code>f19d018</code></a> Revert "Update deprecated Jax calls (<a href="https://redirect.github.com/huggingface/transformers/issues/35919">#35919</a>)" (<a href="https://redirect.github.com/huggingface/transformers/issues/36880">#36880</a>)</li> <li>Additional commits viewable in <a href="https://github.com/huggingface/transformers/compare/v4.41.2...v4.50.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.41.2&new-version=4.50.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

### Description after deinitialize_onnxruntime_vitisai_ep, s_domains_vitisaiep will be incorrect, which may cause an exception ### Motivation and Context Put deregister_xir_ops() before deinitialize_onnxruntime_vitisai_ep() to avoid dangling pointers Co-authored-by: GenMing Zhong <[email protected]>

) This PR enables matmul8bits for the dp4/subgroupMatrix path in webgpu. This PR is separated from microsoft#24546 for easier review.

### Description This PR incorporates the changes requested in PR: 24394 Changes are summarized below: 1. Reordered enable_ovep_qdq_optimizer to appear before all output parameters as per review suggestion. Other reorders are also done for clarity. 2. Replaced non-release build check with RELEASE flag for clarity. This will allow all build configs to dump model except release.

### Description Add int64 as a supported datatype for moving nodes to the CoreML EP. We already convert constants automatically from int64 to int32 for CoreML by calling narrow. Adding the conversion for outputs as well. ### Motivation and Context - More nodes supported on CoreML ### Note on the Unsqueeze op According to microsoft#22975 there is a bug with the Unsqueeze op with scalar inputs on x86. I was running into a bug for unsqueezes that unsqueezed a scalar input to a tensor of shape [1] since CoreML doesn't support scalar values for MLProgram. I adapted the HandleX86ArchUnsqueeze method but alternatively, can replace with an identity operator or add some additional checks. I went with adapting the HandleX86ArchUnsqueeze method since it seemed like the fastest solution.

### Description - Introduces `USE_<EP>_PROVIDER_INTERFACE` pre-processor macros that indicate when an EP interface is enabled but the full EP is not being compiled. - Previously, the CMake configuration turned on `USE_<EP>` for both use cases. This prevented tests from determining whether the full EP or only the interface was available, which caused test failures. It also turned on all EP code paths in core ORT code at the same time, which caused compilation and logic errors. - Adds the new NV EP to list of EPs whose interface is enabled with ORT is built with `--enable_generic_interface` - Updates the Windows Arm64 QNN CI Pipeline to actually use the `--enable_generic_interface` flag. - Previously, It was not actually being passed to the build command, so no unit tests were being run with the flag enabled. - Adds unit tests to check that adding an EP to the session options fails when only the generic interface (but not the full EP) is built. #### CI Pipelines that use --enable_generic_interface - Windows ARM64 QNN CI Pipeline: - Builds ORT with `--use_qnn --enable_generic_interface` and runs all normal QNN EP unit tests. - Builds ORT with `--use_qnn --enable_generic_interface` and runs new unit tests that try to add the following EPs to the session options (expect failure): OpenVINO, CUDA, NV, TensorRT, VitisAI - Build and Test OpenVINO EP (AlamLinux8, Py3.12) / build_test_pipeline: - Builds ORT with `--use_openvino --enable_generic_interface` and runs all normal OpenVINO EP unit tests. - Builds ORT with `--use_openvino --enable_generic_interface` and runs new unit tests that try to add the following EPs to the session options (expect failure): QNN, CUDA, NV, TensorRT, VitisAI - windows_x64_release_ep_generic_interface - Builds ORT with `--enable_generic_interface` and now runs CPU EP unit tests (didn't previously). ### Motivation and Context Fix use of `--enable_generic_interface` and make sure tests actually run.

### Description  Update Qnn nuget package to use Arm64x binary. Enable build with generic interface. Copy Qnn libs with Qnn ep project build instead of the test_all project. Update DML nuget package to enable generic interface, and pack the shared.dll into the package.

…ders/impl/gather_op_builder.cc. (microsoft#24609) ### Description  Fix unused variable warning in onnxruntime/core/providers/coreml/builders/impl/gather_op_builder.cc. ### Motivation and Context  Fix build.

…lls (microsoft#24606) ### Description Fixes microsoft#24500 - Fixes local build of onnxruntime.dll to have a valid version, such as "1.23.0", instead of the literal string "ORT_VERSION" - Adds version info to onnxruntime_providers_qnn.dll, onnxruntime_providers_cuda.dll, and onnxruntime_providers_tensorrt.dll. It was missing completely. This was done by adding `onnxruntime_providers_*.rc` files to define each EP's [DLL version info](https://learn.microsoft.com/en-us/windows/win32/menurc/versioninfo-resource). Fixed onnxruntime.dll version info (local non-ADO build): <img width="263" alt="image" src="https://github.com/user-attachments/assets/33ef85ea-ac36-4c6a-9171-8fe4fb35955d" /> Fixed onnxruntime_providers_qnn.dll version info (adds QNN SDK version too): <img width="275" alt="image" src="https://github.com/user-attachments/assets/a1f04604-2e3c-416d-989e-e92cb7df1776" /> ### Motivation and Context We create dlls with invalid or missing version info.

### Description This PR adds support for atomic types for program output. Applying atomic type on program output can be done in the following way: ```c++ program.AddOutput({output_tensor, ProgramTensorMetadataDependency::TypeAndRank, ProgramOutput::Atomic}); ``` The last ``` The support for atomic type is minimal. According to [spec](https://www.w3.org/TR/WGSL/#atomic-types), the only valid operations on atomic objects are the [atomic builtin functions](https://www.w3.org/TR/WGSL/#atomic-builtin-functions). This means atomic types cannot be accessed (get/set) using the normal way. Get* and Set* functions will not be working on atomic types for indices helper. Use the WGSL builtin functions directly. OffsetToIndices and IndicesToOffset functions still work.

@ankan-ban

### Description While cleaning up the options I missed the part in the provider bridge that translates session options to TRT options. To better integrate with current IHV work I adopted the principle that QNN and OV use to pipe through session options. Since all this is string based magic it would be great to be access a general point of truth like `EpContextModelGenerationOptions` in the provider wrappedtypes. https://github.com/microsoft/onnxruntime/blob/6df620675290d97d7e406faf232b8b521333b6e8/onnxruntime/core/framework/session_options.h#L73 This is a fix on top of microsoft#24456 @ankan-ban and @chilo-ms to review.

### Description  Win TRT version was set to 10.8 when CI was migrating to Github Actions. Reset to the latest 10.9. Linux TRT CI and other packaging CIs have no issue as they are correctly set to 10.9. ### Motivation and Context

[QNN EP] Add Einsum support for some equations. Intend is not to support all equations. But to enable case by case to improve performance.

zhaoxul-qti and others added 30 commits April 24, 2025 09:09

[QNN-EP] Support for Upsample operator (microsoft#24265)

c162821

### Description Add support to Upsample operator to op builder in QNN-EP. ### Motivation and Context - Enhance QNN-EP support for Upsample operator. - Add unit test for Upsample operator in QNN-EP.

[webgpu] Fix bug in 1D dispatch workgroups (microsoft#24519)

5c014e2

Fixed the bug in microsoft#24228 which causes the incorrect result for phi models when flash attention is disabled.

webgpu: fix InstanceNorm errors (microsoft#24514)

e9bb150

### Description  ### Motivation and Context

webgpu: fix conv error when using MatMulNaiveProgram (microsoft#24496)

1e118d6

### Description  ### Motivation and Context

Auto-generated baselines by 1ES Pipeline Templates (microsoft#24541)

57c6cca

Remove executable permission bit from source files (microsoft#24413)

392961f

These are source files, not executables, do not set the executable permission bit on these files.

Auto-generated baselines by 1ES Pipeline Templates (microsoft#24559)

b35302a

Add macos to custom package (microsoft#24574)

edb7a2a

skottmckay and others added 27 commits April 29, 2025 13:52

[webgpu] Fix multihead-attention for ort-web-tests (microsoft#24485)

4513476

Compute 'total_sequence_length' the same way as JSEP.

[webgpu] Enable matmul8bits for dp4/subgroupMatrix path (microsoft#24590

e5ba591

) This PR enables matmul8bits for the dp4/subgroupMatrix path in webgpu. This PR is separated from microsoft#24546 for easier review.

Make sure the macosx binaries get signed (microsoft#24605)

e9c003e

Update ESRP settings (microsoft#24608)

72ec0e8

[QNN EP] Add Einsum support for some equations (microsoft#24616)

f0d3c33

[QNN EP] Add Einsum support for some equations. Intend is not to support all equations. But to enable case by case to improve performance.

Merge branch 'master' into sync_msft_2_5_25

9dd6a1a

jatinwadhwa921 requested a review from ankitm3k May 2, 2025 05:54

ankitm3k approved these changes May 2, 2025

View reviewed changes

jatinwadhwa921 merged commit e354009 into ovep-develop May 2, 2025
4 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Backmerging with Msft commits #683

Backmerging with Msft commits #683

Uh oh!

jatinwadhwa921 commented May 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

39 participants

Backmerging with Msft commits #683

Backmerging with Msft commits #683

Uh oh!

Conversation

jatinwadhwa921 commented May 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

39 participants