[triage] upstream#7345: PFI (Permutation Feature Importance) takes forever on some regression models created using AutoML

**Upstream:** https://github.com/dotnet/machinelearning/issues/7345
**Status:** COMPLETE
**Classification:** bug-report
**Confidence:** 0.75
**Reproduced:** ⏭️ Skipped
**Area:** AutoML
**Investigated at:** 2026-03-07T08:51:18Z

---

## Triage Summary

**Category:** Bug Report
**Reasoning:** The user reports PFI taking forever (hanging) on certain AutoML-generated regression models, while working fine on others trained against the same data. A contributor confirmed the root cause — AutoML generated an abnormally large number of features for one model, and since PFI runtime scales directly with feature count, this causes the hang. The unexpected explosion in feature count from AutoML is the defect.

**Summary:** When using AutoML to create regression models, PFI (Permutation Feature Importance) hangs indefinitely on some models while completing quickly on others trained with the same data. Investigation in comments reveals AutoML generated an unusually large number of features for the slow model. The secondary concern is the lack of a cancellation/progress mechanism for PFI.

**Suggested Labels:** bug, needs-info

## Reproduction Results

Reproduction was skipped per workflow configuration.

## Additional Context

From the issue comments:
- The fast model has a manageable number of features
- The slow model has an unexpectedly large number of features generated by AutoML
- PFI runtime is O(n_features), so an explosion in feature count directly causes the hang
- The open question is **why AutoML generates that many features** for some training runs
- Contributor `@michaelgsharp` and `@LittleLittleCloud` (assignee) are investigating
- User provided learner code and data in a follow-up ZIP attachment

**Root Cause Analysis:**
The performance issue is caused by AutoML's feature engineering pipeline generating an excessive number of features for certain inputs, likely due to interaction or cross-product feature generation with many categorical values or unbounded feature expansion. PFI permutes each feature independently, so O(n_features × dataset_size) evaluations are required, making this prohibitively slow when n_features is very large.

## Suggested Fix

**Files:**
- `src/Microsoft.ML.AutoML/` — investigate feature engineering pipeline for unbounded expansion
- `src/Microsoft.ML.Transforms/` — PFI implementation could benefit from cancellation token support

**Description:** Two complementary fixes: (1) add a cap or warning when AutoML's feature engineering produces an unusually large number of output features; (2) expose a `CancellationToken` parameter in the PFI API so callers can time out or cancel long-running evaluations.

**Complexity:** Medium




> Generated by [Triage Single Issue](https://github.com/JanKrivanek/machinelearning/actions/runs/22795914014) · [◷](https://github.com/search?q=repo%3AJanKrivanek%2Fmachinelearning+is%3Aissue+%22gh-aw-workflow-call-id%3A+JanKrivanek%2Fmachinelearning%2Ftriage-single-issue%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[triage] upstream#7345: PFI (Permutation Feature Importance) takes forever on some regression models created using AutoML #42

Triage Summary

Reproduction Results

Additional Context

Suggested Fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[triage] upstream#7345: PFI (Permutation Feature Importance) takes forever on some regression models created using AutoML #42

Description

Triage Summary

Reproduction Results

Additional Context

Suggested Fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions