Upstream: dotnet#7345
Status: COMPLETE
Classification: bug-report
Confidence: 0.75
Reproduced: ⏭️ Skipped
Area: AutoML
Investigated at: 2026-03-07T08:51:18Z
Triage Summary
Category: Bug Report
Reasoning: The user reports PFI taking forever (hanging) on certain AutoML-generated regression models, while working fine on others trained against the same data. A contributor confirmed the root cause — AutoML generated an abnormally large number of features for one model, and since PFI runtime scales directly with feature count, this causes the hang. The unexpected explosion in feature count from AutoML is the defect.
Summary: When using AutoML to create regression models, PFI (Permutation Feature Importance) hangs indefinitely on some models while completing quickly on others trained with the same data. Investigation in comments reveals AutoML generated an unusually large number of features for the slow model. The secondary concern is the lack of a cancellation/progress mechanism for PFI.
Suggested Labels: bug, needs-info
Reproduction Results
Reproduction was skipped per workflow configuration.
Additional Context
From the issue comments:
- The fast model has a manageable number of features
- The slow model has an unexpectedly large number of features generated by AutoML
- PFI runtime is O(n_features), so an explosion in feature count directly causes the hang
- The open question is why AutoML generates that many features for some training runs
- Contributor
@michaelgsharp and @LittleLittleCloud (assignee) are investigating
- User provided learner code and data in a follow-up ZIP attachment
Root Cause Analysis:
The performance issue is caused by AutoML's feature engineering pipeline generating an excessive number of features for certain inputs, likely due to interaction or cross-product feature generation with many categorical values or unbounded feature expansion. PFI permutes each feature independently, so O(n_features × dataset_size) evaluations are required, making this prohibitively slow when n_features is very large.
Suggested Fix
Files:
src/Microsoft.ML.AutoML/ — investigate feature engineering pipeline for unbounded expansion
src/Microsoft.ML.Transforms/ — PFI implementation could benefit from cancellation token support
Description: Two complementary fixes: (1) add a cap or warning when AutoML's feature engineering produces an unusually large number of output features; (2) expose a CancellationToken parameter in the PFI API so callers can time out or cancel long-running evaluations.
Complexity: Medium
Generated by Triage Single Issue · ◷
Upstream: dotnet#7345
Status: COMPLETE
Classification: bug-report
Confidence: 0.75
Reproduced: ⏭️ Skipped
Area: AutoML
Investigated at: 2026-03-07T08:51:18Z
Triage Summary
Category: Bug Report
Reasoning: The user reports PFI taking forever (hanging) on certain AutoML-generated regression models, while working fine on others trained against the same data. A contributor confirmed the root cause — AutoML generated an abnormally large number of features for one model, and since PFI runtime scales directly with feature count, this causes the hang. The unexpected explosion in feature count from AutoML is the defect.
Summary: When using AutoML to create regression models, PFI (Permutation Feature Importance) hangs indefinitely on some models while completing quickly on others trained with the same data. Investigation in comments reveals AutoML generated an unusually large number of features for the slow model. The secondary concern is the lack of a cancellation/progress mechanism for PFI.
Suggested Labels: bug, needs-info
Reproduction Results
Reproduction was skipped per workflow configuration.
Additional Context
From the issue comments:
@michaelgsharpand@LittleLittleCloud(assignee) are investigatingRoot Cause Analysis:
The performance issue is caused by AutoML's feature engineering pipeline generating an excessive number of features for certain inputs, likely due to interaction or cross-product feature generation with many categorical values or unbounded feature expansion. PFI permutes each feature independently, so O(n_features × dataset_size) evaluations are required, making this prohibitively slow when n_features is very large.
Suggested Fix
Files:
src/Microsoft.ML.AutoML/— investigate feature engineering pipeline for unbounded expansionsrc/Microsoft.ML.Transforms/— PFI implementation could benefit from cancellation token supportDescription: Two complementary fixes: (1) add a cap or warning when AutoML's feature engineering produces an unusually large number of output features; (2) expose a
CancellationTokenparameter in the PFI API so callers can time out or cancel long-running evaluations.Complexity: Medium