flutter_ort_plugin

Flutter plugin for ONNX Runtime inference via Dart FFI. Load .onnx models and run them natively on Android, iOS, and Linux.

ONNX Runtime version: 1.24.1

Platform Support

Platform	Minimum Version	Execution Providers	Status
Android	API 24 (Android 7.0)	WebGPU, NNAPI, XNNPACK, CPU	✅ Full support
iOS	iOS 15.1	CoreML, CPU	✅ Full support
Linux	Any	CPU only	✅ CPU support

Installation

dependencies:
  flutter_ort_plugin:
    git:
      url: https://github.com/adrinator/flutter_ort_plugin.git

Platform setup

iOS

Minimum version: iOS 15.1
Dependency: CocoaPods onnxruntime-c (linked automatically by the plugin)

Recommended steps:

cd ios
pod install

If you run into linking/symbol issues when using DynamicLibrary.process() on iOS, ensure the plugin is properly registered in your app (Flutter plugin registrant) and do a clean build.

Android

Minimum SDK: Android 24 (minSdk 24)
Runtime: Custom-compiled ONNX Runtime 1.24.1 from official repository (includes WebGPU/NNAPI/XNNPACK providers)
NDK: required to build the plugin FFI target (see the NDK version configured in the plugin)

ONNX Runtime Build Strategies

The plugin supports two different ONNX Runtime builds for Android:

Strategy	Description	Size	When to Use
Standard	Basic CPU execution only	Smaller	Simple models, CPU-only inference
Providers	Full provider support (WebGPU, NNAPI, XNNPACK)	Larger	Performance-critical apps with GPU/NPU

Building with Custom Strategy

By default, the plugin uses the standard build. To use the providers build with WebGPU/NNAPI/XNNPACK support:

# Build with providers (includes WebGPU, NNAPI, XNNPACK)
flutter build apk --android-project-arg=ORT_STRATEGY=providers

# Or for App Bundle
flutter build appbundle --android-project-arg=ORT_STRATEGY=providers

# For debug builds
flutter build apk --debug --android-project-arg=ORT_STRATEGY=providers

Provider Requirements

WebGPU: Requires Android device with GPU support and Vulkan drivers
NNAPI: Requires Android API 27+ for best compatibility
XNNPACK: Works on all ARM devices (NEON SIMD)

The providers build is larger but enables hardware acceleration. Use the standard build for smaller app size or if you only need CPU inference.

Performance Considerations

CPU vs Providers: Many models actually perform better with CPU inference than with hardware providers, especially:

Small to medium-sized models (<50MB)
Models with many small operations
Models not optimized for mobile GPUs/NPUs
First-generation inference (warm-up overhead on providers)

Recommendation: Always test both strategies with your specific model:

# Test standard CPU build
flutter build apk --debug
# Run benchmarks with your model

# Test providers build
flutter build apk --debug --android-project-arg=ORT_STRATEGY=providers
# Run benchmarks with your model

# Compare inference latency and accuracy

The providers build shines with large models (>100MB) and operations well-suited for parallel GPU execution, but don't assume it's always faster.

Quick Start

import 'dart:typed_data';
import 'package:flutter_ort_plugin/flutter_ort_plugin.dart';

// 1. Initialize runtime (once)
final runtime = OnnxRuntime.instance;
runtime.initialize();
runtime.createEnvironment();

// 2. Load model (auto-selects best provider for the platform)
final session = OrtSessionWrapper.create('path/to/model.onnx');

// 3. Create input tensor
final input = OrtValueWrapper.fromFloat(
  runtime,
  [1, 3, 224, 224],  // shape
  Float32List(1 * 3 * 224 * 224),  // data
);

// 4. Run inference -> pure Dart output
final results = session.runFloat(
  {session.inputNames.first: input},
  [1000],  // output element count
);

final predictions = results.first; // Float32List

// 5. Cleanup
input.release();
session.dispose();
runtime.dispose();

Avoid UI freezes (Isolate)

FFI calls are synchronous. Heavy model loading or inference can block the Flutter UI thread.

Use OrtIsolateSession to run everything in a background isolate:

final runtime = OnnxRuntime.instance;
runtime.initialize();
runtime.createEnvironment();

final session = await OrtIsolateSession.create(
  OrtIsolateSessionConfig(modelPath: 'path/to/model.onnx'),
);

final input = OrtIsolateInput(
  shape: [1, 1, 28, 28],
  data: Float32List(28 * 28),
);

final outputs = await session.runFloat(
  {session.inputNames.first: input},
  [10],
);

await session.dispose();

Execution Providers

The plugin auto-detects the best provider per platform:

Platform	Default providers	Supported	Notes
iOS	CoreML, CPU	✅ Fully	CoreML via dedicated config
Android	WebGPU, NNAPI, XNNPACK, CPU	✅ Fully	WebGPU via Dawn, NNAPI flags, XNNPACK threads
Linux	CPU	✅ CPU only	CPU execution provider

Note: Android providers (WebGPU, NNAPI, XNNPACK) require building with --android-project-arg=ORT_STRATEGY=providers. See Android setup section for details.

Provider Implementation Status

Provider	Status	Platform	Notes
CPU	✅ Ready	All	Always available, built-in
CoreML	✅ Ready	iOS	Apple Neural Engine/GPU acceleration
WebGPU	✅ Ready	Android	GPU acceleration via Dawn/WebGPU support
NNAPI	✅ Ready	Android	NPU/GPU with FP16/NCHW/CPU-disabled flags
XNNPACK	✅ Ready	Android	CPU SIMD optimization with thread config
QNN	⚠️ Generic only	Android	Qualcomm NPU via generic API

Automatic (default)

// Providers are selected automatically
final session = OrtSessionWrapper.create('model.onnx');

Manual

final session = OrtSessionWrapper.createWithProviders(
  'model.onnx',
  providers: [OrtProvider.coreML, OrtProvider.cpu],
  providerOptions: {
    OrtProvider.coreML: {'MLComputeUnits': 'ALL'},
  },
);

XNNPACK (Android optimized CPU)

XNNPACK is the recommended provider for Android devices without a dedicated NPU:

import 'package:flutter_ort_plugin/flutter_ort_plugin.dart';

final session = OrtSessionWrapper.createWithProviders(
  'model.onnx',
  providers: [OrtProvider.xnnpack, OrtProvider.cpu],
  providerOptions: {
    OrtProvider.xnnpack: XnnpackOptions(
      numThreads: 4,  // Use 4 threads (default: all cores)
    ).toMap(),
  },
);

NNAPI (Android NPU/GPU)

NNAPI supports hardware acceleration but may have compatibility issues with some models:

final session = OrtSessionWrapper.createWithProviders(
  'model.onnx',
  providers: [OrtProvider.nnapi, OrtProvider.cpu],
  providerOptions: {
    OrtProvider.nnapi: {
      'use_fp16': 'true',      // Use FP16 for faster inference
      'use_nchw': 'false',     // Keep NHWC format
    },
  },
);

WebGPU (Android GPU acceleration)

WebGPU provides hardware-accelerated inference on Android devices with GPU support:

final session = OrtSessionWrapper.createWithProviders(
  'model.onnx',
  providers: [OrtProvider.webGpu, OrtProvider.cpu],
  providerOptions: {
    // WebGPU options can be added here if needed
    OrtProvider.webGpu: {},
  },
);

Querying available providers

final providers = OrtProviders(OnnxRuntime.instance);

providers.getAvailableProviders();
// ['WebGpuExecutionProvider', 'NnapiExecutionProvider', 'CPUExecutionProvider']

providers.isProviderAvailable(OrtProvider.webGpu); // true

Performance Tuning

Fine-tune session options for optimal performance on your target device:

import 'package:flutter_ort_plugin/flutter_ort_plugin.dart';

final session = OrtSessionWrapper.create(
  'model.onnx',
  sessionConfig: SessionConfig(
    intraOpThreads: 4,                    // Threads within ops (0 = ORT default)
    interOpThreads: 1,                    // Threads across ops (0 = ORT default)
    graphOptimizationLevel: GraphOptLevel.all, // Max graph optimizations
    executionMode: ExecutionMode.sequential,    // Better on mobile
  ),
);

Android Big.LITTLE Optimization

For Android devices with heterogeneous cores, limit intra-op threads to avoid contention:

final session = OrtSessionWrapper.create(
  'model.onnx',
  sessionConfig: SessionConfig.androidOptimized, // Pre-configured for Android
);

Available Options

Option	Values	Description
`intraOpThreads`	`0` (auto) or integer	Parallelism within a single operation
`interOpThreads`	`0` (auto) or integer	Parallelism across independent nodes
`graphOptimizationLevel`	`disabled`/`basic`/`extended`/`all`	Graph transformation aggressiveness
`executionMode`	`sequential`/`parallel`	Node execution order

API Overview

High-Level (no FFI pointers)

Class	Purpose
`OrtSessionWrapper`	Load model, run inference, manage lifecycle
`OrtValueWrapper`	Create/read tensors with Dart types
`OrtProviders`	Query and configure execution providers
`OrtIsolateSession`	Run inference off the UI thread (background isolate)

OrtSessionWrapper

// Auto providers
OrtSessionWrapper.create(modelPath);
OrtSessionWrapper.create(modelPath, providerOptions: { ... });

// Manual providers
OrtSessionWrapper.createWithProviders(modelPath, providers: [...]);

// Inference
session.run(inputs)         // -> List<OrtValueWrapper>
session.runFloat(inputs, outputSizes) // -> List<Float32List>

// Metadata
session.inputNames   // List<String>
session.outputNames  // List<String>

session.dispose();

OrtValueWrapper

// Create
OrtValueWrapper.fromFloat(runtime, shape, float32Data);
OrtValueWrapper.fromInt64(runtime, shape, int64Data);

// Read
value.toFloatList(elementCount); // -> Float32List

value.release();

Low-Level (FFI pointers)

For advanced use cases, OnnxRuntime and OrtTensor expose the full C API with raw pointers. The generated bindings are also exported for direct access.

final rt = OnnxRuntime.instance;
final options = rt.createSessionOptions();
final session = rt.createSession('model.onnx', options);

final tensor = OrtTensor(rt);
final input = tensor.createFloat([1, 3], data);

final outputs = rt.run(session,
  inputNames: ['input'],
  inputValues: [input],
  outputNames: ['output'],
);

final result = tensor.getDataFloat(outputs.first, 10);

// Manual cleanup required
tensor.release(input);
for (final o in outputs) { tensor.release(o); }
rt.releaseSession(session);
rt.releaseSessionOptions(options);

Example

The example/ app demonstrates real-world computer vision inference with YOLO models and includes comprehensive performance tuning:

YOLO Setup: Model selection, provider configuration, and performance tuning UI
Camera Detection: Real-time YOLO inference on camera feed with FPS/inference stats
Image Detection: Static image inference with bounding box overlay
Video Detection: Frame-by-frame inference on video with detection overlay
Performance Tuning: Configure threading, graph optimization, and execution mode
Execution Providers: Test different providers (WebGPU, NNAPI, XNNPACK, CoreML)

Features demonstrated:

Dynamic model loading (.onnx/.ort formats)
Platform-aware provider selection (WebGPU/NNAPI/XNNPACK on Android, CoreML on iOS)
Session configuration for Android Big.LITTLE optimization
Provider-specific options (NNAPI flags, XNNPACK threads, CoreML compute units)
Background isolate inference to prevent UI freezes

cd example
flutter run

Regenerating Bindings

dart run ffigen --config ffigen.yaml

Recent Changes

v1.0.3+

WebGPU Support: Added WebGPU execution provider for Android GPU acceleration
Session Configuration: New SessionConfig class for fine-tuning performance
- Intra-op/inter-op thread control
- Graph optimization levels (disabled → all)
- Execution modes (sequential/parallel)
- Android Big.LITTLE optimization preset
Performance Tuning UI: Example app now includes comprehensive tuning controls
Video Detection: Fixed playback stuttering with self-scheduling inference loop
Provider Summary: Fixed provider options display to respect manual selection

Provider Priority Updates

Android now prioritizes GPU providers: WebGPU → NNAPI → XNNPACK → CPU
iOS: CoreML → CPU
Linux: CPU only

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
android		android
example		example
include/onnxruntime/core		include/onnxruntime/core
ios		ios
lib		lib
linux		linux
scripts		scripts
src		src
.gitignore		.gitignore
.metadata		.metadata
.pubignore		.pubignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
analysis_options.yaml		analysis_options.yaml
ffigen.yaml		ffigen.yaml
flutter_ort_plugin.code-workspace		flutter_ort_plugin.code-workspace
pubspec.yaml		pubspec.yaml

License

certainlyWrong/flutter_ort_plugin

Folders and files

Latest commit

History

Repository files navigation

flutter_ort_plugin

Platform Support

Installation

Platform setup

iOS

Android

ONNX Runtime Build Strategies

Building with Custom Strategy

Provider Requirements

Performance Considerations

Quick Start

Avoid UI freezes (Isolate)

Execution Providers

Provider Implementation Status

Automatic (default)

Manual

XNNPACK (Android optimized CPU)

NNAPI (Android NPU/GPU)

WebGPU (Android GPU acceleration)

Querying available providers

Performance Tuning

Android Big.LITTLE Optimization

Available Options

API Overview

High-Level (no FFI pointers)

OrtSessionWrapper

OrtValueWrapper

Low-Level (FFI pointers)

Example

Regenerating Bindings

Recent Changes

v1.0.3+

Provider Priority Updates

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages