Skip to content

Commit b1b5508

Browse files
committed
Update OpenVINO ExecutionProvider documentation version 2
1 parent bf84c2f commit b1b5508

File tree

1 file changed

+130
-13
lines changed

1 file changed

+130
-13
lines changed

docs/execution-providers/OpenVINO-ExecutionProvider.md

Lines changed: 130 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -66,12 +66,6 @@ $ source <openvino_install_directory>/setupvars.sh
6666

6767
To use csharp api for openvino execution provider create a custom nuget package. Follow the instructions [here](../build/inferencing.md#build-nuget-packages) to install prerequisites for nuget creation. Once prerequisites are installed follow the instructions to [build openvino execution provider](../build/eps.md#openvino) and add an extra flag `--build_nuget` to create nuget packages. Two nuget packages will be created Microsoft.ML.OnnxRuntime.Managed and Intel.ML.OnnxRuntime.Openvino.
6868

69-
# OpenVINO Execution Provider Configuration
70-
71-
## Table of Contents
72-
- [Provider Options](#configuration-options)
73-
- [Provider Descriptions](#configuration-descriptions)
74-
- [Examples](#examples)
7569

7670
## Configuration Options
7771

@@ -195,23 +189,146 @@ Enables model caching to significantly reduce subsequent load times. Supports CP
195189

196190
### `load_config`
197191

198-
- Loads custom OpenVINO properties from JSON configuration file during runtime.
192+
**Recommended Configuration Method** for setting OpenVINO runtime properties. Provides direct access to OpenVINO properties through a JSON configuration file during runtime.
193+
194+
#### Overview
195+
196+
`load_config` enables fine-grained control over OpenVINO inference behavior by loading properties from a JSON file. This is the **preferred method** for configuring advanced OpenVINO features, offering:
197+
198+
- Direct access to OpenVINO runtime properties
199+
- Device-specific configuration
200+
- Better compatibility with future OpenVINO releases
201+
- No property name translation required
202+
203+
#### JSON Configuration Format
204+
```json
205+
{
206+
"DEVICE_NAME": {
207+
"PROPERTY_KEY": "value"
208+
}
209+
}
210+
```
211+
212+
**Supported Device Names:**
213+
- `"CPU"` - Intel CPU
214+
- `"GPU"` - Intel integrated/discrete GPU
215+
- `"NPU"` - Intel Neural Processing Unit
216+
- `"AUTO"` - Automatic device selection
217+
218+
**Property Precedence**: `load_config` properties override legacy provider options when both are specified.
219+
220+
221+
---
222+
223+
#### Popular OpenVINO Properties
224+
225+
The following properties are commonly used for optimizing inference performance. For complete property definitions and all possible values, refer to the [OpenVINO properties](https://github.com/openvinotoolkit/openvino/blob/master/src/inference/include/openvino/runtime/properties.hpp) header file.
226+
##### Performance & Execution Hints
227+
228+
| Property | Valid Values | Description |
229+
|----------|-------------|-------------|
230+
| `PERFORMANCE_HINT` | `"LATENCY"`, `"THROUGHPUT"` | High-level performance optimization goal |
231+
| `EXECUTION_MODE_HINT` | `"ACCURACY"`, `"PERFORMANCE"` | Accuracy vs performance trade-off |
232+
| `INFERENCE_PRECISION_HINT` | `"f32"`, `"f16"`, `"bf16"` | Explicit inference precision |
233+
| `MODEL_PRIORITY` | `"LOW"`, `"MEDIUM"`, `"HIGH"`, `"DEFAULT"` | Model resource allocation priority |
234+
235+
**PERFORMANCE_HINT:**
236+
- `"LATENCY"`: Optimizes for low latency, single-stream inference
237+
- `"THROUGHPUT"`: Optimizes for high throughput, multi-stream inference
238+
239+
**EXECUTION_MODE_HINT:**
240+
- `"ACCURACY"`: Maintains model precision, dynamic precision selection
241+
- `"PERFORMANCE"`: Optimizes for speed, may use lower precision
199242

200-
**JSON Format:**
243+
**INFERENCE_PRECISION_HINT:**
244+
- `"f16"`: FP16 precision - recommended for GPU/NPU performance
245+
- `"f32"`: FP32 precision - highest accuracy
246+
- `"bf16"`: BF16 precision - balance between f16 and f32
201247

248+
> **Note:** CPU accepts `"f16"` hint in configuration but will upscale to FP32 during execution, as CPU only supports FP32 precision natively.
249+
##### Threading & Streams
250+
251+
| Property | Valid Values | Description |
252+
|----------|-------------|-------------|
253+
| `NUM_STREAMS` | Positive integer (e.g., `"1"`, `"4"`, `"8"`) | Number of parallel execution streams |
254+
| `INFERENCE_NUM_THREADS` | Integer | Maximum number of inference threads |
255+
| `COMPILATION_NUM_THREADS` | Integer | Maximum number of compilation threads |
256+
257+
**NUM_STREAMS:**
258+
- Controls parallel execution streams for throughput optimization
259+
- Higher values increase throughput for batch processing
260+
- Lower values optimize latency for real-time inference
261+
262+
**INFERENCE_NUM_THREADS:**
263+
- Controls CPU thread count for inference execution
264+
- Explicit value: Fixed thread count (e.g., `"4"` limits to 4 threads)
265+
266+
##### Caching Properties
267+
268+
| Property | Valid Values | Description |
269+
|----------|-------------|-------------|
270+
| `CACHE_DIR` | File path string | Model cache directory |
271+
| `CACHE_MODE` | `"OPTIMIZE_SIZE"`, `"OPTIMIZE_SPEED"` | Cache optimization strategy |
272+
273+
**CACHE_MODE:**
274+
- `"OPTIMIZE_SPEED"`: Faster model loading, larger cache files
275+
- `"OPTIMIZE_SIZE"`: Smaller cache files, slower loading
276+
277+
##### Logging Properties
278+
279+
| Property | Valid Values | Description |
280+
|----------|-------------|-------------|
281+
| `LOG_LEVEL` | `"LOG_NONE"`, `"LOG_ERROR"`, `"LOG_WARNING"`, `"LOG_INFO"`, `"LOG_DEBUG"`, `"LOG_TRACE"` | Logging verbosity level |
282+
> **Note:** `LOG_LEVEL` is not supported on GPU devices. Use with CPU or NPU for debugging purposes.
283+
284+
##### AUTO Device Properties
285+
286+
| Property | Valid Values | Description |
287+
|----------|-------------|-------------|
288+
| `ENABLE_STARTUP_FALLBACK` | `"YES"`, `"NO"` | Enable device fallback during model loading |
289+
| `ENABLE_RUNTIME_FALLBACK` | `"YES"`, `"NO"` | Enable device fallback during inference runtime |
290+
| `DEVICE_PROPERTIES` | Nested JSON string | Device-specific property configuration |
291+
292+
**DEVICE_PROPERTIES Syntax:**
293+
294+
Used to configure properties for individual devices when using AUTO mode.
202295
```json
203296
{
204-
"DEVICE_KEY": {"PROPERTY": "PROPERTY_VALUE"}
297+
"AUTO": {
298+
"DEVICE_PROPERTIES": "{CPU:{PROPERTY:value},GPU:{PROPERTY:value}}"
299+
}
205300
}
206301
```
207302

208-
**Validation**
303+
**Syntax Rules:**
304+
- Entire value is a single JSON string (enclosed in quotes)
305+
- No spaces between properties
306+
- No quotes around property names/values inside the nested structure
307+
- Device names are uppercase (CPU, GPU, NPU)
308+
309+
310+
#### Property Reference Documentation
311+
312+
For complete property definitions and advanced options, refer to the official OpenVINO properties header:
313+
314+
**[OpenVINO Runtime Properties](https://github.com/openvinotoolkit/openvino/blob/master/src/inference/include/openvino/runtime/properties.hpp)**
315+
316+
Property keys used in `load_config` JSON must match the string literal defined in the properties header file.
317+
318+
#### Migration from Legacy Provider Options
319+
320+
**Deprecation Notice**
209321

210-
- Invalid property keys are ignored with warnings. Invalid values cause execution exceptions. Immutable properties are skipped.
322+
The following provider options are **deprecated** and should be migrated to `load_config` for better compatibility with future releases.
211323

212-
**Common Properties:**
324+
| Deprecated Provider Option | `load_config` Equivalent | Recommended Migration |
325+
|---------------------------|------------------------|----------------------|
326+
| `precision="FP16"` | `INFERENCE_PRECISION_HINT` | `{"GPU": {"INFERENCE_PRECISION_HINT": "f16"}}` |
327+
| `precision="FP32"` | `INFERENCE_PRECISION_HINT` | `{"GPU": {"INFERENCE_PRECISION_HINT": "f32"}}` |
328+
| `precision="ACCURACY"` | `EXECUTION_MODE_HINT` | `{"GPU": {"EXECUTION_MODE_HINT": "ACCURACY"}}` |
329+
| `num_of_threads=8` | `INFERENCE_NUM_THREADS` | `{"CPU": {"INFERENCE_NUM_THREADS": "8"}}` |
330+
| `num_streams=4` | `NUM_STREAMS` | `{"GPU": {"NUM_STREAMS": "4"}}` |
213331

214-
`PERFORMANCE_HINT`, `EXECUTION_MODE_HINT`, `LOG_LEVEL`, `CACHE_DIR`, `INFERENCE_PRECISION_HINT`
215332

216333
---
217334

0 commit comments

Comments
 (0)