You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AGENTS.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,6 +31,7 @@ If I tell you to remember something, you do the same, update
31
31
- Image placeholders must emit Markdown image links (``) that reference persisted artifacts; only fall back to bold text when no file is available.
32
32
- If AI image enrichment yields no insight, log and continue instead of throwing—treat empty payloads as a soft failure.
33
33
- When executing tests, always include the `ManualConversionDebugTests` suite; treat its failures as blocking.
34
+
- Always run the full test suite after making changes and share the results with the user.
34
35
- Telemetry work: instrument both overall document processing time and per-page duration with real metrics alongside traces—include histogram/counter coverage so latency is observable at both levels.
35
36
- For large converters, structure them as partial classes and split related files into a dedicated subfolder.
36
37
- Markdown hygiene: strip non-breaking, zero-width, or other non-printable spaces; replace them with regular ASCII spaces so output never contains invisible characters like the long space before `Add`.
-`DocumentConverterResult` exposes `Markdown`, `Title`, `Segments`, `Artifacts`, and `Metadata` for downstream processing.
267
267
- Apply custom behaviour through `MarkItDownOptions` (segment settings, AI providers, middleware) when constructing the client.
268
268
269
+
### Metadata Keys
270
+
271
+
The `MetadataKeys` static class centralises every metadata field the converters emit so you never have to guess string names. Use these constants when inspecting `DocumentConverterResult.Metadata`, per-segment metadata, or artifact metadata:
272
+
273
+
```csharp
274
+
awaitusingvarclient=newMarkItDownClient();
275
+
varresult=awaitclient.ConvertAsync(path);
276
+
277
+
if (result.Metadata.TryGetValue(MetadataKeys.DocumentTitle, outvartitle))
278
+
{
279
+
Console.WriteLine($"Detected title: {title}");
280
+
}
281
+
282
+
foreach (vartableinresult.Artifacts.Tables)
283
+
{
284
+
if (table.Metadata.TryGetValue(MetadataKeys.TableComment, outvarcomment))
285
+
{
286
+
Console.WriteLine(comment);
287
+
}
288
+
}
289
+
```
290
+
291
+
Notable keys include `MetadataKeys.TableComment` (table span hints), `MetadataKeys.EmailAttachments` (EML attachment summary), `MetadataKeys.NotebookCellsCount` (Jupyter statistics), and `MetadataKeys.ArchiveEntry` (ZIP entry provenance). Refer to `src/MarkItDown/Utilities/MetadataKeys.cs` for the full catalog; new format handlers add their metadata there so downstream consumers can rely on stable identifiers.
292
+
269
293
### CLI
270
294
271
295
Prefer a guided experience? Run the bundled CLI to batch files or URLs:
@@ -276,6 +300,18 @@ dotnet run --project src/MarkItDown.Cli -- path/to/input
276
300
277
301
Use `dotnet publish` with your preferred runtime identifier if you need a self-contained binary.
278
302
303
+
Each run now surfaces the document title plus quick stats (pages, images, tables, attachments) in the conversion summary. These numbers come straight from `MetadataKeys` so the CLI mirrors what you see when processing results programmatically.
304
+
305
+
#### Cloud Provider Configuration Prompts
306
+
307
+
Choose **Configure cloud providers** in the CLI to register AI integrations without writing code. The prompts map directly to the corresponding option objects:
308
+
309
+
-**Azure** → `AzureIntelligenceOptions` (`DocumentIntelligence`, `Vision`, `Media`) and supports endpoints, API keys/tokens, and Video Indexer account metadata.
310
+
-**Google** → `GoogleIntelligenceOptions` with credentials for Vertex AI or Speech services.
311
+
-**AWS** → `AwsIntelligenceOptions` for Rekognition/Transcribe style integrations.
312
+
313
+
You can leave a prompt blank to keep the current value, or enter `-` to clear it. The saved settings are applied to every subsequent conversion until you change them or use **Clear all**. Combine these prompts with the metadata counts above to validate that enrichment providers are wired up correctly.
0 commit comments