Skip to content

A CLI tool to help documentation teams perform audit-related tasks across the docs monorepo.

License

Notifications You must be signed in to change notification settings

grove-platform/audit-cli

Repository files navigation

audit-cli

A Go CLI tool for performing audit-related tasks in the MongoDB documentation monorepo.

Table of Contents

Overview

This CLI tool helps with maintenance and audit-related tasks across MongoDB's documentation by:

  1. Extracting code examples or procedures from RST files into individual, testable files
  2. Searching files for specific patterns or substrings
  3. Analyzing reference relationships to understand file dependencies
  4. Comparing file contents across documentation versions to identify differences
  5. Following include directives to process entire documentation trees
  6. Counting documentation pages or tested code examples to track coverage and quality metrics

This CLI provides built-in handling for MongoDB-specific conventions like steps files, extracts, version comprehension, and template variables.

Installation

Build from Source

cd audit-cli/bin
go build ../

This creates an audit-cli executable in the bin directory.

Run Without Building

cd audit-cli
go run main.go [command] [flags]

Configuration

Monorepo Path Configuration

Some commands require a monorepo path (e.g., analyze composables, count tested-examples, count pages). You can configure the monorepo path in three ways, listed in order of priority:

1. Command-Line Argument (Highest Priority)

Pass the path directly to the command:

./audit-cli analyze composables /path/to/docs-monorepo
./audit-cli count tested-examples /path/to/docs-monorepo
./audit-cli count pages /path/to/docs-monorepo

2. Environment Variable

Set the AUDIT_CLI_MONOREPO_PATH environment variable:

export AUDIT_CLI_MONOREPO_PATH=/path/to/docs-monorepo
./audit-cli analyze composables
./audit-cli count tested-examples
./audit-cli count pages

3. Config File (Lowest Priority)

Create a .audit-cli.yaml file in either:

  • Current directory: ./.audit-cli.yaml
  • Home directory: ~/.audit-cli.yaml

Config file format:

monorepo_path: /path/to/docs-monorepo

Example:

# Create config file
cat > .audit-cli.yaml << EOF
monorepo_path: /Users/username/mongodb/docs-monorepo
EOF

# Now you can run commands without specifying the path
./audit-cli analyze composables
./audit-cli count tested-examples --for-product pymongo
./audit-cli count pages --count-by-project

Priority Example:

If you have all three configured, the command-line argument takes precedence:

# Config file has: monorepo_path: /config/path
# Environment has: AUDIT_CLI_MONOREPO_PATH=/env/path
# Command-line argument: /cmd/path

./audit-cli analyze composables /cmd/path  # Uses /cmd/path
./audit-cli analyze composables             # Uses /env/path (env overrides config)

File Path Resolution

File-based commands (e.g., extract code-examples, analyze usage, compare file-contents) support flexible path resolution. Paths can be specified in three ways:

1. Absolute Path

./audit-cli extract code-examples /full/path/to/file.rst
./audit-cli analyze usage /full/path/to/includes/fact.rst

2. Relative to Monorepo Root (if monorepo is configured)

If you have a monorepo path configured (via config file or environment variable), you can use paths relative to the monorepo root:

# With monorepo_path configured as /Users/username/mongodb/docs-monorepo
./audit-cli extract code-examples manual/manual/source/tutorial.rst
./audit-cli analyze usage manual/manual/source/includes/fact.rst
./audit-cli compare file-contents manual/manual/source/file.rst

3. Relative to Current Directory (fallback)

If the path doesn't exist relative to the monorepo, it falls back to the current directory:

./audit-cli extract code-examples ./local-file.rst
./audit-cli analyze includes ../other-dir/file.rst

Priority Order:

  1. If path is absolute → use as-is
  2. If monorepo is configured and path exists relative to monorepo → use monorepo-relative path
  3. Otherwise → resolve relative to current directory

This makes it convenient to work with files in the monorepo without typing full paths every time!

Usage

The CLI is organized into parent commands with subcommands:

audit-cli
├── extract          # Extract content from RST files
│   ├── code-examples
│   └── procedures
├── search           # Search through extracted content or source files
│   └── find-string
├── analyze          # Analyze RST file structures
│   ├── includes
│   ├── usage
│   ├── procedures
│   └── composables
├── compare          # Compare files across versions
│   └── file-contents
└── count            # Count code examples and documentation pages
    ├── tested-examples
    └── pages

Extract Commands

extract code-examples

Extract code examples from reStructuredText files into individual files. For details about what code example directives are supported and how, refer to the Supported rST Directives - Code Example Extraction section below.

Use Cases:

This command helps writers:

  • Examine all the code examples that make up a specific page or section
  • Split out code examples into individual files for migration to test infrastructure
  • Report on the number of code examples by language
  • Report on the number of code examples by directive type
  • Use additional commands, such as search, to find strings within specific code examples

Basic Usage:

# Extract from a single file
./audit-cli extract code-examples path/to/file.rst -o ./output

# Extract from a directory (non-recursive)
./audit-cli extract code-examples path/to/docs -o ./output

# Extract recursively from all subdirectories
./audit-cli extract code-examples path/to/docs -o ./output -r

# Extract recursively and preserve directory structure
./audit-cli extract code-examples path/to/docs -o ./output -r --preserve-dirs

# Follow include directives
./audit-cli extract code-examples path/to/file.rst -o ./output -f

# Combine recursive scanning and include following
./audit-cli extract code-examples path/to/docs -o ./output -r -f

# Dry run (show what would be extracted without writing files)
./audit-cli extract code-examples path/to/file.rst -o ./output --dry-run

# Verbose output
./audit-cli extract code-examples path/to/file.rst -o ./output -v

Flags:

  • -o, --output <dir> - Output directory for extracted files (default: ./output)
  • -r, --recursive - Recursively scan directories for RST files. If you do not provide this flag, the tool will only extract code examples from the top-level RST file. If you do provide this flag, the tool will recursively scan all subdirectories for RST files and extract code examples from all files.
  • --preserve-dirs - Preserve directory structure in output (use with --recursive). By default, all extracted files are written to a flat structure in the output directory. When this flag is enabled with --recursive, the tool will preserve the directory structure relative to the input directory. For example, if extracting from docs/source/ and a file is located at docs/source/includes/example.rst, the output will be written to output/includes/example.*.ext instead of output/example.*.ext.
  • -f, --follow-includes - Follow .. include:: directives in RST files. If you do not provide this flag, the tool will only extract code examples from the top-level RST file. If you do provide this flag, the tool will follow any .. include:: directives in the RST file and extract code examples from all included files. When combined with -r, the tool will recursively scan all subdirectories for RST files and follow .. include:: directives in all files. If an include filepath is outside the input directory, the -r flag would not parse it, but the -f flag would follow the include directive and parse the included file. This effectively lets you parse all the files that make up a single page, if you start from the page's root .txt file.
  • --dry-run - Show what would be extracted without writing files
  • -v, --verbose - Show detailed processing information

Output Format:

Extracted files are named: {source-base}.{directive-type}.{index}.{ext}

Examples:

  • my-doc.code-block.1.js - First code-block from my-doc.rst
  • my-doc.literalinclude.2.py - Second literalinclude from my-doc.rst
  • my-doc.io-code-block.1.input.js - Input from first io-code-block
  • my-doc.io-code-block.1.output.json - Output from first io-code-block

Report:

After extraction, the code extraction report shows:

  • Number of files traversed
  • Number of output files written
  • Code examples by language
  • Code examples by directive type

extract procedures

Extract unique procedures from reStructuredText files into individual files. This command parses procedures and creates one file per unique procedure (grouped by heading and content). Each procedure file represents a distinct piece of content, even if it appears in multiple selections or variations.

Use Cases:

This command helps writers:

  • Extract all unique procedures from a page for testing or migration
  • Generate individual procedure files for each distinct procedure
  • Understand how many different procedures exist in a document
  • Create standalone procedure files for reuse or testing
  • See which selections each procedure appears in

Basic Usage:

# Extract all unique procedures from a file
./audit-cli extract procedures path/to/file.rst -o ./output

# Extract only procedures that appear in a specific selection
./audit-cli extract procedures path/to/file.rst -o ./output --selection "driver, nodejs"

# Dry run (show what would be extracted without writing files)
./audit-cli extract procedures path/to/file.rst -o ./output --dry-run

# Verbose output (shows all selections each procedure appears in)
./audit-cli extract procedures path/to/file.rst -o ./output -v

# Expand include directives inline
./audit-cli extract procedures path/to/file.rst -o ./output --expand-includes

Flags:

  • -o, --output <dir> - Output directory for extracted procedure files (default: ./output)
  • --selection <value> - Extract only procedures that appear in a specific selection (e.g., "python", "driver, nodejs")
  • --expand-includes - Expand include directives inline instead of preserving them
  • --dry-run - Show what would be extracted without writing files
  • -v, --verbose - Show detailed processing information including all selections each procedure appears in

Output Format:

Extracted files are named: {heading}_{first-step-title}_{hash}.rst

The filename includes:

  • Heading: The section heading above the procedure
  • First step title: The title of the first step (for readability)
  • Hash: A short 6-character hash of the content (for uniqueness)

Examples:

  • before-you-begin_pull-the-mongodb-docker-image_e8eeec.rst
  • install-mongodb-community-edition_download-the-tarball_44c437.rst
  • configuration_create-the-data-and-log-directories_f1d35b.rst

Verbose Output:

With the -v flag, the command shows detailed information about each procedure:

Found 36 unique procedures:

1. Before You Begin
   Output file: before-you-begin-pull-the-mongodb-docker-image-e8eeec.rst
   Steps: 5
   Appears in 2 selections:
     - docker, None, None, None, None, None, without-search-docker
     - docker, None, None, None, None, None, with-search-docker

2. Install MongoDB Community Edition
   Output file: install-mongodb-community-edition-download-the-tarball-44c437.rst
   Steps: 4
   Appears in 1 selections:
     - macos, None, None, tarball, None, None, None

Supported Procedure Types:

The command recognizes and extracts:

  • .. procedure:: directives with .. step:: directives
  • Ordered lists (numbered or lettered) as procedures
  • .. tabs:: directives with :tabid: options for variations
  • .. composable-tutorial:: directives with .. selected-content:: blocks
  • Sub-procedures (ordered lists within steps)
  • YAML steps files (automatically converted to RST format)

How Uniqueness is Determined:

Procedures are grouped by:

  1. Heading: The section heading above the procedure
  2. Content hash: A hash of the procedure's steps and content

This means:

  • Procedures with the same heading but different content are treated as separate unique procedures
  • Procedures with identical content that appear in multiple selections are extracted once
  • The output file shows all selections where that procedure appears (visible with -v flag)

Report:

After extraction, the report shows:

  • Number of unique procedures extracted
  • Number of files written
  • Detailed list of procedures with step counts and selections (with -v flag)

Search Commands

search find-string

Search through files for a specific substring. Can search through extracted code example or procedure files or RST source files.

Default Behavior:

  • Case-insensitive search (matches "curl", "CURL", "Curl", etc.)
  • Exact word matching (excludes partial matches like "curl" in "libcurl")

Use --case-sensitive to make the search case-sensitive, or --partial-match to allow matching the substring as part of larger words.

Use Cases:

This command helps writers:

  • Find specific strings across documentation files or pages
    • Search for product names, command names, API methods, or other strings that may need to be updated
  • Understand the number of references and impact of changes across documentation files or pages
  • Identify files that need to be updated when a string needs to be changed
  • Scope work related to specific changes

Basic Usage:

# Search in a single file (case-insensitive, exact word match)
./audit-cli search find-string path/to/file.js "curl"

# Search in a directory (non-recursive)
./audit-cli search find-string path/to/output "substring"

# Search recursively
./audit-cli search find-string path/to/output "substring" -r

# Search an RST file and all files it includes
./audit-cli search find-string path/to/source.rst "substring" -f

# Search a directory recursively and follow includes in RST files
./audit-cli search find-string path/to/source "substring" -r -f

# Verbose output (show file paths and language breakdown)
./audit-cli search find-string path/to/output "substring" -r -v

# Case-sensitive search (only matches exact case)
./audit-cli search find-string path/to/output "CURL" --case-sensitive

# Partial match (includes "curl" in "libcurl")
./audit-cli search find-string path/to/output "curl" --partial-match

# Combine flags for case-sensitive partial matching
./audit-cli search find-string path/to/output "curl" --case-sensitive --partial-match

Flags:

  • -r, --recursive - Recursively scan directories for RST files. If you do not provide this flag, the tool will only search within the top-level RST file or directory. If you do provide this flag, the tool will recursively scan all subdirectories for RST files and search across all files.
  • -f, --follow-includes - Follow .. include:: directives in RST files. If you do not provide this flag, the tool will search only the top-level RST file or directory. If you do provide this flag, the tool will follow any .. include:: directives in any RST file in the input path and search across all included files. When combined with -r, the tool will recursively scan all subdirectories for RST files and follow .. include:: directives in all files. If an include filepath is outside the input directory, the -r flag would not parse it, but the -f flag would follow the include directive and search the included file. This effectively lets you parse all the files that make up a single page, if you start from the page's root .txt file.
  • -v, --verbose - Show file paths and language breakdown
  • --case-sensitive - Make search case-sensitive (default: case-insensitive)
  • --partial-match - Allow partial matches within words (default: exact word matching)

Report:

The search report shows:

  • Number of files scanned
  • Number of files containing the substring (each file counted once)

With -v flag, also shows:

  • List of file paths where substring appears
  • Count broken down by language (file extension)

Analyze Commands

analyze includes

Analyze include directive relationships in RST files to understand file dependencies.

This command recursively follows .. include:: directives to show all files that are referenced from a starting file. This helps you understand which content is transcluded into a page.

Use Cases:

This command helps writers:

  • Understand the impact of changes to widely-included files
  • Identify files included multiple times
  • Document file relationships for maintenance
  • Plan refactoring of complex include structures
  • See what content is actually pulled into a page

Basic Usage:

# Analyze a single file (shows summary)
./audit-cli analyze includes path/to/file.rst

# Show hierarchical tree structure
./audit-cli analyze includes path/to/file.rst --tree

# Show flat list of all included files
./audit-cli analyze includes path/to/file.rst --list

# Show both tree and list
./audit-cli analyze includes path/to/file.rst --tree --list

# Verbose output (show processing details)
./audit-cli analyze includes path/to/file.rst --tree -v

Flags:

  • --tree - Display results as a hierarchical tree structure
  • --list - Display results as a flat list of all files
  • -v, --verbose - Show detailed processing information

Output Formats:

Summary (default - no flags):

============================================================
INCLUDE ANALYSIS SUMMARY
============================================================
Root File: /path/to/file.rst
Unique Files: 18
Include Directives: 56
Max Depth: 2
============================================================

Use --tree to see the hierarchical structure
Use --list to see a flat list of all files
  • Root file path
  • Number of unique files discovered
  • Total number of include directive instances (counting duplicates)
  • Maximum depth of include nesting
  • Hints to use --tree or --list for more details

Tree (--tree flag):

  • Hierarchical tree structure showing include relationships
  • Uses box-drawing characters for visual clarity
  • Shows which files include which other files
  • Displays directory paths to help disambiguate files with the same name
    • Files in includes directories: includes/filename.rst
    • Files outside includes: path/from/source/filename.rst

List (--list flag):

  • Flat numbered list of all unique files
  • Files listed in depth-first traversal order
  • Shows absolute paths to all files

Verbose (-v flag):

  • Shows complete dependency tree with all nodes (including duplicates)
  • Each file displays the number of include directives it contains
  • Uses visual indicators to show duplicate includes:
    • (filled bullet) - First occurrence of a file
    • (hollow bullet) - Subsequent occurrences (duplicates)
  • Example output:
• get-started.txt (24 include directives)
  • get-started/node/language-connection-steps.rst (3 include directives)
    • includes/load-sample-data.rst
    • includes/connection-string-note.rst
    • includes/application-output.rst
  • includes/next-steps.rst
  • get-started/python/language-connection-steps.rst (3 include directives)
    ◦ includes/load-sample-data.rst
    ◦ includes/connection-string-note.rst
    ◦ includes/application-output.rst
  ◦ includes/next-steps.rst

Note on File Counting:

The command reports two distinct metrics:

  1. Unique Files: Number of distinct files discovered through include directives. If a file is included multiple times (e.g., file A includes file C, and file B also includes file C), the file is counted only once.

  2. Include Directives: Total number of include directive instances across all files. This counts every occurrence, including duplicates. For example, if load-sample-data.rst is included 12 times across different files, it contributes 12 to this count.

In verbose mode, the tree view shows files in all locations where they appear. Duplicate occurrences are marked with a hollow bullet () to help you identify files that are included multiple times.

Note on Toctree:

This command does not follow .. toctree:: entries. Toctree entries are navigation links to other pages, not content that's transcluded into the page. If you need to find which files reference a target file through toctree entries, use the analyze usage command with the --include-toctree flag.

analyze usage

Find all files that use a target file through RST directives. This performs reverse dependency analysis, showing which files reference the target file through include, literalinclude, io-code-block, or toctree directives.

The command searches all RST files (.rst and .txt extensions) and YAML files (.yaml and .yml extensions) in the source directory tree. YAML files are included because extract and release files contain RST directives within their content blocks.

Use Cases:

By default, this command searches for content inclusion directives (include, literalinclude, io-code-block) that transclude content into pages. Use --include-toctree to also search for toctree entries, which are navigation links rather than content transclusion.

This command helps writers:

  • Understand the impact of changes to a file (what pages will be affected)
  • Find all usages of an include file across the documentation
  • Track where code examples are referenced
  • Plan refactoring by understanding file dependencies

Basic Usage:

# Find what uses an include file (content inclusion only)
./audit-cli analyze usage path/to/includes/fact.rst

# Find what uses a code example
./audit-cli analyze usage path/to/code-examples/example.js

# Include toctree references (navigation links)
./audit-cli analyze usage path/to/file.rst --include-toctree

# Get JSON output for automation
./audit-cli analyze usage path/to/file.rst --format json

# Show detailed information with line numbers
./audit-cli analyze usage path/to/file.rst --verbose

Flags:

  • --format <format> - Output format: text (default) or json
  • -v, --verbose - Show detailed information including line numbers and reference paths
  • -c, --count-only - Only show the count of usages (useful for quick checks and scripting)
  • --paths-only - Only show the file paths, one per line (useful for piping to other commands)
  • --summary - Only show summary statistics (total files and usages by type, without file list)
  • -t, --directive-type <type> - Filter by directive type: include, literalinclude, io-code-block, or toctree
  • --include-toctree - Include toctree entries (navigation links) in addition to content inclusion directives
  • --exclude <pattern> - Exclude paths matching this glob pattern (e.g., */archive/* or */deprecated/*)

Understanding the Counts:

The command shows two metrics:

  • Total Files: Number of unique files that use the target (deduplicated)
  • Total Usages: Total number of directive occurrences (includes duplicates)

When a file includes the target multiple times, it counts as:

  • 1 file (in Total Files)
  • Multiple usages (in Total Usages)

This helps identify both the impact scope (how many files) and duplicate includes (when usages > files).

Supported Directive Types:

By default, the command tracks content inclusion directives:

  1. .. include:: - RST content includes (transcluded)

    .. include:: /includes/intro.rst
  2. .. literalinclude:: - Code file references (transcluded)

    .. literalinclude:: /code-examples/example.py
       :language: python
  3. .. io-code-block:: - Input/output examples with file arguments (transcluded)

    .. io-code-block::
    
       .. input:: /code-examples/query.js
          :language: javascript
    
       .. output:: /code-examples/result.json
          :language: json

With --include-toctree, also tracks:

  1. .. toctree:: - Table of contents entries (navigation links, not transcluded)
    .. toctree::
       :maxdepth: 2
    
       intro
       getting-started

Note: Only file-based references are tracked. Inline content (e.g., .. input:: with :language: but no file path) is not tracked since it doesn't reference external files.

Output Formats:

Text (default):

============================================================
USAGE ANALYSIS
============================================================
Target File: /path/to/includes/intro.rst
Total Files: 3
Total Usages: 4
============================================================

include             : 3 files, 4 usages

  1. [include] duplicate-include-test.rst (2 usages)
  2. [include] include-test.rst
  3. [include] page.rst

Text with --verbose:

============================================================
USAGE ANALYSIS
============================================================
Target File: /path/to/includes/intro.rst
Total Files: 3
Total Usages: 4
============================================================

include             : 3 files, 4 usages

  1. [include] duplicate-include-test.rst (2 usages)
     Line 6: /includes/intro.rst
     Line 13: /includes/intro.rst
  2. [include] include-test.rst
     Line 6: /includes/intro.rst
  3. [include] page.rst
     Line 12: /includes/intro.rst

JSON (--format json):

{
  "target_file": "/path/to/includes/intro.rst",
  "source_dir": "/path/to/source",
  "total_files": 3,
  "total_usages": 4,
  "using_files": [
    {
      "file_path": "/path/to/duplicate-include-test.rst",
      "directive_type": "include",
      "usage_path": "/includes/intro.rst",
      "line_number": 6
    },
    {
      "file_path": "/path/to/duplicate-include-test.rst",
      "directive_type": "include",
      "usage_path": "/includes/intro.rst",
      "line_number": 13
    },
    {
      "file_path": "/path/to/include-test.rst",
      "directive_type": "include",
      "usage_path": "/includes/intro.rst",
      "line_number": 6
    }
  ]
}

Examples:

# Check if an include file is being used
./audit-cli analyze usage ~/docs/source/includes/fact-atlas.rst

# Find all pages that use a specific code example
./audit-cli analyze usage ~/docs/source/code-examples/connect.py

# Get machine-readable output for scripting
./audit-cli analyze usage ~/docs/source/includes/fact.rst --format json | jq '.total_usages'

# See exactly where a file is referenced (with line numbers)
./audit-cli analyze usage ~/docs/source/includes/intro.rst --verbose

# Quick check: just show the count
./audit-cli analyze usage ~/docs/source/includes/fact.rst --count-only
# Output: 5

# Show summary statistics only
./audit-cli analyze usage ~/docs/source/includes/fact.rst --summary
# Output:
# Total Files: 3
# Total Usages: 5
#
# By Type:
#   include             : 3 files, 5 usages

# Get list of files for piping to other commands
./audit-cli analyze usage ~/docs/source/includes/fact.rst --paths-only
# Output:
# page1.rst
# page2.rst
# page3.rst

# Filter to only show include directives (not literalinclude or io-code-block)
./audit-cli analyze usage ~/docs/source/includes/fact.rst --directive-type include

# Filter to only show literalinclude usages
./audit-cli analyze usage ~/docs/source/code-examples/example.py --directive-type literalinclude

# Combine filters: count only literalinclude usages
./audit-cli analyze usage ~/docs/source/code-examples/example.py -t literalinclude -c

# Combine filters: list files that use this as an io-code-block
./audit-cli analyze usage ~/docs/source/code-examples/query.js -t io-code-block --paths-only

# Exclude archived or deprecated files from search
./audit-cli analyze usage ~/docs/source/includes/fact.rst --exclude "*/archive/*"
./audit-cli analyze usage ~/docs/source/includes/fact.rst --exclude "*/deprecated/*"

analyze procedures

Analyze procedures in reStructuredText files to understand procedure complexity, uniqueness, and how they appear across different selections.

This command parses procedures from RST files and provides statistics about:

  • Total number of unique procedures (grouped by heading and content)
  • Total number of procedure appearances across all selections
  • Implementation types (procedure directive vs ordered list)
  • Step counts for each procedure
  • Detection of sub-procedures (ordered lists within steps)
  • All selections where each procedure appears

Use Cases:

This command helps writers:

  • Understand the complexity of procedures in a document
  • Count how many unique procedures exist vs. how many times they appear
  • Identify procedures that use different implementation approaches
  • See which selections each procedure appears in
  • Plan testing coverage for procedure variations
  • Scope work related to procedure updates

Basic Usage:

# Get summary count of unique procedures and total appearances
./audit-cli analyze procedures path/to/file.rst

# Show summary with incremental reporting flags
./audit-cli analyze procedures path/to/file.rst --list-summary

# List all unique procedures with full details
./audit-cli analyze procedures path/to/file.rst --list-all

# Expand include directives inline before analyzing
./audit-cli analyze procedures path/to/file.rst --expand-includes

Flags:

  • --list-summary - Show summary statistics plus a list of procedure headings
  • --list-all - Show full details for each procedure including steps, selections, and implementation
  • --expand-includes - Expand include directives inline instead of preserving them

Output:

Default output (summary only):

File: path/to/file.rst
Total unique procedures: 36
Total procedure appearances: 93

With --list-summary:

File: path/to/file.rst
Total unique procedures: 36
Total procedure appearances: 93

Unique Procedures:
  1. Before You Begin
  2. Install MongoDB Community Edition
  3. Configuration
  4. Run MongoDB Community Edition
  ...

With --list-all:

File: path/to/file.rst
Total unique procedures: 36
Total procedure appearances: 93

================================================================================
Procedure Details
================================================================================

1. Before You Begin
   Line: 45
   Implementation: procedure-directive
   Steps: 5
   Contains sub-procedures: no
   Appears in 2 selections:
     - docker, None, None, None, None, None, without-search-docker
     - docker, None, None, None, None, None, with-search-docker

   Steps:
     1. Pull the MongoDB Docker Image
     2. Run the MongoDB Docker Container
     3. Verify MongoDB is Running
     4. Connect to MongoDB
     5. Stop the MongoDB Docker Container

2. Install MongoDB Community Edition
   Line: 123
   Implementation: ordered-list
   Steps: 4
   Contains sub-procedures: yes
   Appears in 10 selections:
     - linux, None, None, tarball, None, None, with-search
     - linux, None, None, tarball, None, None, without-search
     ...

   Steps:
     1. Download the tarball
     2. Extract the files from the tarball
     3. Ensure the binaries are in a directory listed in your PATH
     4. Run MongoDB Community Edition

Understanding the Counts:

The command reports two key metrics:

  1. Total unique procedures: Number of distinct procedures (grouped by heading and content hash)

    • Procedures with the same heading but different content are counted separately
    • Procedures with identical content are counted once, even if they appear in multiple selections
  2. Total procedure appearances: Total number of times procedures appear across all selections

    • If a procedure appears in 5 different selections, it contributes 5 to this count
    • This represents the total number of procedure instances a user might encounter

Example:

  • A file might have 36 unique procedures that appear a total of 93 times across different selections
  • This means some procedures appear in multiple selections (e.g., a "Before You Begin" procedure that's the same for Docker with and without search)

Supported Procedure Types:

The command recognizes:

  • .. procedure:: directives with .. step:: directives
  • Ordered lists (numbered or lettered) as procedures
  • .. tabs:: directives with :tabid: options for variations
  • .. composable-tutorial:: directives with .. selected-content:: blocks
  • Sub-procedures (ordered lists within steps)
  • YAML steps files (automatically converted to RST format)

Deterministic Parsing:

The parser ensures deterministic results by:

  • Sorting all map iterations to ensure consistent ordering
  • Sorting procedures by line number
  • Computing content hashes in a consistent manner
  • This guarantees the same file will always produce the same counts and groupings

For more details about procedure parsing logic, refer to docs/PROCEDURE_PARSING.md.

analyze composables

Analyze composable definitions in snooty.toml files across the MongoDB documentation monorepo. This command helps identify consolidation opportunities and track composable usage.

Composables are configuration elements in snooty.toml that define content variations for different contexts (e.g., different programming languages, deployment types, or interfaces). They're used in .. composable-tutorial:: directives to create context-specific documentation.

Use Cases:

This command helps writers:

  • Inventory all composables across projects and versions
  • Identify identical composables that could be consolidated across projects
  • Find similar composables with different IDs but overlapping options (potential consolidation candidates)
  • Track where composables are used in RST files
  • Identify unused composables that may be candidates for removal
  • Understand the scope of changes when updating a composable

Basic Usage:

# Analyze all composables in the monorepo
./audit-cli analyze composables /path/to/docs-monorepo

# Use configured monorepo path (from config file or environment variable)
./audit-cli analyze composables

# Analyze composables for a specific project
./audit-cli analyze composables --for-project atlas

# Analyze only current versions
./audit-cli analyze composables --current-only

# Show full option details with titles
./audit-cli analyze composables --verbose

# Find consolidation candidates
./audit-cli analyze composables --find-similar

# Find where composables are used
./audit-cli analyze composables --find-usages

# Include canonical rstspec.toml composables
./audit-cli analyze composables --with-rstspec --find-similar

# Combine flags for comprehensive analysis
./audit-cli analyze composables --for-project atlas --find-similar --find-usages --verbose

Flags:

  • --for-project <project> - Only analyze composables for a specific project
  • --current-only - Only analyze composables in current versions (skips versioned directories)
  • -v, --verbose - Show full option details with titles instead of just IDs
  • --find-similar - Show identical and similar composables for consolidation
  • --find-usages - Show where each composable is used in RST files
  • --with-rstspec - Include composables from the canonical rstspec.toml file in the snooty-parser repository

Output:

Default output (summary and table):

Composables Analysis
====================

Total composables found: 24

Composables by ID:
  - deployment-type: 1
  - interface: 1
  - language: 1
  ...

All Composables
===============

Project              Version         ID                             Title                          Options
------------------------------------------------------------------------------------------------------------------------
atlas                (none)          deployment-type                Deployment Type                atlas, local, self, local-onprem
atlas                (none)          interface                      Interface                      compass, mongosh, atlas-ui, driver
atlas                (none)          language                       Language                       c, csharp, cpp, go, java-async, ...

With --find-similar:

Shows two types of consolidation opportunities:

  1. Identical Composables - Same ID, title, and options across different projects/versions

    Identical Composables (Consolidation Candidates)
    ================================================
    
    ID: connection-mechanism
    Occurrences: 15
    Title: Connection Mechanism
    Default: connection-string
    Options: connection-string, mongocred
    
    Found in:
      - java/current
      - java/v5.1
      - kotlin/current
      ...
    
  2. Similar Composables - Different IDs but similar option sets (60%+ overlap)

    Similar Composables (Review Recommended)
    ========================================
    
    Similar Composables (100.0% similarity)
    Composables: 2
    
    Composables in this group:
    
      1. ID: interface-atlas-only
         Location: atlas
         Title: Interface
         Default: driver
         Options: atlas-ui, driver, mongosh
    
      2. ID: interface-local-only
         Location: atlas
         Title: Interface
         Default: driver
         Options: atlas-ui, driver, mongosh
    

With --find-usages:

Shows where each composable is used in .. composable-tutorial:: directives:

Composable Usages
=================

Composable ID: deployment-type
Total usages: 28

  atlas: 28 usages

Composable ID: interface
Total usages: 35

  atlas: 35 usages

Unused Composables
------------------

  connection-type:
    - atlas

With --verbose and --find-usages:

Shows file paths where each composable is used:

Composable ID: interface-atlas-only
Total usages: 1

  atlas: 1 usages
    - content/atlas/source/atlas-vector-search/tutorials/vector-search-quick-start.txt

Understanding Composables:

Composables are defined in snooty.toml files:

[[composables]]
id = "language"
title = "Language"
default = "nodejs"

[[composables.options]]
id = "python"
title = "Python"

[[composables.options]]
id = "nodejs"
title = "Node.js"

They're used in RST files with .. composable-tutorial:: directives:

.. composable-tutorial::
   :options: language, interface
   :defaults: nodejs, driver

   .. procedure::
      .. step:: Install dependencies
         .. selected-content::
            :selections: language=nodejs
            npm install mongodb
         .. selected-content::
            :selections: language=python
            pip install pymongo

Consolidation Analysis:

The command uses Jaccard similarity (intersection / union) to compare option sets between composables with different IDs. A 60% similarity threshold is used to identify potential consolidation candidates.

For example, if you have:

  • language with 15 options
  • language-atlas-only with 14 options (13 in common with language)
  • language-local-only with 14 options (13 in common with language)

These would be flagged as similar composables (93.3% similarity) and potential consolidation candidates.

Compare Commands

compare file-contents

Compare file contents to identify differences between files. Supports two modes:

  1. Direct comparison - Compare two specific files
  2. Version comparison - Compare the same file across multiple documentation versions

Use Cases:

This command helps writers:

  • Identify content drift across documentation versions
  • Verify that updates have been applied consistently
  • Scope maintenance work when updating shared content
  • Understand how files have diverged over time

Basic Usage:

# Direct comparison of two files
./audit-cli compare file-contents file1.rst file2.rst

# Compare with diff output
./audit-cli compare file-contents file1.rst file2.rst --show-diff

# Version comparison - auto-discovers all versions
./audit-cli compare file-contents \
  /path/to/manual/manual/source/includes/example.rst

# Version comparison - specific versions only
./audit-cli compare file-contents \
  /path/to/manual/manual/source/includes/example.rst \
  --versions manual,upcoming,v8.0,v7.0

# Show which files differ
./audit-cli compare file-contents \
  /path/to/manual/manual/source/includes/example.rst \
  --show-paths

# Show detailed diffs
./audit-cli compare file-contents \
  /path/to/manual/manual/source/includes/example.rst \
  --show-diff

# Verbose output (show processing details and auto-discovered versions)
./audit-cli compare file-contents \
  /path/to/manual/manual/source/includes/example.rst \
  -v

Flags:

  • -V, --versions <list> - Comma-separated list of versions (optional; auto-discovers all versions if not specified)
  • --show-paths - Display file paths grouped by status (matching, differing, not found)
  • -d, --show-diff - Display unified diff output (implies --show-paths)
  • -v, --verbose - Show detailed processing information (including auto-discovered versions and product directory)

Comparison Modes:

1. Direct Comparison (Two Files)

Provide two file paths as arguments:

./audit-cli compare file-contents path/to/file1.rst path/to/file2.rst

This mode:

  • Compares exactly two files
  • Reports whether they are identical or different
  • Can show unified diff with --show-diff

2. Version Comparison (Product Directory)

Provide one file path. The product directory and versions are automatically detected from the file path:

# Auto-discover all versions
./audit-cli compare file-contents \
  /path/to/manual/manual/source/includes/example.rst

# Or specify specific versions
./audit-cli compare file-contents \
  /path/to/manual/manual/source/includes/example.rst \
  --versions manual,upcoming,v8.0

This mode:

  • Automatically detects the product directory from the file path
  • Auto-discovers all available versions (unless --versions is specified)
  • Extracts the relative path from the reference file
  • Resolves the same relative path in each version directory
  • Compares all versions against the reference file
  • Reports matching, differing, and missing files

Version Directory Structure:

The tool expects MongoDB documentation to be organized as:

product-dir/
├── manual/
│   └── source/
│       └── includes/
│           └── example.rst
├── upcoming/
│   └── source/
│       └── includes/
│           └── example.rst
└── v8.0/
    └── source/
        └── includes/
            └── example.rst

Output Formats:

Summary (default - no flags):

  • Total number of versions compared
  • Count of matching, differing, and missing files
  • Hints to use --show-paths or --show-diff for more details

With --show-paths:

  • Summary (as above)
  • List of files that match (with ✓)
  • List of files that differ (with ✗)
  • List of files not found (with -)

With --show-diff:

  • Summary and paths (as above)
  • Unified diff output for each differing file
  • Shows added lines (prefixed with +)
  • Shows removed lines (prefixed with -)
  • Shows context lines around changes

Examples:

# Check if a file is consistent across all versions (auto-discovered)
./audit-cli compare file-contents \
  ~/workspace/docs-mongodb-internal/content/manual/manual/source/includes/fact-atlas-search.rst

# Find differences and see what changed (all versions)
./audit-cli compare file-contents \
  ~/workspace/docs-mongodb-internal/content/manual/manual/source/includes/fact-atlas-search.rst \
  --show-diff

# Compare across specific versions only
./audit-cli compare file-contents \
  ~/workspace/docs-mongodb-internal/content/manual/manual/source/includes/fact-atlas-search.rst \
  --versions manual,upcoming,v8.0,v7.0,v6.0

# Compare two specific versions of a file directly
./audit-cli compare file-contents \
  ~/workspace/docs-mongodb-internal/content/manual/manual/source/includes/example.rst \
  ~/workspace/docs-mongodb-internal/content/manual/v8.0/source/includes/example.rst \
  --show-diff

Exit Codes:

  • 0 - Success (files compared successfully, regardless of whether they match)
  • 1 - Error (invalid arguments, file not found, read error, etc.)

Note on Missing Files:

Files that don't exist in certain versions are reported separately and do not cause errors. This is expected behavior since features may be added or removed across versions.

Count Commands

count tested-examples

Count tested code examples in the MongoDB documentation monorepo.

This command navigates to the content/code-examples/tested directory from the monorepo root and counts all files recursively. The tested directory has a two-level structure: L1 (language directories) and L2 (product directories).

Use Cases:

This command helps writers and maintainers:

  • Track the total number of tested code examples
  • Monitor code example coverage by product
  • Identify products with few or many examples
  • Count only source files (excluding output files)

Basic Usage:

# Get total count of all tested code examples
./audit-cli count tested-examples /path/to/docs-monorepo

# Use configured monorepo path (from config file or environment variable)
./audit-cli count tested-examples

# Count examples for a specific product
./audit-cli count tested-examples --for-product pymongo

# Show counts broken down by product
./audit-cli count tested-examples --count-by-product

# Count only source files (exclude .txt and .sh output files)
./audit-cli count tested-examples --exclude-output

Flags:

  • --for-product <product> - Only count code examples for a specific product
  • --count-by-product - Display counts for each product
  • --exclude-output - Only count source files (exclude .txt and .sh files)

Current Valid Products:

  • mongosh - MongoDB Shell
  • csharp/driver - C#/.NET Driver
  • go/driver - Go Driver
  • go/atlas-sdk - Atlas Go SDK
  • java/driver-sync - Java Sync Driver
  • javascript/driver - Node.js Driver
  • pymongo - PyMongo Driver

Output:

By default, prints a single integer (total count) for use in CI or scripting. With --count-by-product, displays a formatted table with product names and counts.

count pages

Count documentation pages (.txt files) in the MongoDB documentation monorepo.

This command navigates to the content directory and recursively counts all .txt files, which represent documentation pages that resolve to unique URLs. The command automatically excludes certain directories and file types that don't represent actual documentation pages.

Use Cases:

This command helps writers and maintainers:

  • Track the total number of documentation pages across the monorepo
  • Monitor documentation coverage by product/project
  • Identify projects with extensive or minimal documentation
  • Exclude auto-generated or deprecated content from counts
  • Count only current versions of versioned documentation
  • Compare page counts across different documentation versions

Automatic Exclusions:

The command automatically excludes:

  • Files in code-examples directories at the root of content or source (these contain plain text examples, not pages)
  • Files in the following directories at the root of content:
    • 404 - Error pages
    • docs-platform - Documentation for the MongoDB website and meta content
    • meta - MongoDB Meta Documentation - style guide, tools, etc.
    • table-of-contents - Navigation files
  • All non-.txt files (configuration files, YAML, etc.)

Basic Usage:

# Get total count of all documentation pages
./audit-cli count pages /path/to/docs-monorepo

# Use configured monorepo path (from config file or environment variable)
./audit-cli count pages

# Count pages for a specific project
./audit-cli count pages --for-project manual

# Show counts broken down by project
./audit-cli count pages --count-by-project

# Exclude specific directories from counting
./audit-cli count pages --exclude-dirs api-reference,generated

# Count only current versions (for versioned projects)
./audit-cli count pages --current-only

# Show counts by project and version
./audit-cli count pages --by-version

# Combine flags: count pages for a specific project, excluding certain directories
./audit-cli count pages /path/to/docs-monorepo --for-project atlas --exclude-dirs deprecated

Flags:

  • --for-project <project> - Only count pages for a specific project (directory name under content/)
  • --count-by-project - Display counts for each project in a formatted table
  • --exclude-dirs <dirs> - Comma-separated list of directory names to exclude from counting (e.g., deprecated,archive)
  • --current-only - Only count pages in the current version (for versioned projects, counts only current or manual version directories; for non-versioned projects, counts all pages)
  • --by-version - Display counts grouped by project and version (shows version breakdown for versioned projects; non-versioned projects show as "(no version)")

Output:

By default, prints a single integer (total count) for use in CI or scripting. With --count-by-project, displays a formatted table with project names and counts. With --by-version, displays a hierarchical breakdown by project and version.

Versioned Documentation:

Some MongoDB documentation projects contain multiple versions, represented as distinct directories between the project directory and the source directory:

  • Versioned project structure: content/{project}/{version}/source/...
  • Non-versioned project structure: content/{project}/source/...

Version directory names follow these patterns:

  • current or manual - The current/latest version
  • upcoming - Pre-release version
  • v{number} - Specific version (e.g., v8.0, v7.0)

The --current-only flag counts only files in the current version directory (current or manual) for versioned projects, while counting all files for non-versioned projects.

The --by-version flag shows a breakdown of page counts for each version within each project.

Note: The --current-only and --by-version flags are mutually exclusive.

Examples:

# Quick count for CI/CD
TOTAL_PAGES=$(./audit-cli count pages ~/docs-monorepo)
echo "Total documentation pages: $TOTAL_PAGES"

# Detailed breakdown by project
./audit-cli count pages ~/docs-monorepo --count-by-project
# Output:
# Page Counts by Project:
#
#   app-services                       245
#   atlas                              512
#   manual                            1024
#   ...
#
# Total: 2891

# Count only Atlas pages
./audit-cli count pages ~/docs-monorepo --for-project atlas
# Output: 512

# Exclude deprecated content
./audit-cli count pages ~/docs-monorepo --exclude-dirs deprecated,archive --count-by-project

# Count only current versions
./audit-cli count pages ~/docs-monorepo --current-only
# Output: 1245 (only counts current/manual versions)

# Show breakdown by version
./audit-cli count pages ~/docs-monorepo --by-version
# Output:
# Project: drivers
#   manual                           150
#   upcoming                         145
#   v8.0                             140
#   v7.0                             135
#
# Project: atlas
#   (no version)                     200
#
# Total: 770

# Count current version for a specific project
./audit-cli count pages ~/docs-monorepo --for-project drivers --current-only
# Output: 150

Development

Project Structure

audit-cli/
├── main.go                                  # CLI entry point
├── commands/                                # Command implementations
│   ├── extract/                             # Extract parent command
│   │   ├── extract.go                       # Parent command definition
│   │   ├── code-examples/                   # Code examples subcommand
│   │   │   ├── code_examples.go             # Command logic
│   │   │   ├── code_examples_test.go        # Tests
│   │   │   ├── parser.go                    # RST directive parsing
│   │   │   ├── writer.go                    # File writing logic
│   │   │   ├── report.go                    # Report generation
│   │   │   ├── types.go                     # Type definitions
│   │   │   └── language.go                  # Language normalization
│   │   └── procedures/                      # Procedures extraction subcommand
│   │       ├── procedures.go                # Command logic
│   │       ├── procedures_test.go           # Tests
│   │       ├── parser.go                    # Filename generation and filtering
│   │       ├── writer.go                    # RST file writing
│   │       └── types.go                     # Type definitions
│   ├── search/                              # Search parent command
│   │   ├── search.go                        # Parent command definition
│   │   └── find-string/                     # Find string subcommand
│   │       ├── find_string.go               # Command logic
│   │       ├── types.go                     # Type definitions
│   │       └── report.go                    # Report generation
│   ├── analyze/                             # Analyze parent command
│   │   ├── analyze.go                       # Parent command definition
│   │   ├── composables/                     # Composables analysis subcommand
│   │   │   ├── composables.go               # Command logic
│   │   │   ├── composables_test.go          # Tests
│   │   │   ├── analyzer.go                  # Composable analysis logic
│   │   │   ├── parser.go                    # Snooty.toml parsing
│   │   │   ├── rstspec_adapter.go           # Rstspec.toml adapter
│   │   │   ├── rstspec_adapter_test.go      # Rstspec adapter tests
│   │   │   ├── usage_finder.go              # Usage finding logic
│   │   │   ├── output.go                    # Output formatting
│   │   │   └── types.go                     # Type definitions
│   │   ├── includes/                        # Includes analysis subcommand
│   │   │   ├── includes.go                  # Command logic
│   │   │   ├── analyzer.go                  # Include tree building
│   │   │   ├── output.go                    # Output formatting
│   │   │   └── types.go                     # Type definitions
│   │   ├── procedures/                      # Procedures analysis subcommand
│   │   │   ├── procedures.go                # Command logic
│   │   │   ├── procedures_test.go           # Tests
│   │   │   ├── analyzer.go                  # Procedure analysis logic
│   │   │   ├── output.go                    # Output formatting
│   │   │   └── types.go                     # Type definitions
│   │   └── usage/                           # Usage analysis subcommand
│   │       ├── usage.go                     # Command logic
│   │       ├── usage_test.go                # Tests
│   │       ├── analyzer.go                  # Reference finding logic
│   │       ├── output.go                    # Output formatting
│   │       └── types.go                     # Type definitions
│   ├── compare/                             # Compare parent command
│   │   ├── compare.go                       # Parent command definition
│   │   └── file-contents/                   # File contents comparison subcommand
│   │       ├── file_contents.go             # Command logic
│   │       ├── file_contents_test.go        # Tests
│   │       ├── comparer.go                  # Comparison logic
│   │       ├── differ.go                    # Diff generation
│   │       ├── output.go                    # Output formatting
│   │       ├── types.go                     # Type definitions
│   │       └── version_resolver.go          # Version path resolution
│   └── count/                               # Count parent command
│       ├── count.go                         # Parent command definition
│       ├── tested-examples/                 # Tested examples counting subcommand
│       │   ├── tested_examples.go           # Command logic
│       │   ├── tested_examples_test.go      # Tests
│       │   ├── counter.go                   # Counting logic
│       │   ├── output.go                    # Output formatting
│       │   └── types.go                     # Type definitions
│       └── pages/                           # Pages counting subcommand
│           ├── pages.go                     # Command logic
│           ├── pages_test.go                # Tests
│           ├── counter.go                   # Counting logic
│           ├── output.go                    # Output formatting
│           └── types.go                     # Type definitions
├── internal/                                # Internal packages
│   ├── config/                              # Configuration management
│   │   ├── config.go                        # Config loading and path resolution
│   │   └── config_test.go                   # Config tests
│   ├── projectinfo/                         # Project structure and info utilities
│   │   ├── pathresolver.go                  # Core path resolution
│   │   ├── pathresolver_test.go             # Tests
│   │   ├── source_finder.go                 # Source directory detection
│   │   ├── version_resolver.go              # Version path resolution
│   │   └── types.go                         # Type definitions
│   └── rst/                                 # RST parsing utilities
│       ├── parser.go                        # Generic parsing with includes
│       ├── include_resolver.go              # Include directive resolution
│       ├── directive_parser.go              # Directive parsing
│       ├── directive_regex.go               # Directive regex patterns
│       ├── parse_procedures.go              # Procedure parsing (core logic)
│       ├── parse_procedures_test.go         # Procedure parsing tests
│       ├── get_procedure_variations.go      # Variation extraction logic
│       ├── get_procedure_variations_test.go # Variation tests
│       ├── procedure_types.go               # Procedure type definitions
│       ├── rstspec.go                       # Rstspec.toml fetching and parsing
│       ├── rstspec_test.go                  # Rstspec tests
│       └── file_utils.go                    # File utilities
└── testdata/                                # Test fixtures
    ├── input-files/                         # Test RST files
    │   └── source/                          # Source directory (required)
    │       ├── *.rst                        # Test files
    │       ├── includes/                    # Included RST files
    │       └── code-examples/               # Code files for literalinclude
    ├── expected-output/                     # Expected extraction results
    ├── composables-test/                    # Composables analysis test data
    │   └── content/                         # Test monorepo structure
    ├── compare/                             # Compare command test data
    │   ├── product/                         # Version structure tests
    │   │   ├── manual/                      # Manual version
    │   │   ├── upcoming/                    # Upcoming version
    │   │   └── v8.0/                        # v8.0 version
    │   └── *.txt                            # Direct comparison tests
    ├── count-test-monorepo/                 # Count command test data
    │   └── content/code-examples/tested/    # Tested examples structure
    └── search-test-files/                   # Search command test data

Adding New Commands

1. Adding a New Subcommand to an Existing Parent

Example: Adding extract tables subcommand

  1. Create the subcommand directory:

    mkdir -p commands/extract/tables
  2. Create the command file (commands/extract/tables/tables.go):

    package tables
    
    import (
        "github.com/spf13/cobra"
    )
    
    func NewTablesCommand() *cobra.Command {
        cmd := &cobra.Command{
            Use:   "tables [filepath]",
            Short: "Extract tables from RST files",
            Args:  cobra.ExactArgs(1),
            RunE: func(cmd *cobra.Command, args []string) error {
                // Implementation here
                return nil
            },
        }
    
        // Add flags
        cmd.Flags().StringP("output", "o", "./output", "Output directory")
    
        return cmd
    }
  3. Register the subcommand in commands/extract/extract.go:

    import (
        "github.com/grove-platform/audit-cli/commands/extract/tables"
    )
    
    func NewExtractCommand() *cobra.Command {
        cmd := &cobra.Command{...}
    
        cmd.AddCommand(codeexamples.NewCodeExamplesCommand())
        cmd.AddCommand(tables.NewTablesCommand())  // Add this line
    
        return cmd
    }

2. Adding a New Parent Command

Example: Adding analyze parent command

  1. Create the parent directory:

    mkdir -p commands/analyze
  2. Create the parent command (commands/analyze/analyze.go):

    package analyze
    
    import (
        "github.com/spf13/cobra"
    )
    
    func NewAnalyzeCommand() *cobra.Command {
        cmd := &cobra.Command{
            Use:   "analyze",
            Short: "Analyze extracted content",
        }
    
        // Add subcommands here
    
        return cmd
    }
  3. Register in main.go:

    import (
        "github.com/grove-platform/audit-cli/commands/analyze"
    )
    
    func main() {
        rootCmd.AddCommand(extract.NewExtractCommand())
        rootCmd.AddCommand(search.NewSearchCommand())
        rootCmd.AddCommand(analyze.NewAnalyzeCommand())  // Add this line
    }

Testing

Running Tests

# Run all tests
cd audit-cli
go test ./...

# Run tests for a specific package
go test ./commands/extract/code-examples -v

# Run a specific test
go test ./commands/extract/code-examples -run TestRecursiveDirectoryScanning -v

# Run tests with coverage
go test ./... -cover

Test Structure

Tests use a table-driven approach with test fixtures in the testdata/ directory:

  • Input files: testdata/input-files/source/ - RST files and referenced code
  • Expected output: testdata/expected-output/ - Expected extracted files
  • Test pattern: Compare actual extraction output against expected files

Note: The testdata directory name is special in Go - it's automatically ignored during builds, which is important since it contains non-Go files (.cpp, .rst, etc.).

Adding New Tests

  1. Create test input files in testdata/input-files/source/:

    # Create a new test RST file
    cat > testdata/input-files/source/my-test.rst << 'EOF'
    .. code-block:: javascript
    
       console.log("Hello, World!");
    EOF
  2. Generate expected output:

    ./audit-cli extract code-examples testdata/input-files/source/my-test.rst \
      -o testdata/expected-output
  3. Verify the output is correct before committing

  4. Add test case in the appropriate *_test.go file:

    func TestMyNewFeature(t *testing.T) {
        testDataDir := filepath.Join("..", "..", "..", "testdata")
        inputFile := filepath.Join(testDataDir, "input-files", "source", "my-test.rst")
        expectedDir := filepath.Join(testDataDir, "expected-output")
    
        tempDir, err := os.MkdirTemp("", "test-*")
        if err != nil {
            t.Fatalf("Failed to create temp directory: %v", err)
        }
        defer os.RemoveAll(tempDir)
    
        report, err := RunExtract(inputFile, tempDir, false, false, false, false)
        if err != nil {
            t.Fatalf("RunExtract failed: %v", err)
        }
    
        // Add assertions here
    }

Test Conventions

  • Relative paths: Tests use filepath.Join("..", "..", "..", "testdata") to reference test data (three levels up from commands/extract/code-examples/)
  • Temporary directories: Use os.MkdirTemp() for test output, clean up with defer os.RemoveAll()
  • Exact content matching: Tests compare byte-for-byte content
  • No trailing newlines: Expected output files should not have trailing blank lines

Updating Expected Output

If you've changed the parsing logic and need to regenerate expected output:

cd audit-cli

# Update all expected outputs
./audit-cli extract code-examples testdata/input-files/source/literalinclude-test.rst \
  -o testdata/expected-output

./audit-cli extract code-examples testdata/input-files/source/code-block-test.rst \
  -o testdata/expected-output

./audit-cli extract code-examples testdata/input-files/source/nested-code-block-test.rst \
  -o testdata/expected-output

./audit-cli extract code-examples testdata/input-files/source/io-code-block-test.rst \
  -o testdata/expected-output

./audit-cli extract code-examples testdata/input-files/source/include-test.rst \
  -o testdata/expected-output -f

Important: Always verify the new output is correct before committing!

Code Patterns

1. Command Structure Pattern

All commands follow this pattern:

package mycommand

import "github.com/spf13/cobra"

func NewMyCommand() *cobra.Command {
    var flagVar string

    cmd := &cobra.Command{
        Use:   "my-command [args]",
        Short: "Brief description",
        Long:  "Detailed description",
        Args:  cobra.ExactArgs(1),  // Or MinimumNArgs, etc.
        RunE: func(cmd *cobra.Command, args []string) error {
            // Get flag values
            flagValue, _ := cmd.Flags().GetString("flag-name")

            // Call the main logic function
            return RunMyCommand(args[0], flagValue)
        },
    }

    // Define flags
    cmd.Flags().StringVarP(&flagVar, "flag-name", "f", "default", "Description")

    return cmd
}

// Separate logic function for testability
func RunMyCommand(arg string, flagValue string) error {
    // Implementation here
    return nil
}

Why this pattern?

  • Separates command definition from logic
  • Makes logic testable without Cobra
  • Consistent across all commands

2. Error Handling Pattern

Use descriptive error wrapping:

import "fmt"

// Wrap errors with context
file, err := os.Open(filePath)
if err != nil {
    return fmt.Errorf("failed to open file %s: %w", filePath, err)
}

// Check for specific conditions
if !fileInfo.IsDir() {
    return fmt.Errorf("path %s is not a directory", path)
}

3. File Processing Pattern

Use the scanner pattern for line-by-line processing:

import (
    "bufio"
    "os"
)

func processFile(filePath string) error {
    file, err := os.Open(filePath)
    if err != nil {
        return fmt.Errorf("failed to open file: %w", err)
    }
    defer file.Close()

    scanner := bufio.NewScanner(file)
    lineNum := 0

    for scanner.Scan() {
        lineNum++
        line := scanner.Text()

        // Process line
    }

    if err := scanner.Err(); err != nil {
        return fmt.Errorf("error reading file: %w", err)
    }

    return nil
}

4. Directory Traversal Pattern

Use filepath.Walk for recursive traversal:

import (
    "os"
    "path/filepath"
)

func traverseDirectory(rootPath string, recursive bool) ([]string, error) {
    var files []string

    err := filepath.Walk(rootPath, func(path string, info os.FileInfo, err error) error {
        if err != nil {
            return err
        }

        // Skip subdirectories if not recursive
        if !recursive && info.IsDir() && path != rootPath {
            return filepath.SkipDir
        }

        // Collect files
        if !info.IsDir() {
            files = append(files, path)
        }

        return nil
    })

    return files, err
}

Path Resolution for File-Based Commands:

Commands that accept file paths should use config.ResolveFilePath() to support flexible path resolution:

import "github.com/grove-platform/audit-cli/internal/config"

RunE: func(cmd *cobra.Command, args []string) error {
    // Resolve file path (supports absolute, monorepo-relative, or cwd-relative)
    filePath, err := config.ResolveFilePath(args[0])
    if err != nil {
        return err
    }

    // Use the resolved absolute path
    return processFile(filePath)
}

This allows users to specify paths as:

  • Absolute: /full/path/to/file.rst
  • Monorepo-relative: manual/manual/source/file.rst (if monorepo configured)
  • Current directory-relative: ./file.rst

5. Testing Pattern

Use table-driven tests where appropriate:

func TestLanguageNormalization(t *testing.T) {
    tests := []struct {
        name     string
        input    string
        expected string
    }{
        {"TypeScript", "ts", "typescript"},
        {"C++", "c++", "cpp"},
        {"Golang", "golang", "go"},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            result := NormalizeLanguage(tt.input)
            if result != tt.expected {
                t.Errorf("NormalizeLanguage(%q) = %q, want %q",
                    tt.input, result, tt.expected)
            }
        })
    }
}

6. Verbose Output Pattern

Use a consistent pattern for verbose logging:

func processWithVerbose(filePath string, verbose bool) error {
    if verbose {
        fmt.Printf("Processing: %s\n", filePath)
    }

    // Do work

    if verbose {
        fmt.Printf("Completed: %s\n", filePath)
    }

    return nil
}

Supported RST Directives

Code Example Extraction

The tool extracts code examples from the following reStructuredText directives:

1. literalinclude

Extracts code from external files with support for partial extraction and dedenting.

Syntax:

.. literalinclude:: /path/to/file.py
   :language: python
   :start-after: start-tag
   :end-before: end-tag
   :dedent:

Supported Options:

  • :language: - Specifies the programming language (normalized: tstypescript, c++cpp, golanggo)
  • :start-after: - Extract content after this tag (skips the entire line containing the tag)
  • :end-before: - Extract content before this tag (cuts before the entire line containing the tag)
  • :dedent: - Remove common leading whitespace from the extracted content

Example:

Given code-examples/example.py:

def main():
    # start-example
    result = calculate(42)
    print(result)
    # end-example

And RST:

.. literalinclude:: /code-examples/example.py
   :language: python
   :start-after: start-example
   :end-before: end-example
   :dedent:

Extracts:

result = calculate(42)
print(result)

2. code-block

Inline code blocks with automatic dedenting based on the first line's indentation.

Syntax:

.. code-block:: javascript
   :copyable: false
   :emphasize-lines: 2,3

   const greeting = "Hello, World!";
   console.log(greeting);

Supported Options:

  • Language argument - .. code-block:: javascript (optional, defaults to txt)
  • :language: - Alternative way to specify language
  • :copyable: - Parsed but not used for extraction
  • :emphasize-lines: - Parsed but not used for extraction

Automatic Dedenting:

The content is automatically dedented based on the indentation of the first content line. For example:

.. note::

   .. code-block:: python

      def hello():
          print("Hello")

The code has 6 spaces of indentation (3 from note, 3 from code-block). The tool automatically removes these 6 spaces, resulting in:

def hello():
    print("Hello")

3. io-code-block

Input/output code blocks for interactive examples with nested sub-directives.

Syntax:

.. io-code-block::
   :copyable: true

   .. input::
      :language: javascript

      db.restaurants.aggregate([
         { $match: { category: "cafe" } }
      ])

   .. output::
      :language: json

      [
         { _id: 1, category: 'café', status: 'Open' }
      ]

Supported Options:

  • :copyable: - Parsed but not used for extraction
  • Nested .. input:: sub-directive (required)
    • Can have filepath argument: .. input:: /path/to/file.js
    • Or inline content with :language: option
  • Nested .. output:: sub-directive (optional)
    • Can have filepath argument: .. output:: /path/to/output.txt
    • Or inline content with :language: option

File-based Content:

.. io-code-block::

   .. input:: /code-examples/query.js
      :language: javascript

   .. output:: /code-examples/result.json
      :language: json

Output Files:

Generates two files:

  • {source}.io-code-block.{index}.input.{ext} - The input code
  • {source}.io-code-block.{index}.output.{ext} - The output (if present)

Example: my-doc.io-code-block.1.input.js and my-doc.io-code-block.1.output.json

Include handling

4. include

Follows include directives to process entire documentation trees (when -f flag is used).

Syntax:

.. include:: /includes/intro.rst

Special MongoDB Conventions:

The tool handles several MongoDB-specific include patterns:

Steps Files

Converts directory-based paths to filename-based paths:

  • Input: /includes/steps/run-mongodb-on-linux.rst
  • Resolves to: /includes/steps-run-mongodb-on-linux.yaml
Extracts and Release Files

Resolves ref-based includes by searching YAML files:

  • Input: /includes/extracts/install-mongodb.rst
  • Searches: /includes/extracts-*.yaml for ref: install-mongodb
  • Resolves to: The YAML file containing that ref
Template Variables

Resolves template variables from YAML replacement sections:

replacement:
  release_specification_default: "/includes/release/install-windows-default.rst"
  • Input: {{release_specification_default}}
  • Resolves to: /includes/release/install-windows-default.rst

Source Directory Resolution:

The tool walks up the directory tree to find a directory named "source" or containing a "source" subdirectory. This is used as the base for resolving relative include paths.

Internal Packages

internal/config

Provides configuration management for the CLI tool:

  • Config file loading - Loads .audit-cli.yaml from current or home directory
  • Environment variable support - Reads AUDIT_CLI_MONOREPO_PATH environment variable
  • Monorepo path resolution - Resolves monorepo path with priority: CLI arg > env var > config file
  • File path resolution - Resolves file paths as absolute, monorepo-relative, or cwd-relative

Key Functions:

  • LoadConfig() - Loads configuration from file or environment
  • GetMonorepoPath(cmdLineArg string) - Resolves monorepo path with priority order
  • ResolveFilePath(pathArg string) - Resolves file paths with flexible resolution

Priority Order for Monorepo Path:

  1. Command-line argument (highest priority)
  2. Environment variable AUDIT_CLI_MONOREPO_PATH
  3. Config file .audit-cli.yaml (lowest priority)

Priority Order for File Paths:

  1. Absolute path (used as-is)
  2. Relative to monorepo root (if monorepo configured and file exists there)
  3. Relative to current directory (fallback)

See the code in internal/config/ for implementation details.

internal/projectinfo

Provides centralized utilities for understanding MongoDB documentation project structure:

  • Source directory detection - Finds the documentation root by walking up the directory tree
  • Project info detection - Identifies product directory, version, and whether a project is versioned
  • Version discovery - Automatically discovers all available versions in a product directory
  • Version path resolution - Resolves file paths across multiple documentation versions
  • Relative path resolution - Resolves paths relative to the source directory

Key Functions:

  • FindSourceDirectory(filePath string) - Finds the source directory for a given file
  • DetectProjectInfo(filePath string) - Detects project structure information
  • DiscoverAllVersions(productDir string) - Discovers all available versions in a product
  • ResolveVersionPaths(referenceFile, productDir string, versions []string) - Resolves paths across versions
  • ResolveRelativeToSource(sourceDir, relativePath string) - Resolves relative paths

See the code in internal/projectinfo/ for implementation details.

internal/rst

Provides reusable utilities for parsing and processing RST files:

  • Include resolution - Handles all include directive patterns
  • Directory traversal - Recursive file scanning
  • Directive parsing - Extracts structured data from RST directives
  • Procedure parsing - Parses procedure directives, ordered lists, and variations
  • Procedure variations - Extracts variations from composable tutorials and tabs
  • Rstspec.toml fetching - Fetches and parses canonical composable definitions from snooty-parser
  • Template variable resolution - Resolves YAML-based template variables
  • Source directory detection - Finds the documentation root

Key Functions:

  • ParseFileWithIncludes(filePath string) - Parses RST file with include expansion
  • ParseDirectives(content string) - Extracts directive information from RST content
  • ParseProcedures(filePath string, expandIncludes bool) - Parses procedures from RST file
  • GetProcedureVariations(filePath string) - Extracts procedure variations
  • FetchRstspec() - Fetches and parses canonical rstspec.toml from snooty-parser repository

Rstspec.toml Support: The FetchRstspec() function retrieves the canonical composable definitions from the snooty-parser repository. This provides:

  • Standard composable IDs (e.g., interface, language, deployment-type)
  • Composable titles and descriptions
  • Default values for each composable
  • Available options for each composable

This is used by the analyze composables command to show canonical definitions alongside project-specific ones.

See the code in internal/rst/ for implementation details.

Language Normalization

The tool normalizes language identifiers to standard file extensions:

Input Normalized Extension
bash bash .sh
c c .c
c++ cpp .cpp
c# csharp .cs
console console .sh
cpp cpp .cpp
cs csharp .cs
csharp csharp .cs
go go .go
golang go .go
java java .java
javascript javascript .js
js javascript .js
kotlin kotlin .kt
kt kotlin .kt
php php .php
powershell powershell .ps1
ps1 powershell .ps1
ps5 ps5 .ps1
py python .py
python python .py
rb ruby .rb
rs rust .rs
ruby ruby .rb
rust rust .rs
scala scala .scala
sh shell .sh
shell shell .sh
swift swift .swift
text text .txt
ts typescript .ts
txt text .txt
typescript typescript .ts
(empty string) undefined .txt
none undefined .txt
(unknown) (unchanged) .txt

Notes:

  • Language identifiers are case-insensitive
  • Unknown languages are returned unchanged by NormalizeLanguage() but map to .txt extension
  • The normalization handles common aliases (e.g., tstypescript, golanggo, c++cpp)

Contributing

When contributing to this project:

  1. Follow the established patterns - Use the command structure, error handling, and testing patterns described above
  2. Write tests - All new functionality should have corresponding tests
  3. Update documentation - Keep this README up to date with new features
  4. Run tests before committing - Ensure go test ./... passes
  5. Use meaningful commit messages - Describe what changed and why

About

A CLI tool to help documentation teams perform audit-related tasks across the docs monorepo.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages