Skip to content

Conversation

@cardmagic
Copy link
Owner

Summary

  • Force UTF-8 encoding on HTTP response body in CLI to fix encoding errors in non-UTF8 locales
  • Add prominent CLI documentation to README with remote model examples

Problem

When downloading remote models in environments with C/POSIX locales (like Homebrew's test sandbox), the HTTP response body defaults to ASCII-8BIT encoding, causing:

Error: "\xC2" from ASCII-8BIT to UTF-8

Solution

Call force_encoding(Encoding::UTF_8) on the response body in fetch_github_file.

README Updates

Added a new "Command Line" section after Installation showing:

  • Pre-trained model usage with -r flag
  • Training custom models from files
  • Listing available models

Also adds prominent CLI documentation to README with examples of
using pre-trained remote models.

Bump version to 2.3.2
@cardmagic cardmagic merged commit 14c5d85 into master Jan 1, 2026
6 checks passed
@cardmagic cardmagic deleted the fix-http-response-encoding-and-cli-docs branch January 1, 2026 18:39
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 1, 2026

Greptile Summary

This PR adds a targeted encoding fix and improves CLI documentation. The changes build on PR #130's global encoding defaults by adding an explicit force_encoding(UTF-8) call on HTTP response bodies in lib/classifier/cli.rb:877. This defense-in-depth approach prevents encoding errors when downloading remote models in C/POSIX locale environments.

The README additions provide clear CLI examples with pre-trained models (-r flag) and custom training workflows, improving discoverability for command-line users.

Key changes:

  • Added force_encoding(UTF-8) to HTTP response body in fetch_github_file method
  • Added "Command Line" section to README with practical examples
  • Added Homebrew installation instructions
  • Version bump to 2.3.2

Confidence Score: 5/5

  • safe to merge - targeted bug fix with improved documentation
  • single-line encoding fix addresses a specific bug, README additions are documentation-only with no runtime impact, version bump follows semantic versioning, and one minor style suggestion doesn't affect correctness
  • No files require special attention

Important Files Changed

Filename Overview
lib/classifier/cli.rb added UTF-8 encoding enforcement on HTTP response body to prevent ASCII-8BIT encoding errors
README.md added CLI documentation section with pre-trained model examples and Homebrew installation instructions

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI
    participant GitHub
    
    User->>CLI: classify with -r flag
    CLI->>CLI: fetch_github_file(registry, file_path)
    CLI->>GitHub: GET /registry/main/file_path
    alt Success
        GitHub-->>CLI: HTTP 200 + body (ASCII-8BIT)
        CLI->>CLI: force_encoding(UTF-8)
        CLI-->>User: classification result
    else Main branch fails
        GitHub-->>CLI: HTTP 404/error
        CLI->>GitHub: GET /registry/master/file_path
        alt Master success
            GitHub-->>CLI: HTTP 200 + body (ASCII-8BIT)
            CLI->>CLI: force_encoding(UTF-8)
            CLI-->>User: classification result
        else Master fails
            GitHub-->>CLI: HTTP 404/error
            CLI-->>User: error message
        end
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. lib/classifier/cli.rb, line 864-869 (link)

    style: nested conditionals checking same condition, refactor with early return

    Or inline:

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants