Skip to content

jrgochan/DocDoc2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DocDoc2

DocDoc2 is a document chat application that allows you to upload, process, and chat with your documents using local AI models. It leverages Ollama for AI functionality and works with various document formats including PDF.

Features

  • Upload and organize documents (PDF, DOCX, TXT, etc.)
  • Process documents with local AI models
  • Extract text from PDF files with proper page structure
  • Chat with your documents using a conversational interface
  • Vector search for finding relevant content in your documents
  • Fully local - no data sent to external servers

Setup and Installation

Prerequisites

  • Node.js 18+ and npm
  • Ollama installed and running

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/DocDoc2.git
    cd DocDoc2
  2. Install dependencies:

    npm install
  3. Set up Ollama models:

    # Make the script executable if needed
    chmod +x setup-ollama.sh
    
    # Run the setup script
    ./setup-ollama.sh
  4. Start the development server:

    npm run dev
  5. Open your browser to http://localhost:5000

Using DocDoc2

Uploading Documents

  1. Click the "Upload" button in the sidebar or top bar
  2. Drag and drop files or click to browse
  3. Select the documents you want to upload (PDF, DOCX, TXT, etc.)
  4. Configure processing options (optional)
  5. Click "Upload" to start the process

The maximum file size limit is 500MB per document.

Document Processing

After uploading, documents will be automatically processed:

  • Text extraction (including proper PDF page extraction)
  • Text chunking for better handling of large documents
  • Vector embedding generation for semantic search
  • Status will change from "Processing" to "Ready" when complete

Chatting with Documents

  1. Click the chat button in the bottom right corner
  2. Ask questions about your documents
  3. The system will search for relevant information and provide answers
  4. Sources will be displayed under each answer, showing the document name and page number

PDF Support

DocDoc2 includes special support for PDF documents:

  • Proper text extraction with page numbers
  • Accurate page count detection
  • Page-based source citations
  • Chunk-to-page mapping for better context

To test PDF functionality:

# Generate a test PDF document
./test-pdf.sh

# Upload the test-document.pdf file
# Ask questions about its content

PDF Troubleshooting

If you encounter the error Error: ENOENT: no such file or directory, open './test/data/05-versions-space.pdf' when starting the application, run the following command to fix the pdf-parse library:

# Fix the pdf-parse module initialization issue
node fix-pdf-parse.cjs

This issue occurs because the pdf-parse library expects a test file to exist during initialization. The fix script creates the necessary directory and file structure.

The application automatically applies this fix during startup (via the predev script), but you can run it manually if needed.

Troubleshooting

Chat Not Working

If you can upload documents but chat isn't working:

  1. Make sure Ollama is running:

    ollama serve
  2. Check that you have the required models:

    ./setup-ollama.sh
  3. Restart the application:

    npm run dev

PDF Processing Issues

If PDF documents aren't processing correctly:

  1. Make sure pdf-parse is installed:

    npm install pdf-parse
  2. Check the document size (limit is 20MB)

  3. Try with a simple test PDF:

    ./test-pdf.sh

Technology Stack

  • Frontend: React, TypeScript, Tailwind CSS
  • Backend: Node.js, Express
  • AI: Ollama (llama3 for chat, nomic-embed-text for embeddings)
  • PDF Processing: pdf-parse
  • Storage: SQLite (local file-based)

License

MIT

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published