DocDoc2 is a document chat application that allows you to upload, process, and chat with your documents using local AI models. It leverages Ollama for AI functionality and works with various document formats including PDF.
- Upload and organize documents (PDF, DOCX, TXT, etc.)
- Process documents with local AI models
- Extract text from PDF files with proper page structure
- Chat with your documents using a conversational interface
- Vector search for finding relevant content in your documents
- Fully local - no data sent to external servers
- Node.js 18+ and npm
- Ollama installed and running
-
Clone the repository:
git clone https://github.com/yourusername/DocDoc2.git cd DocDoc2 -
Install dependencies:
npm install
-
Set up Ollama models:
# Make the script executable if needed chmod +x setup-ollama.sh # Run the setup script ./setup-ollama.sh
-
Start the development server:
npm run dev
-
Open your browser to http://localhost:5000
- Click the "Upload" button in the sidebar or top bar
- Drag and drop files or click to browse
- Select the documents you want to upload (PDF, DOCX, TXT, etc.)
- Configure processing options (optional)
- Click "Upload" to start the process
The maximum file size limit is 500MB per document.
After uploading, documents will be automatically processed:
- Text extraction (including proper PDF page extraction)
- Text chunking for better handling of large documents
- Vector embedding generation for semantic search
- Status will change from "Processing" to "Ready" when complete
- Click the chat button in the bottom right corner
- Ask questions about your documents
- The system will search for relevant information and provide answers
- Sources will be displayed under each answer, showing the document name and page number
DocDoc2 includes special support for PDF documents:
- Proper text extraction with page numbers
- Accurate page count detection
- Page-based source citations
- Chunk-to-page mapping for better context
To test PDF functionality:
# Generate a test PDF document
./test-pdf.sh
# Upload the test-document.pdf file
# Ask questions about its contentIf you encounter the error Error: ENOENT: no such file or directory, open './test/data/05-versions-space.pdf' when starting the application, run the following command to fix the pdf-parse library:
# Fix the pdf-parse module initialization issue
node fix-pdf-parse.cjsThis issue occurs because the pdf-parse library expects a test file to exist during initialization. The fix script creates the necessary directory and file structure.
The application automatically applies this fix during startup (via the predev script), but you can run it manually if needed.
If you can upload documents but chat isn't working:
-
Make sure Ollama is running:
ollama serve
-
Check that you have the required models:
./setup-ollama.sh
-
Restart the application:
npm run dev
If PDF documents aren't processing correctly:
-
Make sure pdf-parse is installed:
npm install pdf-parse
-
Check the document size (limit is 20MB)
-
Try with a simple test PDF:
./test-pdf.sh
- Frontend: React, TypeScript, Tailwind CSS
- Backend: Node.js, Express
- AI: Ollama (llama3 for chat, nomic-embed-text for embeddings)
- PDF Processing: pdf-parse
- Storage: SQLite (local file-based)
MIT