DocDoc2

DocDoc2 is a document chat application that allows you to upload, process, and chat with your documents using local AI models. It leverages Ollama for AI functionality and works with various document formats including PDF.

Features

Upload and organize documents (PDF, DOCX, TXT, etc.)
Process documents with local AI models
Extract text from PDF files with proper page structure
Chat with your documents using a conversational interface
Vector search for finding relevant content in your documents
Fully local - no data sent to external servers

Setup and Installation

Prerequisites

Node.js 18+ and npm
Ollama installed and running

Installation

Clone the repository:

git clone https://github.com/yourusername/DocDoc2.git
cd DocDoc2

Install dependencies:
```
npm install
```

Set up Ollama models:

# Make the script executable if needed
chmod +x setup-ollama.sh

# Run the setup script
./setup-ollama.sh

Start the development server:
```
npm run dev
```
Open your browser to http://localhost:5000

Using DocDoc2

Uploading Documents

Click the "Upload" button in the sidebar or top bar
Drag and drop files or click to browse
Select the documents you want to upload (PDF, DOCX, TXT, etc.)
Configure processing options (optional)
Click "Upload" to start the process

The maximum file size limit is 500MB per document.

Document Processing

After uploading, documents will be automatically processed:

Text extraction (including proper PDF page extraction)
Text chunking for better handling of large documents
Vector embedding generation for semantic search
Status will change from "Processing" to "Ready" when complete

Chatting with Documents

Click the chat button in the bottom right corner
Ask questions about your documents
The system will search for relevant information and provide answers
Sources will be displayed under each answer, showing the document name and page number

PDF Support

DocDoc2 includes special support for PDF documents:

Proper text extraction with page numbers
Accurate page count detection
Page-based source citations
Chunk-to-page mapping for better context

To test PDF functionality:

# Generate a test PDF document
./test-pdf.sh

# Upload the test-document.pdf file
# Ask questions about its content

PDF Troubleshooting

If you encounter the error Error: ENOENT: no such file or directory, open './test/data/05-versions-space.pdf' when starting the application, run the following command to fix the pdf-parse library:

# Fix the pdf-parse module initialization issue
node fix-pdf-parse.cjs

This issue occurs because the pdf-parse library expects a test file to exist during initialization. The fix script creates the necessary directory and file structure.

The application automatically applies this fix during startup (via the predev script), but you can run it manually if needed.

Troubleshooting

Chat Not Working

If you can upload documents but chat isn't working:

Make sure Ollama is running:
```
ollama serve
```
Check that you have the required models:
```
./setup-ollama.sh
```
Restart the application:
```
npm run dev
```

PDF Processing Issues

If PDF documents aren't processing correctly:

Make sure pdf-parse is installed:
```
npm install pdf-parse
```
Check the document size (limit is 20MB)
Try with a simple test PDF:
```
./test-pdf.sh
```

Technology Stack

Frontend: React, TypeScript, Tailwind CSS
Backend: Node.js, Express
AI: Ollama (llama3 for chat, nomic-embed-text for embeddings)
PDF Processing: pdf-parse
Storage: SQLite (local file-based)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.snapshots		.snapshots
.yarn		.yarn
attached_assets		attached_assets
client		client
entities		entities
server		server
shared		shared
storage		storage
test/data		test/data
.gitignore		.gitignore
.pnp.cjs		.pnp.cjs
.pnp.loader.mjs		.pnp.loader.mjs
.replit		.replit
.yarnrc.yml		.yarnrc.yml
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
README.md		README.md
components.json		components.json
crypto-js.d.ts		crypto-js.d.ts
drizzle.config.ts		drizzle.config.ts
entities.d.ts		entities.d.ts
fix-pdf-parse.cjs		fix-pdf-parse.cjs
fix-pdf-processing.sh		fix-pdf-processing.sh
generated-icon.png		generated-icon.png
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
replit.md		replit.md
setup-ollama.sh		setup-ollama.sh
tailwind.config.ts		tailwind.config.ts
test-document-enhanced.pdf		test-document-enhanced.pdf
test-document.pdf		test-document.pdf
test-pdf.sh		test-pdf.sh
tsconfig.json		tsconfig.json
types.d.ts		types.d.ts
vite.config.ts		vite.config.ts
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DocDoc2

Features

Setup and Installation

Prerequisites

Installation

Using DocDoc2

Uploading Documents

Document Processing

Chatting with Documents

PDF Support

PDF Troubleshooting

Troubleshooting

Chat Not Working

PDF Processing Issues

Technology Stack

License

About

Uh oh!

Releases

Packages

Languages

jrgochan/DocDoc2

Folders and files

Latest commit

History

Repository files navigation

DocDoc2

Features

Setup and Installation

Prerequisites

Installation

Using DocDoc2

Uploading Documents

Document Processing

Chatting with Documents

PDF Support

PDF Troubleshooting

Troubleshooting

Chat Not Working

PDF Processing Issues

Technology Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages