generated from cording12/next-fast-turbo
-
-
Notifications
You must be signed in to change notification settings - Fork 846
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Feature Description
Users can now add any website URL to their knowledge base. The system will automatically crawl the website using FireCrawl API, extract the content in clean markdown format, and make it searchable.
Target Deployment
- [] SurfSense Cloud (hosted version)
- Self-hosted version
- [] Both
Problem Statement
Users couldn't easily add content from websites to their knowledge base. They would have to manually copy and paste content, or save pages as PDFs first. This made it hard to quickly build up their knowledge base with web resources like documentation, articles, blog posts, or research papers.
Proposed Solution
Add a URL crawling feature that lets users:
- Paste any website URL into SurfSense
- System automatically crawls the URL using FireCrawl API
- Extracts clean content in markdown format (no ads, navigation, or clutter)
- Saves page metadata (title, description, language)
- Indexes the content so users can search it later
- Provides a fallback crawler for users without FireCrawl API access
Benefits
- ✅ Quick way to add web content to knowledge base
- ✅ Gets clean, readable content without ads or clutter
- ✅ Works with modern websites that use JavaScript
- ✅ Extracts useful metadata automatically (titles, descriptions)
- ✅ No manual copying and pasting needed
- ✅ Works for documentation, articles, blogs, research papers, etc.
Use Case Examples
- Developer saving documentation: Developer finds helpful API documentation at
https://docs.example.com/api/quickstart, pastes URL into SurfSense, content is crawled and added to their knowledge base for future reference - Student researching a topic: Student finds a useful article, adds the URL, can now search and reference it later alongside their other materials
- Team building knowledge base: Team members can quickly add relevant blog posts, guides, and resources by just sharing URLs
Implementation Considerations
- This may require frontend changes - Yes (add URL input field)
- This may require backend changes - Yes (crawling logic, processing pipeline)
- This may require database changes - Yes (store crawled documents)
- This may affect existing features - No
Requirements:
- FireCrawl API key (optional, but recommended for best results)
firecrawl-pylibrary version 4.5.0+- Background task processing (Celery)
Checklist
- I have searched existing issues/feature requests to ensure this is not a duplicate
- I have provided a clear description of the feature
- I have added appropriate labels (enhancement, deployment type)
apolmig
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request