[FEATURE] Add web crawling capability to allow users to add websites to their knowledge base

## Feature Description
Users can now add any website URL to their knowledge base. The system will automatically crawl the website using FireCrawl API, extract the content in clean markdown format, and make it searchable.

## Target Deployment
- [] SurfSense Cloud (hosted version)
- [x] Self-hosted version
- [] Both

## Problem Statement
Users couldn't easily add content from websites to their knowledge base. They would have to manually copy and paste content, or save pages as PDFs first. This made it hard to quickly build up their knowledge base with web resources like documentation, articles, blog posts, or research papers.

## Proposed Solution
Add a URL crawling feature that lets users:
1. Paste any website URL into SurfSense
2. System automatically crawls the URL using FireCrawl API
3. Extracts clean content in markdown format (no ads, navigation, or clutter)
4. Saves page metadata (title, description, language)
5. Indexes the content so users can search it later
6. Provides a fallback crawler for users without FireCrawl API access

## Benefits
- ✅ Quick way to add web content to knowledge base
- ✅ Gets clean, readable content without ads or clutter
- ✅ Works with modern websites that use JavaScript
- ✅ Extracts useful metadata automatically (titles, descriptions)
- ✅ No manual copying and pasting needed
- ✅ Works for documentation, articles, blogs, research papers, etc.

## Use Case Examples
1. **Developer saving documentation**: Developer finds helpful API documentation at `https://docs.example.com/api/quickstart`, pastes URL into SurfSense, content is crawled and added to their knowledge base for future reference
2. **Student researching a topic**: Student finds a useful article, adds the URL, can now search and reference it later alongside their other materials
3. **Team building knowledge base**: Team members can quickly add relevant blog posts, guides, and resources by just sharing URLs

## Implementation Considerations
- [x] This may require frontend changes - **Yes** (add URL input field)
- [x] This may require backend changes - **Yes** (crawling logic, processing pipeline)
- [ ] This may require database changes - **Yes** (store crawled documents)
- [ ] This may affect existing features - **No**

**Requirements:**
- FireCrawl API key (optional, but recommended for best results)
- `firecrawl-py` library version 4.5.0+
- Background task processing (Celery)

## Checklist
- [x] I have searched existing issues/feature requests to ensure this is not a duplicate
- [x] I have provided a clear description of the feature
- [x] I have added appropriate labels (enhancement, deployment type)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[FEATURE] Add web crawling capability to allow users to add websites to their knowledge base #468

Feature Description

Target Deployment

Problem Statement

Proposed Solution

Benefits

Use Case Examples

Implementation Considerations

Checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE] Add web crawling capability to allow users to add websites to their knowledge base #468

Description

Feature Description

Target Deployment

Problem Statement

Proposed Solution

Benefits

Use Case Examples

Implementation Considerations

Checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions