Skip to content

Destro28/js-rendering-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JS Rendering Service (v1)

This service provides a robust and efficient way to render JavaScript-heavy web pages using FastAPI and Playwright. It's designed to handle dynamic content and evade common bot detection mechanisms.

Key Features (v1)

  • FastAPI Backend: A high-performance, asynchronous web service built with FastAPI.
  • Playwright Integration: Leverages Playwright for headless browser automation, enabling rendering of modern web applications.
  • **Flexible Rendering Modes:
    • fast mode: Blocks non-essential resources (images, stylesheets, fonts, media) to speed up page load.
    • full mode: Renders the complete page content.
  • Dynamic User-Agent Rotation: Randomly selects from a pool of user agents to mimic diverse browser types.
  • Robust Waiting Strategy: Supports an optional wait_for_selector to ensure specific content is loaded before returning HTML, falling back to domcontentloaded and networkidle states.
  • Automatic Dialog Dismissal: Automatically dismisses browser dialogs (e.g., alerts, confirms) to prevent rendering hangs.
  • Randomized Viewport: Sets a random viewport size for each rendering request to help evade browser fingerprinting.
  • Stealth Capabilities: Integrates playwright-stealth to patch Playwright and hide common automation traces, making the service harder to detect as a bot.
  • Health Check Endpoint: A simple /healthcheck endpoint to monitor service status.

Future Enhancements

This service is designed with future scalability and robustness in mind. Here are some planned features and areas for improvement:

  • Proxy Rotation: Integration with a proxy management system to rotate IP addresses, crucial for large-scale scraping and avoiding IP bans.
  • Session Management: Implementation of mechanisms to handle cookies and local storage, enabling persistent sessions and rendering of pages behind authentication.
  • Advanced Anti-Bot Techniques: Continuous research and integration of more sophisticated evasion methods to counter evolving bot detection.
  • Enhanced Error Handling & Logging: More granular error reporting and comprehensive logging for better debugging and monitoring.
  • Performance Optimizations: Further fine-tuning for speed, resource utilization, and efficient Playwright browser instance management.
  • Scalability & Deployment: Considerations for horizontal scaling and optimized deployment strategies (e.g., Kubernetes).

About

A javascript rendering service for an automated web scraping application

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors