Kafka-based components for Scrapy. There are 2 components:
- A custom
Spiderthat waits for URLs to crawl via a Kafka topic. When there are no more messages to read for the topic, theSpiderjust stays idle. - A custom
ItemPipelinecomponent that stores a JSON-ifiedItemback into another Kafka topic.
Please see the example directory for how to use this.
Contributors to scrapy-kafka, listed alphabetically: