Skip to content

Missing Parent Request #, Duration, and Response Size fields #78

@BurnzZ

Description

@BurnzZ

Currently, the requests coming from scrapy_zyte_api.providers.ZyteApiProvider doesn't create the Parent Request # field in Scrapy Cloud.

image

In the example above, Request 1 should have a Parent Request # field which is missing.

Note that when reverting the changes from the PR #73, we get the Parent Request # field back which comes from the other request which is filtered in the new scrapinghub-entrypoint-scrapy version.

It would seem that after filtering out one of the duplicate requests, the request.meta.setdefault(HS_PARENT_ID_KEY) should somehow be copied into the other request (code ref).

Reproducible example:

class ParentSpider(scrapy.Spider):
    name = "parent"

    def start_requests(self):
        yield scrapy.Request(
            url="https://books.toscrape.com",
            callback=self.parse_nav,
        )

    def parse_nav(self, response: DummyResponse, navigation: ProductNavigation):
        for request in navigation.items:
            yield request.to_scrapy(
                callback=self.parse_item,
            )

    def parse_item(self, response: DummyResponse, product: Product):
        yield product

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions