Skip to content

Support item definitions using dataclasses #30

@BurnzZ

Description

@BurnzZ

Currently, using JSON Schema items with dataclasses (or attrs) doesn't work. Here's quick exampe:

from dataclasses import dataclass
from typing import Optional

from scrapy_jsonschema.item import JsonSchemaItem 


class BookSchemaItem(JsonSchemaItem):
    jsonschema = {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "title": "Book",
        "description": "A Book item extracted from books.toscrape.com",
        "type": "object",
        "properties": {
            "url": {
                "description": "Book's URL",
                "type": "string",
                "pattern": "^https?://[\\S]+$"
            },
            "title": {
                "description": "Book's title",
                "type": "string"
            }
        },
        "required": ["url"]
    }


@dataclass
class BookItem(BookSchemaItem):
    url: str
    title: Optional[str] = None

It's mostly because of how scrapy-jsonschema tries to define the fields in the item via https://github.com/scrapy-plugins/scrapy-jsonschema/blob/master/scrapy_jsonschema/item.py#L77-L79 which is different from how dataclasses and attrs create the fields.

We should have a better way of defining the JSON Schema inside dataclasses by re-writing some portions of the library.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions