Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 17 additions & 16 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,34 +18,35 @@ jobs:
matrix:
include:
- python-version: '3.9'
toxenv: pinned-scrapy-2x0
toxenv: min-scrapy-2x0
- python-version: '3.9'
toxenv: pinned-scrapy-2x1
toxenv: min-scrapy-2x1
- python-version: '3.9'
toxenv: pinned-scrapy-2x3
toxenv: min-scrapy-2x3
- python-version: '3.9'
toxenv: pinned-scrapy-2x4
toxenv: min-scrapy-2x4
- python-version: '3.9'
toxenv: pinned-scrapy-2x5
toxenv: min-scrapy-2x5
- python-version: '3.9'
toxenv: pinned-scrapy-2x6
toxenv: min-scrapy-2x6
- python-version: '3.9'
toxenv: pinned-scrapy-2x7
toxenv: min-scrapy-2x7
- python-version: '3.9'
toxenv: min-extra
- python-version: '3.9'
toxenv: min-provider
- python-version: '3.10'
toxenv: min-x402
- python-version: '3.10'
- python-version: '3.11'
- python-version: '3.12'
- python-version: '3.13'

- python-version: '3.9'
toxenv: pinned-provider
- python-version: '3.13'
toxenv: extra
- python-version: '3.13'
toxenv: provider

- python-version: '3.9'
toxenv: pinned-extra
- python-version: '3.13'
toxenv: extra

toxenv: x402
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
Expand All @@ -70,7 +71,7 @@ jobs:
fail-fast: false
matrix:
python-version: ["3.12"] # Keep in sync with .readthedocs.yml
tox-job: ["mypy", "pre-commit", "twine-check", "docs"]
tox-job: ["pre-commit", "mypy", "twine", "docs"]

steps:
- uses: actions/checkout@v4
Expand Down
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.coverage
.coverage*
.mypy_cache/
.tox/
dist/
Expand All @@ -8,4 +8,4 @@ docs/_build
*.egg-info/
__pycache__/
/test-results/
build/
build/
6 changes: 6 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
Changes
=======

0.31.0 (unreleased)
-------------------

- Added :ref:`x402 support <x402>`.


0.30.0 (2025-05-13)
-------------------

Expand Down
73 changes: 60 additions & 13 deletions docs/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,13 @@ You need at least:

- Scrapy 2.0.1+

:doc:`scrapy-poet <scrapy-poet:index>` integration requires higher versions:
:doc:`scrapy-poet <scrapy-poet:index>` integration requires Scrapy 2.6+.

- Scrapy 2.6+
:ref:`x402 support <x402>` requires Python 3.10+.


.. _install:

Installation
============

Expand All @@ -42,31 +44,76 @@ For :ref:`scrapy-poet integration <scrapy-poet>`:

pip install scrapy-zyte-api[provider]

For :ref:`x402 support <x402>`, make sure you have Python 3.10+ and install
the ``x402`` extra:

.. code-block:: shell

pip install scrapy-zyte-api[x402]

Note that you can install multiple extras_:

.. _extras: https://setuptools.pypa.io/en/latest/userguide/dependency_management.html#optional-dependencies

.. code-block:: shell

pip install scrapy-zyte-api[provider,x402]


Configuration
=============

To configure scrapy-zyte-api, :ref:`set your API key <config-api-key>` and
either :ref:`enable the add-on <config-addon>` (Scrapy ≥ 2.10) or
:ref:`configure all components separately <config-components>`.
To configure scrapy-zyte-api, :ref:`set up authentication <auth>` and either
:ref:`enable the add-on <config-addon>` (Scrapy ≥ 2.10) or :ref:`configure all
components separately <config-components>`.

.. warning:: :ref:`reactor-change`.

.. _auth:
.. _config-api-key:

Setting your API key
--------------------
Authentication
--------------

Add your `Zyte API key`_, and add it to your project ``settings.py``:
If you `sign up for a Zyte API account
<https://app.zyte.com/account/signup/zyteapi>`_, copy `your API key
<https://app.zyte.com/o/zyte-api/api-access>`_ and do either of the following:

.. _Zyte API key: https://app.zyte.com/o/zyte-api/api-access
- Define an environment variable named ``ZYTE_API_KEY`` with your API key.

.. code-block:: python
- Add your API key to your setting module:

.. code-block:: python
:caption: settings.py

ZYTE_API_KEY = "YOUR_API_KEY"

.. _x402:

To use :ref:`python-zyte-api:x402` instead:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should downplay it a bit for now - add a separate section about x402, and add a link to it, while keeping it outside of the "main" setup flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


#. Read the `Zyte Terms of Service`_. By using Zyte API, you are accepting
them.

.. _Zyte Terms of Service: https://www.zyte.com/terms-policies/terms-of-service/

#. During :ref:`installation <install>`, make sure to install the ``x402``
extra.

#. Configure the *private* key of your Ethereum_ account to authorize by doing
either of the following:

.. _Ethereum: https://ethereum.org/

- Define an environment variable named ``ZYTE_API_ETH_KEY`` with your
Ethereum private key.

- Add your Ethereum private key to your setting module:

ZYTE_API_KEY = "YOUR_API_KEY"
.. code-block:: python
:caption: settings.py

Alternatively, you can set your API key in the ``ZYTE_API_KEY`` environment
variable instead.
ZYTE_API_ETH_KEY = "YOUR_ETH_PRIVATE_KEY"


.. _config-addon:
Expand Down
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ provider = [
"web-poet>=0.17.0",
"zyte-common-items>=0.27.0",
]
x402 = [
"zyte-api[x402]>=0.8.0",
]

[project.urls]
source = "https://github.com/scrapy-plugins/scrapy-zyte-api"
Expand Down
49 changes: 32 additions & 17 deletions scrapy_zyte_api/handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from copy import deepcopy
from typing import Any, Generator, Optional, Union


from scrapy import Spider, signals
from scrapy.crawler import Crawler
from scrapy.exceptions import NotConfigured
Expand All @@ -15,12 +16,12 @@
from twisted.internet.defer import Deferred, inlineCallbacks
from zyte_api import AsyncZyteAPI, RequestError
from zyte_api.apikey import NoApiKey
from zyte_api.constants import API_URL

from ._params import _ParamParser
from .responses import ZyteAPIResponse, ZyteAPITextResponse, _process_response
from .utils import (
_AUTOTHROTTLE_DONT_ADJUST_DELAY_SUPPORT,
_X402_SUPPORT,
USER_AGENT,
_build_from_crawler,
)
Expand Down Expand Up @@ -95,7 +96,7 @@ def __init__(
# https://github.com/scrapy-plugins/scrapy-zyte-api/issues/58
crawler.zyte_api_client = client # type: ignore[attr-defined]
self._client: AsyncZyteAPI = crawler.zyte_api_client # type: ignore[attr-defined]
logger.info("Using a Zyte API key starting with %r", self._client.api_key[:7])
self._log_auth()
verify_installed_reactor(
"twisted.internet.asyncioreactor.AsyncioSelectorReactor"
)
Expand Down Expand Up @@ -133,6 +134,21 @@ def __init__(
def from_crawler(cls, crawler):
return cls(crawler.settings, crawler)

def _log_auth(self):
if _X402_SUPPORT:
auth_type = (
"a Zyte API key"
if self._client.auth.type == "zyte"
else "an Ethereum private key"
)
logger.info(
f"Using {auth_type} starting with {self._client.auth.key[:7]!r}"
)
else:
logger.info(
f"Using a Zyte API key starting with {self._client.api_key[:7]!r}"
)

async def engine_started(self):
self._session = self._client.session(trust_env=self._trust_env)
if not self._cookies_enabled:
Expand All @@ -153,27 +169,26 @@ async def engine_started(self):

@staticmethod
def _build_client(settings):
kwargs = {}
if api_key := settings.get("ZYTE_API_KEY"):
kwargs["api_key"] = api_key
if _X402_SUPPORT and (eth_key := settings.get("ZYTE_API_ETH_KEY")):
kwargs["eth_key"] = eth_key
if api_url := settings.get("ZYTE_API_URL"):
kwargs["api_url"] = api_url
try:
return AsyncZyteAPI(
# To allow users to have a key defined in Scrapy settings and
# in a environment variable, and be able to cause the
# environment variable to be used instead of the setting by
# overriding the setting on the command-line to be an empty
# string, we do not support setting empty string keys through
# settings.
api_key=settings.get("ZYTE_API_KEY") or None,
api_url=settings.get("ZYTE_API_URL") or API_URL,
n_conn=settings.getint("CONCURRENT_REQUESTS"),
user_agent=settings.get("_ZYTE_API_USER_AGENT", default=USER_AGENT),
user_agent=settings.get("_ZYTE_API_USER_AGENT", USER_AGENT),
**kwargs,
)
except NoApiKey:
logger.warning(
"'ZYTE_API_KEY' must be set in the spider settings or env var "
"in order for ScrapyZyteAPIDownloadHandler to work."
)
raise NotConfigured(
"Your Zyte API key is not set. Set ZYTE_API_KEY to your API key."
message = (
"No authentication data provided. See "
"https://scrapy-zyte-api.readthedocs.io/en/latest/setup.html#auth"
)
logger.warning(message)
raise NotConfigured(message)

def _create_handler(self, path: Any) -> Any:
dhcls = load_object(path)
Expand Down
7 changes: 7 additions & 0 deletions scrapy_zyte_api/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,3 +62,10 @@ def _build_from_crawler(
_SCRAPY_POET_VERSION = Version(version("scrapy-poet"))
_SCRAPY_POET_0_26_0 = Version("0.26.0")
_POET_ADDON_SUPPORT = _SCRAPY_POET_VERSION >= _SCRAPY_POET_0_26_0

try:
from zyte_api import AuthInfo # noqa: F401
except ImportError:
_X402_SUPPORT = False
else:
_X402_SUPPORT = True
Loading
Loading