Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 17 additions & 16 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,34 +18,35 @@ jobs:
matrix:
include:
- python-version: '3.9'
toxenv: pinned-scrapy-2x0
toxenv: min-scrapy-2x0
- python-version: '3.9'
toxenv: pinned-scrapy-2x1
toxenv: min-scrapy-2x1
- python-version: '3.9'
toxenv: pinned-scrapy-2x3
toxenv: min-scrapy-2x3
- python-version: '3.9'
toxenv: pinned-scrapy-2x4
toxenv: min-scrapy-2x4
- python-version: '3.9'
toxenv: pinned-scrapy-2x5
toxenv: min-scrapy-2x5
- python-version: '3.9'
toxenv: pinned-scrapy-2x6
toxenv: min-scrapy-2x6
- python-version: '3.9'
toxenv: pinned-scrapy-2x7
toxenv: min-scrapy-2x7
- python-version: '3.9'
toxenv: min-extra
- python-version: '3.9'
toxenv: min-provider
- python-version: '3.10'
toxenv: min-x402
- python-version: '3.10'
- python-version: '3.11'
- python-version: '3.12'
- python-version: '3.13'

- python-version: '3.9'
toxenv: pinned-provider
- python-version: '3.13'
toxenv: extra
- python-version: '3.13'
toxenv: provider

- python-version: '3.9'
toxenv: pinned-extra
- python-version: '3.13'
toxenv: extra

toxenv: x402
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
Expand All @@ -70,7 +71,7 @@ jobs:
fail-fast: false
matrix:
python-version: ["3.12"] # Keep in sync with .readthedocs.yml
tox-job: ["mypy", "pre-commit", "twine-check", "docs"]
tox-job: ["pre-commit", "mypy", "twine", "docs"]

steps:
- uses: actions/checkout@v4
Expand Down
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.coverage
.coverage*
.mypy_cache/
.tox/
dist/
Expand All @@ -8,4 +8,4 @@ docs/_build
*.egg-info/
__pycache__/
/test-results/
build/
build/
6 changes: 6 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
Changes
=======

0.31.0 (unreleased)
-------------------

- Added :ref:`x402 support <x402>`.


0.30.0 (2025-05-13)
-------------------

Expand Down
110 changes: 97 additions & 13 deletions docs/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,13 @@ You need at least:

- Scrapy 2.0.1+

:doc:`scrapy-poet <scrapy-poet:index>` integration requires higher versions:
:doc:`scrapy-poet <scrapy-poet:index>` integration requires Scrapy 2.6+.

- Scrapy 2.6+
:ref:`x402 support <x402>` requires Python 3.10+.


.. _install:

Installation
============

Expand All @@ -42,31 +44,63 @@ For :ref:`scrapy-poet integration <scrapy-poet>`:

pip install scrapy-zyte-api[provider]

For :ref:`x402 support <x402>`, make sure you have Python 3.10+ and install
the ``x402`` extra:

.. code-block:: shell

pip install scrapy-zyte-api[x402]

Note that you can install multiple extras_:

.. _extras: https://setuptools.pypa.io/en/latest/userguide/dependency_management.html#optional-dependencies

.. code-block:: shell

pip install scrapy-zyte-api[provider,x402]


Configuration
=============

To configure scrapy-zyte-api, :ref:`set your API key <config-api-key>` and
either :ref:`enable the add-on <config-addon>` (Scrapy ≥ 2.10) or
:ref:`configure all components separately <config-components>`.
To configure scrapy-zyte-api, :ref:`set up authentication <auth>` and either
:ref:`enable the add-on <config-addon>` (Scrapy ≥ 2.10) or :ref:`configure all
components separately <config-components>`.

.. warning:: :ref:`reactor-change`.

.. _auth:
.. _config-api-key:

Setting your API key
--------------------
Authentication
--------------

Add your `Zyte API key`_, and add it to your project ``settings.py``:
`Sign up for a Zyte API account
<https://app.zyte.com/account/signup/zyteapi>`_, copy `your API key
<https://app.zyte.com/o/zyte-api/api-access>`_ and do either of the following:

.. _Zyte API key: https://app.zyte.com/o/zyte-api/api-access
- Define an environment variable named ``ZYTE_API_KEY`` with your API key:

.. code-block:: python
- On Windows’ CMD:

ZYTE_API_KEY = "YOUR_API_KEY"
.. code-block:: shell

Alternatively, you can set your API key in the ``ZYTE_API_KEY`` environment
variable instead.
> set ZYTE_API_KEY=YOUR_API_KEY

- On macOS and Linux:

.. code-block:: shell

$ export ZYTE_API_KEY=YOUR_API_KEY

- Add your API key to your setting module:

.. code-block:: python
:caption: settings.py

ZYTE_API_KEY = "YOUR_API_KEY"

To use `x402 <https://www.x402.org/>`__ instead, see :ref:`x402`.


.. _config-addon:
Expand Down Expand Up @@ -175,3 +209,53 @@ your existing code may need changes, such as:
some Scrapy functions and methods. For example, when you yield the
return value of ``self.crawler.engine.download()`` from a spider
callback, you are yielding a Deferred.


.. _x402:

x402
====

It is possible to use :ref:`Zyte API <zyte-api>` without a Zyte API account by
using the `x402 <https://www.x402.org/>`__ protocol to handle payments:

#. Read the `Zyte Terms of Service`_. By using Zyte API, you are accepting
them.

.. _Zyte Terms of Service: https://www.zyte.com/terms-policies/terms-of-service/

#. During :ref:`installation <install>`, make sure to install the ``x402``
extra.

#. :ref:`Configure <eth-key>` the *private* key of your Ethereum_ account to
authorize payments.

.. _Ethereum: https://ethereum.org/

.. _eth-key:

Configuring your Ethereum private key
-------------------------------------

It is recommended to configure your Ethereum private key through an environment
variable, so that it also works when you use :doc:`python-zyte-api
<python-zyte-api:index>`:

- On Windows’ CMD:

.. code-block:: shell

> set ZYTE_API_ETH_KEY=YOUR_ETH_PRIVATE_KEY

- On macOS and Linux:

.. code-block:: shell

$ export ZYTE_API_ETH_KEY=YOUR_ETH_PRIVATE_KEY

Alternatively, you can add your Ethereum private key to the settings module:

.. code-block:: python
:caption: settings.py

ZYTE_API_ETH_KEY = "YOUR_ETH_PRIVATE_KEY"
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ provider = [
"web-poet>=0.17.0",
"zyte-common-items>=0.27.0",
]
x402 = [
"zyte-api[x402]>=0.8.0",
]

[project.urls]
source = "https://github.com/scrapy-plugins/scrapy-zyte-api"
Expand Down
49 changes: 32 additions & 17 deletions scrapy_zyte_api/handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from copy import deepcopy
from typing import Any, Generator, Optional, Union


from scrapy import Spider, signals
from scrapy.crawler import Crawler
from scrapy.exceptions import NotConfigured
Expand All @@ -15,12 +16,12 @@
from twisted.internet.defer import Deferred, inlineCallbacks
from zyte_api import AsyncZyteAPI, RequestError
from zyte_api.apikey import NoApiKey
from zyte_api.constants import API_URL

from ._params import _ParamParser
from .responses import ZyteAPIResponse, ZyteAPITextResponse, _process_response
from .utils import (
_AUTOTHROTTLE_DONT_ADJUST_DELAY_SUPPORT,
_X402_SUPPORT,
USER_AGENT,
_build_from_crawler,
)
Expand Down Expand Up @@ -95,7 +96,7 @@ def __init__(
# https://github.com/scrapy-plugins/scrapy-zyte-api/issues/58
crawler.zyte_api_client = client # type: ignore[attr-defined]
self._client: AsyncZyteAPI = crawler.zyte_api_client # type: ignore[attr-defined]
logger.info("Using a Zyte API key starting with %r", self._client.api_key[:7])
self._log_auth()
verify_installed_reactor(
"twisted.internet.asyncioreactor.AsyncioSelectorReactor"
)
Expand Down Expand Up @@ -133,6 +134,21 @@ def __init__(
def from_crawler(cls, crawler):
return cls(crawler.settings, crawler)

def _log_auth(self):
if _X402_SUPPORT:
auth_type = (
"a Zyte API key"
if self._client.auth.type == "zyte"
else "an Ethereum private key"
)
logger.info(
f"Using {auth_type} starting with {self._client.auth.key[:7]!r}"
)
else:
logger.info(
f"Using a Zyte API key starting with {self._client.api_key[:7]!r}"
)

async def engine_started(self):
self._session = self._client.session(trust_env=self._trust_env)
if not self._cookies_enabled:
Expand All @@ -153,27 +169,26 @@ async def engine_started(self):

@staticmethod
def _build_client(settings):
kwargs = {}
if api_key := settings.get("ZYTE_API_KEY"):
kwargs["api_key"] = api_key
if _X402_SUPPORT and (eth_key := settings.get("ZYTE_API_ETH_KEY")):
kwargs["eth_key"] = eth_key
if api_url := settings.get("ZYTE_API_URL"):
kwargs["api_url"] = api_url
try:
return AsyncZyteAPI(
# To allow users to have a key defined in Scrapy settings and
# in a environment variable, and be able to cause the
# environment variable to be used instead of the setting by
# overriding the setting on the command-line to be an empty
# string, we do not support setting empty string keys through
# settings.
api_key=settings.get("ZYTE_API_KEY") or None,
api_url=settings.get("ZYTE_API_URL") or API_URL,
n_conn=settings.getint("CONCURRENT_REQUESTS"),
user_agent=settings.get("_ZYTE_API_USER_AGENT", default=USER_AGENT),
user_agent=settings.get("_ZYTE_API_USER_AGENT", USER_AGENT),
**kwargs,
)
except NoApiKey:
logger.warning(
"'ZYTE_API_KEY' must be set in the spider settings or env var "
"in order for ScrapyZyteAPIDownloadHandler to work."
)
raise NotConfigured(
"Your Zyte API key is not set. Set ZYTE_API_KEY to your API key."
message = (
"No authentication data provided. See "
"https://scrapy-zyte-api.readthedocs.io/en/latest/setup.html#auth"
)
logger.warning(message)
raise NotConfigured(message)

def _create_handler(self, path: Any) -> Any:
dhcls = load_object(path)
Expand Down
7 changes: 7 additions & 0 deletions scrapy_zyte_api/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,3 +62,10 @@ def _build_from_crawler(
_SCRAPY_POET_VERSION = Version(version("scrapy-poet"))
_SCRAPY_POET_0_26_0 = Version("0.26.0")
_POET_ADDON_SUPPORT = _SCRAPY_POET_VERSION >= _SCRAPY_POET_0_26_0

try:
from zyte_api import AuthInfo # noqa: F401
except ImportError:
_X402_SUPPORT = False
else:
_X402_SUPPORT = True
Loading