-
Notifications
You must be signed in to change notification settings - Fork 107
Description
Hi!
I've done some experimenting, and I've found that extra_headers do not seem to apply to cross-origin frames. I tested by setting extra_headers to {"warcprox-meta": """{"warc-prefix": "special-warc"}"""}, brozzling https://thetechrobo.ca/brozzler-iframe-test.html (which has a YouTube embed), and observing what goes into which WARC. special-warc should intuitively contain all requests relating to what is being brozzled. But grepping for WARC-Target-URI (| uniq) shows a different story:
special-warc-20250827035114778-00000-6i5ps4mg.warc
WARC-Target-URI: https://thetechrobo.ca/brozzler-iframe-test.html
WARC-Target-URI: https://thetechrobo.ca/does/not/exist.png
WARC-Target-URI: https://www.youtube.com/embed/aPg2V5RVh7U
WARC-Target-URI: https://thetechrobo.ca/favicon.ico
WARCPROX-20250827035113470-00000-wrmp3cvu.warc
WARC-Target-URI: http://clients2.google.com/time/1/current?cup2key=9:rgqGXb-a_ZszmhF-iGROG6F-JO_DSPJoG_P-_VgbnpM&cup2hreq=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
WARC-Target-URI: https://safebrowsingohttpgateway.googleapis.com/v1/ohttp/hpkekeyconfig?key=AIzaSyBqJZh-7pA44blAaAkH6490hUFOwX0KCYM
WARC-Target-URI: https://accounts.google.com/ListAccounts?gpsia=1&source=ChromiumBrowser&laf=b64bin&json=standard
WARC-Target-URI: https://www.youtube.com/s/player/6742b2b9/www-player.css
WARC-Target-URI: https://www.youtube.com/s/player/6742b2b9/player_ias.vflset/en_GB/embed.js
WARC-Target-URI: https://fonts.gstatic.com/s/roboto/v18/KFOlCnqEu92Fr1MmEU9fBBc4.woff2
WARC-Target-URI: https://fonts.gstatic.com/s/roboto/v18/KFOmCnqEu92Fr1Mu4mxK.woff2
WARC-Target-URI: https://www.youtube.com/s/player/6742b2b9/www-embed-player.vflset/www-embed-player.js
WARC-Target-URI: https://www.youtube.com/s/player/6742b2b9/player_ias.vflset/en_GB/base.js
WARC-Target-URI: https://www.youtube.com/s/player/6742b2b9/player_ias.vflset/en_GB/remote.js
WARC-Target-URI: https://www.google.com/js/th/z1P_mE5apSVCd16CrsEwj7UAJuHEPotZNGO7bYrdVCQ.js
WARC-Target-URI: https://i.ytimg.com/vi/aPg2V5RVh7U/default.jpg?v=682ce3a3
WARC-Target-URI: https://yt3.ggpht.com/d2sGw3qXN-qcwvaTBtCDWHXSj_LTcFzwEQpHtma55tFPMlL0x6mLkfIwbQRqxFy5y3idvPFKbpw=s68-c-k-c0x00ffffff-no-rj
WARC-Target-URI: https://jnn-pa.googleapis.com/$rpc/google.internal.waa.v1.Waa/Create
WARC-Target-URI: https://www.gstatic.com/cv/js/sender/v1/cast_sender.js
WARC-Target-URI: https://www.youtube.com/generate_204?T2xb7Q
WARC-Target-URI: https://www.gstatic.com/eureka/clank/139/cast_sender.js
WARC-Target-URI: https://jnn-pa.googleapis.com/$rpc/google.internal.waa.v1.Waa/GenerateIT
WARC-Target-URI: https://play.google.com/log?hasfast=true&authuser=0&format=json
WARC-Target-URI: https://android.clients.google.com/checkin
WARC-Target-URI: https://android.clients.google.com/c2dm/register3
WARC-Refers-To-Target-URI: https://android.clients.google.com/c2dm/register3
WARC-Target-URI: https://android.clients.google.com/c2dm/register3
WARC-Target-URI: https://www.youtube.com/youtubei/v1/log_event?alt=json
Some of this is expected, like the ListAccounts call that the browser is doing on its own. But all of the frame's requisites are here as well. This is likely the same root cause as the caveat with frames in #394, since headers are set per websocket connection (and each frame has its own websocket).
This could be fixed by watching for new frames and reconfiguring them as they pop up. But that would likely need some refactoring with the websocket thread (to allow for multiple?), and we'd probably miss some requests during the time it takes to find and connect to the new websocket.
I haven't tested it, but this also likely affects the logic inside the websocket thread as well, such as detecting proxy errors, on_request/on_response, console output, etc. Anything originating from a frame won't show up there.