-
-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Long ago, we worked around an issue where we were getting lots of connection failures from Wayback with a dirty hack: if we ran out of retries but still had a failure to establish a new connection, we’d try resetting the whole session (only once!) and start over:
web-monitoring-processing/web_monitoring/cli.py
Lines 264 to 273 in 74b31b3
| except Exception as error: | |
| # On connection failures, reset the session and try again. If we | |
| # don't do this, the connection pool for this thread is pretty much | |
| # dead. It's not clear to me whether there is a problem in urllib3 | |
| # or Wayback's servers that requires this. | |
| # This unfortunately requires string checking because the error can | |
| # get wrapped up into multiple kinds of higher-level errors :( | |
| if retry_connection_failures and ('failed to establish a new connection' in str(error).lower()): | |
| self.wayback.session.reset() | |
| return self.process_record(record) |
At the time, I knew this was a kind of ugly hack, and probably masking some underlying issues. After some discussion with @danielballan and some spelunking through the source of requests and urllib3 today, we realized there were two problems:
-
We might be holding open connections for an unnecessarily long time. This can be fixed by explicitly closing our response objects.
-
@danielballan did this for
WaybackClient.searchin Explicitly close response when search is done. wayback#19 -
I need to do this when we load Mementos here in
cli.py:web-monitoring-processing/web_monitoring/cli.py
Lines 260 to 263 in 74b31b3
memento = self.wayback.get_memento(record.raw_url, exact_redirects=False) return self.format_memento(memento, record, self.maintainers, self.tags) (Update: we reliably read the whole response, which closes the connection, so we are actually OK here.)
-
-
But that doesn’t cover everything. Ultimately,
requestsburies some features around connection pooling fromurllib3in a way that means eachWaybackSession(we have one per thread, for 36 in production right now) could possibly create infinity connections and then hold onto up to 10. (so we could wind up attempting to hold 360 open connections to Wayback!) Some things we can do to be smarter about this:-
Just reduce the number of threads in production! Already on this, but it doesn’t make the problem obvious or clear if you scale up too far. We should ultimately do better.
-
When
get_memento()redirects, make sure we read the body and close the response before moving onto the next in the redirect chain. (Release connections for memento redirects wayback#20) -
Automatically step down the number of memento threads (to a point) if one gets too many connections refused. (How Many is Too Many? Two? Five? Probably > 1 to differentiate between something spurious and Wayback actually telling us we have too many connections open. Then again, maybe the retry configuration can be partially responsible for this? Much nuance here.) We have to make sure we re-queue the
CdxRecordthat was being loaded in this case. The error we’d see to trigger this would includeFailed to establish a new connection: [Errno 111] Connection refused(Update: I prototyped this out and it didn’t actually improve things over just sharing a connection pool between threads. See Use a shared HTTPAdapter across all threads #551 for details.)
-
Expose ways to set
pool_maxsizeandpool_blockon therequests.HTTPAdapterinstances used byWaybackSessionso users can control this better. (Update: see Use a shared HTTPAdapter across all threads #551 and Sketch out a way to support multithreading wayback#23) -
Make sure
WaybackClientandWaybackSessionandrequests.Sessionandrequests.HTTPAdapterare thread-safe, and use one client with appropriatepool_maxsizeandpool_blockparameters on itsHTTPAdapterinstead of one client per thread. This is the most resilient solution, but most questionable: we can make our stuff thread-safe by fixingDisableAfterCloseSession, but there seems to deep confusion among both users and maintainers ofrequestsas to its thread-safety (urllib3is all good).(Update: Given the issues in the requests package, see Use a shared HTTPAdapter across all threads #551 for a practical, short-term workaround in this repo and Sketch out a way to support multithreading wayback#23 for one approach to solving this better.)
-
Metadata
Metadata
Assignees
Labels
Type
Projects
Status