-
Notifications
You must be signed in to change notification settings - Fork 79
Try to fix django package importer #544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Maybe this is the cause of our crawler getting blocked by djangopackages.org Cloudflare protection?
| headers = { | ||
| "Accept": "application/json", | ||
| "User-Agent": "Wagtail.org Packages Importer", | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no need of the headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @p-r-a-v-i-n, the goal was to seem less suspicious to Cloudflare by not using the default User-Agent header set by requests.
This did not work, so I might change this to the header of a browser to see if that works instead.
| response = requests.get(url, headers=headers, timeout=10) | ||
| if not response.ok: | ||
| raise ValueError(f"Failed to fetch data from {url}: {response.status_code}") | ||
|
|
||
| grid_data = response.json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! but just opinon here:
what do you think of using try-except block here to raise error, bcs sometimes requests won't even allow to perform request.ok if respective server is down.
|
Hello @p-r-a-v-i-n, feel free to continue my work on this PR. This PR is not on my list of priorities to work on. It might be hard to reproduce the issue with Cloudflare, since the issue only seems to occur when deployed to staging or production. My guess is the IP addresses used by our Heroku hosting are on a list at Cloudflare, and together with the way the import behaves makes Cloudflare think we are a malicious bot. Things that could be changed:
|
|
Thanks. I think you are right about horoku's IPs , they often get flagged by cloudflare , i don't have much exposure .
This seems like hole. |
See #501
Some changes that I want to test on staging, see if this stops the bot protection from triggering.