我试图抓取亚马逊的产品页面( https://www.amazon.com/dp/B0B6TR2GTJ), 代码如下:
import requests
url = "https://www.amazon.com/dp/B0B6TR2GTJ"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
'Accept-Language': 'en-US, en;q=0.5'
}
r= requests.get(url, headers = headers)
print(r.status_code)
print("-------------------")
doc = pq(r.text)
print(doc("title"))
print("-------------------")
print(r.text)
结果如下(被判断为机器人了): Headers 尝试了各种写法, 都是一样的结果.
503
-------------------
<title>Sorry! Something went wrong!</title>
-------------------
<!--
To discuss automated access to Amazon data please contact [email protected].
For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.com/ref=rm_5_sv, or our Product Advertising API at https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html/ref=rm_5_ac for advertising use cases.
-->
<!doctype html>
......
我爬虫还在初学阶段, 有没有前辈大神帮帮我. 万分感谢