【发布时间】:2020-03-01 16:56:34
【问题描述】:
如标题所述,urlopen get 卡在打开 URL 时。
代码:
from bs4 import BeautifulSoup as soup # HTML data structure
from urllib.request import urlopen as uReq # Web client
page_url = "https://store.hp.com/us/en/pdp/hp-laserjet-pro-m404n?jumpid=ma_weekly-deals_product-tile_printers_3_w1a52a_hp-laserjet-pro-m404"
uClient = uReq(page_url)
# parses html into a soup data structure to traverse html
# as if it were a json data type.
page_soup = soup(uClient.read(), "html.parser")
uClient.close()
print(page_soup)
问题:卡在 uReq 上。但是,如果您将 page_url 替换为以下链接,则一切正常。
page_url= "http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=-1&IsNodeId=1&Description=GTX&bop=And&Page=1&PageSize=36&order=BESTMATCH"
错误:超时错误
我怎样才能打开给定的 URL,以用于 Web Scraping 目的?
编辑
【问题讨论】:
标签: python web-scraping soap request urllib