【发布时间】:2020-11-10 04:23:27
【问题描述】:
我正在尝试对表格进行网络抓取,这是我正在使用的代码。我尝试了很多方法,但我是 Python 新手,但它们不起作用。有人有想法吗?请在您的回答中包括该部分代码的插入位置。
import urllib
import urllib.request
from bs4 import BeautifulSoup
def make_soup(url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage, "html.parser")
return soupdata
soup = make_soup('https://www.transfermarkt.com/transfers/saisontransfers/statistik?land_id=0&ausrichtung=&spielerposition_id=&altersklasse=&leihe=&transferfenster=&saison-id=2020&plus=1')
我收到错误:
Traceback (most recent call last):
File "C:\Users\GBEM\AppData\Local\Programs\Python\Python38-32\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\GBEM\AppData\Local\Programs\Python\Python38-32\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\GBEM\PycharmProjects\tablepractice\tablescrape.py", line 11, in <module>
soup = make_soup('https://www.transfermarkt.com/transfers/saisontransfers/statistik?land_id=0&ausrichtung=&spielerposition_id=&altersklasse=&leihe=&transferfenster=&saison-id=2020&plus=1')
File "C:\Users\GBEM\PycharmProjects\tablepractice\tablescrape.py", line 7, in make_soup
thepage = urllib.request.urlopen(url)
File "C:\Users\GBEM\AppData\Local\Programs\Python\Python38-32\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\GBEM\AppData\Local\Programs\Python\Python38-32\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\GBEM\AppData\Local\Programs\Python\Python38-32\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "C:\Users\GBEM\AppData\Local\Programs\Python\Python38-32\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Users\GBEM\AppData\Local\Programs\Python\Python38-32\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Users\GBEM\AppData\Local\Programs\Python\Python38-32\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
【问题讨论】: