【发布时间】:2020-06-17 15:36:40
【问题描述】:
我尝试使用 requests 和 BeautifulSoup 来解析来自该网站http://hdr.undp.org/en/indicators/137506# 的人类发展指数 (HDI) 通过检查页面,我得到了这个表格
<div id="indcontent">
<table id="table"><thead><tr><th style="width:auto;">HDI Rank</th><th style="width:auto;">Country</th><th style="width:auto;">1990</th><th style="width:auto;"></th><th style="width:auto;">1991</th><th
.
.
style="width:auto;"></th><th style="width:auto;">2016</th><th style="width:auto;"></th><th style="width:auto;">2017</th><th style="width:auto;"></th><th style="width:auto;">2018</th><th style="width:auto;"></th></tr><tr><th class="indName">Human Development Index (HDI)
null
Dimension: Composite indices
Definition: A composite index measuring average achievement in three basic dimensions of human development—a long and healthy life, knowledge and a decent standard of living. See Technical note 1 at http://hdr.undp.org/sites/default/files/hdr2019_technical_notes.pdf for details on how the HDI is calculated.
Source: HDRO calculations based on data from UNDESA (2019b), UNESCO Institute for Statistics (2019), United Nations Statistics Division (2019b), World Bank (2019a), Barro and Lee (2018) and IMF (2019).</th></tr></thead><tbody><tr class="row-even"><td>170</td><td><img src="/sites/default/files/Country-Profiles/AFG.GIF" style="width:20px; height:auto;"> <a href="/countries/profiles/AFG">Afghanistan</a></td><td>0.298</td><td></td><td>0.304</td><td></td><td>0.312</td><td></td><td>0.308</td><td></td><td>0.303</td><td></td><td>0.327</td><td></td><td>0.331</td><td></td><td>0.335</td><td></td>
.
.
<td>0.339</td><td></td><td>0.343</td><td></td><td>0.345</td><td></td><td>0.347</td><td></td><td>0.378</td><td></td><td>0.387</td><td></td><td>0.400</td><td></td><td>0.410</td><td></td><td>0.419</td><td></td><td>0.431</td><td></td><td>0.436</td><td></td><td>0.447</td><td></td><td>0.464</td><td></td><td>0.465</td><td></td><td>0.479</td><td></td><td>0.485</td><td></td><td>0.708</td><td></td><td>0.713</td><td></td><td>0.718</td><td></td><td>0.722</td><td></td><td>0.727</td><td></td><td>0.729</td><td></td><td>0.731</td><td></td></tr><tr><td class="footnotestable"></td></tr></tbody><tfoot></tfoot></table></div>
每当我运行我的代码时
from bs4 import BeautifulSoup
import requests
url="http://hdr.undp.org/en/indicators/137506#"
html_table = requests.get(url)
soup = BeautifulSoup(html_table.content, "html.parser")
# print(soup.prettify()) # print the parsed data of html to test it!
hdi_table = soup.find("div", attrs={"id": "indcontent"})
print(hdi_table)
要尝试查找里面是否有内容,它会返回
<div id="indcontent">
</div>
hdi_table = soup.find("table", attrs={"id": "table"})
rows = hdi_table.table.find_all("tr")
要返回里面的任何东西,但它会打印出 NoneType,在这一步之后我想包含
headers = rows[0]
header_text = []
for th in headers.find_all('th'):
header_text.append(th.text)
row_text_array = []
for row in rows[1:]:
row_text = []
for row_element in row.find_all(['th', 'td']):
row_text.append(row_element.text.replace('\n', '').strip())
row_text_array.append(row_text)
with open("out.csv", "w") as f:
wr = csv.writer(f)
wr.writerow(header_text)
for row_text_single in row_text_array:
wr.writerow(row_text_single)
非常感谢您的帮助!尝试将代码作为一个整体放在一起,以将表格转换为 csv 格式。我已经尝试过 xpath //*[@id="indcontent"],但无法开始工作。
【问题讨论】:
标签: html python-3.x beautifulsoup html-table python-requests