Beautifulsoup 从位于单词网格下的表格网格中获取文本答案

【问题标题】：Beautifulsoup get text from table grid under located words' gridBeautifulsoup 从位于单词网格下的表格网格中获取文本
【发布时间】：2021-10-12 05:50:33
【问题描述】：

我想从这个表中提取信息到一个csv文件，但只有年级数和年龄，没有“年级：”和“年龄：”部分：

<table>
<tbody>
        <tr>
            <td><b>Grade:</b></td>
            <td>11</td>
        </tr>
                
        <tr>
            <td><b>Age:</b></td>
            <td>15</td>
        </tr>
</tbody>
</table>

我发现的大多数教程只展示了如何将所有表解析为 csv 文件，而不是解析下一行定位的单词：

import csv
from bs4 import BeautifulSoup as bs

with open("1.html") as fp:
    soup = bs(fp, 'html.parser')
    tables = soup.find_all('table')   

filename = "input.csv"
csv_writer = csv.writer(open(filename, 'w'))

for tr in soup.find_all("tr"):
    data = []
    for th in tr.find_all("th"):
        data.append(th.text)
    if data:
        csv_writer.writerow(data)
        continue

    for td in tr.find_all("td"):
        if td.a:
            data.append(td.a.text.strip())
        else:
            data.append(td.text.strip())
    if data:
        csv_writer.writerow(data)

我该怎么做？谢谢！

【问题讨论】：

标签： python html csv parsing beautifulsoup

【解决方案1】：

您可以使用find_next() 方法在<b> 之后搜索<td>：

soup = BeautifulSoup(html, "html.parser")

for tag in soup.select("table tr > td > b"):
    print(tag.find_next("td").text)

【讨论】：