从网页Python中抓取多个表答案

【问题标题】：Scraping Multiple Tables from webpage Python从网页Python中抓取多个表
【发布时间】：2018-01-01 16:35:13
【问题描述】：

我正在尝试从下面的网页中抓取多个表格。但是，我的代码只得到第一个表，即使所有表都嵌套在相同的 tr 和 td 标记中。这是我的尝试：

 url = "http://zipnet.in/index.php?page=missing_person_search&criteria=browse_all&Page_No=1"
 r = requests.get(url)
 soup = BeautifulSoup(r.content, 'html.parser')
 tables = soup.find('table', border=1)
 for row in tables.findAll('tr'):
 sleep (3)
 col = row.findAll('td')
 fields = col[0].string
 details = col[1].string
 record = (fields, details)
 print (record)

我在这里错过了什么？

【问题讨论】：

tables = soup.findAll("table") 或使用 pandas（如果您已安装）pandas.readhtml(str(soup))
@AlbinPaul，我收到了这个错误，当我做 soup.findAll 时，AttributeError: ResultSet object has no attribute 'findAll'。您可能将项目列表视为单个项目。当你打算调用 find() 时，你调用了 find_all() 吗？
总是把完整的错误信息（Traceback）放在有问题的地方（作为文本，而不是屏幕截图）。还有其他有用的信息。
你当前的问题可能是你有两个 findAll` - 所以最后你有像soup.findAll('table',border=1).findAll('tr') 这样的东西。在您使用第一个 findAll 后，您必须使用 for 循环 (for item in tables:) 并在每个表上分别使用第二个 findAll - item.findAll()。
也许你应该使用soup.find('table', border=1).findAll('table')搜索第一个表中的所有表

标签： python web-scraping beautifulsoup python-requests

【解决方案1】：

试一试，获取该页面中所有可用的表格，尤其是包含所需记录的表格：

import requests 
from bs4 import BeautifulSoup

url = "http://zipnet.in/index.php?page=missing_person_search&criteria=browse_all&Page_No=1"
res = requests.get(url)
soup = BeautifulSoup(res.text, 'lxml')
for trow in soup.select("table#AutoNumber15"):
    data = [[' '.join(item.text.split()) for item in tcel.select("td")]
            for tcel in trow.select("tr")]
    print(data)

【讨论】：

顺便说一句，您可以随时剔除您不想保留的任何不需要的部分，如果您只保留每个配置文件的前几条记录，那么您可以带来像 tcel in trow.select("tr")[0:8]] 这样的小改动.就是这样。
非常感谢，Shahin。