Python / Selenium：如何在动态生成的表中迭代 tr答案

【问题标题】：Python / Selenium: how to iterate over tr in dynamically-generated tablePython / Selenium：如何在动态生成的表中迭代 tr
【发布时间】：2018-07-01 07:40:51
【问题描述】：

我正在尝试从该网站https://coinmunity.co/ 下载表格，然后使用 Pandas 以简单的方式操作数据。问题是该表是动态生成的，因此我无法轻松理解其结构或检测我需要执行循环工作的“tr”。之前用过Requests和BeautifulSoup都试过了，没用，所以这里有人推荐了Selenium，但没有告诉我更多。

在 Selenium 上，我已经尝试了很多东西，包括 xpaths、css 选择器等……但没有任何效果。我的想法是为每一行有序地提取数据，但是行名似乎有一个很奇怪的名字，包括“_ngcontent”，我无法理解。

这是我的（不起作用的）代码：

from selenium import webdriver
import pandas as pd
import time
from bs4 import BeautifulSoup

driver = webdriver.Firefox()
driver.implicitly_wait(10)
#driver.get("https://coinmunity.co/")
url = 'file:///C:/Users/nique/PycharmProjects/untitled/test1.html'
driver.get(url)
html = driver.page_source.encode('utf-8')

#html = driver.page_source.encode('utf-8')
soup = BeautifulSoup(html, 'lxml')

results = []
symbol_list = []

#items = driver.find_elements_by_class_name('coin-link')
items = driver.find_elements_by_css_selector('.inner-container > table:nth-child(1) > tbody:nth-child(2) > tr:nth-child(2)')
#how_many = driver.find_elements_by_css_selector('html body app-root app-home div.outer-container div.inner-container table tbody tr')

count = 1
for el in range(1,3):
    #row = driver.find_elements_by_css_selector('.inner-container > table:nth-child(1) > tbody:nth-child(2) > tr:nth-child((count))')
    row = driver.find_elements_by_xpath('/html/body/app-root/app-home/div/div/table/tbody/tr[count]')

    symbol = row.find_element_by_class_name('coin-link')
    followers = driver.find_elements_by_class_name('stats')[0]
    changefollowers = driver.find_elements_by_class_name('stats')[1]
    # subscribers = driver.find_elements_by_class_name('stats')[2]
    # changesubscribers = driver.find_elements_by_class_name('stats')[3]
    # price = driver.find_elements_by_class_name('stats')[4]
    # changeprice = driver.find_elements_by_class_name('stats')[5]
    count += 1
    print(symbol)

    # results.append({'Symbol': symbol.text, 'TFollowers': followers.text, 'ChangeFollowers': changefollowers.text,'Subscribers': subscribers.text,'ChangeSubscribers': changesubscribers.text,'Price': price.text, 'ChangePrice': changeprice.text})

print(symbol_list)
print(results)

如何以最简单、最整洁的方式下载这些信息并为 Pandas 做好准备？谢谢

【问题讨论】：

您能否详细说明extract the data in an orderly fashion for each row 的确切含义？您尝试Automate 的确切Manual Steps 是什么？
此时我的目标只是以一种我可以用 Pandas 轻松可视化的方式下载数据，所以我的意思是我不想让事情变得过于复杂，主要是因为我不是高级程序员也是。

标签： python selenium dictionary html-table

【解决方案1】：

有两个指令可以处理您描述的问题：

使用driver.find_elements_by_<something>() 遍历表的所有行。
我建议将其与静态网页一起使用。这更自然，因为您实际上利用了 Selenium 的功能。
下载网页的 HTML 代码，“离线”解析和操作它。
当页面不断更新时，这会更好，并保证您不会调用页面中的陈旧元素。但是，它会迫使您解析 HTML 代码，这绝不是一件有趣的事情。

选择最适合您的，然后您可以继续处理更多技术问题。
祝您好运！

编辑：请注意该方法是如何调用元素s的，因此您不应该提及元素的索引。因此，在您的情况下，您可以使用：

row = driver.find_elements_by_xpath('/html/body/app-root/app-home/div/div/table/tbody/tr')

# And not:
row = driver.find_elements_by_xpath('/html/body/app-root/app-home/div/div/table/tbody/tr[number]')

它将返回所有具有上述属性的元素（在本例中为给定的 xpath）。

【讨论】：

您能否给我一个代码示例，说明如何“使用 driver.getElements() 遍历表的所有行”？我不确定你的意思。我还不先进。我想我必须在括号之间放一些东西，这就是我卡住的地方。在这种情况下，“（）”之间会发生什么？我很困惑
@skeitel 我已经编辑了我的答案。告诉我进展如何
你写的代码行很好，@GalAbra。这是丢失的部分。非常感谢！但是为了让我了解......你是如何想出达到我们需要的 tr 所需的确切语法的？我在哪里可以了解更多信息？
@skeitel 我很高兴听到这个消息！我已经与 Selenium 打交道有一段时间了，所以我可以向您保证，最好的学习方法是练习并陷入困境；）您可以阅读更多关于 find_elements_by_... here 或 here 的信息。祝你好运！！
哦，谢谢...卡住似乎是我的专长...如果这是最好的学习方式，我有一天会变得很好...谢谢！