【问题标题】:Cannot access to cell text inside an html table (Selenium,python)无法访问 html 表中的单元格文本(Selenium,python)
【发布时间】:2017-07-20 07:43:43
【问题描述】:

我已经尝试了几个小时,徒劳地从下表中的特定单元格中提取文本:

<tbody class="table-body">
   <tr class=" " data-blah="25293454534534513" data-currency="1">
      <td class="action-cell no-sort">
         <a href="" class="buy-btn tooltip" data-tooltip="Buy the bond"></a><a href="" class="sell-btn tooltip" data-tooltip="Sell the bond"></a>
      </td>
      <td class="col1 id">
         <a class="alert-ico " data-tooltip=""></a>
         <a class="isin-btn " data-tooltip="" id="isin" data-portfolioid="2423424" data-status="0">US3</a>
      </td>
      <td class="col2 name hide">4%</td>
      <td class="col9 colNo.9" title="Bid: 101.23; Mid: 101.28; Ask: 101.33; 
         Liquidity Score: -*/5*; Merit: -/4;" data-bprice="101.28" data-uprice="101.28">101.28<span class="estim-star">*</span></td>
      <td class="col10 price_change" nowrap="" data-sort="0.02"><span class="positive-change">0.02%</span><span class="change-sign positive-change">↑</span></td>
      <td class="col11 yield yield-val" title="" data-sort="3.33" data-byield="3.33" data-uyield="3.34%">3.33%</td>
      <td class="col12 purchase_price" data-bprice="101.28" data-uprice="101.28" data-sort="101.28"><input type="text" name="purchase_price" class="positive-num-only default" value="101.28"></td>
      <td class="col13 margin_bond" data-bond="sec" data-sort="0"><input type="text" name="margin_bond" maxlength="3" class="positive-num-only default" value="0"></td>
   </tr>
</tbody>

我正在尝试使用 lxml.html 从“价格变化”列(第 10 列)中提取文本,它允许我在几秒钟内从大表中提取数据。我就是这样做的:

import lxml.html
import pandas as pd
root = lxml.html.fromstring(self.driver.page_source)
data = []
for row in root.xpath('.//*[@id=\'main\']/div[5]/div[2]/table/tbody/tr'):
    cells = row.xpath('.//td/text()')

所以,我成功地提取了整个表格,我知道唯一的例外是第 10 列(“价格变化”)并尝试了以下操作,它返回了空字符串(“”):

  1. row.xpath('.//tr[1]/td[11][@data-sort]/text()')

  2. row.xpath('.//[@id='main']/div[5]/div[2]/table/tbody/tr[1]/td[11]/span/text ()')

  3. row.xpath('.//*[@id='main']/div[5]/div[2]/table/tbody/tr[1]/td[11]/text( )')

我不想使用 WebElement 提取文本,而只能使用 lxml.html 库

谢谢!

【问题讨论】:

  • 这会返回什么root.xpath("//table[@class='table-body']//td[contains(class,'col10')])/span 检查监视列表或调试器中的值。它包含什么文字?

标签: python pandas selenium automation lxml.html


【解决方案1】:

有两个问题

  1. 总共有7个tds而不是11,你感兴趣的td是5而不是11。
  2. 您感兴趣的 td 有两个跨度,您没有提供您感兴趣的跨度。

这段代码运行良好。

html_code = """
<tbody class="table-body">
   <tr class=" " data-blah="25293454534534513" data-currency="1">
      <td class="action-cell no-sort">
         <a href="" class="buy-btn tooltip" data-tooltip="Buy the bond"></a><a href="" class="sell-btn tooltip" data-tooltip="Sell the bond"></a>
      </td>
      <td class="col1 id">
         <a class="alert-ico " data-tooltip=""></a>
         <a class="isin-btn " data-tooltip="" id="isin" data-portfolioid="2423424" data-status="0">US3</a>
      </td>
      <td class="col2 name hide">4%</td>
      <td class="col9 colNo.9" title="Bid: 101.23; Mid: 101.28; Ask: 101.33;
         Liquidity Score: -*/5*; Merit: -/4;" data-bprice="101.28" data-uprice="101.28">101.28<span class="estim-star">*</span></td>
      <td class="col10 price_change" nowrap="" data-sort="0.02">
        <span class="positive-change">0.02%</span>
        <span class="change-sign positive-change">↑</span></td>
      <td class="col11 yield yield-val" title="" data-sort="3.33" data-byield="3.33" data-uyield="3.34%">3.33%</td>
      <td class="col12 purchase_price" data-bprice="101.28" data-uprice="101.28" data-sort="101.28"><input type="text" name="purchase_price" class="positive-num-only default" value="101.28"></td>
      <td class="col13 margin_bond" data-bond="sec" data-sort="0"><input type="text" name="margin_bond" maxlength="3" class="positive-num-only default" value="0"></td>
   </tr>
</tbody>
"""


tree = html.fromstring(html_code)

print "purchase price is %s" % tree.xpath(".//td[contains(@class,'col10')]/span[1]/text()")[0]
print "purchase price is %s" % tree.xpath(".//td[5]/span[1]/text()")[0]

【讨论】:

    猜你喜欢
    • 2014-10-19
    • 2020-01-12
    • 1970-01-01
    • 2020-09-06
    • 1970-01-01
    • 1970-01-01
    • 2018-10-30
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多