无法在 python 中使用 XPATH 获取文本值答案

【问题标题】：Can't get text values using XPATH in python无法在 python 中使用 XPATH 获取文本值
【发布时间】：2015-11-21 21:44:09
【问题描述】：

我正在尝试解析来自this bank website 的货币。在代码中：

import requests
import time
import logging
from retrying import retry
from lxml import html

logging.basicConfig(filename='info.log', format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

@retry(wait_fixed=5000)
def fetch_data_from_nb_ved_ru():
try:
    page = requests.get('http://www.nbu.com/exchange_rates')
    #print page.text
    tree = (html.fromstring(page.text))
    #fetched_ved_usd_buy = tree.xpath('//div[@class="exchangeRates"]/table/tbody/tr[5]/td[5]')
    fetched_ved_usd_buy = tree.xpath('/html/body/div[1]/div//div[7]/div/div/div[1]//text()')
    print fetched_ved_usd_buy
    fetched_ved_usd_sell = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[6]/td[6]/text()')).strip()
    fetched_ved_eur_buy = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[7]/td[5]/text()')).strip()
    fetched_ved_eur_sell = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[7]/td[6]/text()')).strip()
    fetched_cb_eur = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[7]/td[4]/text()')).strip()
    fetched_cb_rub = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[18]/td[4]/text()')).strip()
    fetched_cb_usd = str(tree.xpath('/html/body/div[1]/div/div[7]/div/div/div[1]/table/tbody/tr[6]/td[4]/text()')).strip()
except:
    logging.warning("NB VED UZ fetch failed")
    raise IOError("NB VED UZ  fetch failed")
return fetched_ved_usd_buy, fetched_ved_usd_sell, fetched_cb_usd, fetched_ved_eur_buy, fetched_ved_eur_sell,\
    fetched_cb_eur, fetched_cb_rub

while True:
    f = open('values_uzb.txt', 'w')
    ved_usd_buy, ved_usd_sell, cb_usd, ved_eur_buy, ed_eur_sell, cb_eur, cb_rub = fetch_data_from_nb_ved_ru()
               f.write(str(ved_usd_buy)+'\n'+str(ved_usd_sell)+'\n'+str(cb_usd)+'\n'+str(ved_eur_buy)+'\n'+str(ed_eur_sell)+'\n'
        + str(cb_eur)+'\n'+str(cb_rub))

    f.close()
    time.sleep(120)

但它总是返回空字符串，但是如果我这样做print page.text，我可以看到这些值在他们的位置上。我从萤火虫那里得到了那个xpath。 Chrome 提供相同的 xpath。试图构建自己的xpath //div[@class="exchangeRates"]/table/tbody/tr[5]/td[5] 但它恰好是无效的。

有什么建议吗？谢谢。

【问题讨论】：

尝试在 xpath 中不使用 tbody。
看起来 nbu 站点已关闭
@AnandSKumar，删除了 tbody，结果是一样的 :(

标签： python html xpath request lxml

【解决方案1】：

我使用以下语句，这对我来说运行得很好。

ActualValue = driver.find_element_by_xpath("//div/div[2]/div").text

【讨论】：

【解决方案2】：

试试这个 xpath：

tree.xpath('//div[@class="exchangeRates"]//tr[NUMBER OF TR]/td[5]/text()')

另一件事...我认为如果您输入此代码，您将改进您的代码：

trs = tree.xpath('//div[@class="exchangeRates"]//tr')
    for tr in trs:
        currency_code = tr.xpath('./td[7]/text()').strip()

        if currency_code=='USD':
            usd_buy = tr.xpath('./td[5]/text()').strip()
            usd_sell = tr.xpath('./td[6]/text()').strip()
            usd_cb = tr.xpath('./td[4]/text()').strip()

然后继续使用您需要的其他货币。

这是一个快速代码，如果您需要更多详细信息，请回复。

【讨论】：

【解决方案3】：

我不确定您到底在寻找什么，但这有效：

tree.xpath("/html/body/div[1]/div[7]/div/div/div[1]//text()")

至于从exchangeRates 类开始，我通过使用tree.xpath("//div[@class='exchangeRates']/table")[0].getchildren() 发现没有table 的子tbody，即使浏览器说有。见this SO question for an explanation。从原始 xpath 中删除 tbody 确实有效。但是，您选择的 (td[5]) 是空的，因此返回 []。试试

tree.xpath("//div[@class='exchangeRates']/table/tr[5]/td[4]//text()")
# ['706.65']

或

tree.xpath("//div[@class='exchangeRates']/table/tr[6]/td[5]//text()")
# ['2638.00']

【讨论】：

非常感谢，效果很好！但是你是怎么找到路径"/html/body/div[1]/div[7]/div/div/div[1]//text()"的呢？
只是反复试验。从“/html”开始，一次添加一个孩子，直到我找到你的第一个猜测停止工作的地方。