【问题标题】:how to scrape table data using selenium?如何使用 selenium 抓取表数据?
【发布时间】:2021-08-10 12:17:26
【问题描述】:
from selenium import webdriver
import time
import datetime

driver = webdriver.Chrome(
executable_path=r'C:\Users\Kashi\Downloads\Compressed\chromedriver_win32/chromedriver.exe')
driver.get('https://www.mql5.com/en/quotes/currencies')
driver.find_element_by_xpath('//*[@id="list-view-btn"]').click()
time.sleep(15)
values = [[ [] for c in range(4)] for r in range(4)]

def scrape():
    for i in range (2,8):
        if i%2 == 0 :
            ask = driver.find_element_by_xpath('//*[@id="ticker_ask_ + str(i) + "]').text
            values[i][0].append(ask)
            bid = driver.find_element_by_xpath('//*[@id="ticker_bid_ + str(i) + "]').text
            values[i][1].append(bid)
            high = driver.find_element_by_xpath('//*[@id="ticker_high_ + str(i) + "]').text
            values[i][2].append(high)
            low = driver.find_element_by_xpath('//*[@id="ticker_low_ + str(i) + "]').text
            values[i][3].append(low)
            print ( values[i][0] ,' , ',values[i][1] ,' , ',values[i][2] ,' , ',values[i][3]  )
scrape()

如何使用 Selenium 抓取多行数据?因为我无法抓取多行数据。我正在使用 Jupyter Notebook 和 Python 3。我收到了这个错误:

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="ticker_ask_ + str(i) + "]"}

【问题讨论】:

    标签: python python-3.x selenium web-scraping


    【解决方案1】:

    我们可以利用{}find_elements中定义i:

    试试这个:

    values = [[ [] for c in range(4)] for r in range(4)]
    def scrape():
        for i in range(2, 8):
          if i % 2 == 0:
            ask = driver.find_element_by_xpath(f"//*[@id='ticker_ask_{i}']").text
            values[i][0].append(ask)
            bid = driver.find_element_by_xpath(f"//*[@id='ticker_bid_{i}']").text
            values[i][1].append(bid)
            high = driver.find_element_by_xpath(f"//*[@id='ticker_high_{i}']").text
            values[i][2].append(high)
            low = driver.find_element_by_xpath(f"//*[@id='ticker_low_{i}']").text
            values[i][3].append(low)
            print(values[i][0], ' , ', values[i][1], ' , ', values[i][2], ' , ', values[i][3])
    scrape()
    

    【讨论】:

    • NoSuchElementException: 消息:没有这样的元素:无法找到元素:{"method":"xpath","selector":"//*[@id="ticker_ask_ + 2"]"}
    • @KashifHussain :更新了上面的帖子,请检查并告诉我
    • 这个解决方案的缺点是一旦 HTML 结构被改变它就会中断。使用 API 调用(针对https://www.mql5.com/en/quotes/symbols/json)可以解决这个问题。
    • @balderman :这基本上是在 python 中解析 xpath 中的i,如果定位器是正确的,那么这是一个很好的方法
    • @cruisepandey 虽然抓取很简单(我同意),但调用 API(如果有的话)总是更安全、更干净。在这种情况下,有一个 API 端点。为什么不使用它?
    【解决方案2】:

    您可以对端点 https://www.mql5.com/en/quotes/symbols/json 运行 HTTP POST 请求。此端点将数据返回给浏览器。
    在浏览器中执行 F12,Network->Fetch/XHR。
    在“货币汇率”表中导航,查看从浏览器发起的 HTTP POST 请求。

    下面的卷曲示例

    curl 'https://www.mql5.com/en/quotes/symbols/json' \
      -H 'authority: www.mql5.com' \
      -H 'pragma: no-cache' \
      -H 'cache-control: no-cache' \
      -H 'sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92"' \
      -H 'x-requested-with: XMLHttpRequest' \
      -H 'sec-ch-ua-mobile: ?0' \
      -H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36' \
      -H 'content-type: application/x-www-form-urlencoded' \
      -H 'accept: */*' \
      -H 'origin: https://www.mql5.com' \
      -H 'sec-fetch-site: same-origin' \
      -H 'sec-fetch-mode: cors' \
      -H 'sec-fetch-dest: empty' \
      -H 'referer: https://www.mql5.com/en/quotes/currencies' \
      -H 'accept-language: en-US,en;q=0.9,el;q=0.8,he;q=0.7,de;q=0.6,fr;q=0.5,it;q=0.4,es;q=0.3' \
      -H 'cookie: sid=21maju3l3pnzvqri0chcalbt; lang=en; _fz_uniq=5044349715386065060; _fz_fvdt=1628597925; _fz_ssn=1628597925063698947' \
      --data-raw 'symbols=EURCAD%2CCADCHF%2CNZDCAD%2CCADMXN&__signature=245f9caa9d5d04eac315c5ade7b83b55' \
      --compressed
    

    回复

    [{"id":17,"name":"EURCAD","description":"Euro vs Canadian Dollar","chart":"1.47673,1.47261,1.47932,1.48741,1.48959,1.50352,1.49338,1.48128,1.47798,1.47882,1.48048,1.48890,1.48325,1.47912,1.47983,1.48445,1.48709,1.48405,1.47919,1.47668,1.47562,1.47286","c1":"eu","c2":"ca","growth":0.00262754097470208,"digits":5,"ask":1.47286,"bid":1.47286,"dir":0,"chart_date":1626037200},{"id":20,"name":"CADCHF","description":"Canadian Dollar vs Swiss Franc","chart":"0.73461,0.73344,0.73135,0.72833,0.72851,0.71856,0.72641,0.72995,0.73079,0.73165,0.72877,0.72525,0.72563,0.72739,0.72601,0.72309,0.72052,0.72282,0.72435,0.72856,0.73064,0.73375","c1":"ca","c2":"ch","growth":0.00117206132879044,"digits":5,"ask":0.73375,"bid":0.73375,"dir":0,"chart_date":1626037200},{"id":552,"name":"NZDCAD","description":"New Zealand Dollar vs Canadian Dollar","chart":"0.86911,0.86889,0.87900,0.87887,0.88323,0.88483,0.87695,0.87441,0.87577,0.87663,0.87799,0.87630,0.87097,0.87216,0.86933,0.87092,0.87939,0.88297,0.88116,0.88022,0.87925,0.87885","c1":"nz","c2":"ca","growth":0.0112068667947671,"digits":5,"ask":0.87885,"bid":0.87885,"dir":0,"chart_date":1626037200},{"id":21412,"name":"CADMXN","description":"Canadian Dollar vs Mexican Peso","chart":"15.90559,15.90099,15.89632,15.89409,15.72787,15.78864,15.98092,15.98750,15.95864,15.95734,15.92996,15.92746,15.90106,15.83825,15.82930,15.94008,15.95096","c1":"ca","c2":"mx","growth":0.00285245627480646,"digits":5,"ask":15.95096,"bid":15.95096,"dir":0,"chart_date":1626037200}]a
    

    【讨论】:

    • @KashifHussain - 它确实有效。您应该使用带有正确参数的 HTTP POST 调用。 (不是 HTTP GET)
    • 我没听懂,但感谢您的回复
    • 让我换个说法。您可以通过对我提到的端点进行 HTTP 调用来获取您正在寻找的数据。你不需要做任何抓取操作。
    【解决方案3】:

    我更喜欢使用下面的代码来抓取该网站。

    driver.implicitly_wait(10)
    driver.get("https://www.mql5.com/en/quotes/currencies")
    driver.find_element_by_id("list-view-btn").click()
    time.sleep(2)
    tabledata = driver.find_elements_by_xpath("//tbody/tr")
    for table in tabledata:
        time.sleep(2)
        title = table.find_element_by_xpath(".//div[contains(@class,'descr')]").text
        ask = table.find_element_by_xpath(".//div[contains(@id,'ask')]").text
        bid = table.find_element_by_xpath(".//div[contains(@id,'bid')]").text
        high = table.find_element_by_xpath(".//div[contains(@id,'high')]").text
        low = table.find_element_by_xpath(".//div[contains(@id,'low')]").text
        print("{}: {},{},{},{}".format(title,ask,bid,high,low))
    

    【讨论】:

      【解决方案4】:

      试试这个:

      def scrape():
          for i in range (2,8):
              if i%2 == 0 :
                  ask = driver.find_element_by_xpath('//*[@id="ticker_ask_" + str(i) + "]"').text
                  values[i][0].append(ask)
                  bid = driver.find_element_by_xpath('//*[@id="ticker_bid_" + str(i) + "]"').text
                  values[i][1].append(bid)
                  high = driver.find_element_by_xpath('//*[@id="ticker_high_" + str(i) + "]"').text
                  values[i][2].append(high)
                  low = driver.find_element_by_xpath('//*[@id="ticker_low_" + str(i) + "]"').text
                  values[i][3].append(low)
                  print ( values[i][0] ,' , ',values[i][1] ,' , ',values[i][2] ,' , ',values[i][3]  )
      scrape()
      

      【讨论】:

      • InvalidSelectorException:消息:无效选择器:由于以下错误,无法使用 xpath 表达式定位元素 //*[@id="ticker_ask_" + str(i) + "]:SyntaxError:无法对“文档”执行“评估”:字符串 '//*[@id="ticker_ask_" + str(i) + "]' 不是有效的 XPath 表达式。
      • InvalidSelectorException:消息:无效选择器:无法使用 xpath 表达式定位元素 //*[@id="ticker_ask_" + str(i) + "]" 因为以下错误:SyntaxError : 无法对“文档”执行“评估”:字符串 '//*[@id="ticker_ask_" + str(i) + "]"' 不是有效的 XPath 表达式。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-06-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-09-09
      • 1970-01-01
      相关资源
      最近更新 更多