【问题标题】:Scraping data displayed when moving the mouse over a plot将鼠标移到绘图上时显示的抓取数据
【发布时间】:2020-03-09 15:15:06
【问题描述】:

我有兴趣从 https://www.hltv.org/team/7532/big 等网页自动抓取。更准确地说,我想从您将鼠标悬停在绘图上时显示的框中提取日期和#ranking(请参见下面的屏幕截图)

我尝试将 python 与 selenium 结合使用,但我真的不知道如何进一步进行,尽管我经历了不同的教程。我觉得我需要更改样式属性的顶部和左侧值,但我不知道该怎么做以及是否应该使用 xpath、css 选择器或其他任何东西。这是我的一段代码,它返回我感兴趣的 WebElement(大概),但我没有设法从中提取任何东西:(

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
executable_path=r'C:/Users/fabbe/Documents/Python Scripts/hltv/chromedriver/chromedriver.exe'
driver = webdriver.Chrome(executable_path, chrome_options=options)

driver.get("https://www.hltv.org/team/7532/big")

elements = driver.find_elements_by_xpath("//*[@id='fusioncharts-tooltip-element']")

screenshot

【问题讨论】:

  • 如果您使用的是 FusionCharts,您可以在将鼠标悬停在数据图上时使用它们的 API 事件来获取值,您可以使用 dataPlotRollOver 事件,这是一个演示 - jsfiddle.net/fusioncharts/w5tcppk8

标签: python selenium screen-scraping


【解决方案1】:

我会采用另一种方法来获取图表数据,这样您就不必将鼠标悬停在图表的所有部分上。

您必须添加以下导入。

import json
from lxml import html

代码:

url = "https://www.hltv.org/team/7532/BIG"
driver.get(url)
graph_data  = driver.find_element_by_css_selector('.chart-container.core-chart-container .border-box .graph').get_attribute('data-fusionchart-config')
graph_text = json.loads(graph_data)['dataSource']['dataset'][0]['data']
for graph_item in graph_text:
    tree = html.fromstring(graph_item['tooltext'])
    print("Date:" + tree.xpath("//div[@class='subtitle']//text()")[0])
    print("Rank:" + tree.xpath("(//div[@class='ranking-development-top-info']//div[@class='title'])[2]//text()")[0])
driver.close()

这里是获取图形内容然后解析它。然后只获取我们感兴趣的数据并遍历所有图形项。

下面是输出。

Date:24th December 2018
Rank:#11
Date:31st December 2018
Rank:#11
Date:7th January 2019
Rank:#11
Date:14th January 2019
Rank:#12
Date:21st January 2019
Rank:#13
Date:28th January 2019
Rank:#13
Date:4th February 2019
Rank:#15
Date:11th February 2019
Rank:#12
Date:18th February 2019
Rank:#14
Date:25th February 2019
Rank:#15
Date:4th March 2019
Rank:#18
Date:11th March 2019
Rank:#16
Date:18th March 2019
Rank:#18
Date:25th March 2019
Rank:#18
Date:1st April 2019
Rank:#18
Date:8th April 2019
Rank:#18
Date:15th April 2019
Rank:#18
Date:22nd April 2019
Rank:#19
Date:29th April 2019
Rank:#19
Date:6th May 2019
Rank:#18
Date:13th May 2019
Rank:#18
Date:20th May 2019
Rank:#20
Date:27th May 2019
Rank:#22
Date:3rd June 2019
Rank:#22
Date:10th June 2019
Rank:#22
Date:17th June 2019
Rank:#26
Date:24th June 2019
Rank:#30
Date:1st July 2019
Rank:#34
Date:8th July 2019
Rank:#23
Date:15th July 2019
Rank:#27
Date:22nd July 2019
Rank:#22
Date:29th July 2019
Rank:#23
Date:5th August 2019
Rank:#28
Date:12th August 2019
Rank:#25
Date:19th August 2019
Rank:#24
Date:26th August 2019
Rank:#26
Date:2nd September 2019
Rank:#28
Date:9th September 2019
Rank:#24
Date:16th September 2019
Rank:#22
Date:23rd September 2019
Rank:#22
Date:30th September 2019
Rank:#21
Date:7th October 2019
Rank:#27
Date:14th October 2019
Rank:#24
Date:21st October 2019
Rank:#26
Date:28th October 2019
Rank:#24
Date:4th November 2019
Rank:#24
Date:11th November 2019
Rank:#24
Date:18th November 2019
Rank:#28
Date:25th November 2019
Rank:#26
Date:2nd December 2019
Rank:#26
Date:9th December 2019
Rank:#29
Date:16th December 2019
Rank:#33
Date:23rd December 2019
Rank:#40
Date:30th December 2019
Rank:#39
Date:6th January 2020
Rank:#46
Date:13th January 2020
Rank:#46
Date:20th January 2020
Rank:#46
Date:27th January 2020
Rank:#22
Date:3rd February 2020
Rank:#22
Date:10th February 2020
Rank:#23
Date:17th February 2020
Rank:#25
Date:24th February 2020
Rank:#26
Date:2nd March 2020
Rank:#21
Date:9th March 2020
Rank:#20

【讨论】:

    猜你喜欢
    • 2016-11-12
    • 2012-06-12
    • 1970-01-01
    • 1970-01-01
    • 2016-05-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多