【问题标题】:How to click the CSV button in a website and download the data in python如何单击网站中的 CSV 按钮并在 python 中下载数据
【发布时间】:2020-08-10 09:28:37
【问题描述】:

我正在尝试从以下网站下载 CSV 和 JSON 数据:https://worldpopulationreview.com/countries/countries-by-gdp/#worldCountries

如何模拟点击csv文件?

import pandas as pd
import requests
from lxml import html,etree

url = "https://worldpopulationreview.com/countries/countries-by-gdp/#worldCountries"

# now I am not sure, how to click csv button of actual website
# also I am not sure how it will download the csv file
# to DOWNLOADS as like when I click the page

我可以抓取网页,但我想学习点击按钮

import pandas as pd
import requests

url = "https://worldpopulationreview.com/countries/countries-by-gdp/#worldCountries"

r = requests.get(url)
df = pd.read_html(r.text)[0]
df.to_csv('data.csv')

【问题讨论】:

    标签: python pandas beautifulsoup lxml


    【解决方案1】:

    需要下载pip install selenium 如果使用 Chrome,请在此处下载 Chrome 驱动程序 - Chrome driver。然后找到按钮/链接的xpath,我使用inspect元素找到xpath:

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    
    driver = webdriver.Chrome(executable_path='/Users/xxx/Downloads/chromedriver-1')
    driver.get('https://worldpopulationreview.com/countries/countries-by-gdp')#put here the adress of your page
    btn = driver.find_element_by_xpath('/html/body/div[1]/div/div[1]/div[2]/div[2]/div[1]/div/div/div/div[2]/div[1]/a[2]')
    btn.click()
    df = pd.read_csv('/Users/xxx/Downloads/data.csv')
    print(df.head())
    driver.close()
    
       rank         country        imfGDP           unGDP  gdpPerCapita          pop
    0     1   United States  2.219812e+13  18624475000000    67063.2695   331002.651
    1     2           China  1.546810e+13  11218281029298    10746.7828  1439323.776
    2     3           Japan  5.495420e+12   4936211827875    43450.1405   126476.461
    3     4         Germany  4.157120e+12   3477796274497    49617.1450    83783.942
    4     6  United Kingdom  2.927080e+12   2647898654635    43117.5725    67886.011
    

    查找xpath的图像:

    【讨论】:

    • 非常感谢,但它给了我以下错误WebDriverException: Message: unknown error: Element <a>...</a> is not clickable at point。你测试过代码,它对你有用吗?
    • 是的,我有以下可执行文件/usr/local/bin/chromedriver
    • 另外,你能告诉我为什么你有这么多的 div/div/div,你能发布截图如何获得所需的 XPath,我将不胜感激。
    • 尝试通过在 Chrome 中检查按钮的 xpath 来找到它
    • 右键单击 CSV 按钮并点击检查,然后右键单击 html 并复制完整的 xpath。
    【解决方案2】:

    您可以使用 selenium 来模拟单击 csv 下载按钮 https://selenium-python.readthedocs.io/getting-started.html#example-explained

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多